Data Quality Audit

A programmatic scan of your data model surfaces broken joins, silent data loss, and stale sources. Ranked punch list with business risk framing in 2 to 3 weeks.

Your CFO does not trust the dashboard. Your BI team spends half their week reconciling. The board keeps asking “but is this number right?” and nobody in the room can give a clean answer.

That is not a tooling problem. It is a data quality problem. And it is almost certainly worse than you think, because the issues that erode trust are usually invisible until someone catches one manually.

The Data Quality Audit is a programmatic diagnostic. We scan your data model, surface what is broken, rank it by likely business impact, and hand you a prioritized fix list in 2 to 3 weeks.

Key Insight: Most data quality problems are not random. They cluster around a handful of structural patterns, and those patterns repeat across companies. An AI-assisted scan finds them faster than manual review, and ranks them by the business risk they carry, not by how loud the error message is.

What the Audit Finds

The scan covers the full range of issues that cause dashboards to be questioned, reports to disagree, and analysts to spend their Fridays chasing reconciliation gaps.

Orphan keys and broken joins

Records that should join to a dimension table but do not. Often invisible in aggregated reports until someone drills through to a segment that has no label.

Detected in 90%+ of audits

Silent data loss in ETL

Rows dropped during transformation with no error, no log entry, no alert. The numbers add up. They just do not add up to the right thing.

High impact – affects KPI accuracy

Stale sources pretending to be fresh

A field that looks current but is sourced from a feed that last refreshed three months ago. Usually surfaces when two reports cover the same period and show different totals.

Most common trust-eroding pattern

Field-level nulls in critical KPIs

Nulls in a cost center field, a product category, or a currency code that get silently excluded from aggregations. The KPI calculates. It just calculates on the wrong denominator.

High impact – silent denominator errors

Inconsistent definitions across reports

Revenue defined as gross in one report and net in another. Headcount calculated differently between HR and Finance. Same label, different logic, different number.

Most common – found in every engagement

Duplicate records with near-matches

The same customer appearing three times under slightly different name spellings. The same invoice in two tables with a one-day date difference. Deduplication that was supposed to happen in the ETL but did not.

High impact – inflates counts and totals

Slowly changing dimensions that break history

A sales region gets renamed. All historical transactions now appear under the new name. The trend line looks like a sudden shift. It is a data model artifact, not a business event.

Corrupts trend analysis silently

Source-to-target drift

The upstream source system changes its schema. The pipeline keeps running. The data in your warehouse quietly diverges from the source of record.

Grows undetected over months

Suspicious zero-value patterns

Zeros that should be nulls. Nulls that should be zeros. The distinction changes whether a KPI calculation excludes a record or treats it as a valid zero.

Skews aggregations and averages

Time zone and currency inconsistencies

Timestamps recorded in local time in one source, UTC in another. Exchange rates applied at different points in the pipeline. Small differences that compound into material variances at month-end.

Compounds into material month-end variances

How It Works

The engagement runs in four phases. Total elapsed time is typically 2 to 3 weeks, depending on source complexity and access setup.

Phase 1 · 60 to 90 minutes

Scope call and source inventory

We spend 60 to 90 minutes mapping what data systems are in scope, where the business-critical reports live, and which fields or metrics have already been flagged as unreliable. This determines the probe strategy for Phase 2.

Phase 2 · 3 to 5 days

Programmatic probes across your model

We run automated checks against your data model: join integrity, null distributions, cardinality anomalies, timestamp gaps, aggregation consistency, deduplication health. No manual row-by-row review. The probes surface the full population of issues across the model.

Phase 3 · 2 to 3 days

AI-assisted clustering and business-impact ranking

The raw probe output contains thousands of anomalies. The AI step clusters related findings, deduplicates overlapping signals, and scores each cluster by likely financial or reputational impact. A broken join in a revenue attribution table ranks above a timezone inconsistency in a log table, even if both trigger the same type of alert.

Phase 4 · delivery

Findings report and debrief

You receive the full written report with a severity-ranked punch list, a business risk annotation for each finding, and a fix effort estimate. We walk through the top-tier findings in a debrief call so your team can prioritize remediation without needing to read every line.

What You Get

The deliverable is a document your BI lead and CDO can act on immediately. It is not a dashboard. It is not another monitoring tool to configure. It is a ranked list of things that are broken, why they matter, and how hard they are to fix.

✓ Findings report

The full output of the scan, organized by severity tier. Every finding includes the table or field where it was detected, the pattern it matches, and the business process it affects.

✓ Severity-ranked punch list

A single-page prioritized list of the top issues. Designed for a 15-minute standup or a board update. The format is: finding, affected KPI or report, severity, estimated fix effort.

✓ Business risk annotation per finding

For each issue in the top tier, a one-paragraph note explaining what goes wrong if it stays unfixed. Not technical jargon. The actual business consequence: a KPI that overstates, a report that excludes a customer segment, a metric that calculates on incomplete history.

✓ Fix effort estimate

Each finding is tagged with a rough fix effort: small (a single transform fix, 1 to 2 days), medium (pipeline rework, 1 to 2 weeks), or large (structural data model change, requires a scoping conversation). The estimate is for prioritization, not contracting.

✓ Optional follow-up for remediation scoping

If the findings reveal issues large enough to warrant a remediation project, we can scope that separately. It is not assumed, not bundled, and not the goal of the audit.

Is This Right for You?

The audit is best suited to organizations that already have a functioning BI environment and want to baseline trust in it. It is not an onboarding engagement for teams that are still building their first data pipeline.

You are a good fit if:

  • A BI team or data function exists internally.
  • At least one business-critical dashboard is in regular use by leadership.
  • Your data lives in a warehouse, a Qlik app, or a comparable accessible stack.
  • You have heard “the numbers do not match” more than once from the business side.
  • A new CDO, Head of BI, or Analytics Lead has joined and wants to establish a trust baseline before committing to the current setup.
  • A board or audit committee has raised questions about data reliability that your team cannot fully answer yet.

If the primary problem is that no BI infrastructure exists yet, this is not the right starting point. The profitability analysis guide and the finance dashboard guide cover the foundations of building that layer first.

What It Costs

Fixed Fee

CHF 12,000 to CHF 35,000

Number of sources, source complexity, and depth of lineage required drive where in that range a given engagement lands.

Three factors drive where in that range a given engagement lands:

  • Number of sources. A single warehouse with one primary data model is at the lower end. Multiple source systems with separate pipelines and integration points move the engagement toward the middle or upper end.
  • Source complexity. A clean star schema with documented transformations is faster to probe than a legacy model with undocumented joins, mixed grain fact tables, and years of accumulated workarounds.
  • Depth of lineage required. Some engagements only need field-level quality checks. Others require tracing a KPI back through three or four transformation steps to find where a value diverges from the source. Lineage depth is the single largest driver of effort after source count.

The scope call in Phase 1 is enough to determine where in the range the engagement will land. There is no commitment required before that conversation.

Timeline

Timeline: 2 to 3 weeks from kickoff to final report.

Week 1 covers scope, source access setup, and probe execution. Week 2 covers AI-assisted analysis, finding write-up, and severity ranking. The debrief happens in Week 2 or at the start of Week 3, depending on scheduling. Larger engagements with 4 or more source systems may extend to 4 weeks.

Why AI Matters Here

Programmatic probes against a mid-size data model produce thousands of anomaly signals. A human analyst reviewing that output manually would spend weeks before producing anything actionable.

The AI step does three things. It clusters related findings so that 40 variations of the same broken join pattern appear as one ranked finding, not 40 separate line items. It deduplicates signals that originate from the same root cause but surface in different tables. And it scores findings by likely business impact using the context from the scope call: which tables feed which reports, which KPIs the business has already flagged as problematic, which cost centers or revenue lines the CFO cares most about.

The result is a report a BI lead can act on in a week. Not a dashboard of red lights that takes months to prioritize.

Is This a Replacement for Observability Tools?

No. This is a one-time diagnostic, not ongoing monitoring.

Tools like Monte Carlo, Bigeye, and Great Expectations are designed for continuous data observability. They alert you when something changes relative to a learned baseline. They are valuable for mature data platforms that need ongoing quality assurance across production pipelines.

An audit is different. It establishes the current state. It finds problems that have been accumulating undetected, sometimes for years. Many clients use the audit findings to decide whether they need continuous observability at all, and if so, which parts of their pipeline justify the investment. The audit is the diagnosis. Observability tools are the monitoring regime you might set up after.

If you are evaluating both simultaneously, the revenue leakage guide covers how undetected data issues translate into financial exposure over time, which can help frame the business case for either investment.

Frequently Asked Questions

What tools do you need access to?

Read-only access to your data model is the baseline requirement. That typically means a read-only database connection, access to the Qlik app or BI environment where the business-critical reports live, and documentation of the primary ETL or transformation pipeline if it exists. We do not need production write access at any stage.

Do you run probes against production?

The probes are read-only queries. They do not modify, delete, or insert data. For clients who prefer it, we run against a recent snapshot of the production data rather than the live environment. Either approach works. Most clients use production because the snapshot setup adds time and the read-only risk is negligible.

What if we are not on Qlik?

The audit is tool-agnostic. The probe methodology works against any accessible data model: SQL warehouses, Snowflake, BigQuery, Databricks, Power BI semantic layers, or other BI platforms with queryable datasets. If your stack is accessible via SQL or an API, the probes run. The report format is the same regardless of the underlying platform.

Is there a guarantee?

If the programmatic scan does not surface at least five material findings in the top two severity tiers, we refund the full engagement fee. In practice, this has not happened. Every mid-size data environment we have assessed has had material issues in the top tier. The question is never whether findings exist. It is how many there are and how severe.

Can this lead to a remediation project?

It can, but that is not the goal of the audit and it is not assumed in the engagement structure. The findings report includes fix effort estimates. If the top-tier findings point to structural work that would benefit from external support, we can discuss a remediation scope. That conversation happens after the debrief, not before. The audit stands on its own.

How is this different from a Gartner-style data maturity assessment?

A maturity assessment scores your organization against a capability framework. It tells you where you sit on a maturity curve and what the next level looks like. The output is a scorecard and a roadmap of organizational improvements.

This audit is a technical diagnostic. It tells you what is specifically broken in your data, where in the model it is broken, and what it is costing you. The deliverable is a punch list, not a scorecard. There is no maturity level to aspire to. There are concrete issues to fix, ranked by impact.

Request an Audit Scope Call

The scope call is 60 minutes. We map your environment, confirm fit, and give you a firm price band before any commitment. No sales deck, no follow-up sequence.

Request an audit scope call →

Not sure if this is the right starting point? The Hidden Money case study shows what a data-driven diagnostic found in a real engagement. The margin erosion guide covers one of the most common patterns the audit finds in mid-market companies.