Qlik Sense Data Quality Framework: Quality Gates in ETL

Q: How Do Quality Gates Work at Each Level?

The problem: Faulty data spreads unnoticed through all layers and corrupts business decisions! Fixing an error in the application is 10x more expensive than fixing it in the load script. The solution: Quality Gates after EVERY processing step – detect and fix problems early! This three-layer model is the foundation for all robust Qlik architectures. The three validation layers: Layer Validation Purpose Examples 1. Staging Structural integrity of the import File loaded? Row count?

Q: What Is Uniqueness?

Are primary keys truly unique with no duplicates? // Check: Duplicates by OrderID? (Checked with GROUP BY in the script) Count(*) as DuplicateCount GROUP BY OrderID HAVING Count(*) > 1

This is Article 19 of the Qlik Sense Data Modeling Course.

📚 Qlik Sense Course – Article 19 of 28

← Previous Article: Flag-Based Modeling
→ Next Article: Error Handling

What is a Data Quality Framework? A systematic approach to validating data quality at EVERY level of the ETL process – from staging through transform to the final data model. Early detection minimizes the effort required for later error correction.

What Will You Learn About the Data Quality Framework?

After this guide, you will be able to:

Implement and automate Quality Gates on all ETL layers.
Use Quality Flags for automatic data validation and for **performant** analyses. (Topic of the previous article)
Build a Quality Dashboard for transparent monitoring.
Detect critical errors using thresholds and variables.

Time investment: 25 min reading + 4 hrs hands-on
Quick Win: In 20 minutes you’ll have your first quality checks implemented

How Do Quality Gates Work at Each Level?

The problem: Faulty data spreads unnoticed through all layers and corrupts business decisions! Fixing an error in the application is 10x more expensive than fixing it in the load script.

The solution: Quality Gates after EVERY processing step – detect and fix problems early! This three-layer model is the foundation for all robust Qlik architectures.

The three validation layers:

Layer	Validation Purpose	Examples
1. Staging	Structural integrity of the import	File loaded? Row count? Data types? (Primary: Loading Data)
2. Transform	Compliance with business rules	Required fields? Valid values? Duplicates?
3. Model	Relationships & model hygiene	No Synthetic Keys? Referential integrity?

What Are the 6 Data Quality Dimensions?

Data quality can be divided into six measurable dimensions – in the Qlik script, each dimension represents a group of `If()` checks and flags. These dimensions are defined by the DAMA Data Management Body of Knowledge, the authoritative reference for data quality standards.

How Is Completeness Evaluated?

Are all expected data present (no NULL values in required fields)?

// Check: Required fields populated?
If(IsNull(CustomerID) OR Len(Trim(CustomerID)) = 0, 1, 0) as DQ_MissingCustomerID

How Is Accuracy Ensured?

Are the values correct and within valid ranges (contradiction to reality)?

// Check: Amount in reasonable range?
If(Amount <= 0 OR Amount > 1000000, 1, 0) as DQ_InvalidAmount

How Important Is Consistency?

Are values logically consistent across different fields (e.g. totals match)?

// Check: ShipDate not before OrderDate?
If(ShipDate < OrderDate, 1, 0) as DQ_InvalidDateOrder

What Is Validity?

Do values match the expected format (e.g. numeric, date, defined list)?

// Check: Email contains @ and .?
If(NOT Match(Email, '*@*.*'), 1, 0) as DQ_InvalidEmail

What Is Uniqueness?

Are primary keys truly unique with no duplicates?

// Check: Duplicates by OrderID? (Checked with GROUP BY in the script)
Count(*) as DuplicateCount
GROUP BY OrderID
HAVING Count(*) > 1

How Important Is Timeliness?

Is the data current enough for analysis (e.g. load date vs. today)?

// Check: Last update more than 24h ago?
If(Today() - LoadDate > 1, 1, 0) as DQ_DataStale

How to Implement Structural Validation in the Staging Layer?

In the staging layer, right after loading raw data, you check the basic integrity of the import.

// Load raw data
STG_Sales:
LOAD
    OrderID,
    CustomerID,
    ProductID,
    OrderDate,
    Amount,
    Quantity
FROM
(ooxml, embedded labels);

// Quality Check 1: Was the file loaded at all?
LET vRowCount = NoOfRows('STG_Sales');
TRACE [QC] Staging: Loaded $(vRowCount) rows;

IF $(vRowCount) = 0 THEN
    TRACE Staging: No data loaded! Stopping script.
    SET vQuality_Staging = 'FAIL';
    EXIT SCRIPT;
ELSE
    SET vQuality_Staging = 'PASS';
ENDIF

Explanation:

NoOfRows(): Counts records in the loaded table.
EXIT SCRIPT: Stops on critical error (e.g. no data loaded) – this is part of Error Handling.

How Do I Check the Row Count Range?

Check whether the record count is within the expected range to detect export errors or data corruption early.

// Define expected range
LET vExpectedMin = 1000;
LET vExpectedMax = 1000000;

IF $(vRowCount) < $(vExpectedMin) OR $(vRowCount) > $(vExpectedMax) THEN
    TRACE Staging: Row count outside expected range! Check the export.;
    SET vQuality_Staging_RowCount = 'WARN';
ELSE
    SET vQuality_Staging_RowCount = 'PASS';
ENDIF

How Does Data Type Validation Work?

Data type validation is essential. If a column that should be numeric (e.g. `Amount`) is loaded as text, all aggregations will fail later.

// Check data types
Temp_DataTypeCheck:
LOAD
    Count(*) as TotalRows,
    Sum(If(NOT IsNum(Amount), 1, 0)) as AmountNotNumeric,
    Sum(If(NOT IsNum(Quantity), 1, 0)) as QuantityNotNumeric
RESIDENT STG_Sales;

LET vAmountNotNumeric = Peek('AmountNotNumeric', 0, 'Temp_DataTypeCheck');
TRACE [QC] Staging: Amount NOT numeric in $(vAmountNotNumeric) rows;

DROP TABLE Temp_DataTypeCheck;

Explanation: `IsNum()` checks whether the value is numeric. If `AmountNotNumeric` is greater than 0, you need to correct the load statement.

How to implement business rules and quality flags in Qlik Sense?

In the transform layer, you validate the logic and create the actual **Quality Flags**. This is the most performant method for error marking (see Flag-Based Modeling).

TRF_Sales:
LOAD
    OrderID,
    CustomerID,
    ProductID,
    Date(OrderDate) as OrderDate,
    Amount,
    Quantity,

    // Quality Flags - Completeness
    If(IsNull(CustomerID) OR Len(Trim(CustomerID)) = 0, 1, 0)
        as DQ_MissingCustomerID,
    If(IsNull(ProductID) OR Len(Trim(ProductID)) = 0, 1, 0)
        as DQ_MissingProductID,

    // Quality Flags - Range Validation
    If(Amount <= 0, 1, 0) as DQ_ZeroOrNegativeAmount,
    If(Quantity <= 0, 1, 0) as DQ_InvalidQuantity,

    // Quality Flags - Consistency
    If(OrderDate > Today(), 1, 0) as DQ_FutureOrderDate,

    // Composite Quality Flag: Marks every faulty record
    If(IsNull(CustomerID) OR Amount <= 0 OR OrderDate > Today() OR Quantity <= 0,
  	   1, 0) as DQ_HasAnyIssue

FROM (qvd);

Advantages of Quality Flags: Instead of deleting faulty records, you mark them. This way, business users can later **in the frontend** use Set Analysis (e.g. Sum({<DQ_HasAnyIssue={'0'}>} Sales)) to analyze exclusively the “good” data, while the faulty records are preserved for root cause analysis (error debugging). You can use the Qlik ErrorMode system variable to control script behavior when quality gates fail.

How Do You Calculate the Quality Summary?

Summarize the error counts to monitor the health of the data.

Quality_Transform:
LOAD
    'TRF_Sales' as TableName,
    Count(*) as TotalRecords,
    Sum(DQ_HasAnyIssue) as RecordsWithIssues,
    1 - (Sum(DQ_HasAnyIssue) / Count(*)) as QualityScore
RESIDENT TRF_Sales;

// Log quality score
LET vQualityScore = Peek('QualityScore', 0, 'Quality_Transform');
TRACE [QC] Transform: Quality Score = $(vQualityScore);

// Threshold Check
IF $(vQualityScore) < 0.95 THEN
    TRACE Transform: Quality Score below 95% threshold!
    SET vQuality_Transform = 'ALERT';
ELSE
    SET vQuality_Transform = 'PASS';
ENDIF

Explanation: `QualityScore` is 1 minus the error rate, giving you a percentage of “good” records. When the defined `Threshold` (here 95%) is not met, an **ALERT** is triggered.

How is referential integrity enforced in the model layer?

In the final model, the relationship between fact and dimension tables must be validated. This is primarily about avoiding *orphaned records*. This is closely related to avoiding Synthetic Keys.

// Load dimensions
Dim_Customer:
LOAD CustomerID, CustomerName
FROM (qvd);

// Facts with Referential Integrity Check
TRF_Sales:
LOAD
    OrderID,
    CustomerID,
    ProductID,
    Amount,

    // Does CustomerID exist in Dim_Customer?
    If(NOT Exists(CustomerID, CustomerID), 1, 0) as DQ_OrphanedCustomer,

    // Does ProductID exist in Dim_Product?
    If(NOT Exists(ProductID, ProductID), 1, 0) as DQ_OrphanedProduct

FROM (qvd);

Explanation:

Exists(Field, Value): Checks whether the value of `CustomerID` in the currently loaded table already exists in the `CustomerID` field of the `Dim_Customer` table (the loaded dimension).
DQ_OrphanedCustomer: 1 when the `CustomerID` exists in the Sales table but **no** matching counterpart is found in the dimension.

Why is this important? Orphaned records cause NULL associations, which can lead to missing values in your dimensions and **incorrect aggregations**.

In production environments, quality gates are most effective when embedded in a three-stage ETL architecture, where each layer – staging, transform, and model – validates data at the appropriate level.

How Does Duplicate Detection Work?

Duplicates are one of the most common causes of unreliable metrics. Identify them before processing the data further.

// Identify duplicates (by OrderID)
Temp_DuplicateCheck:
LOAD
    OrderID,
    Count(*) as DuplicateCount
RESIDENT STG_Sales
GROUP BY OrderID;

// Add duplicate flag
LEFT JOIN (STG_Sales)
LOAD
    OrderID,
    If(DuplicateCount > 1, 1, 0) as DQ_IsDuplicate,
    DuplicateCount
RESIDENT Temp_DuplicateCheck;

// Log duplicate flag
LET vDuplicateCount = Sum(If(DQ_IsDuplicate = 1, 1, 0), 'STG_Sales');
TRACE [QC] Duplicates: Found $(vDuplicateCount) duplicate records;

How Does the Deduplication Strategy Work?

Choose a strategy: The best results come from keeping the “newest” or “most correct” record:

// Strategy 2: Keep newest record (MOST IMPORTANT STRATEGY)
TRF_Sales_Latest:
LOAD *
RESIDENT STG_Sales
// Keep all non-duplicates AND only the newest duplicate record
WHERE DQ_IsDuplicate = 0
   OR RecNo() = FirstSortedValue(RecNo(), -OrderDate, OrderID);

Explanation: This advanced logic keeps all unique records and, for duplicates, selects only the record with the latest `OrderDate` (or the highest `SourceRowNumber` if `OrderDate` is not unique). This type of data transformation is essential for invoices and orders.

What Is Cross-Field Validation?

Use consistency checks to uncover logical errors between related fields.

TRF_Orders:
LOAD
    OrderID,
    Date(OrderDate) as OrderDate,
    Date(ShipDate) as ShipDate,
    TotalAmount,

    // Cross-Field Consistency Flags
    If(ShipDate < OrderDate, 1, 0) as DQ_ShipBeforeOrder,
    If(Abs(TotalAmount - (OrderAmount + ShippingCost)) > 0.01, 1, 0)
  	    as DQ_TotalMismatch,
    //... additional checks
FROM (qvd);

How Do I Integrate the Quality Dashboard and Set Up Alerting?

Collect quality metrics over time (historization) and use Qlik to react automatically to problems.

// Summarize and historize current load quality
//... (Summary logic)...

// Append to historical log
CONCATENATE(Quality_History)
LOAD * RESIDENT Quality_Summary;

// Save historical data
STORE Quality_History INTO (qvd);

How Does Automated Alerting with Thresholds Work?

We use the calculated `vQualityScore` from Layer 2 to send an automated notification if quality falls below a threshold.

// Define thresholds
LET vQualityThreshold = 0.95;  // 95% quality required

// Calculate quality score
//... (Logic to calculate vQualityScore)...

// Alert Logic
IF $(vQualityScore) < $(vQualityThreshold) THEN
    TRACE Quality Score $(vQualityScore) below threshold!

    // Trigger automated action
    // Here e.g. Qlik Automate can send an email:
    // EXECUTE;

    TRACE See Qlik Automate Email notifications for setup.
ENDIF

What Are the Best Practices for the Data Quality Framework?

The following best practices ensure your framework remains efficient and maintainable.

Layered Validation:

[ ] Staging Layer: Structural checks (File loaded? Row count?)
[ ] Transform Layer: Business rules (Required fields? Ranges?)
[ ] Model Layer: Relationships (Data model problems, No orphans?)

Quality Flags:

[ ] Mark faulty records, DON’T delete them
[ ] Clear naming convention (DQ_ prefix)
[ ] Composite flags for “HasAnyIssue”

Quality Logging & Performance:

[ ] Store quality metrics in QVD (historical data)
[ ] Quality dashboard for monitoring
[ ] Apply performance strategies before critical checks

How Does Performance Affect Quality?

Quality checks have a performance overhead (approx. +11% to +55% additional load time) – but the cost of *incorrect* business decisions far exceeds this effort. It’s a necessary trade-off.

Scenario	Load Time (1M Records)	Overhead
Without Quality Checks	45 sec	–
With Staging Checks	50 sec	+11%
With Transform Checks	57 sec	+27%
With All Checks	70 sec	+55%

Conclusion: +55% load time, but 100% confidence in data quality – a good trade-off! Ideally, keep checks at the transform layer to keep staging performance as fast as possible.

How to troubleshoot Qlik Sense Data Quality Framework?

Problem: Quality checks slow down loads too much

Solution 1: Selective Checks

Don’t check ALL records, just a sample:

// 10% sample for fast checks
WHERE Rand() < 0.1

Solution 2: Parallel Processing

Use separate QVDs for quality checks – load main data in parallel:

// Main Load
STORE TRF_Sales INTO (qvd);

// Quality Checks in separate job
// Load TRF_Sales.qvd and run checks

Problem: Too many quality flags make the model confusing

Solution: Composite Flags

Combine multiple checks into one flag:

If(DQ_MissingCustomerID = 1 OR
   DQ_ZeroOrNegativeAmount = 1 OR
   DQ_FutureOrderDate = 1,
   1, 0) as DQ_HasAnyIssue

Problem: Quality alerts fire too frequently

Solution: Realistic Thresholds

100% quality is unrealistic – set practical thresholds:

// Only alert on actual degradation
IF $(vQualityScore) < 0.95 THEN
    // Alert
ENDIF

What Are the Next Steps?

You can now implement systematic data quality gates! Next:

1. Error Handling: What happens when quality checks fail and the script needs to stop? Error Handling shows you how to catch critical errors and terminate the script in a controlled manner.

2. Performance Optimization: To minimize the performance overhead, you need to further optimize the load times of your applications.

How to implement quality gates in an ETL pipeline?

Error Handling – What to do when quality checks fail?
Flag-Based Modeling – Flags for ultimate performance and simple Set Analysis

Previous: Flag-Based Modeling | Next: Error Handling

What would help you most right now?

Thanks!