Three-Stage Architecture: Enterprise ETL Pattern in Qlik

Q: What is the principle of Separation of Concerns in the three-stage architecture?

The monolith problem: Everything in one script – load statements, complex calculations, and the final data model are all mixed together. A bug fix in the ETL process requires reloading source data and risks destabilizing the entire data model. Testing is virtually impossible! The solution: Three clearly separated layers (scripts or blocks) that communicate via QVD files. Each layer has its own responsibility, is independently testable, and the QVDs serve as stable interfaces. This is the culmi

Q: What are the three stages and their responsibilities in Qlik?

Stage Main Task QVD Strategy & Output 1. Staging (Extract) Secure raw data, 1:1 copy of the source. Fastest possible load. STG_*.qvd (Raw Data Cache) 2. Transform Data cleansing, applying business logic, standardization, quality checks. TRF_*.qvd (Clean, Business-Ready Data) 3. Model (App) Create the dimensional model (star schema), performance optimization. Facts/Dims directly in app memory

Q: What performance and maintenance benefits does the three-stage architecture offer?

The architecture delivers massive benefits in production: Stability: When the source database changes (Stage 1), only the staging layer needs to be adjusted and reloaded. Stage 2 and 3 can continue the transformation using the preserved STG_*.qvd files. Efficiency: Faulty transformations (Stage 2) don't require re-querying the slow source database, since the raw data is already in the fast QVD cache (Stage 1). This dramatically speeds up debugging. Reusability: The TRF_*.qvd files (Stage 2 outpu

Q: How do you optimize QVD per layer in the three-stage architecture?

Performance in the model layer depends heavily on how the TRF_*.qvd files are loaded. For a deep dive into QVD file formats and optimization techniques, see the QVD layer optimization guide.

This is Article 24 of the Qlik Sense Data Modeling Course.

📚 Qlik Sense Course – Article 24 of 28

← Previous Article: Subroutines
→ Next Article: Section Access

What is the three-stage architecture? This is the proven pattern for enterprise Qlik development that divides script logic into three clearly defined phases: Extract, Transform, and Model. It is fundamental for maintainable, scalable, and performant Qlik solutions. For more context on how Qlik structures data loading, see the Qlik data modeling documentation.

What will you learn about the three-stage architecture in Qlik?

After this guide you will be able to:

Systematically implement three-stage architecture (separation of concerns principle).
Optimize the QVD strategy for each layer (Staging, Transform, Model).
Place quality gates and data quality checks at the logically correct stage.
Drastically improve the maintainability of your Qlik applications.

Time investment: 25 min reading + 4 hrs hands-on
Quick win: In 30 minutes you will have your first layers structured and dependencies clearly separated.

What is the principle of Separation of Concerns in the three-stage architecture?

The monolith problem: Everything in one script – load statements, complex calculations, and the final data model are all mixed together. A bug fix in the ETL process requires reloading source data and risks destabilizing the entire data model. Testing is virtually impossible!

The solution: Three clearly separated layers (scripts or blocks) that communicate via QVD files. Each layer has its own responsibility, is independently testable, and the QVDs serve as stable interfaces. This is the culmination of modular development (Subroutines and Includes).

What are the three stages and their responsibilities in Qlik?

Stage	Main Task	QVD Strategy & Output
1. Staging (Extract)	Secure raw data, 1:1 copy of the source. Fastest possible load.	`STG_*.qvd` (Raw Data Cache)
2. Transform	Data cleansing, applying business logic, standardization, quality checks.	`TRF_*.qvd` (Clean, Business-Ready Data)
3. Model (App)	Create the dimensional model (star schema), performance optimization.	Facts/Dims directly in app memory

What is the Staging Layer (Raw Extract) in the three-stage architecture?

The staging layer is the gateway to source data. Its main goal is speed and data preservation. The staging layer is also the ideal place to implement incremental loading, since delta loads work most efficiently when applied directly to the raw data extraction.

//============================================
// STAGING LAYER: Raw data preservation
// Input: Database/API/Filesystem | Output: STG_*.qvd
//============================================

STG_Sales:
LOAD
    OrderID,
    CustomerID,
    OrderDate, // Date loaded as number
    Amount
FROM; // Data source

// Save: Fast QVD cache (foundation for Incremental Loading)
STORE STG_Sales INTO (qvd);
DROP TABLE STG_Sales;

Key rules in Stage 1:

1:1 mapping: No calculations, no Date(), Year(), or If() statements. Data is loaded exactly as it comes from the source.
Speed: After the initial load, data access happens exclusively through QVDs. This is the foundation for maximum load performance.
Checks: Only structural quality checks take place here (row count, file access).

What is the Transform Layer (Business Logic) in the three-stage architecture?

This is the heart of the ETL process, where data is transformed into an analytical format.

//============================================
// TRANSFORM LAYER: Business Logic
// Input: STG_*.qvd | Output: TRF_*.qvd
//============================================

TRF_Sales:
LOAD
    OrderID,
    CustomerID,
    // Data transformations: Date values are formatted here
    Date(OrderDate) as OrderDate,
    Year(OrderDate) as Year,
    // Logic: Flag calculations (Flag-based Modeling)
    If(Amount > $(vHighValueThreshold), 1, 0) as IsHighValue,
    // Quality flags
    If(IsNull(CustomerID) OR Len(Trim(CustomerID)) = 0, 1, 0) as DQ_MissingCustomer

FROM (qvd); // Load from Stage 1 QVD

// Save: Transformed QVD (foundation for the final model)
STORE TRF_Sales INTO (qvd);
DROP TABLE TRF_Sales;

Key rules in Stage 2:

Standardization: All date formats, currencies, and numeric fields are unified here.
Business logic: All complex If() statements, field categorizations, and quality flag assignments (DQ_MissingCustomer) happen here.
Independence: This layer loads exclusively from STG_*.qvd files.

How does the Model Layer (Dimensional Model) work in the three-stage architecture?

The model layer, often executed as the final Qlik Sense app script, is solely responsible for creating the analytically optimized data model (typically a star schema).

//============================================
// MODEL LAYER: Final dimensional model
// Input: TRF_*.qvd | Output: App Memory
//============================================

// FACTS TABLE: Load only valid data and select fields
Facts_Sales:
LOAD
    OrderID,
    CustomerID,
    ProductID,
    Amount,
    IsHighValue
FROM (qvd)
WHERE DQ_MissingCustomer = 0 // Filter out erroneous data
  AND DQ_InvalidAmount = 0;

// DIMENSIONS
Dim_Customer:
LOAD
    CustomerID,
    CustomerName,
    Country
FROM (qvd);

Key rules in Stage 3:

Optimized loads: No transformations are performed. This enables optimized QVD loads, which massively reduce load times.
Quality filtering: Final data selection happens here by filtering out erroneous records using the quality flags created in Stage 2.
Model hygiene: Focus on avoiding synthetic keys and correctly linking tables.

How should the folder structure and naming conventions be organized?

A clear folder structure and naming convention is essential for the maintainability of the three-stage approach.

QlikProject/
│
├── Source/         // Source files (Stage 1 input)
│
├── QVD/
│   ├── Staging/    // STG_*.qvd (Stage 1 output / Stage 2 input)
│   └── Transform/  // TRF_*.qvd (Stage 2 output / Stage 3 input)
│
└── Apps/           // QVF applications (Stage 3 script)

Naming Conventions:

Prefix	Layer	Responsibility
`STG_`	Staging	Raw data, cache.
`TRF_`	Transform	Cleansed, enriched data.
`Facts_`	Model (Facts)	Fact values in star schema.
`Dim_`	Model (Dimensions)	Dimensions in star schema.

What performance and maintenance benefits does the three-stage architecture offer?

The architecture delivers massive benefits in production:

Stability: When the source database changes (Stage 1), only the staging layer needs to be adjusted and reloaded. Stage 2 and 3 can continue the transformation using the preserved STG_*.qvd files.

Efficiency: Faulty transformations (Stage 2) don’t require re-querying the slow source database, since the raw data is already in the fast QVD cache (Stage 1). This dramatically speeds up debugging.

Reusability: The TRF_*.qvd files (Stage 2 output) can serve as a central, cleansed data source for dozens of final Qlik apps. A TRF_Customer.qvd only needs to be transformed once but can be loaded by any app.

How do you optimize QVD per layer in the three-stage architecture?

Performance in the model layer depends heavily on how the TRF_*.qvd files are loaded. For a deep dive into QVD file formats and optimization techniques, see the QVD layer optimization guide.

How does Model: Optimized Load (100x faster) work in Qlik architecture?

Since the model layer performs no transformations or calculations, it should always load QVDs in optimized mode.

// Optimized QVD load (very fast!)
Dim_Customer:
LOAD CustomerID, CustomerName
FROM (qvd);
// Only works when there are no calculations or transformations in this LOAD statement.

If you need transformations in the model layer (e.g. a calculation with If()), the load becomes non-optimized and therefore much slower. The goal is to keep such operations in Stage 2.

What are Quality Gates per layer in the three-stage architecture?

Thanks to the clear separation, quality gates can be placed exactly where they make the most sense:

// STAGING: Structural checks and error recovery
CALL CheckFileExists('SourceSales.csv');
CALL CheckRowCount('STG_Sales', 1000); // See Error Handling

// TRANSFORM: Business rules and data cleansing
CALL CheckQuality('TRF_Sales', 'DQ_MissingCustomer'); // See Data Quality Gates
CALL CalculateQualityScore('TRF_Sales');

// MODEL: Model hygiene
CALL CheckOrphanedRecords('Facts_Sales', 'CustomerID');
CALL CheckSyntheticKeys; // Check for data model issues

What are the best practices for the three-stage architecture in Qlik?

For additional ETL architecture inspiration and community-tested patterns, see this ETL architecture patterns on Qlik Community discussion.

Layer Separation:

[ ] Staging = Raw Extract (no Date(), If()).
[ ] Transform = Business Logic and Quality Flags.
[ ] Model = Star Schema, Optimized Loads.

Naming & Structure:

[ ] Consistent use of prefixes STG_, TRF_, Facts_, Dim_.
[ ] Separate QVD folders (/Staging/ and /Transform/).

Dependencies:

[ ] Never create a “skip-layer” dependency (e.g. Model loads directly from source CSV or from STG_*.qvd).
[ ] TRF_ may only load from STG_. Model may only load from TRF_.

How do you troubleshoot three-stage architecture issues in Qlik?

Problem: Performance is still poor despite three-stage architecture

Cause 1: Non-optimized loads in the model layer

The most common cause is an accidental transformation (e.g. an unnecessary Trim()) in the model layer. Any transformation makes the load slower.

Solution: Ensure that ALL cleansing and transformations (including Date() or Trim()) happen in the Transform Layer, so Stage 3 can perform pure optimized loads.

Problem: Debugging is difficult because source data is missing

Solution: Use Stage 1 QVDs

When a problem occurs in Stage 2 (e.g. a duplicate), you don’t need to re-query the slow source database. Simply load the preserved STG_*.qvd files into a temporary debug app and analyze the raw data without putting load on the source system again.

Problem: Data model is messy (synthetic keys, circular references)

Solution: Isolate Stage 3

Ensure that the model layer strictly adheres to star schema principles. Since it only loads from TRF_*.qvd, model errors are often easier to trace back to the linking pattern.

What are the next steps in the three-stage architecture course?

The three-stage architecture is the foundation for all scalable Qlik solutions! Next, we dive deeper into the details:

1. Data security: Access restrictions at the app level. Read the next article: Section Access.

2. Performance tuning: Learn how to maximize QVD optimization and which additional performance strategies (like best practices) to apply within the three-stage model.

Once your three-stage architecture is in place, the next challenge is deploying it reliably across environments. Learn how to set up a deployment pipeline with DEV/TEST/PROD separation and Git version control.

What related topics are covered in the three-stage architecture course?

QVD Optimization – Essential for Stage 3 optimized loads.
Data Quality Gates – Implementing checks in Stage 1 & 2.
Incremental Loading – Most efficiently implemented in Stage 1.

What would help you most right now?

Thanks!