This is Article 11 of the Qlik Sense Data Modeling Course.
📚 Qlik Sense Course – Article 11 of 28
← Previous Article: Star Schema in Qlik – Performance & Clarity
→ Next Article: Link Tables for Many-to-Many Relationships
What is the difference between Fact and Dimension? Facts contain transactional, measurable data (revenue, quantities) – they answer “What happened?”. Dimensions contain descriptive data (customer, product) – they answer “Who/What/Where/When?”.
What will you learn about Fact vs Dimension design decisions?
After this guide you will be able to:
- Decide whether data belongs in a fact or dimension table
- Define the right granularity for facts
- Design performance-optimal data models
Time investment: 15 min reading + 2 hrs hands-on
Prerequisite: Knowledge of Star Schema
Quick win: In 10 minutes you will know how to correctly classify any table
The decision: Fact or Dimension?
The problem: You have loaded a table with revenue data and you are unsure: Should you keep all fields in one table or split them into facts and dimensions?
The solution: Use a simple decision tree: Is the data transactional and changes frequently? → Fact. Does it describe entities and is relatively stable? → Dimension.
How do you apply the decision tree in practice?
Ask yourself these three questions:
1. Is the data transactional?
- Yes → Probably a fact table
- No → Continue to question 2
2. Does the data contain numeric values that get aggregated?
- Yes (Sum, Avg, Count) → Fact table
- No → Dimension table
3. How many rows does the table have?
- Millions of rows → Fact table
- Thousands to hundreds of thousands → Dimension table
How do you classify revenue data in an example?
// Original table: Sales.csv
// OrderID, CustomerID, CustomerName, Region, ProductID, ProductName,
// Category, Amount, Quantity, OrderDate
// What belongs where?
Analysis:
- OrderID: Unique key → Fact (Primary Key)
- CustomerID: Reference → Fact (Foreign Key to dimension)
- CustomerName, Region: Descriptive → Dimension
- ProductID: Reference → Fact (Foreign Key to dimension)
- ProductName, Category: Descriptive → Dimension
- Amount, Quantity: Measures for aggregation → Fact
- OrderDate: Transaction timestamp → Fact
How do you choose the right structure for Fact and Dimension?
// Fact table: Only transactions + keys + measures
Facts_Sales:
LOAD
OrderID, // Primary Key
CustomerID, // Foreign Key
ProductID, // Foreign Key
Date(OrderDate) as OrderDate, // Foreign Key to Date dimension
Amount, // Measure
Quantity // Measure
FROM [DataSales.csv]
(ooxml, embedded labels);
Explanation: The fact table contains only the skeleton of the transaction – keys and numbers. No descriptive text.
// Dimension: Customer attributes
Dim_Customer:
LOAD DISTINCT
CustomerID, // Primary Key
CustomerName,
Region,
Country,
City
FROM [DataSales.csv]
(ooxml, embedded labels);
Explanation: The dimension contains all information ABOUT the customer. Once per customer, not repeated with every order.
// Dimension: Product attributes
Dim_Product:
LOAD DISTINCT
ProductID, // Primary Key
ProductName,
Category,
SubCategory,
Brand
FROM [DataSales.csv]
(ooxml, embedded labels);
Explanation: Same here: All information ABOUT the product, once per product.
Performance tip: This split saves 40-60% memory! The names “Samsung Galaxy” and “Electronics” are not repeated millions of times but stored only once in the dimension.
How do you understand and choose the right granularity?
Granularity determines the “detail level” of your fact table. This is one of the most important design decisions!
Rule of thumb: Load data as detailed as possible (atomic level). You can always aggregate later – but you cannot reconstruct details from aggregated data!
What are the granularity levels in the Fact vs Dimension example?
Option 1: Order level
// One record = one order
Facts_Orders:
LOAD
OrderID, // Key
CustomerID,
OrderDate,
OrderTotalAmount, // Sum of all line items
OrderItemCount // Number of items
FROM Orders;
What you can analyze: Revenue per order, average order size
What you CANNOT do: Which products were purchased in this order?
Option 2: Order line level
// One record = one line item in an order
Facts_OrderLines:
LOAD
OrderLineID, // Unique key
OrderID, // Order reference
ProductID, // Which product?
CustomerID,
OrderDate,
LineAmount, // Amount for this line
LineQuantity // Quantity for this line
FROM OrderLines;
What you can analyze: Everything from Option 1 (through aggregation) PLUS product analyses, cross-selling, basket analyses
Trade-off: More rows = more memory (but maximum flexibility)
Option 3: Daily aggregate (pre-aggregated)
// One record = one product per day
Facts_DailySales:
LOAD
ProductID,
Date,
Sum(Amount) as DailyAmount,
Sum(Quantity) as DailyQuantity
FROM OrderLines
GROUP BY ProductID, Date;
What you can analyze: Daily revenue, trends
What you CANNOT do: Customer analyses, individual orders
Best practice: Start with the most detailed level (order line). If performance issues arise, you can always create an aggregated table additionally (!) later.
How do you avoid common mistakes in Fact vs Dimension design?
Mistake 1: Measures in dimensions
Symptom: Your totals don’t add up, numbers are counted multiple times.
Wrong:
Dim_Product:
LOAD
ProductID,
ProductName,
Category,
AveragePrice, // ← WRONG! This is a measure
TotalSold // ← WRONG! This is a measure
FROM Products;
Correct:
Dim_Product:
LOAD
ProductID,
ProductName,
Category,
StandardPrice // ← OK: A fixed attribute value, not an aggregate
FROM Products;
// In the frontend you then calculate:
// Average price: Avg(Price)
// Total sold: Sum(Quantity)
Why does this matter? If you put measures in dimensions, they get multiplied during joins and your totals are wrong.
Mistake 2: Too many fields in facts
Symptom: Synthetic keys ($Syn tables) in the data model, performance issues.
Wrong:
Facts_Sales:
LOAD
OrderID,
CustomerID,
CustomerName, // ← Redundant! Belongs in dimension
Region, // ← Redundant! Belongs in dimension
ProductID,
ProductName, // ← Redundant! Belongs in dimension
Amount
FROM Sales;
Correct:
Facts_Sales:
LOAD
OrderID,
CustomerID, // Only the ID
ProductID, // Only the ID
Amount
FROM Sales;
// Names and attributes come from the dimensions
Why does this matter? If CustomerName appears in both facts and the customer dimension, Qlik creates a synthetic key. This is inefficient and can lead to problems.
Mistake 3: Wrong granularity
Symptom: You cannot answer all business questions or you have exploding data volumes.
Problem: You load data at order level but need product details:
// Too coarse!
Facts_Orders:
LOAD
OrderID,
OrderTotalAmount // Only total, no product info
FROM Orders;
// Question: "Which products were sold the most?"
// Answer: Impossible to answer!
Solution: Load at the most detailed level needed:
// Properly detailed
Facts_OrderLines:
LOAD
OrderLineID,
OrderID,
ProductID, // Now you can analyze by product
LineAmount,
LineQuantity
FROM OrderLines;
How do you combine different granularities in advanced patterns?
Sometimes you have data at different detail levels. Example: Budget is at region level, sales at customer level.
The challenge: How do you compare budget (per region) with sales (per customer)?
How do you use Generic Keys as the solution?
// Sales at customer level
Facts_Sales:
LOAD
OrderID,
CustomerID & '|Customer' as SalesKey, // Generic key with type identifier
ProductID,
Date(OrderDate) as OrderDate,
Amount,
Quantity
FROM Sales;
// Budget at region level
CONCATENATE(Facts_Sales)
LOAD
BudgetID,
Region & '|Region' as SalesKey, // Generic key with type identifier
ProductID,
Date(BudgetMonth) as OrderDate,
BudgetAmount as Amount,
Null() as Quantity, // No quantity for budget
1 as IsBudget // Flag to differentiate
FROM Budget;
Explanation: By appending ‘|Customer’ or ‘|Region’, you make the keys unique. This allows both granularities to coexist.
// Extended customer dimension
Dim_Customer:
LOAD
CustomerID & '|Customer' as SalesKey, // Matching the fact
CustomerID,
CustomerName,
Region,
Country
FROM Customers;
// Add region level
CONCATENATE(Dim_Customer)
LOAD DISTINCT
Region & '|Region' as SalesKey, // Matching the budget fact
Region,
Country
RESIDENT Dim_Customer;
Explanation: The dimension now contains both customer entries (detailed) and region entries (aggregated). Both fact granularities can work with it.
Result: You can now compare sales and budget even though they have different granularities. Qlik aggregates correctly automatically!
How do your design decisions affect performance?
Your design decisions have a massive impact on performance. For a deeper dive into Star schema design patterns and how they relate to fact/dimension modeling, the foundational concepts apply directly to Qlik.
| Design | Load Time (1M rows) | Memory | Expression Speed |
|---|---|---|---|
| Star Schema (Facts + Dims) | 15-25 sec | ~150-200 MB | 0.5-2 sec |
| Denormalized (one table) | 8-12 sec | ~400-600 MB | 0.3-1 sec |
| Snowflake (over-normalized) | 25-40 sec | ~180-250 MB | 1-4 sec |
Recommendation: Star Schema is the best compromise – acceptable load time, optimal memory, good performance. For more details see Star Schema in Qlik. For additional guidance, refer to the official Qlik data modeling best practices.
Which naming conventions improve clarity in Fact vs Dimension?
Clear names help you and your team immediately recognize what is what:
// Always prefix facts with "Facts_"
Facts_Sales:
Facts_Inventory:
Facts_Budget:
// Always prefix dimensions with "Dim_"
Dim_Customer:
Dim_Product:
Dim_Calendar:
Benefits:
- Immediately recognizable in the data model viewer what are facts and what are dimensions
- Faster navigation when debugging
- New team members understand the structure instantly
Checklist: Is your design optimal?
Check your data model with these questions:
Facts:
- [ ] Contains only keys, measures, and timestamps?
- [ ] No redundant attributes (names, descriptions)?
- [ ] Granularity as detailed as needed for analyses?
- [ ] Measures are numeric and aggregatable?
Dimensions:
- [ ] Contains only descriptive attributes?
- [ ] No measures (Sum, Avg, Count)?
- [ ] Primary key is unique (no duplicates)?
- [ ] Hierarchies are denormalized (Region + Country + Continent in one table)?
- [ ] Facts and dims connect via IDs?
- [ ] No synthetic keys?
- [ ] No circular references?
What are the next steps for Fact vs Dimension design decisions?
You now have the foundations for clean fact/dimension designs. Next steps:
1. Many-to-many relationships: When a customer has multiple addresses or a product has multiple categories – how do you model that? Link Tables for complex scenarios shows you the solution.
2. Historization: A customer was previously “Bronze” and is now “Gold” – how do you keep the history? Slowly Changing Dimensions explains how.
3. Temporal data: How do you model validity periods and point-in-time analyses? Temporal Data & IntervalMatch is the next step.
What related topics are covered in the Fact vs Dimension course?
- Star Schema in Qlik – Performance & Clarity – The foundation for facts and dimensions
- Resolving Synthetic Keys & Circular References – Avoiding problems
- Link Tables for Many-to-Many Relationships – Modeling complex relationships
Previous: Star Schema | Next: Link Tables