LEARNING PATHS & COURSES

Star Schema in Qlik Sense: The Complete Guide to Performance & Clarity

KlarMetrics

October 5, 2025 · 17 min read

This is Article 10 of the Qlik Sense Data Modeling Course.

📚 Qlik Sense Course – Article 10 of 28

Previous Article: Resolving Synthetic Keys & Circular References
Next Article: Fact vs Dimension – Design Decisions

What can you learn about Star Schema in Qlik in 25 minutes?

After this guide you will be able to:

  1. Implement star schema architecture in Qlik for 3-8x better expression performance
  2. Make fact vs dimension decisions that reduce memory consumption by 40-60%
  3. Model clean business hierarchies without circular references or synthetic keys

Time investment: 25 min reading + 4 hrs hands-on
Prerequisite: Knowledge of Synthetic Keys & Circular References and basic table relationships
Quick win: In 15 minutes, build a working fact-dimension model with immediate performance improvement

The Star schema (Wikipedia) is a widely adopted pattern in dimensional modeling. For Qlik-specific guidance, see the Qlik data modeling best practices.

How do you get started with your first Star Schema in Qlik?

The problem: Your revenue data is in denormalized form (one large table) and expressions get slower as data volumes grow.

Solution in 4 steps:

  1. Identify transactional data (facts) vs. descriptive data (dimensions)
  2. Extract dimensions into separate tables with unique keys
  3. Keep only keys and measures in the fact table
  4. Leverage Qlik’s automatic associations between facts and dimensions
// Starting point: Denormalized revenue table
SalesFlat:
LOAD * FROM [lib://Data/sales_flat.csv];
// Problems: 10M rows x 25 columns = redundant data, slow aggregations

// Star Schema: Fact table (only transactions + keys)
Sales_Fact:
LOAD
    SalesID,
    CustomerID,    // FK to Customer Dimension
    ProductID,     // FK to Product Dimension
    DateID,        // FK to Date Dimension
    SalesAmount,   // Measure
    Quantity,      // Measure
    Discount       // Measure
FROM [lib://Data/sales_flat.csv];

// Customer Dimension (descriptive attributes)
Customer_Dim:
LOAD DISTINCT
    CustomerID,           // PK
    CustomerName,
    CustomerSegment,
    CustomerRegion,
    CustomerCountry
FROM [lib://Data/sales_flat.csv];

// Product Dimension (descriptive attributes)
Product_Dim:
LOAD DISTINCT
    ProductID,            // PK
    ProductName,
    ProductCategory,
    ProductSubCategory,
    ProductBrand,
    ProductUnitPrice
FROM [lib://Data/sales_flat.csv];

// Date Dimension (time hierarchies)
Date_Dim:
LOAD DISTINCT
    Date(SalesDate) as DateID,    // PK
    Year(SalesDate) as Year,
    Month(SalesDate) as Month,
    Week(SalesDate) as Week,
    WeekDay(SalesDate) as WeekDay
FROM [lib://Data/sales_flat.csv];

Explanation:

  • Fact table: Contains only transaction keys and numeric measures. High granularity, many rows.
  • Dimension tables: Contain descriptive attributes for facts. Low granularity, few rows.
  • Automatic associations: CustomerID in both tables → Qlik links them automatically without JOINs.

Performance: 10M rows: Denormalized 45s expression time → Star Schema 8s expression time (5.6x speedup)

Checkpoint: Do you see a central fact table with multiple dimension tables around it in the data model? → Perfect! Still one big table? → Troubleshooting

For systematic modeling, see Fact vs Dimension Design.

Which Star Schema pattern fits your project?

Choose the star schema variant based on data volume, business complexity, and performance requirements:

Pattern Fact Rows Dimensions Complexity Performance Ideal for
Simple Star < 5M 3-8 dims Low Good Standard BI, dashboards, prototypes
Multiple Facts 5M – 50M 5-15 dims Medium Very good Enterprise BI, different granularities
Conformed Dims > 50M 10-30 dims High Optimal Data warehouse, multi-subject-area, enterprise

Legend:

  • Fact Rows: Number of transaction rows in the largest fact table
  • Dimensions: Number of separate dimension tables in the model
  • Complexity: Development effort and maintenance overhead
  • Performance: Expression speed and memory efficiency

How does Method 1: Simple Star Schema work in Qlik?

The problem

Flat table structures work for small data volumes but quickly become a performance problem as data grows. Redundant data consumes memory and aggregations must iterate over all columns instead of just relevant facts.

The solution

A central fact table (transactional data) surrounded by multiple dimension tables (master data). Each dimension has a primary key, facts have foreign keys to the dimensions.

// Step 1: Define fact table (granularity = one sales record)
Sales_Fact:
LOAD
    SalesID,              // Primary key of the transaction
    CustomerID,           // FK → Customer_Dim
    ProductID,            // FK → Product_Dim
    Date(SalesDate) as DateID,  // FK → Date_Dim
    EmployeeID,           // FK → Employee_Dim
    SalesAmount,          // Measure: What gets aggregated
    Quantity,             // Measure: What gets aggregated
    Discount,             // Measure: What gets aggregated
    CommissionRate        // Measure: What gets aggregated
FROM [lib://Data/sales_transactions.csv];

// Step 2: Customer Dimension (all customer attributes)
Customer_Dim:
LOAD DISTINCT
    CustomerID,           // Primary Key
    CustomerName,
    CustomerSegment,      // Here: Premium, Standard, Basic
    CustomerRegion,
    CustomerCountry,
    CustomerCreditLimit,
    CustomerSignupDate
FROM [lib://Data/customers.csv];  // Separate master data source

// Step 3: Product Dimension (all product attributes)
Product_Dim:
LOAD DISTINCT
    ProductID,            // Primary Key
    ProductName,
    ProductCategory,      // Here: Electronics, Clothing, Books
    ProductSubCategory,
    ProductBrand,
    ProductUnitPrice,
    ProductCost,
    ProductLaunchDate
FROM [lib://Data/products.csv];   // Separate master data source

// Step 4: Employee Dimension (all employee attributes)
Employee_Dim:
LOAD DISTINCT
    EmployeeID,           // Primary Key
    EmployeeName,
    EmployeeDepartment,   // Here: Sales, Marketing, Support
    EmployeeTitle,
    EmployeeHireDate,
    EmployeeManager
FROM [lib://Data/employees.csv];  // Separate master data source

// Step 5: Date Dimension (time hierarchies for better navigation)
Date_Dim:
LOAD
    DateID,
    Year(DateID) as Year,
    Month(DateID) as Month,
    MonthName(DateID) as MonthName,
    Quarter(DateID) as Quarter,
    Week(DateID) as Week,
    WeekDay(DateID) as WeekDay,
    Day(DateID) as Day
RESIDENT Sales_Fact;
// Here: Generated from sales data, alternatively: Master Calendar

Explanation of strategy:

  • Fact granularity: One record per transaction/event. Contains only keys (for joins) and measures (for aggregation).
  • Dimension denormalization: All attributes of a business entity in one table (Customer*). Few rows, many columns.
  • Natural keys: Business keys (CustomerID, ProductID) instead of technical surrogate keys for better understandability.

Performance context:

  • 5M facts + 4 dimensions: Sum(SalesAmount) 3-5s vs denormalized 15-25s
  • Memory reduction: 60-80% less RAM through eliminated redundancy
  • Development speed: 2-3x faster expression development through clear structure
The 3 most common mistakes (and solutions)

Mistake 1: Measures in dimension tables

  • Symptom: Aggregations produce wrong totals, measures are double-counted
  • Cause: Numeric values (prices, amounts) were placed in dimensions instead of facts
  • Solution: Rule: What gets aggregated belongs in facts, what describes belongs in dimensions
  • Code: ProductUnitPrice in Product_Dim is OK, SalesAmount only in Sales_Fact

Mistake 2: Over-granular dimensions (snowflake instead of star)

  • Symptom: Many small dimension tables, complex joins, poor performance
  • Cause: Region → Country → Continent as separate tables instead of denormalized
  • Solution: Denormalize hierarchies into one dimension table
  • Code: Customer_Dim with CustomerCountry AND CustomerContinent, not a separate Country_Dim

Mistake 3: Business logic in facts instead of dimensions

  • Symptom: Calculated fields everywhere in the code, hard to maintain, inconsistent
  • Cause: Derived attributes (CustomerSegment, ProductCategory) are calculated in expressions
  • Solution: Move business logic to the script, store results in dimensions
  • Code: If(SalesAmount > 10000, 'Premium', 'Standard') as CustomerSegment in Customer_Dim

Best Practices

  • Fact table granularity rule: One row = one business transaction. Never pre-aggregated (Qlik does that at runtime). Test: Can you reconstruct an invoice/order/event from every fact row? Then granularity is correct.
  • Dimension completeness: Every dimension must contain ALL possible attributes of the business entity. Don’t spread them across multiple tables. CustomerSegment, CustomerRegion, CustomerCredit all in Customer_Dim.
  • Surrogate vs natural keys: Use business keys (CustomerID from ERP) instead of technical ones (AutoIncrement). Better debuggability and business users can understand the data. Surrogate keys are only needed for Slowly Changing Dimensions.
  • Date dimension standardization: Always create a separate date dimension with complete hierarchies (Year, Quarter, Month, Week). Never parse dates in expressions. Prepare centralized date logic (fiscal year, holiday calendar) in the dimension.

Checkpoint: Why should hierarchies (Region → Country → Continent) be in one dimension table instead of separate tables?

Show answer

Separate hierarchy tables create a snowflake schema → more joins → worse performance. Qlik optimizes for star schemas with few large dimensions. Denormalized hierarchies in one table also enable better drill-down performance and simpler expression syntax.

For very complex multi-subject areas, see Multiple Facts Pattern.

→ Next step: When you have different granularities (order-level and order-line-level), you need Multiple Facts. Implement Multiple Facts Star Schema.

How does Method 2: Multiple Facts Star Schema work in Qlik?

The problem

Business processes have different granularities: Order header (1x per order) vs order lines (Nx per order) vs payments (Mx per order). A single fact table cannot optimally answer all business questions.

The solution

Multiple fact tables with different granularities that share common dimension tables. Each fact table is optimized for specific business questions.

// Fact 1: Order Header (granularity = one order)
OrderHeader_Fact:
LOAD
    OrderID,              // PK for this fact
    CustomerID,           // FK → Customer_Dim
    EmployeeID,           // FK → Employee_Dim
    Date(OrderDate) as DateID,  // FK → Date_Dim
    OrderTotalAmount,     // Measure: Total order value
    OrderShippingCost,    // Measure: Shipping costs
    OrderDiscount,        // Measure: Total discount
    OrderItemCount        // Measure: Number of line items
FROM [lib://Data/orders.csv];

// Fact 2: Order Lines (granularity = one order line item)
OrderLine_Fact:
LOAD
    OrderLineID,          // PK for this fact
    OrderID,              // FK → OrderHeader_Fact (drill-through)
    ProductID,            // FK → Product_Dim
    CustomerID,           // FK → Customer_Dim (redundant but performance-optimal)
    Date(OrderDate) as DateID,  // FK → Date_Dim (redundant)
    LineQuantity,         // Measure: Quantity for this line
    LineUnitPrice,        // Measure: Unit price
    LineAmount,           // Measure: Line total
    LineDiscountPercent   // Measure: Discount %
FROM [lib://Data/order_lines.csv];

// Fact 3: Payments (granularity = one payment)
Payment_Fact:
LOAD
    PaymentID,            // PK for this fact
    OrderID,              // FK → OrderHeader_Fact
    CustomerID,           // FK → Customer_Dim (redundant)
    Date(PaymentDate) as PaymentDateID,  // FK → Date_Dim
    PaymentAmount,        // Measure: Payment amount
    PaymentMethod,        // Dimension: Credit Card, Bank Transfer
    PaymentStatus         // Dimension: Completed, Pending, Failed
FROM [lib://Data/payments.csv];

// Shared Dimensions (used by all facts)
Customer_Dim:
LOAD DISTINCT
    CustomerID,           // Shared PK
    CustomerName,
    CustomerSegment,
    CustomerRegion
FROM [lib://Data/customers.csv];

Product_Dim:
LOAD DISTINCT
    ProductID,            // Shared PK
    ProductName,
    ProductCategory,
    ProductBrand
FROM [lib://Data/products.csv];

// Advanced: Date Dimension with role-playing for different date types
Date_Dim:
LOAD
    DateID,
    Year(DateID) as Year,
    Month(DateID) as Month,
    Quarter(DateID) as Quarter,
    WeekDay(DateID) as WeekDay
// Generates all dates from min to max across all facts
WHERE DateID >= Date(Floor(MinDate)) AND DateID <= Date(Ceil(MaxDate));

Explanation of strategy:

  • Granularity optimization: Each fact at the optimal level for typical business questions.
  • Controlled redundancy: CustomerID in all facts for performance, even though derivable via OrderID.
  • Shared dimensions: One Customer_Dim for all facts → consistent navigation and filtering.

Performance context:

  • Order-level analysis (revenue per customer): OrderHeader_Fact 2-4s vs OrderLine_Fact 8-15s
  • Product-level analysis (units sold): OrderLine_Fact 3-6s vs OrderHeader_Fact impossible
  • Payment analysis (collection performance): Payment_Fact 1-3s vs calculated from orders 10-20s
The 3 most common mistakes (and solutions)

Mistake 1: Inconsistent dimensions between facts

  • Symptom: CustomerSegment in OrderHeader_Fact differs from Payment_Fact
  • Cause: Each fact table pulls customer data from different sources/timestamps
  • Solution: One central Customer_Dim table, all facts reference it
  • Code: Never load customer attributes directly into facts, always via Customer_Dim

Mistake 2: Over-normalization – too many granular facts

  • Symptom: 15+ fact tables for every mini-process, complex expressions
  • Cause: Every database table becomes a separate fact without business context
  • Solution: Consolidate facts by business processes, not technical tables
  • Code: OrderLines + OrderCharges + OrderTaxes → one OrderLine_Fact with calculated totals

Mistake 3: Missing bridge tables for many-to-many

  • Symptom: Wrong aggregations for customers with multiple addresses or products with multiple categories
  • Cause: M:N relationships were not modeled as separate link tables
  • Solution: Bridge tables for M:N, allocation factors for correct distribution
  • Code: Customer_Address_Bridge with weight factors for primary/secondary address

Best Practices

  • Fact consolidation rule: Maximum 5-8 fact tables per subject area. More confuses business users and complicates cross-fact analysis. Rule: One fact per major business process (Sales, Purchases, Production, Service).
  • Redundant foreign keys: CustomerID in all facts even when derivable via OrderID. Performance gain outweighs normalization. Exception: High-cardinality keys (OrderID in Payment) only when truly needed.
  • Date role-playing: One Date_Dim but multiple roles (OrderDate, ShipDate, PaymentDate). Use aliases in expressions: Sum({<OrderDate={'2024'}>} Amount) vs Sum({<PaymentDate={'2024'}>} Amount).
  • Fact-to-fact relationships: Explicitly document how facts relate (Order → OrderLine → Payment). Bridge tables for M:N, drill-through keys for 1:N. Never create direct associations between facts without a bridge.

Checkpoint: Why should CustomerID be in both OrderHeader_Fact and OrderLine_Fact, even though it’s derivable via OrderID?

Show answer

Performance and flexibility. Direct CustomerID in OrderLine_Fact enables fast customer-to-product analysis without joining through OrderHeader. Qlik can also perform better indexing and memory optimization. The memory overhead is minimal compared to the performance gain for complex expressions.

For enterprise-scale with 50+ facts, see Conformed Dimensions Pattern.

→ Next step: For enterprise environments with many subject areas, you need Conformed Dimensions. Implement Conformed Dimensions.

What are Conformed Dimensions in Method 3?

The problem

Enterprise environments have multiple subject areas (Sales, Marketing, Finance, Operations) with their own fact tables. Without coordination, inconsistent dimension definitions emerge → business users see different “truths” depending on the data area.

The solution

Central, standardized dimension tables shared by all subject areas. Conformed dimensions ensure a unified business view and enable cross-subject-area analysis.

// Conformed Dimension 1: Master Customer (for all subject areas)
Customer_Conformed:
LOAD
    CustomerID,                    // Standardized Business Key
    CustomerName,
    CustomerSegment,               // Consistent: Premium, Gold, Silver, Bronze
    CustomerRegion,                // Consistent: North, South, East, West
    CustomerCountry,               // Consistent: ISO Country Codes
    CustomerTier,                  // Business Rule: Based on 12-month Revenue
    CustomerSignupDate,
    CustomerStatus,                // Active, Inactive, Suspended
    CustomerCreditRating,          // A, B, C, D (Enterprise-wide Standard)
    CustomerIndustryCode,          // NAICS Standard Classification
    // Calculated fields for enterprise consistency
    If(CustomerTier = 'Enterprise', 1, 0) as IsEnterpriseCustomer,
    If(Year(CustomerSignupDate) = Year(Today()), 1, 0) as IsNewCustomer
FROM [lib://MasterData/Customer_Master.qvd] (qvd);

// Subject Area 1: Sales Facts
Sales_Fact:
LOAD
    SalesID,
    CustomerID,                    // References Customer_Conformed
    ProductID,                     // References Product_Conformed
    Date(SalesDate) as DateID,     // References Date_Conformed
    SalesRepID,                    // References Employee_Conformed
    SalesAmount,
    SalesQuantity,
    SalesDiscount
FROM [lib://Sales/sales_transactions.qvd] (qvd);

// Subject Area 2: Marketing Facts
Campaign_Fact:
LOAD
    CampaignID,
    CustomerID,                    // Same Customer_Conformed as Sales
    Date(CampaignDate) as DateID,  // Same Date_Conformed as Sales
    CampaignType,
    CampaignCost,
    CampaignImpressions,
    CampaignClicks,
    CampaignConversions
FROM [lib://Marketing/campaigns.qvd] (qvd);

// Subject Area 3: Service Facts
Service_Fact:
LOAD
    ServiceTicketID,
    CustomerID,                    // Same Customer_Conformed
    ProductID,                     // Same Product_Conformed
    Date(ServiceDate) as DateID,   // Same Date_Conformed
    ServiceRepID,                  // References Employee_Conformed
    ServiceType,
    ServiceDuration,
    ServiceCost,
    ServiceSatisfactionScore
FROM [lib://Service/service_tickets.qvd] (qvd);

// Conformed Product Dimension (shared across all subject areas)
Product_Conformed:
LOAD
    ProductID,                     // Standardized across all systems
    ProductName,
    ProductCategory,               // L1: Electronics, Clothing, Books
    ProductSubCategory,            // L2: Smartphones, Laptops, Accessories
    ProductBrand,
    ProductLaunchDate,
    ProductStatus,                 // Active, Discontinued, Phase-Out
    ProductProfitMargin,           // Calculated by Finance team
    // Calculated business rules
    If(ProductProfitMargin > 0.4, 'High Margin',
       If(ProductProfitMargin > 0.2, 'Medium Margin', 'Low Margin')) as ProductProfitSegment
FROM [lib://MasterData/Product_Master.qvd] (qvd);

// Conformed Date Dimension (enterprise calendar)
Date_Conformed:
LOAD
    DateID,
    Year(DateID) as Year,
    Month(DateID) as Month,
    Quarter(DateID) as Quarter,
    Week(DateID) as Week,
    WeekDay(DateID) as WeekDay,
    // Enterprise-specific business calendar
    If(Month(DateID) >= 7, Year(DateID), Year(DateID)-1) as FiscalYear,  // Fiscal year starts July
    If(WeekDay(DateID) >= 1 and WeekDay(DateID) <= 5, 1, 0) as IsBusinessDay,
    If(Match(DateID, '2024-01-01', '2024-07-04', '2024-12-25'), 1, 0) as IsHoliday
FROM [lib://MasterData/Calendar_Master.qvd] (qvd);

// Conformed Employee Dimension
Employee_Conformed:
LOAD
    EmployeeID,
    EmployeeName,
    EmployeeDepartment,            // Sales, Marketing, Service, Finance
    EmployeeTitle,
    EmployeeRegion,                // Same regions as Customer_Conformed
    EmployeeHireDate,
    EmployeeManager,
    EmployeeStatus                 // Active, On Leave, Terminated
FROM [lib://MasterData/Employee_Master.qvd] (qvd);

Explanation of strategy:

  • Single source of truth: One Customer_Conformed for Sales, Marketing, Service → unified customer view.
  • Cross-subject analysis: Customer revenue (Sales) vs marketing spend (Campaigns) vs service cost (Support) → ROI analysis possible.
  • Business rule centralization: Fiscal year definition, customer tier logic centralized in dimensions → no code duplication.

Performance context:

  • Cross-subject expressions: Customer LTV (3 facts) 8-15s vs separate models 45-90s
  • Consistent aggregations: CustomerSegment definitions identical across all apps
  • Memory efficiency: Shared dimensions 40-60% less memory than duplicate dimensions
The 3 most common mistakes (and solutions)

Mistake 1: Dimension proliferation – too many “conformed” variants

  • Symptom: Customer_Sales, Customer_Marketing, Customer_Service as separate dimensions
  • Cause: Each team creates their own customer definition because the shared one is too complex
  • Solution: One true Customer_Conformed + subject-area-specific attributes in facts or bridge tables
  • Code: Customer_Marketing_Attributes as a separate table, not Customer_Marketing_Dim

Mistake 2: Governance vacuum – no central dimension ownership

  • Symptom: Conformed dimensions are inconsistently modified by different teams
  • Cause: No clear responsibility for master data management
  • Solution: Data governance board defines dimension owner per conformed dimension
  • Code: Customer_Conformed owned by Sales Ops, Product_Conformed owned by Product Management

Mistake 3: Over-conforming – too many attributes in conformed dimensions

  • Symptom: Customer_Conformed has 50+ columns, becomes unwieldy and slow
  • Cause: Each subject-area team adds “their” customer attributes
  • Solution: Core conformed (10-15 attributes) + subject-area extensions (separate tables)
  • Code: Customer_Core_Conformed (Name, Segment, Region) + Customer_Sales_Extension (Territory, Rep)

Best Practices

  • Dimension ownership matrix: Explicitly define who maintains which conformed dimension (Customer → Sales Ops, Product → Product Mgmt, Employee → HR). Change requests go through the dimension owner. Prevents chaos and inconsistencies.
  • Core vs extended attributes: Conformed dimensions contain only universally relevant attributes (Name, Status, Hierarchy). Subject-area-specific attributes in extension tables or directly in facts. Keeps dimensions focused and maintainable.
  • Version control for dimensions: Conformed dimensions are critical enterprise assets. Git-based version control, change approval process, rollback capability. Track every change with business justification.
  • Cross-subject validation: Automated tests that customer revenue (Sales) + customer cost (Service) + customer acquisition cost (Marketing) = consistent customer P&L. Validate business logic consistency across subject areas.

Checkpoint: Why should fiscal year definitions be in Date_Conformed instead of calculated separately in each app?

Show answer

Consistency and performance. If the Sales app calculates FiscalYear differently than the Finance app, you get different “YTD” numbers → business confusion. Central definition in the conformed dimension ensures enterprise-wide consistency. Performance bonus: Calculation happens once in ETL instead of on every expression.

For complete enterprise implementation, see Data Governance Frameworks.

→ Next step: Conformed dimensions require enterprise data governance. Data Governance Framework implements standards and quality assurance.

How do all Star Schema patterns compare in performance?

These benchmarks are based on typical enterprise data volumes (standard hardware: 32 GB RAM, SSD, optimized QVD loads).

Pattern Expressions (5M Facts) Cross-Subject Analysis Memory (Peak) Development Speed Maintenance Enterprise-Ready
Denormalized Flat 15-25s Not possible 3.2 GB Fast High No
Simple Star 3-6s Limited 1.8 GB Medium Medium For smaller orgs
Multiple Facts 2-4s Good 2.1 GB Medium Medium-High Yes
Conformed Dims 2-3s Excellent 1.9 GB Slow Low Optimal

Legend:

  • Expressions: Average time for Sum(Amount) by Customer, Product for 5M fact rows
  • Cross-subject analysis: Ability to calculate customer LTV across Sales + Marketing + Service
  • Memory (Peak): Maximum RAM usage during load + first expression calculation
  • Development speed: Time from requirements to working app
  • Maintenance: Long-term effort for changes, new requirements, debugging

Interpretation: Conformed Dimensions have the highest initial complexity but best long-term performance and maintainability. Multiple Facts is the sweet spot for mid-size enterprise environments. Simple Star for teams that want to get started quickly.

How do you troubleshoot common Star Schema issues in Qlik?

Alphabetical index:


How do aggregations affect totals in Star Schema?

Symptom: Sum(SalesAmount) in star schema differs from the original flat table

Most common causes:

  1. Measures in dimensions instead of facts → Solution: Numeric values only in fact tables, dimensions only for descriptive attributes
  2. Many-to-many without bridge table → Solution: Model M:N relationships via separate bridge tables
  3. Duplicate records in facts → Solution: DISTINCT or GROUP BY in fact load, check for true duplicates

Debug steps:

  1. Compare row count: TRACE NoOfRows('Sales_Fact'); vs original table
  2. Test with simple Sum(SalesAmount) without dimensions → should be identical
  3. Check whether measures accidentally ended up in dimension tables

Code fix:

// Wrong: Amount in Dimension
Customer_Dim: LOAD CustomerID, CustomerName, CustomerRevenue FROM customers.csv;

// Correct: Only descriptive attributes in Dimensions
Customer_Dim: LOAD CustomerID, CustomerName, CustomerSegment FROM customers.csv;
// Revenue calculated from Facts: Sum(SalesAmount)

How do you connect facts to dimensions in Star Schema?

Symptom: Fact and dimension tables stand isolated in the data model, no associations

Most common causes:

  1. Key field name mismatch → Solution: CustomerID in facts must be named exactly CustomerID in dimensions
  2. Data type mismatch → Solution: String vs number keys cannot be automatically linked
  3. Key values not present → Solution: Orphaned records in facts without corresponding dimension records

Debug steps:

  1. Check field names: TRACE "Fact Fields: " & FieldName(1, 'Sales_Fact');
  2. Test key overlap: Count(DISTINCT CustomerID) in both tables
  3. Check data types with sample values

Code fix:

// Ensure consistent data types and field names
Sales_Fact:
LOAD
    Text(CustomerID) as CustomerID,  // Force to text
    SalesAmount
FROM sales.csv;

Customer_Dim:
LOAD
    Text(CustomerID) as CustomerID,  // Same type as Facts
    CustomerName
FROM customers.csv;

Why is performance worse after Star Schema conversion?

Symptom: Expressions take longer than before with the denormalized table

Most common causes:

  1. Too many small dimensions → Solution: Consolidate related dimensions (snowflake → star)
  2. Non-optimized QVD loads → Solution: Check that all tables load as optimized QVDs
  3. Over-normalization → Solution: Denormalize hierarchies into dimensions

Debug steps:

  1. Measure memory usage before/after: DocumentSize()
  2. Profiling: Which specific expressions have gotten slower?
  3. Check QVD load times: “optimized” vs “non-optimized” in the log

Code fix:

// Consolidate related dimensions
Customer_Dim:
LOAD
    CustomerID,
    CustomerName,
    CustomerRegion,    // Denormalized instead of separate Region_Dim
    CustomerCountry,   // Denormalized instead of separate Country_Dim
    CustomerContinent  // Full hierarchy in one table
FROM customers.csv;

See also: QVD Performance Optimization for memory-optimized star schemas

What are the next steps for Star Schema in Qlik?

You can now build performant, clean star schemas that meet enterprise requirements. For production-ready implementation, dive deeper into these topics:

1. Advanced modeling: Star schemas are the foundation for more complex patterns. Fact vs Dimension Design Decisions shows you how to make optimal business entity assignments.

2. Performance maximization: Clean models need optimized load patterns. QVD Optimization for Star Schemas achieves maximum memory efficiency and load speed.

3. Enterprise governance: Conformed dimensions require data governance. Enterprise Data Governance Framework establishes standards, quality gates, and change management for scalable star schema architectures.

What are the next steps in the Star Schema course?

Phases 1 & 2 complete! You have mastered the fundamentals (articles 1-5) and advanced data processing (articles 6-10).

Upcoming phases:

  • Phase 3 (Articles 11-15): Data Modeling Concepts
  • Phase 4 (Articles 16-20): Advanced Development
  • Phase 5 (Articles 21-28): Enterprise & Operations

Related topics:

Previous: Data Model Problems | Next: Fact vs Dimension Design

What would help you most right now?

Thanks!