Qlik QVDs and Why You Should Be Using Them
note: Hi hello and happy timezome, I’m gonna’ talk about QVDs in Qlik today. This tutorial assumes you have a basic familiarity with the Data Load Editor
What are QVDs?
Behind the scenes a QVD file contains three key components
- An XML header with metadata
- Symbol tables for efficient storage of field values
- The actual data an optimized format
note: QVDs are proprietary file formats created by Qlik. They serve as an intermediate storage format between your source systems (SQL Server, Excel Files, etc.) and your Qlik applications. [READ SLIDE]
Why Use QVDs?
- SPEED
- Reduced Server Load
- Data Transformation Layer
note: ONE—QVDs load data 10-100 times faster than most other data formats or direct connections to databases. QVDs are designed for Qlik, this means for any larger dataset this is the choice. TWO—By extracting data once and storing it as QVDs, it minimizes the load on source systems. Instead of multiple Qlik applications repeatedly querying your databases, they can access the pre-extracted QVDs. THREE—QVDs enable a structured ETL process with clear separation between extraction, transformation, and loading phases
An app before the app
Having a Qlik app to generate QVDs for your production app
note: Because QVDs are so powerful, this leads to the idea of having an app to generate data for the production app. This is a great practice as it has a separate, logical layer for all ETL steps, meaning the Production app can focus purely on the visuals and front-end, leading to less to maintain. This is a structured pipeline approach. That’s beauty
Optimized Loads Actions that COULD Break Them :)
These actions will break optimization and result in a standard load
- Concatenating fields together
- Applying functions to fields during the load, like as Date(), Upper(), or Num()
- Using WHERE clauses to filter the data
- Using MAPPING loads
note: QVDs allow for an ‘optimized load’, which can speed up reading in a large complicated file. I will say I have ran into trial and error with some of these rules, I went to an official Qlik presentation and they also talked about “Just play around and see what breaks it :)“
Practical Framework Example
- Load data from source
- Save data as QVD
- Load data from QVD
- Transform data
- Save data as QVD
- Load data from QVD
note: These 6 steps above are the ones I follow in my app before the app to generate QVDs for production. We’ll walk through each step one by one over in Qlik
1. Load data from source
// Extract from source system
random_data:
LOAD
first_name,
last_name,
integer_column,
float_column
FROM [lib://QVD_Tutorial/random_data_1M.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
TRACE Loaded .csv;
CALL AddTimestamp;
2. Save data as QVD (Extraction Layer)
// Store raw data as QVD
STORE random_data INTO [lib://QVD_Tutorial/random_data.qvd] (qvd);
DROP TABLE random_data;
TRACE Stored QVD;
CALL AddTimestamp;
3. Load data from QVD
// Load the raw QVD using optimized load
random_data:
LOAD *
FROM [lib://QVD_Tutorial/random_data.qvd] (qvd);
TRACE Loaded QVD;
CALL AddTimestamp;
4. Transform data
// Apply transformations
random_data_transformed:
NoConcatenate
LOAD
LOWER(first_name) as first_name,
UPPER(last_name) as last_name,
integer_column - 100 AS integer_column,
float_column
RESIDENT random_data;
DROP TABLE random_data;
5. Save transformed data as QVD
// Store transformed data as QVD
STORE random_data_transformed INTO [lib://QVD_Tutorial/random_data_transformed.qvd] (qvd);
DROP TABLE random_data_transformed;
TRACE Transformations;
CALL AddTimestamp;
6. Load data from transformed QVD for analysis
// Final load for the application
LOAD *
FROM [lib://QVD_Tutorial/random_data_transformed.qvd] (qvd);
TRACE Final Load;
CALL AddTimestamp;
kthxbye
QVDs:
- Dramatically faster data loading
- Reduced strain on source systems
- Structured data management approach
- Support for incremental loading strategies
note: By implementing the layered approach we’ve demonstrated, you create a robust data architecture that separates the E the T and L concerns. This leads to more maintainable applications, faster reload times, and better overall performance. This presentation was made using Obsidian’s Advanced Slides, and my Obsidian Vault with all my notes will be available in the description