Star Schema Benchmark (SSB, 2009)
The Star Schema Benchmark is roughly based on the TPC-H's tables and queries but unlike TPC-H, it uses a star schema layout.
The bulk of the data sits in a gigantic fact table which is surrounded by multiple small dimension tables.
The queries joined the fact table with one or more dimension tables to apply filter criteria, e.g. MONTH = 'JANUARY'
.
References:
- Star Schema Benchmark (O'Neil et. al), 2009
- Variations of the Star Schema Benchmark to Test the Effects of Data Skew on Query Performance (Rabl. et. al.), 2013
First, checkout the star schema benchmark repository and compile the data generator:
Then, generate the data. Parameter -s
specifies the scale factor. For example, with -s 100
, 600 million rows are generated.
Now create tables in ClickHouse:
The data can be imported as follows:
In many use cases of ClickHouse, multiple tables are converted into a single denormalized flat table. This step is optional, below queries are listed in their original form and in a format rewritten for the denormalized table.
The queries are generated by ./qgen -s <scaling_factor>
. Example queries for s = 100
:
Q1.1
Denormalized table:
Q1.2
Denormalized table:
Q1.3
Denormalized table:
Q2.1
Denormalized table:
Q2.2
Denormalized table:
Q2.3
Denormalized table:
Q3.1
Denormalized table:
Q3.2
Denormalized table:
Q3.3
Denormalized table:
Q3.4
Denormalized table:
Q4.1
Denormalized table:
Q4.2
Denormalized table:
Q4.3
Denormalized table: