Skip to main content
Skip to main content
Edit this page

Anonymized Web Analytics Data

This dataset consists of two tables containing anonymized web analytics data with hits (hits_v1) and visits (visits_v1).

The tables can be downloaded as compressed tsv.xz files. In addition to the sample worked with in this document, an extended (7.5GB) version of the hits table containing 100 million rows is available as TSV at https://datasets.clickhouse.com/hits/tsv/hits_100m_obfuscated_v1.tsv.xz.

Download and ingest the data

Download the hits compressed TSV file:

Create the database and table

For hits_v1

Or for hits_100m_obfuscated

Import the hits data:

Verify the count of rows

Download the visits compressed TSV file:

Create the visits table

Import the visits data

Verify the count

An example JOIN

The hits and visits dataset is used in the ClickHouse test routines, this is one of the queries from the test suite. The rest of the tests are referenced in the Next Steps section at the end of this page.

Next Steps

A Practical Introduction to Sparse Primary Indexes in ClickHouse uses the hits dataset to discuss the differences in ClickHouse indexing compared to traditional relational databases, how ClickHouse builds and uses a sparse primary index, and indexing best practices.

Additional examples of queries to these tables can be found among the ClickHouse stateful tests.

Note

The test suite uses a database name test, and the tables are named hits and visits. You can rename your database and tables, or edit the SQL from the test file.