Integrating ClickHouse with Kafka using Named Collections

Introduction

In this guide, we will explore how to connect ClickHouse to Kafka using named collections. Using the configuration file for named collections offers several advantages:

Centralized and easier management of configuration settings.
Changes to settings can be made without altering SQL table definitions.
Easier review and troubleshooting of configurations by inspecting a single configuration file.

This guide has been tested on Apache Kafka 3.4.1 and ClickHouse 24.5.1.

Assumptions

This document assumes you have:

A working Kafka cluster.
A ClickHouse cluster set up and running.
Basic knowledge of SQL and familiarity with ClickHouse and Kafka configurations.

Prerequisites

Ensure the user creating the named collection has the necessary access permissions:

Refer to the User Management Guide for more details on enabling access control.

Configuration

Add the following section to your ClickHouse config.xml file:

Configuration Notes

Adjust Kafka addresses and related configurations to match your Kafka cluster setup.
The section before <kafka> contains ClickHouse Kafka engine parameters. For a full list of parameters, refer to the Kafka engine parameters .
The section within <kafka> contains extended Kafka configuration options. For more options, refer to the librdkafka configuration.
This example uses the SASL_SSL security protocol and PLAIN mechanism. Adjust these settings based on your Kafka cluster configuration.

Creating Tables and Databases

Create the necessary databases and tables on your ClickHouse cluster. If you run ClickHouse as a single node, omit the cluster part of the SQL command and use any other engine instead of ReplicatedMergeTree.

Create the Database

Create Kafka Tables

Create the first Kafka table for the first Kafka cluster:

Create the second Kafka table for the second Kafka cluster:

Create Replicated Tables

Create a table for the first Kafka table:

Create a table for the second Kafka table:

Create Materialized Views

Create a materialized view to insert data from the first Kafka table into the first replicated table:

Create a materialized view to insert data from the second Kafka table into the second replicated table:

Verifying the Setup

You should now see the relative consumer groups on your Kafka clusters:

cluster_1_clickhouse_consumer on cluster_1
cluster_2_clickhouse_consumer on cluster_2

Run the following queries on any of your ClickHouse nodes to see the data in both tables:

Note

In this guide, the data ingested in both Kafka topics is the same. In your case, they would differ. You can add as many Kafka clusters as you want.

Example output:

This completes the setup for integrating ClickHouse with Kafka using named collections. By centralizing Kafka configurations in the ClickHouse config.xml file, you can manage and adjust settings more easily, ensuring a streamlined and efficient integration.

Introduction​

Assumptions​

Prerequisites​

Configuration​

Configuration Notes​

Creating Tables and Databases​

Create the Database​

Create Kafka Tables​

Create Replicated Tables​

Create Materialized Views​

Verifying the Setup​

Note​