Integrating ClickHouse with Kafka using Named Collections
Introduction
In this guide, we will explore how to connect ClickHouse to Kafka using named collections. Using the configuration file for named collections offers several advantages:
- Centralized and easier management of configuration settings.
- Changes to settings can be made without altering SQL table definitions.
- Easier review and troubleshooting of configurations by inspecting a single configuration file.
This guide has been tested on Apache Kafka 3.4.1 and ClickHouse 24.5.1.
Assumptions
This document assumes you have:
- A working Kafka cluster.
- A ClickHouse cluster set up and running.
- Basic knowledge of SQL and familiarity with ClickHouse and Kafka configurations.
Prerequisites
Ensure the user creating the named collection has the necessary access permissions:
Refer to the User Management Guide for more details on enabling access control.
Configuration
Add the following section to your ClickHouse config.xml
file:
Configuration Notes
- Adjust Kafka addresses and related configurations to match your Kafka cluster setup.
- The section before
<kafka>
contains ClickHouse Kafka engine parameters. For a full list of parameters, refer to the Kafka engine parameters . - The section within
<kafka>
contains extended Kafka configuration options. For more options, refer to the librdkafka configuration. - This example uses the
SASL_SSL
security protocol andPLAIN
mechanism. Adjust these settings based on your Kafka cluster configuration.
Creating Tables and Databases
Create the necessary databases and tables on your ClickHouse cluster. If you run ClickHouse as a single node, omit the cluster part of the SQL command and use any other engine instead of ReplicatedMergeTree
.
Create the Database
Create Kafka Tables
Create the first Kafka table for the first Kafka cluster:
Create the second Kafka table for the second Kafka cluster:
Create Replicated Tables
Create a table for the first Kafka table:
Create a table for the second Kafka table:
Create Materialized Views
Create a materialized view to insert data from the first Kafka table into the first replicated table:
Create a materialized view to insert data from the second Kafka table into the second replicated table:
Verifying the Setup
You should now see the relative consumer groups on your Kafka clusters:
cluster_1_clickhouse_consumer
oncluster_1
cluster_2_clickhouse_consumer
oncluster_2
Run the following queries on any of your ClickHouse nodes to see the data in both tables:
Note
In this guide, the data ingested in both Kafka topics is the same. In your case, they would differ. You can add as many Kafka clusters as you want.
Example output:
This completes the setup for integrating ClickHouse with Kafka using named collections. By centralizing Kafka configurations in the ClickHouse config.xml
file, you can manage and adjust settings more easily, ensuring a streamlined and efficient integration.