Skip to main content
Skip to main content

Hive

Not supported in ClickHouse Cloud

The Hive engine allows you to perform SELECT queries on HDFS Hive table. Currently it supports input formats as below:

  • Text: only supports simple scalar column types except binary

  • ORC: support simple scalar columns types except char; only support complex types like array

  • Parquet: support all simple scalar columns types; only support complex types like array

Creating a Table

See a detailed description of the CREATE TABLE query.

The table structure can differ from the original Hive table structure:

  • Column names should be the same as in the original Hive table, but you can use just some of these columns and in any order, also you can use some alias columns calculated from other columns.
  • Column types should be the same from those in the original Hive table.
  • Partition by expression should be consistent with the original Hive table, and columns in partition by expression should be in the table structure.

Engine Parameters

  • thrift://host:port — Hive Metastore address

  • database — Remote database name.

  • table — Remote table name.

Usage Example

How to Use Local Cache for HDFS Filesystem

We strongly advice you to enable local cache for remote filesystems. Benchmark shows that its almost 2x faster with cache.

Before using cache, add it to config.xml

  • enable: ClickHouse will maintain local cache for remote filesystem(HDFS) after startup if true.
  • root_dir: Required. The root directory to store local cache files for remote filesystem.
  • limit_size: Required. The maximum size(in bytes) of local cache files.
  • bytes_read_before_flush: Control bytes before flush to local filesystem when downloading file from remote filesystem. The default value is 1MB.

When ClickHouse is started up with local cache for remote filesystem enabled, users can still choose not to use cache with settings use_local_cache_for_remote_storage = 0 in their query. use_local_cache_for_remote_storage is 1 by default.

Query Hive Table with ORC Input Format

Create Table in Hive

Create Table in ClickHouse

Table in ClickHouse, retrieving data from the Hive table created above:

Query Hive Table with Parquet Input Format

Create Table in Hive

Create Table in ClickHouse

Table in ClickHouse, retrieving data from the Hive table created above:

Query Hive Table with Text Input Format

Create Table in Hive

Create Table in ClickHouse

Table in ClickHouse, retrieving data from the Hive table created above: