• This repository has been archived on 18/Apr/2024
  • Stars
    star
    166
  • Rank 227,748 (Top 5 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 6 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A JDBC proxy from ClickHouse to external databases

ClickHouse JDBC Bridge

Build Release

JDBC bridge for ClickHouse®. It acts as a stateless proxy passing queries from ClickHouse to external datasources. With this extension, you can run distributed query on ClickHouse across multiple datasources in real time, which in a way simplifies the process of building data pipelines for data warehousing, monitoring and integrity check etc.

Overview

Overview

Known Issues / Limitation

  • Connection issue like jdbc-bridge is not running or connect timed out - see performance section and this issue for details

  • Complex data types like Array and Tuple are currently not supported - they're treated as String

  • Pushdown is not supported and query may execute twice because of type inferring

  • Mutation is not fully supported - only insertion in simple cases

  • Scripting is experimental

Quick Start

  • Docker Compose

    git clone https://github.com/ClickHouse/clickhouse-jdbc-bridge.git
    cd clickhouse-jdbc-bridge/misc/quick-start
    docker-compose up -d
    ...
    docker-compose ps
    
               Name                         Command               State              Ports
    --------------------------------------------------------------------------------------------------
    quick-start_ch-server_1         /entrypoint.sh                   Up      8123/tcp, 9000/tcp, 9009/tcp
    quick-start_db-mariadb10_1      docker-entrypoint.sh mysqld      Up      3306/tcp
    quick-start_db-mysql5_1         docker-entrypoint.sh mysqld      Up      3306/tcp, 33060/tcp
    quick-start_db-mysql8_1         docker-entrypoint.sh mysqld      Up      3306/tcp, 33060/tcp
    quick-start_db-postgres13_1     docker-entrypoint.sh postgres    Up      5432/tcp
    quick-start_jdbc-bridge_1       /sbin/my_init                    Up      9019/tcp
    
    # issue below query, and you'll see "ch-server        1" returned
    docker-compose run ch-server clickhouse-client --query="select * from jdbc('self?datasource_column', 'select 1')"
  • Docker CLI

    It's easier to get started using all-in-one docker image:

    # build all-in-one docker image
    git clone https://github.com/ClickHouse/clickhouse-jdbc-bridge.git
    cd clickhouse-jdbc-bridge
    docker build -t my/clickhouse-all-in-one -f all-in-one.Dockerfile .
    
    # start container in background
    docker run --rm -d --name ch-and-jdbc-bridge my/clickhouse-all-in-one
    
    # enter container to add datasource and issue query
    docker exec -it ch-and-jdbc-bridge bash
    
    cp /etc/clickhouse-jdbc-bridge/config/datasources/datasource.json.example \
        /etc/clickhouse-jdbc-bridge/config/datasources/ch-server.json
    
    # you're supposed to see "ch-server        1" returned from ClickHouse
    clickhouse-client --query="select * from jdbc('self?datasource_column', 'select 1')"

    Alternatively, if you prefer the hard way ;)

    # create a network for ClickHouse and JDBC brigde, so that they can communicate with each other
    docker network create ch-net --attachable
    # start the two containers
    docker run --rm -d --network ch-net --name jdbc-bridge --hostname jdbc-bridge clickhouse/jdbc-bridge
    docker run --rm -d --network ch-net --name ch-server --hostname ch-server \
        --entrypoint /bin/bash clickhouse/clickhouse-server -c \
        "echo '<clickhouse><jdbc_bridge><host>jdbc-bridge</host><port>9019</port></jdbc_bridge></clickhouse>' \
            > /etc/clickhouse-server/config.d/jdbc_bridge_config.xml && /entrypoint.sh"
    # add named datasource and query
    docker exec -it jdbc-bridge cp /app/config/datasources/datasource.json.example \
        /app/config/datasources/ch-server.json
    docker exec -it jdbc-bridge cp /app/config/queries/query.json.example \
        /app/config/queries/show-query-logs.json
    # issue below query, and you'll see "ch-server        1" returned from ClickHouse
    docker exec -it ch-server clickhouse-client \
        --query="select * from jdbc('self?datasource_column', 'select 1')"
  • Debian/RPM Package

    Besides docker, you can download and install released Debian/RPM package on existing Linux system.

    Debian/Ubuntu

    apt update && apt install -y procps wget
    wget https://github.com/ClickHouse/clickhouse-jdbc-bridge/releases/download/v2.1.0/clickhouse-jdbc-bridge_2.1.0-1_all.deb
    apt install --no-install-recommends -f ./clickhouse-jdbc-bridge_2.1.0-1_all.deb
    clickhouse-jdbc-bridge

    CentOS/RHEL

    yum install -y wget
    wget https://github.com/ClickHouse/clickhouse-jdbc-bridge/releases/download/v2.1.0/clickhouse-jdbc-bridge-2.1.0-1.noarch.rpm
    yum localinstall -y clickhouse-jdbc-bridge-2.1.0-1.noarch.rpm
    clickhouse-jdbc-bridge
  • Java CLI

    wget https://github.com/ClickHouse/clickhouse-jdbc-bridge/releases/download/v2.1.0/clickhouse-jdbc-bridge-2.1.0-shaded.jar
    # add named datasource
    wget -P config/datasources https://raw.githubusercontent.com/ClickHouse/clickhouse-jdbc-bridge/master/misc/quick-start/jdbc-bridge/config/datasources/ch-server.json
    # start jdbc bridge, and then issue below query in ClickHouse for testing
    # select * from jdbc('ch-server', 'select 1')
    java -jar clickhouse-jdbc-bridge-2.1.0-shaded.jar

Usage

In most cases, you'll use jdbc table function to query against external datasources:

select * from jdbc('<datasource>', '<schema>', '<query>')

schema is optional but others are mandatory. Please be aware that the query is in native format of the given datasource. For example, if the query is select * from some_table limit 10, it may work in MariaDB but not in PostgreSQL, as the latter one does not understand limit.

Assuming you started a test environment using docker-compose, please refer to examples below to get familiar with JDBC bridge.

  • Data Source

    -- show datasources and usage
    select * from jdbc('', 'show datasources')
    -- access named datasource
    select * from jdbc('ch-server', 'select 1')
    -- adhoc datasource is NOT recommended for security reason
    select *
    from jdbc('jdbc:clickhouse://localhost:8123/system?compress=false&ssl=false&user=default', 'select 1')
  • Schema

    By default, any adhoc query passed to JDBC bridge will be executed twice. The first run is for type inferring, while the second for retrieving results. Although metadata will be cached(for up to 5 minutes by default), executing same query twice could be a problem - that's where schema comes into play.

    -- inline schema
    select * from jdbc('ch-server', 'num UInt8, str String', 'select 1 as num, ''2'' as str')
    select * from jdbc('ch-server', 'num Nullable(Decimal(10,0)), Nullable(str FixedString(1)) DEFAULT ''x''', 'select 1 as num, ''2'' as str')
    -- named schema
    select * from jdbc('ch-server', 'query-log', 'show-query-logs')
  • Query

    -- adhoc query
    select * from jdbc('ch-server', 'system', 'select * from query_log where user != ''default''')
    select * from jdbc('ch-server', 'select * from query_log where user != ''default''')
    select * from jdbc('ch-server', 'select * from system.query_log where user != ''default''')
    
    -- table query
    select * from jdbc('ch-server', 'system', 'query_log')
    select * from jdbc('ch-server', 'query_log')
    
    -- saved query
    select * from jdbc('ch-server', 'scripts/show-query-logs.sql')
    
    -- named query
    select * from jdbc('ch-server', 'show-query-logs')
    
    -- scripting
    select * from jdbc('script', '[1,2,3]')
    select * from jdbc('script', 'js', '[1,2,3]')
    select * from jdbc('script', 'scripts/one-two-three.js')
  • Query Parameters

    select *
    from jdbc('ch-server?datasource_column&max_rows=1&fetch_size=1&one=1&two=2',
        'select {{one}} union all select {{ two }}')

    Query result:

    ┌─datasource─┬─1─┐
    │ ch-server  │ 1 │
    └────────────┴───┘
  • JDBC Table

    drop table if exists system.test;
    create table system.test (
        a String,
        b UInt8
    ) engine=JDBC('ch-server', '', 'select user as a, is_initial_query as b from system.processes');
  • JDBC Dictionary

    drop dictionary if exists system.dict_test;
    create dictionary system.dict_test
    (
        b UInt64 DEFAULT 0,
        a String
    ) primary key b
    SOURCE(CLICKHOUSE(HOST 'localhost' PORT 9000 USER 'default' TABLE 'test' DB 'system'))
    LIFETIME(MIN 82800 MAX 86400)
    LAYOUT(FLAT());
  • Mutation

    -- use query parameter
    select * from jdbc('ch-server?mutation', 'drop table if exists system.test_table');
    select * from jdbc('ch-server?mutation', 'create table system.test_table(a String, b UInt8) engine=Memory()');
    select * from jdbc('ch-server?mutation', 'insert into system.test_table values(''a'', 1)');
    select * from jdbc('ch-server?mutation', 'truncate table system.test_table');
    
    -- use JDBC table engine
    drop table if exists system.test_table;
    create table system.test_table (
        a String,
        b UInt8
    ) engine=Memory();
    
    drop table if exists system.jdbc_table;
    create table system.jdbc_table (
        a String,
        b UInt8
    ) engine=JDBC('ch-server?batch_size=1000', 'system', 'test_table');
    
    insert into system.jdbc_table(a, b) values('a', 1);
    
    select * from system.test_table;

    Query result:

    ┌─a─┬─b─┐
    │ a │ 1 │
    └───┴───┘
  • Monitoring

    You can use Prometheus to monitor metrics exposed by JDBC bridge.

    curl -v http://jdbc-bridge:9019/metrics

Configuration

  • JDBC Driver

    By default, all JDBC drivers should be placed under drivers directory. You can override that by customizing driverUrls in datasource configuration file. For example:

    {
        "testdb": {
            "driverUrls": [
                "drivers/mariadb",
                "D:\\drivers\\mariadb",
                "/mnt/d/drivers/mariadb",
                "https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/2.7.4/mariadb-java-client-2.7.4.jar"
            ],
            "driverClassName": "org.mariadb.jdbc.Driver",
            ...
        }
    }
  • Named Data Source

    By default, named datasource is defined in configuration file in JSON format under config/datasources directory. You may check examples at misc/quick-start/jdbc-bridge/config/datasources. If you use modern editors like VSCode, you may find it's helpful to use JSON schema for validation and smart autocomplete.

  • Saved Query

    Saved queries and scripts are under scripts directory by default. For example: show-query-logs.sql.

  • Named Query

    Similar as named datasource, named queries are JSON configuration files under config/queries. You may refer to examples at misc/quick-start/jdbc-bridge/config/queries.

  • Logging

    You can customize logging configuration in logging.properties.

  • Vert.x

    If you're familiar with Vert.x, you can customize its configuration by changing config/httpd.json and config/vertx.json.

  • Query Parameters

    All supported query parameters can be found at here. datasource_column=true can be simplied as datasource_column, for example:

    select * from jdbc('ch-server?datasource_column=true', 'select 1')
    
    select * from jdbc('ch-server?datasource_column', 'select 1')
  • Timeout

    Couple of timeout settings you should be aware of:

    1. datasource timeout, for example: max_execution_time in MariaDB
    2. JDBC driver timeout, for example: connectTimeout and socketTimeout in MariaDB Connector/J
    3. JDBC bridge timeout, for examples: queryTimeout in config/server.json, and maxWorkerExecuteTime in config/vertx.json
    4. ClickHouse timeout like max_execution_time, keep_alive_timeout and http_receive_timeout etc.
    5. Client timeout, for example: socketTimeout in ClickHouse JDBC driver

Migration

  • Upgrade to 2.x

    2.x is a complete re-write not fully compatible with older version. You'll have to re-define your datasources and update your queries accordingly.

Build

You can use Maven to build ClickHouse JDBC bridge, for examples:

git clone https://github.com/ClickHouse/clickhouse-jdbc-bridge.git
cd clickhouse-jdbc-bridge
# compile and run unit tests
mvn -Prelease verify
# release shaded jar, rpm and debian packages
mvn -Prelease package

In order to build docker images:

git clone https://github.com/ClickHouse/clickhouse-jdbc-bridge.git
cd clickhouse-jdbc-bridge
docker build -t my/clickhouse-jdbc-bridge .
# or if you want to build the all-ine-one image
docker build --build-arg revision=20.9.3 -f all-in-one.Dockerfile -t my/clickhouse-all-in-one .

Develop

JDBC bridge is extensible. You may take ConfigDataSource and ScriptDataSource as examples to create your own extension.

An extension for JDBC bridge is basically a Java class with 3 optional parts:

  1. Extension Name

    By default, extension class name will be treated as name for the extension. However, you can declare a static member in your extension class to override that, for instance:

    public static final String EXTENSION_NAME = "myExtension";
  2. Initialize Method

    Initialize method will be called once and only once at the time when loading your extension, for example:

    public static void initialize(ExtensionManager manager) {
        ...
    }
  3. Instantiation Method

    In order to create instance of your extension, in general you should define a static method like below so that JDBC bridge knows how(besides walking through all possible constructors):

    public static MyExtension newInstance(Object... args) {
        ...
    }

Assume your extension class is com.mycompany.MyExtension, you can load it into JDBC bridge by:

  • put your extension package(e.g. my-extension.jar) and required dependencies under extensions directory

  • update server.json by adding your extension, for example

    ...
    "extensions": [
        ...
        {
            "class": "com.mycompany.MyExtension"
        }
    ]
    ...

Note: order of the extension matters. The first NamedDataSource extension will be set as default for all named datasources.

Performance

Below is a rough performance comparison to help you understand overhead caused by JDBC bridge as well as its stability. MariaDB, ClickHouse, and JDBC bridge are running on separated KVMs. ApacheBench(ab) is used on another KVM to simulate 20 concurrent users to issue same query 100,000 times after warm-up. Please refer to this in order to setup test environment and run tests by yourself.

Test Case Time Spent(s) Throughput(#/s) Failed Requests Min(ms) Mean(ms) Median(ms) Max(ms)
clickhouse_ping 801.367 124.79 0 1 160 4 1,075
jdbc-bridge_ping 804.017 124.38 0 1 161 10 3,066
clickhouse_url(clickhouse) 801.448 124.77 3 3 160 8 1,077
clickhouse_url(jdbc-bridge) 811.299 123.26 446 3 162 10 3,066
clickhouse_constant-query 797.775 125.35 0 1 159 4 1,077
clickhouse_constant-query(mysql) 1,598.426 62.56 0 7 320 18 2,049
clickhouse_constant-query(remote) 802.212 124.66 0 2 160 8 3,073
clickhouse_constant-query(url) 801.686 124.74 0 3 160 11 1,123
clickhouse_constant-query(jdbc) 925.087 108.10 5,813 14 185 75 4,091
clickhouse(patched)_constant-query(jdbc) 833.892 119.92 1,577 10 167 51 3,109
clickhouse(patched)_constant-query(jdbc-dual) 846.403 118.15 3,021 8 169 50 3,054
clickhouse_10k-rows-query 854.886 116.97 0 12 171 99 1,208
clickhouse_10k-rows-query(mysql) 1,657.425 60.33 0 28 331 123 2,228
clickhouse_10k-rows-query(remote) 854.610 117.01 0 12 171 99 1,201
clickhouse_10k-rows-query(url) 853.292 117.19 5 23 171 105 2,026
clickhouse_10k-rows-query(jdbc) 1,483.565 67.41 11,588 66 297 206 2,051
clickhouse(patched)_10k-rows-query(jdbc) 1,186.422 84.29 6,632 61 237 184 2,021
clickhouse(patched)_10k-rows-query(jdbc-dual) 1,080.676 92.53 4,195 65 216 180 2,013

Note: clickhouse(patched) is a patched version of ClickHouse server by disabling XDBC bridge health check. jdbc-dual on the other hand means dual instances of JDBC bridge managed by docker swarm on same KVM(due to limited resources ;).

Test Case (Decoded) URL
clickhouse_ping http://ch-server:8123/ping
jdbc-bridge_ping http://jdbc-bridge:9019/ping
clickhouse_url(clickhouse) http://ch-server:8123/?query=select * from url('http://ch-server:8123/ping', CSV, 'results String')
clickhouse_url(jdbc-bridge) http://ch-server:8123/?query=select * from url('http://jdbc-bridge:9019/ping', CSV, 'results String')
clickhouse_constant-query http://ch-server:8123/?query=select 1
clickhouse_constant-query(mysql) http://ch-server:8123/?query=select * from mysql('mariadb:3306', 'test', 'constant', 'root', 'root')
clickhouse_constant-query(remote) http://ch-server:8123/?query=select * from remote('ch-server:9000', system.constant, 'default', '')
clickhouse_constant-query(url) http://ch-server:8123/?query=select * from url('http://ch-server:8123/?query=select 1', CSV, 'results String')
clickhouse*_constant-query(jdbc*) http://ch-server:8123/?query=select * from jdbc('mariadb', 'constant')
clickhouse_10k-rows-query http://ch-server:8123/?query=select 1
clickhouse_10k-rows-query(mysql) http://ch-server:8123/?query=select * from mysql('mariadb:3306', 'test', '10k_rows', 'root', 'root')
clickhouse_10k-rows-query(remote) http://ch-server:8123/?query=select * from remote('ch-server:9000', system.10k_rows, 'default', '')
clickhouse_10k-rows-query(url) http://ch-server:8123/?query=select * from url('http://ch-server:8123/?query=select * from 10k_rows', CSV, 'results String')
clickhouse*_10k-rows-query(jdbc*) http://ch-server:8123/?query=select * from jdbc('mariadb', 'small-table')

More Repositories

1

ClickHouse

ClickHouse® is a real-time analytics DBMS
C++
37,339
star
2

clickhouse-go

Golang driver for ClickHouse
Go
2,853
star
3

clickhouse-java

Java client and JDBC driver for ClickHouse
Java
1,324
star
4

clickhouse-presentations

Presentations, meetups and talks about ClickHouse
HTML
981
star
5

ClickBench

ClickBench: a Benchmark For Analytical Databases
HTML
642
star
6

metabase-clickhouse-driver

ClickHouse database driver for the Metabase business intelligence front-end
Clojure
439
star
7

clickhouse_exporter

This is a simple server that periodically scrapes ClickHouse stats and exports them via HTTP for Prometheus(https://prometheus.io/) consumption.
Go
361
star
8

ch-go

Low-level Go Client for ClickHouse
Go
314
star
9

clickhouse-connect

Python driver/sqlalchemy/superset connectors
Python
308
star
10

clickhouse-cpp

C++ client library for ClickHouse
C
300
star
11

NoiSQL

NoiSQL — Generating Music With SQL Queries
SQL
276
star
12

clickhouse-rs

Official pure Rust typed client for ClickHouse DB
Rust
270
star
13

graphouse

Graphouse allows you to use ClickHouse as a Graphite storage.
Java
259
star
14

clickhouse-odbc

ODBC driver for ClickHouse
C
246
star
15

dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Python
245
star
16

adsb.exposed

Interactive visualization and analytics on ADS-B data with ClickHouse
HTML
223
star
17

clickhouse-js

Official JS client for ClickHouse DB
TypeScript
212
star
18

spark-clickhouse-connector

Spark ClickHouse Connector build on DataSourceV2 API
Scala
180
star
19

clickhouse-kafka-connect

ClickHouse Kafka Connector
Java
145
star
20

github-explorer

Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask)
HTML
141
star
21

examples

ClickHouse Examples
Jupyter Notebook
118
star
22

clickhouse-docs

Official documentation for ClickHouse
JavaScript
110
star
23

click-ui

The home of the ClickHouse design system and component library.
TypeScript
73
star
24

pastila

Paste toy-service on top of ClickHouse
HTML
62
star
25

homebrew-clickhouse

ClickHouse Homebrew tap (old repository, unused)
57
star
26

clickhouse-tableau-connector-jdbc

Tableau connector to ClickHouse using JDBC driver
JavaScript
57
star
27

clickpy

PyPI analytics powered by ClickHouse
JavaScript
52
star
28

power-bi-clickhouse

This connector allows you to retrieve data from ClickHouse directly into Power BI for analysis and visualization
45
star
29

reversedns.space

https://reversedns.space/
HTML
44
star
30

clickhouse-academy

ClickHouse Academy training and certification
Python
39
star
31

libhdfs3

HDFS file read access for ClickHouse
C++
33
star
32

ch-bench

Benchmarks for ch
Go
30
star
33

sysroot

Files for cross-compilation
C
27
star
34

CryptoHouse

Artifacts including queries and materialized views used for CryptoHouse
TypeScript
26
star
35

HouseClick

House prices app
JavaScript
23
star
36

ch2rs

Generate Rust structs from ClickHouse rows
Rust
21
star
37

terraform-provider-clickhouse

Terraform Provider for ClickHouse Cloud
Go
21
star
38

web-tables-demo

15
star
39

clickhub

Github analytics powered by the world's fastest real-time analytics database
Python
13
star
40

icudata

Pregenerated data for ICU library
Assembly
11
star
41

clickhouse-website-worker

TypeScript
9
star
42

keeper-extend-cluster

Experiment on how to upgrade single-node clickhouse-keeper to a cluster
Makefile
7
star
43

copier

clickhouse-copier (obsolete)
C++
7
star
44

1trc

1 trillion rows
Python
7
star
45

checkout

Wrapper around actions/checkout for flexible tuning
6
star
46

laion

Supporting code for inserting and searching laion in ClickHouse
Python
5
star
47

bedrock_rag

A simple RAG pipeline for Google Analytics with ClickHouse and Bedrock
Python
5
star
48

aretestsgreenyet

A single-page website to display the status of the open-source ClickHouse CI system.
HTML
4
star
49

fuzz-corpus

Corpuses for libFuzzer-type fuzzers
4
star
50

clickhouse-playground-old

4
star
51

ch-async-inserts-demo

Demo on how to create a Node API that sends data to CH via Async inserts
TypeScript
3
star
52

grpc

Stripped version of grpc
C++
3
star
53

clickhouse-website-content

JavaScript
3
star
54

clickhouse-recipes

Sample code for solving common problems with ClickHouse
Python
3
star
55

clickhouse_vs_snowflake

HTML
2
star
56

clickhouse-com-content

HTML
2
star
57

clickhouse.github.io

HTML
2
star
58

antlr4-runtime

Subtree of antlr4 original repo
C++
2
star
59

kafka-samples

Sample datasets for Kafka
Python
2
star
60

ssl

Minimized libressl
C
2
star
61

libpq

Copy of https://github.com/postgres/postgres/tree/master/src/interfaces/libpq with some files from root
C
2
star
62

llvm

Stripped version of LLVM for use in ClickHouse for runtime code generation.
C++
2
star
63

perspective-forex

Perspective with ClickHouse example using Apache Arrow
JavaScript
2
star
64

clickhouse-typescript-schema

TypeScript
2
star
65

protobuf

add protobuf for libhdfs3
C++
1
star
66

hive-metastore

For files generated with https://github.com/apache/thrift
Thrift
1
star
67

boost-extra

extra boost libs for libhdfs3
C++
1
star
68

UnixODBC

Mirror of http://www.unixodbc.org/
C
1
star
69

doc-pr-preview-test

Testing workflow to build Docusaurus previews for pull requests.
JavaScript
1
star
70

clickhouse-docs-content

1
star
71

clickhouse-blog-images

HTML
1
star
72

bzip2

Forked from https://gitlab.com/federicomenaquintero/bzip2
C
1
star
73

libgsasl

https://www.gnu.org/software/gsasl/
C
1
star
74

readthedocs-stub

HTML
1
star
75

boost

Minimized boost lib
C++
1
star
76

clickhouse-repos-manager

a config and artifacts for packages.clickhouse.com
Python
1
star
77

clickhouse-kafka-transforms

This is meant to hold Clickhouse created kafka transforms.
Java
1
star
78

clickhouse-fivetran-destination

ClickHouse Cloud Fivetran Destination
Go
1
star
79

clickhouse-test.github.io

HTML
1
star
80

rust_vendor

Vendor files from rust dependencies
Rust
1
star
81

simple-logging-benchmark

A simple ClickHouse benchmark for the logging usecase
Python
1
star