• Stars
    star
    314
  • Rank 133,353 (Top 3 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created about 3 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Low-level Go Client for ClickHouse

ch

Low level TCP ClickHouse client and protocol implementation in Go. Designed for very fast data block streaming with low network, cpu and memory overhead.

NB: No pooling, reconnects and not goroutine-safe by default, only single connection. Use clickhouse-go for high-level database/sql-compatible client, pooling for ch-go is available as chpool package.

ClickHouse is an open-source, high performance columnar OLAP database management system for real-time analytics using SQL.

go get github.com/ClickHouse/ch-go@latest

Example

package main

import (
  "context"
  "fmt"

  "github.com/ClickHouse/ch-go"
  "github.com/ClickHouse/ch-go/proto"
)

func main() {
  ctx := context.Background()
  c, err := ch.Dial(ctx, ch.Options{Address: "localhost:9000"})
  if err != nil {
    panic(err)
  }
  var (
    numbers int
    data    proto.ColUInt64
  )
  if err := c.Do(ctx, ch.Query{
    Body: "SELECT number FROM system.numbers LIMIT 500000000",
    Result: proto.Results{
      {Name: "number", Data: &data},
    },
    // OnResult will be called on next received data block.
    OnResult: func(ctx context.Context, b proto.Block) error {
      numbers += len(data)
      return nil
    },
  }); err != nil {
    panic(err)
  }
  fmt.Println("numbers:", numbers)
}
393ms 0.5B rows  4GB  10GB/s 1 job
874ms 2.0B rows 16GB  18GB/s 4 jobs

Results

To stream query results, set Result and OnResult fields of Query. The OnResult will be called after Result is filled with received data block.

The OnResult is optional, but query will fail if more than single block is received, so it is ok to solely set the Result if only one row is expected.

Automatic result inference

var result proto.Results
q := ch.Query{
  Body:   "SELECT * FROM table",
  Result: result.Auto(),
}

Single result with column name inference

var res proto.ColBool
q := ch.Query{
  Body:   "SELECT v FROM test_table",
  Result: proto.ResultColumn{Data: &res},
}

Writing data

See examples/insert.

For table

CREATE TABLE test_table_insert
(
    ts                DateTime64(9),
    severity_text     Enum8('INFO'=1, 'DEBUG'=2),
    severity_number   UInt8,
    body              String,
    name              String,
    arr               Array(String)
) ENGINE = Memory

We prepare data block for insertion as follows:

var (
	body      proto.ColStr
	name      proto.ColStr
	sevText   proto.ColEnum
	sevNumber proto.ColUInt8

	ts  = new(proto.ColDateTime64).WithPrecision(proto.PrecisionNano) // DateTime64(9)
	arr = new(proto.ColStr).Array()                                   // Array(String)
	now = time.Date(2010, 1, 1, 10, 22, 33, 345678, time.UTC)
)

// Append 10 rows to initial data block.
for i := 0; i < 10; i++ {
	body.AppendBytes([]byte("Hello"))
	ts.Append(now)
	name.Append("name")
	sevText.Append("INFO")
	sevNumber.Append(10)
	arr.Append([]string{"foo", "bar", "baz"})
}

input := proto.Input{
	{Name: "ts", Data: ts},
	{Name: "severity_text", Data: &sevText},
	{Name: "severity_number", Data: sevNumber},
	{Name: "body", Data: body},
	{Name: "name", Data: name},
	{Name: "arr", Data: arr},
}

Single data block

if err := conn.Do(ctx, ch.Query{
	// Or "INSERT INTO test_table_insert (ts, severity_text, severity_number, body, name, arr) VALUES"
	// Or input.Into("test_table_insert")
	Body: "INSERT INTO test_table_insert VALUES",
	Input: input,
}); err != nil {
	panic(err)
}

Stream data

// Stream data to ClickHouse server in multiple data blocks.
var blocks int
if err := conn.Do(ctx, ch.Query{
	Body:  input.Into("test_table_insert"), // helper that generates INSERT INTO query with all columns
	Input: input,

	// OnInput is called to prepare Input data before encoding and sending
	// to ClickHouse server.
	OnInput: func(ctx context.Context) error {
		// On OnInput call, you should fill the input data.
		//
		// NB: You should reset the input columns, they are
		// not reset automatically.
		//
		// That is, we are re-using the same input columns and
		// if we will return nil without doing anything, data will be
		// just duplicated.

		input.Reset() // calls "Reset" on each column

		if blocks >= 10 {
			// Stop streaming.
			//
			// This will also write tailing input data if any,
			// but we just reset the input, so it is currently blank.
			return io.EOF
		}

		// Append new values:
		for i := 0; i < 10; i++ {
			body.AppendBytes([]byte("Hello"))
			ts.Append(now)
			name.Append("name")
			sevText.Append("DEBUG")
			sevNumber.Append(10)
			arr.Append([]string{"foo", "bar", "baz"})
		}

		// Data will be encoded and sent to ClickHouse server after returning nil.
		// The Do method will return error if any.
		blocks++
		return nil
	},
}); err != nil {
	panic(err)
}

Writing dumps in Native format

You can use ch-go to write ClickHouse dumps in Native format:

The most efficient format. Data is written and read by blocks in binary format. For each block, the number of rows, number of columns, column names and types, and parts of columns in this block are recorded one after another. In other words, this format is “columnar” – it does not convert columns to rows. This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.

See ./internal/cmd/ch-native-dump for more sophisticated example.

Example:

var (
    colK proto.ColInt64
    colV proto.ColInt64
)
// Generate some data.
for i := 0; i < 100; i++ {
    colK.Append(int64(i))
    colV.Append(int64(i) + 1000)
}
// Write data to buffer.
var buf proto.Buffer
input := proto.Input{
    {"k", colK},
    {"v", colV},
}
b := proto.Block{
    Rows:    colK.Rows(),
    Columns: len(input),
}
// Note that we are using version 54451, proto.Version will fail.
if err := b.EncodeRawBlock(&buf, 54451, input); err != nil {
    panic(err)
}

// You can write buf.Buf to io.Writer, e.g. os.Stdout or file.
var out bytes.Buffer
_, _ = out.Write(buf.Buf)

// You can encode multiple buffers in sequence.
//
// To do this, reset buf and all columns, append new values
// to columns and call EncodeRawBlock again.
buf.Reset()
colV.Reset()
colV.Reset()

Features

  • OpenTelemetry support
  • No reflection or interface{}
  • Generics (go1.18) for Array[T], LowCardinaliy[T], Map[K, V], Nullable[T]
  • Reading or writing ClickHouse dumps in Native format
  • Column-oriented design that operates directly with blocks of data
    • Dramatically more efficient
    • Up to 100x faster than row-first design around sql
    • Up to 700x faster than HTTP API
    • Low memory overhead (data blocks are slices, i.e. continuous memory)
    • Highly efficient input and output block streaming
    • As close to ClickHouse as possible
  • Structured query execution telemetry streaming
  • LZ4, ZSTD or None (just checksums for integrity check) compression
  • External data support
  • Rigorously tested
    • Windows, Mac, Linux (also x86)
    • Unit tests for encoding and decoding
      • ClickHouse Server in Go for faster tests
      • Golden files for all packets, columns
      • Both server and client structures
      • Ensuring that partial read leads to failure
    • End-to-end tests on multiple LTS and stable versions
    • Fuzzing

Supported types

  • UInt8, UInt16, UInt32, UInt64, UInt128, UInt256
  • Int8, Int16, Int32, Int64, Int128, Int256
  • Date, Date32, DateTime, DateTime64
  • Decimal32, Decimal64, Decimal128, Decimal256 (only low-level raw values)
  • IPv4, IPv6
  • String, FixedString(N)
  • UUID
  • Array(T)
  • Enum8, Enum16
  • LowCardinality(T)
  • Map(K, V)
  • Bool
  • Tuple(T1, T2, ..., Tn)
  • Nullable(T)
  • Point
  • Nothing, Interval

Enums

You can use automatic enum inference in proto.ColEnum, this will come with some performance penalty.

To use proto.ColEnum8 and proto.ColEnum16, you need to explicitly provide DDL for them via proto.Wrap:

var v proto.ColEnum8

const ddl = `'Foo'=1, 'Bar'=2, 'Baz'=3`
input := []proto.InputColumn{
  {Name: "v", Data: proto.Wrap(&v, ddl)},
}

Generics

Most columns implement proto.ColumnOf[T] generic constraint:

type ColumnOf[T any] interface {
	Column
	Append(v T)
	AppendArr(vs []T)
	Row(i int) T
}

For example, ColStr (and ColStr.LowCardinality) implements ColumnOf[string]. Same for arrays: new(proto.ColStr).Array() implements ColumnOf[[]string], column of []string values.

Array

Generic for Array(T)

// Array(String)
arr := proto.NewArray[string](new(proto.ColStr))
// Or
arr := new(proto.ColStr).Array()
q := ch.Query{
  Body:   "SELECT ['foo', 'bar', 'baz']::Array(String) as v",
  Result: arr.Results("v"),
}
// Do ...
arr.Row(0) // ["foo", "bar", "baz"]

Dumps

Reading

Use proto.Block.DecodeRawBlock on proto.NewReader:

func TestDump(t *testing.T) {
	// Testing decoding of Native format dump.
	//
	// CREATE TABLE test_dump (id Int8, v String)
	//   ENGINE = MergeTree()
	// ORDER BY id;
	//
	// SELECT * FROM test_dump
	//   ORDER BY id
	// INTO OUTFILE 'test_dump_native.raw' FORMAT Native;
	data, err := os.ReadFile(filepath.Join("_testdata", "test_dump_native.raw"))
	require.NoError(t, err)
	var (
		dec    proto.Block
		ids    proto.ColInt8
		values proto.ColStr
	)
	require.NoError(t, dec.DecodeRawBlock(
		proto.NewReader(bytes.NewReader(data)),
		proto.Results{
			{Name: "id", Data: &ids},
			{Name: "v", Data: &values},
		}),
	)
}

Writing

Use proto.Block.EncodeRawBlock with version 54451 on proto.Buffer with Rows and Columns set:

func TestLocalNativeDump(t *testing.T) {
	ctx := context.Background()
	// Testing clickhouse-local.
	var v proto.ColStr
	for _, s := range data {
		v.Append(s)
	}
	buf := new(proto.Buffer)
	b := proto.Block{Rows: 2, Columns: 2}
	require.NoError(t, b.EncodeRawBlock(buf, 54451, []proto.InputColumn{
		{Name: "title", Data: v},
		{Name: "data", Data: proto.ColInt64{1, 2}},
	}), "encode")

	dir := t.TempDir()
	inFile := filepath.Join(dir, "data.native")
	require.NoError(t, os.WriteFile(inFile, buf.Buf, 0600), "write file")

	cmd := exec.Command("clickhouse-local", "local",
		"--logger.console",
		"--log-level", "trace",
		"--file", inFile,
		"--input-format", "Native",
		"--output-format", "JSON",
		"--query", "SELECT * FROM table",
	)
	out := new(bytes.Buffer)
	errOut := new(bytes.Buffer)
	cmd.Stdout = out
	cmd.Stderr = errOut

	t.Log(cmd.Args)
	require.NoError(t, cmd.Run(), "run: %s", errOut)
	t.Log(errOut)

	v := struct {
		Rows int `json:"rows"`
		Data []struct {
			Title string `json:"title"`
			Data  int    `json:"data,string"`
		}
	}{}
	require.NoError(t, json.Unmarshal(out.Bytes(), &v), "json")
	assert.Equal(t, 2, v.Rows)
	if assert.Len(t, v.Data, 2) {
		for i, r := range []struct {
			Title string `json:"title"`
			Data  int    `json:"data,string"`
		}{
			{"Foo", 1},
			{"Bar", 2},
		} {
			assert.Equal(t, r, v.Data[i])
		}
	}
}

TODO

  • Types
    • Decimal(P, S) API
    • JSON
    • SimpleAggregateFunction
    • AggregateFunction
    • Nothing
    • Interval
    • Nested
    • Geo types
      • Point
      • Ring
      • Polygon
      • MultiPolygon
  • Improved i/o timeout handling for reading packets from server
    • Close connection on context cancellation in all cases
    • Ensure that reads can't block forever

Reference

License

Apache License 2.0

More Repositories

1

ClickHouse

ClickHouse® is a real-time analytics DBMS
C++
37,339
star
2

clickhouse-go

Golang driver for ClickHouse
Go
2,853
star
3

clickhouse-java

Java client and JDBC driver for ClickHouse
Java
1,324
star
4

clickhouse-presentations

Presentations, meetups and talks about ClickHouse
HTML
981
star
5

ClickBench

ClickBench: a Benchmark For Analytical Databases
HTML
642
star
6

metabase-clickhouse-driver

ClickHouse database driver for the Metabase business intelligence front-end
Clojure
439
star
7

clickhouse_exporter

This is a simple server that periodically scrapes ClickHouse stats and exports them via HTTP for Prometheus(https://prometheus.io/) consumption.
Go
361
star
8

clickhouse-connect

Python driver/sqlalchemy/superset connectors
Python
308
star
9

clickhouse-cpp

C++ client library for ClickHouse
C
300
star
10

NoiSQL

NoiSQL — Generating Music With SQL Queries
SQL
276
star
11

clickhouse-rs

Official pure Rust typed client for ClickHouse DB
Rust
270
star
12

graphouse

Graphouse allows you to use ClickHouse as a Graphite storage.
Java
259
star
13

clickhouse-odbc

ODBC driver for ClickHouse
C
246
star
14

dbt-clickhouse

The Clickhouse plugin for dbt (data build tool)
Python
245
star
15

adsb.exposed

Interactive visualization and analytics on ADS-B data with ClickHouse
HTML
223
star
16

clickhouse-js

Official JS client for ClickHouse DB
TypeScript
212
star
17

spark-clickhouse-connector

Spark ClickHouse Connector build on DataSourceV2 API
Scala
180
star
18

clickhouse-jdbc-bridge

A JDBC proxy from ClickHouse to external databases
Java
166
star
19

clickhouse-kafka-connect

ClickHouse Kafka Connector
Java
145
star
20

github-explorer

Everything You Always Wanted To Know About GitHub (But Were Afraid To Ask)
HTML
141
star
21

examples

ClickHouse Examples
Jupyter Notebook
118
star
22

clickhouse-docs

Official documentation for ClickHouse
JavaScript
110
star
23

click-ui

The home of the ClickHouse design system and component library.
TypeScript
73
star
24

pastila

Paste toy-service on top of ClickHouse
HTML
62
star
25

homebrew-clickhouse

ClickHouse Homebrew tap (old repository, unused)
57
star
26

clickhouse-tableau-connector-jdbc

Tableau connector to ClickHouse using JDBC driver
JavaScript
57
star
27

clickpy

PyPI analytics powered by ClickHouse
JavaScript
52
star
28

power-bi-clickhouse

This connector allows you to retrieve data from ClickHouse directly into Power BI for analysis and visualization
45
star
29

reversedns.space

https://reversedns.space/
HTML
44
star
30

clickhouse-academy

ClickHouse Academy training and certification
Python
39
star
31

libhdfs3

HDFS file read access for ClickHouse
C++
33
star
32

ch-bench

Benchmarks for ch
Go
30
star
33

sysroot

Files for cross-compilation
C
27
star
34

CryptoHouse

Artifacts including queries and materialized views used for CryptoHouse
TypeScript
26
star
35

HouseClick

House prices app
JavaScript
23
star
36

ch2rs

Generate Rust structs from ClickHouse rows
Rust
21
star
37

terraform-provider-clickhouse

Terraform Provider for ClickHouse Cloud
Go
21
star
38

web-tables-demo

15
star
39

clickhub

Github analytics powered by the world's fastest real-time analytics database
Python
13
star
40

icudata

Pregenerated data for ICU library
Assembly
11
star
41

clickhouse-website-worker

TypeScript
9
star
42

keeper-extend-cluster

Experiment on how to upgrade single-node clickhouse-keeper to a cluster
Makefile
7
star
43

copier

clickhouse-copier (obsolete)
C++
7
star
44

1trc

1 trillion rows
Python
7
star
45

checkout

Wrapper around actions/checkout for flexible tuning
6
star
46

laion

Supporting code for inserting and searching laion in ClickHouse
Python
5
star
47

bedrock_rag

A simple RAG pipeline for Google Analytics with ClickHouse and Bedrock
Python
5
star
48

aretestsgreenyet

A single-page website to display the status of the open-source ClickHouse CI system.
HTML
4
star
49

fuzz-corpus

Corpuses for libFuzzer-type fuzzers
4
star
50

clickhouse-playground-old

4
star
51

ch-async-inserts-demo

Demo on how to create a Node API that sends data to CH via Async inserts
TypeScript
3
star
52

grpc

Stripped version of grpc
C++
3
star
53

clickhouse-website-content

JavaScript
3
star
54

clickhouse-recipes

Sample code for solving common problems with ClickHouse
Python
3
star
55

clickhouse_vs_snowflake

HTML
2
star
56

clickhouse-com-content

HTML
2
star
57

clickhouse.github.io

HTML
2
star
58

antlr4-runtime

Subtree of antlr4 original repo
C++
2
star
59

kafka-samples

Sample datasets for Kafka
Python
2
star
60

ssl

Minimized libressl
C
2
star
61

libpq

Copy of https://github.com/postgres/postgres/tree/master/src/interfaces/libpq with some files from root
C
2
star
62

llvm

Stripped version of LLVM for use in ClickHouse for runtime code generation.
C++
2
star
63

perspective-forex

Perspective with ClickHouse example using Apache Arrow
JavaScript
2
star
64

clickhouse-typescript-schema

TypeScript
2
star
65

protobuf

add protobuf for libhdfs3
C++
1
star
66

hive-metastore

For files generated with https://github.com/apache/thrift
Thrift
1
star
67

boost-extra

extra boost libs for libhdfs3
C++
1
star
68

UnixODBC

Mirror of http://www.unixodbc.org/
C
1
star
69

doc-pr-preview-test

Testing workflow to build Docusaurus previews for pull requests.
JavaScript
1
star
70

clickhouse-docs-content

1
star
71

clickhouse-blog-images

HTML
1
star
72

bzip2

Forked from https://gitlab.com/federicomenaquintero/bzip2
C
1
star
73

libgsasl

https://www.gnu.org/software/gsasl/
C
1
star
74

readthedocs-stub

HTML
1
star
75

boost

Minimized boost lib
C++
1
star
76

clickhouse-repos-manager

a config and artifacts for packages.clickhouse.com
Python
1
star
77

clickhouse-kafka-transforms

This is meant to hold Clickhouse created kafka transforms.
Java
1
star
78

clickhouse-fivetran-destination

ClickHouse Cloud Fivetran Destination
Go
1
star
79

clickhouse-test.github.io

HTML
1
star
80

rust_vendor

Vendor files from rust dependencies
Rust
1
star
81

simple-logging-benchmark

A simple ClickHouse benchmark for the logging usecase
Python
1
star