• Stars
    star
    345
  • Rank 122,285 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Sync data from the other DB to ClickHouse(cluster)

Synch

pypi docker license workflows workflows

ไธญๆ–‡ๆ–‡ๆกฃ

Introduction

Sync data from other DB to ClickHouse, current support postgres and mysql, and support full and increment ETL.

synch

Features

  • Full data etl and real time increment etl.
  • Support DDL and DML sync, current support add column and drop column and change column of DDL, and full support of DML also.
  • Email error report.
  • Support kafka and redis as broker.
  • Multiple source db sync to ClickHouse at the same timeใ€‚
  • Support ClickHouse MergeTree,CollapsingMergeTree,VersionedCollapsingMergeTree,ReplacingMergeTree.
  • Support ClickHouse cluster.

Requirements

  • Python >= 3.7
  • redis, cache mysql binlog file and position and as broker, support redis cluster also.
  • kafka, need if you use kafka as broker.
  • clickhouse-jdbc-bridge, need if you use postgres and set auto_full_etl = true, or exec synch etl command.

Install

> pip install synch

Usage

Config file synch.yaml

synch will read default config from ./synch.yaml, or you can use synch -c specify config file.

See full example config in synch.yaml.

Full data etl

Maybe you need make full data etl before continuous sync data from MySQL to ClickHouse or redo data etl with --renew.

> synch --alias mysql_db etl -h

Usage: synch etl [OPTIONS]

  Make etl from source table to ClickHouse.

Options:
  --schema TEXT     Schema to full etl.
  --renew           Etl after try to drop the target tables.
  -t, --table TEXT  Tables to full etl.
  -h, --help        Show this message and exit.

Full etl from table test.test:

> synch --alias mysql_db etl --schema test --table test --table test2

Produce

Listen all MySQL binlog and produce to broker.

> synch --alias mysql_db produce

Consume

Consume message from broker and insert to ClickHouse,and you can skip error rows with --skip-error. And synch will do full etl at first when set auto_full_etl = true in config.

> synch --alias mysql_db consume -h

Usage: synch consume [OPTIONS]

  Consume from broker and insert into ClickHouse.

Options:
  --schema TEXT       Schema to consume.  [required]
  --skip-error        Skip error rows.
  --last-msg-id TEXT  Redis stream last msg id or kafka msg offset, depend on
                      broker_type in config.

  -h, --help          Show this message and exit.

Consume schema test and insert into ClickHouse:

> synch --alias mysql_db consume --schema test

Monitor

Set true to core.monitoring, which will create database synch in ClickHouse automatically and insert monitoring data.

Table struct:

create table if not exists synch.log
(
    alias String,
    schema String,
    table String,
    num        int,
    type       int, -- 1:producer, 2:consumer
    created_at DateTime
)
    engine = MergeTree partition by toYYYYMM
(
    created_at
) order by created_at;

ClickHouse Table Engine

Now synch support MergeTree, CollapsingMergeTree, VersionedCollapsingMergeTree, ReplacingMergeTree.

Use docker-compose(recommended)

Redis Broker, lightweight and for low concurrency
version: "3"
services:
  producer:
    depends_on:
      - redis
    image: long2ice/synch
    command: synch --alias mysql_db produce
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  # one service consume on schema
  consumer.test:
    depends_on:
      - redis
    image: long2ice/synch
    command: synch --alias mysql_db consume --schema test
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  redis:
    hostname: redis
    image: redis:latest
    volumes:
      - redis
volumes:
  redis:
Kafka Broker, for high concurrency
version: "3"
services:
  zookeeper:
    image: bitnami/zookeeper:3
    hostname: zookeeper
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
    volumes:
      - zookeeper:/bitnami
  kafka:
    image: bitnami/kafka:2
    hostname: kafka
    environment:
      - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
      - ALLOW_PLAINTEXT_LISTENER=yes
      - JMX_PORT=23456
      - KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=true
      - KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
    depends_on:
      - zookeeper
    volumes:
      - kafka:/bitnami
  kafka-manager:
    image: hlebalbau/kafka-manager
    ports:
      - "9000:9000"
    environment:
      ZK_HOSTS: "zookeeper:2181"
      KAFKA_MANAGER_AUTH_ENABLED: "false"
    command: -Dpidfile.path=/dev/null
  producer:
    depends_on:
      - redis
      - kafka
      - zookeeper
    image: long2ice/synch
    command: synch --alias mysql_db produce
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  # one service consume on schema
  consumer.test:
    depends_on:
      - redis
      - kafka
      - zookeeper
    image: long2ice/synch
    command: synch --alias mysql_db consume --schema test
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  redis:
    hostname: redis
    image: redis:latest
    volumes:
      - redis:/data
volumes:
  redis:
  kafka:
  zookeeper:

Important

  • You need always keep a primary key or unique key without null or composite primary key.
  • DDL sync not support postgres.
  • Postgres sync is not fully test, be careful use it in production.

ThanksTo

Powerful Python IDE Pycharm from Jetbrains.

jetbrains

License

This project is licensed under the Apache-2.0 License.

More Repositories

1

fastapi-cache

fastapi-cache is a tool to cache fastapi response and function result, with backends support redis and memcached.
Python
1,264
star
2

fastapi-limiter

A request rate limiter for fastapi
Python
452
star
3

asyncmy

A fast asyncio MySQL/MariaDB driver with replication protocol support
Python
234
star
4

meilisync

Realtime sync data from MySQL/PostgreSQL/MongoDB to Meilisearch
Python
230
star
5

asynch

An asyncio ClickHouse Python Driver with native (TCP) interface support.
Python
169
star
6

rearq

A distributed task queue built with asyncio and redis, with built-in web interface
Python
147
star
7

swagin

Swagger + Gin = SwaGin, a web framework based on Gin and Swagger
Go
68
star
8

trader

A framework that automated cryptocurrency exchange with strategy
Go
62
star
9

databack

Backup your data from MySQL/PostgreSQL/SSH etc. to any other storages
Python
60
star
10

longurl

A self-hosted short url service
Go
51
star
11

meilisync-admin

A web admin dashboard for meilisync
Python
31
star
12

fibers

Fiber + Swagger = Fibers, a web framework dedicated to providing a FastAPI-like development experience
Go
26
star
13

alarmer

A tool focus on error reporting for your application, like sentry but lightweight
Python
19
star
14

fastapi-rest

Fast restful API based on FastAPI and TortoiseORM
Python
10
star
15

gema-web

Convert from json/xml/yaml to Pydantic/Go/Rust etc.
TypeScript
9
star
16

awesome-web

Search awesome projects
TypeScript
8
star
17

chcli

A Terminal Client for ClickHouse with AutoCompletion and Syntax Highlighting.
Python
6
star
18

fettler

Auto refresh cache of redis with MySQL binlog
Python
4
star
19

telsearch-web

Frontend for telsearch
TypeScript
3
star
20

mccabe

Calculate the cyclomatic complexity of the source code
Python
3
star
21

s3web

Serve static files from any S3 compatible object storage endpoints
Go
2
star
22

ClashForiOS

Swift
2
star
23

xiaoai

ๅฐ็ˆฑ้Ÿณ็ฎฑ้žๅฎ˜ๆ–นSDK
Python
2
star
24

awesome

Search for awesome github project
Go
2
star
25

longurl-web

This is frontend for https://github.com/long2ice/longurl
TypeScript
2
star
26

ergo

Python
2
star
27

gema

Convert from json/xml/yaml to Pydantic/Go/Rust etc.
Python
2
star
28

nvim

My personal neovim config
Lua
2
star
29

sponsor

Sponsor page of long2ice
HTML
2
star
30

youtube-dl-api-server

api server for youtube-dl
Python
2
star
31

talkit

A self hosted comment system
1
star
32

long2ice

My Personal README.
1
star
33

vpsmon

Python
1
star
34

homepage

My homepage
TypeScript
1
star
35

kanp

See video by download
Python
1
star
36

devme

Python
1
star
37

creatable

A tool to create table from file/database to another database
Python
1
star
38

devme-web

TypeScript
1
star
39

hugo-theme-pure

Forked from https://github.com/xiaoheiAh/hugo-theme-pure and make improvements
CSS
1
star
40

fastapi-monitor

Python
1
star
41

meilisync-web

Frontend of meilisearch-admin
Vue
1
star