• Stars
    star
    3,970
  • Rank 10,999 (Top 0.3 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A data integration framework

ChunJun

npm version license npm downloads master coverage

EN doc CN doc

Introduce

ChunJun is a distributed integration framework, and currently is based on Apache Flink. It was initially known as FlinkX and renamed ChunJun on February 22, 2022. It can realize data synchronization and calculation between various heterogeneous data sources. ChunJun has been deployed and running stably in thousands of companies so far.

Official website of ChunJun: https://dtstack.github.io/chunjun/

Features of ChunJun

ChunJun abstracts different databases into reader/source plugins, writer/sink plugins and lookup plugins, and it has the following features:

  • Based on the real-time computing engine--Flink, and supports JSON template and SQL script configuration tasks. The SQL script is compatible with Flink SQL syntax;
  • Supports distributed operation, support flink-standalone, yarn-session, yarn-per job and other submission methods;
  • Supports Docker one-click deployment, support deploy and run on k8s;
  • Supports a variety of heterogeneous data sources, and supports synchronization and calculation of more than 20 data sources such as MySQL, Oracle, SQLServer, Hive, Kudu, etc.
  • Easy to expand, highly flexible, newly expanded data source plugins can integrate with existing data source plugins instantly, plugin developers do not need to care about the code logic of other plugins;
  • Not only supports full synchronization, but also supports incremental synchronization and interval training;
  • Not only supports offline synchronization and calculation, but also compatible with real-time scenarios;
  • Supports dirty data storage, and provide indicator monitoring, etc.;
  • Cooperate with the flink checkpoint mechanism to achieve breakpoint resuming, task disaster recovery;
  • Not only supports synchronizing DML data, but also supports DDL synchronization, like 'CREATE TABLE', 'ALTER COLUMN', etc.;

Build And Compilation

Get the code

Use the git to clone the code of ChunJun

git clone https://github.com/DTStack/chunjun.git

build

Execute the command in the project directory.

./mvnw clean package

Or execute

sh build/build.sh

Common problem

Compiling module 'ChunJun-core' then throws 'Failed to read artifact descriptor for com.google.errorprone:javac-shaded'

Error message:

[ERROR]Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check(spotless-check)on project chunjun-core:
        Execution spotless-check of goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check failed:Unable to resolve dependencies:
        Failed to collect dependencies at com.google.googlejavaformat:google-java-format:jar:1.7->com.google.errorprone:javac-shaded:jar:9+181-r4173-1:
        Failed to read artifact descriptor for com.google.errorprone:javac-shaded:jar:9+181-r4173-1:Could not transfer artifact
        com.google.errorprone:javac-shaded:pom:9+181-r4173-1 from/to aliyunmaven(https://maven.aliyun.com/repository/public): 
        Access denied to:https://maven.aliyun.com/repository/public/com/google/errorprone/javac-shaded/9+181-r4173-1/javac-shaded-9+181-r4173-1.pom -> [Help 1]

Solution: Download the 'javac-shaded-9+181-r4173-1.jar' from url 'https://repo1.maven.org/maven2/com/google/errorprone/javac-shaded/9+181-r4173-1/javac-shaded-9+181-r4173-1.jar', and then install locally by using command below:

mvn install:install-file -DgroupId=com.google.errorprone -DartifactId=javac-shaded -Dversion=9+181-r4173-1 -Dpackaging=jar -Dfile=./jars/javac-shaded-9+181-r4173-1.jar

Quick Start

The following table shows the correspondence between the branches of ChunJun and the version of flink. If the versions are not aligned, problems such as 'Serialization Exceptions', 'NoSuchMethod Exception', etc. mysql occur in tasks.

Branches Flink version
master 1.16.1
1.12_release 1.12.7
1.10_release 1.10.1
1.8_release 1.8.3

ChunJun supports running tasks in multiple modes. Different modes depend on different environments and steps. The following are

Local

Local mode does not depend on the Flink environment and Hadoop environment, and starts a JVM process in the local environment to perform tasks.

Steps

Go to the directory of 'chunjun-dist' and execute the command below:

sh bin/chunjun-local.sh  -job $SCRIPT_PATH

The parameter of "$SCRIPT_PATH" means 'the path where the task script is located'. After execute, you can perform a task locally.

note:

when you package in windows and run sh in linux , you need to execute command  sed -i "s/\r//g" bin/*.sh to fix the '\r' problems.

Reference video

Standalone

Standalone mode depend on the Flink Standalone environment and does not depend on the Hadoop environment.

Steps

1. add jars of chunjun
  1. Find directory of jars: if you build this project using maven, the directory name is 'chunjun-dist' ; if you download tar.gz file from release page, after decompression, the directory name would be like 'chunjun-assembly-${revision}-chunjun-dist'.

  2. Copy jars to directory of Flink lib, command example:

cp -r chunjun-dist $FLINK_HOME/lib

Notice: this operation should be executed in all machines of Flink cluster, otherwise some jobs will fail because of ClassNotFoundException.

2. Start Flink Standalone Cluster
sh $FLINK_HOME/bin/start-cluster.sh

After the startup is successful, the default port of Flink Web is 8081, which you can configure in the file of 'flink-conf.yaml'. We can access the 8081 port of the current machine to enter the flink web of standalone cluster.

3. Submit task

Go to the directory of 'chunjun-dist' and execute the command below:

sh bin/chunjun-standalone.sh -job chunjun-examples/json/stream/stream.json

After the command execute successfully, you can observe the task staus on the flink web.

Reference video

Yarn Session

YarnSession mode depends on the Flink jars and Hadoop environments, and the yarn-session needs to be started before the task is submitted.

Steps

1. Start yarn-session environment

Yarn-session mode depend on Flink and Hadoop environment. You need to set $HADOOP_HOME and $FLINK_HOME in advance, and we need to upload 'chunjun-dist' with yarn-session '-t' parameter.

cd $FLINK_HOME/bin
./yarn-session -t $CHUNJUN_HOME -d
2. Submit task

Get the application id $SESSION_APPLICATION_ID corresponding to the yarn-session through yarn web, then enter the directory 'chunjun-dist' and execute the command below:

sh ./bin/chunjun-yarn-session.sh -job chunjun-examples/json/stream/stream.json -confProp {\"yarn.application.id\":\"SESSION_APPLICATION_ID\"}

'yarn.application.id' can also be set in 'flink-conf.yaml'. After the submission is successful, the task status can be observed on the yarn web.

Reference video

Yarn Per-Job

Yarn Per-Job mode depend on Flink and Hadoop environment. You need to set $HADOOP_HOME and $FLINK_HOME in advance.

Steps

The yarn per-job task can be submitted after the configuration is correct. Then enter the directory 'chunjun-dist' and execute the command below:

sh ./bin/chunjun-yarn-perjob.sh -job chunjun-examples/json/stream/stream.json

After the submission is successful, the task status can be observed on the yarn web.

Docs of Connectors

For details, please visit:https://dtstack.github.io/chunjun/documents/

Contributors

Thanks to all contributors! We are very happy that you can contribute Chunjun.

contributors

Contributor Over Time

Stargazers Over Time

License

ChunJun is under the Apache 2.0 license. Please visit LICENSE for details.

Contact Us

Join ChunJun Slack. https://join.slack.com/t/chunjun/shared_invite/zt-1hzmvh0o3-qZ726NXmhClmLFRMpEDHYw

More Repositories

1

flinkStreamSQL

基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Java
2,017
star
2

Taier

Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display
Java
1,329
star
3

molecule

🚀 A lightweight Web IDE UI framework.
TypeScript
879
star
4

dt-sql-parser

SQL Parsers for BigData, built with antlr4.
TypeScript
293
star
5

jlogstash

java 版本的logstash
HTML
267
star
6

monaco-sql-languages

SQL languages for monaco-editor
TypeScript
217
star
7

chengying

一款支持标准化schema定义、自动化部署产品包的软件,旨在对产品包下每个服务进行部署、升级、卸载、配置等操作,解放人工运维成本。
Go
197
star
8

dt-react-component

React UI component library based on antd package
TypeScript
84
star
9

jfilebeat

类filebeat的轻量级日志采集工具
Java
68
star
10

DatasourceX

Java
65
star
11

catcher

java性能采集工具
Java
51
star
12

flinkx

48
star
13

code-review-practices

Code Review Practices
48
star
14

yice-performance

易测性能检测平台
TypeScript
46
star
15

doraemon

A management tool to help you organize your daily development
TypeScript
32
star
16

chengying-agent

EasyAgent is an infrastructure component, applied to manage the life-cycle of services on the remote host.
Go
32
star
17

dt-utils

前端常用工具函数
TypeScript
30
star
18

dt-python-parser

Python Parsers for BigData, built with antlr4.
JavaScript
25
star
19

ko

Project toolkit for React Applications
JavaScript
24
star
20

dt-react-monaco-editor

Monaco editor for React.
TypeScript
22
star
21

jlogstash-input-plugin

java 版本 logstash input 插件
Java
21
star
22

jlogstash-performance-testing

jlogstash 与 logstash 性能对比
20
star
23

molecule-examples

The collection of Molecule examples
TypeScript
20
star
24

UED

袋鼠云数栈 UED 团队 - http://ued.dtstack.cn/
TypeScript
20
star
25

ant-design-dtinsight-theme

This is a document of DTInsight-theme based on Ant Design.
Less
20
star
26

dt-form-renderer

Render Interaction Form Via JSON
TypeScript
18
star
27

ant-design-testing

TypeScript
16
star
28

babel-plugin-treasure

Base on babel-plugin-import , committed to achieving the AST optimization requirements of a unified library
HTML
11
star
29

chengying-server

Go
10
star
30

Code-Style-Guide

10
star
31

chunjun-web

ChunJun Offical Website https://dtstack.github.io/chunjun-web/
JavaScript
8
star
32

jlogstash-output-plugin

java 版本 logstash output 插件
Java
7
star
33

jlogstash-filter-plugin

java 版本 logstash filter 插件
Java
7
star
34

easyvc-power-meter

an open source component code for demonstrating @easyv/cli.
JavaScript
6
star
35

chengying-schema

Shell
6
star
36

chengying-front

TypeScript
5
star
37

chengying-web

JavaScript
4
star
38

typescript-migration-helper

Help ES6 + React + Redux project migrates to the Typescript project.
Perl
2
star
39

easyv-cli

EasyV官方组件开发工具
2
star
40

dtstack-log-java-sdk

玳数日志java版本sdk
Java
2
star
41

create-molecule

Create Molecule Application with create-react-app.
TypeScript
2
star
42

dt-react-codemirror-editor

Codemirror editor for React.
TypeScript
2
star
43

maven-repository

1
star
44

elasticsearch-sql-old

Java
1
star
45

dt-monaco-editor-nls-webpack-plugin

Simplified Chinese Support For Monaco Editor
JavaScript
1
star