• Stars
    star
    4,956
  • Rank 8,051 (Top 0.2 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

CDC Connectors for Apache Flink®

CDC Connectors for Apache Flink®

CDC Connectors for Apache Flink® is a set of source connectors for Apache Flink®, ingesting changes from different databases using change data capture (CDC). CDC Connectors for Apache Flink® integrates Debezium as the engine to capture data changes. So it can fully leverage the ability of Debezium. See more about what is Debezium.

This README is meant as a brief walkthrough on the core features of CDC Connectors for Apache Flink®. For a fully detailed documentation, please see Documentation.

Supported (Tested) Databases

Connector Database Driver
mongodb-cdc
  • MongoDB: 3.6, 4.x, 5.0, 6.0
  • MongoDB Driver: 4.3.4
    mysql-cdc
  • MySQL: 5.6, 5.7, 8.0.x
  • RDS MySQL: 5.6, 5.7, 8.0.x
  • PolarDB MySQL: 5.6, 5.7, 8.0.x
  • Aurora MySQL: 5.6, 5.7, 8.0.x
  • MariaDB: 10.x
  • PolarDB X: 2.0.1
  • JDBC Driver: 8.0.28
    oceanbase-cdc
  • OceanBase CE: 3.1.x, 4.x
  • OceanBase EE: 2.x, 3.x, 4.x
  • OceanBase Driver: 2.4.x
    oracle-cdc
  • Oracle: 11, 12, 19, 21
  • Oracle Driver: 19.3.0.0
    postgres-cdc
  • PostgreSQL: 9.6, 10, 11, 12, 13, 14
  • JDBC Driver: 42.5.1
    sqlserver-cdc
  • Sqlserver: 2012, 2014, 2016, 2017, 2019
  • JDBC Driver: 9.4.1.jre8
    tidb-cdc
  • TiDB: 5.1.x, 5.2.x, 5.3.x, 5.4.x, 6.0.0
  • JDBC Driver: 8.0.27
    Db2-cdc
  • Db2: 11.5
  • Db2 Driver: 11.5.0.0
    Vitess-cdc
  • Vitess: 8.0.x, 9.0.x
  • MySql JDBC Driver: 8.0.26

    Features

    1. Supports reading database snapshot and continues to read transaction logs with exactly-once processing even failures happen.
    2. CDC connectors for DataStream API, users can consume changes on multiple databases and tables in a single job without Debezium and Kafka deployed.
    3. CDC connectors for Table/SQL API, users can use SQL DDL to create a CDC source to monitor changes on a single table.

    Usage for Table/SQL API

    We need several steps to setup a Flink cluster with the provided connector.

    1. Setup a Flink cluster with version 1.12+ and Java 8+ installed.
    2. Download the connector SQL jars from the Download page (or build yourself).
    3. Put the downloaded jars under FLINK_HOME/lib/.
    4. Restart the Flink cluster.

    The example shows how to create a MySQL CDC source in Flink SQL Client and execute queries on it.

    -- creates a mysql cdc table source
    CREATE TABLE mysql_binlog (
     id INT NOT NULL,
     name STRING,
     description STRING,
     weight DECIMAL(10,3)
    ) WITH (
     'connector' = 'mysql-cdc',
     'hostname' = 'localhost',
     'port' = '3306',
     'username' = 'flinkuser',
     'password' = 'flinkpw',
     'database-name' = 'inventory',
     'table-name' = 'products'
    );
    
    -- read snapshot and binlog data from mysql, and do some transformation, and show on the client
    SELECT id, UPPER(name), description, weight FROM mysql_binlog;

    Usage for DataStream API

    Include following Maven dependency (available through Maven Central):

    <dependency>
      <groupId>com.ververica</groupId>
      <!-- add the dependency matching your database -->
      <artifactId>flink-connector-mysql-cdc</artifactId>
      <!-- The dependency is available only for stable releases, SNAPSHOT dependency need build by yourself. -->
      <version>2.5-SNAPSHOT</version>
    </dependency>
    
    import org.apache.flink.api.common.eventtime.WatermarkStrategy;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
    import com.ververica.cdc.connectors.mysql.source.MySqlSource;
    
    public class MySqlSourceExample {
      public static void main(String[] args) throws Exception {
        MySqlSource<String> mySqlSource = MySqlSource.<String>builder()
                .hostname("yourHostname")
                .port(yourPort)
                .databaseList("yourDatabaseName") // set captured database
                .tableList("yourDatabaseName.yourTableName") // set captured table
                .username("yourUsername")
                .password("yourPassword")
                .deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String
                .build();
    
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    
        // enable checkpoint
        env.enableCheckpointing(3000);
    
        env
          .fromSource(mySqlSource, WatermarkStrategy.noWatermarks(), "MySQL Source")
          // set 4 parallel source tasks
          .setParallelism(4)
          .print().setParallelism(1); // use parallelism 1 for sink to keep message ordering
    
        env.execute("Print MySQL Snapshot + Binlog");
      }
    }

    Building from source

    • Prerequisites:
      • git
      • Maven
      • At least Java 8
    git clone https://github.com/ververica/flink-cdc-connectors.git
    cd flink-cdc-connectors
    mvn clean install -DskipTests
    

    The dependencies are now available in your local .m2 repository.

    License

    The code in this repository is licensed under the Apache Software License 2.

    Contributing

    CDC Connectors for Apache Flink® welcomes anyone that wants to help out in any way, whether that includes reporting problems, helping with documentation, or contributing code changes to fix bugs, add tests, or implement new features. You can report problems to request features in the GitHub Issues.

    Community

    • DingTalk Chinese User Group

      You can search the group number [33121212] or scan the following QR code to join in the group.

    Documents

    To get started, please see https://ververica.github.io/flink-cdc-connectors/

    More Repositories

    1

    flink-sql-cookbook

    The Apache Flink SQL Cookbook is a curated collection of examples, patterns, and use cases of Apache Flink SQL. Many of the recipes are completely self-contained and can be run in Ververica Platform as is.
    Dockerfile
    805
    star
    2

    flink-training-exercises

    Java
    558
    star
    3

    sql-training

    Java
    543
    star
    4

    flink-sql-gateway

    Java
    483
    star
    5

    stateful-functions

    Stateful Functions for Apache Flink
    Java
    274
    star
    6

    flink-jdbc-driver

    Java
    126
    star
    7

    flink-sql-benchmark

    Java
    100
    star
    8

    ververica-platform-playground

    Instructions for getting started with Ververica Platform on minikube.
    Shell
    87
    star
    9

    frocksdb

    C++
    57
    star
    10

    flink-statefun-workshop

    Python
    43
    star
    11

    flink-training-troubleshooting

    Java
    41
    star
    12

    jupyter-vvp

    Jupyter Integration for Flink SQL via Ververica Platform
    Python
    37
    star
    13

    lab-fraud-detection

    Demo code for implementing and showcasing a Fraud Detection Engine with Apache Flink.
    Java
    30
    star
    14

    streaming-ledger

    Serializable ACID transactions on streaming data
    Java
    22
    star
    15

    lab-flink-latency

    Lab for testing different Flink job latency optimization techniques covered in a Flink Forward 2021 talk
    Java
    22
    star
    16

    lab-flink-repository-analytics

    Java
    17
    star
    17

    lab-sql-vs-datastream

    Lab project to showcase Flink's performance differences between using a SQL query and implementing the same logic via the DataStream API
    Java
    13
    star
    18

    flink-ecosystem

    Ecosystem website for Apache Flink
    TypeScript
    12
    star
    19

    tpc-ds-generators

    Binaries for TPC-DS data generators
    8
    star
    20

    acwern

    Flink visualization library for blogposts
    TypeScript
    6
    star
    21

    field-engineering

    4
    star
    22

    demo-vvp-via-azure-pipelines

    Java
    3
    star
    23

    ForSt

    A Persistent Key-Value Store designed for Streaming processing
    C++
    3
    star
    24

    pyflink-docs

    pyflink documentation
    Python
    2
    star
    25

    lab-vvp-pyflink

    Java
    2
    star