Giter VIP home page Giter VIP logo

flink-cdc's Introduction

CDC Connectors for Apache Flink®

CDC Connectors for Apache Flink® is a set of source connectors for Apache Flink®, ingesting changes from different databases using change data capture (CDC). CDC Connectors for Apache Flink® integrates Debezium as the engine to capture data changes. So it can fully leverage the ability of Debezium. See more about what is Debezium.

This README is meant as a brief walkthrough on the core features of CDC Connectors for Apache Flink®. For a fully detailed documentation, please see Documentation.

Supported (Tested) Databases

Connector Database Driver
mongodb-cdc
  • MongoDB: 3.6, 4.x, 5.0
  • MongoDB Driver: 4.3.1
    mysql-cdc
  • MySQL: 5.6, 5.7, 8.0.x
  • RDS MySQL: 5.6, 5.7, 8.0.x
  • PolarDB MySQL: 5.6, 5.7, 8.0.x
  • Aurora MySQL: 5.6, 5.7, 8.0.x
  • MariaDB: 10.x
  • PolarDB X: 2.0.1
  • JDBC Driver: 8.0.27
    oceanbase-cdc
  • OceanBase CE: 3.1.x
  • OceanBase EE (MySQL mode): 2.x, 3.x
  • JDBC Driver: 5.1.4x
    oracle-cdc
  • Oracle: 11, 12, 19
  • Oracle Driver: 19.3.0.0
    postgres-cdc
  • PostgreSQL: 9.6, 10, 11, 12
  • JDBC Driver: 42.2.27
    sqlserver-cdc
  • Sqlserver: 2012, 2014, 2016, 2017, 2019
  • JDBC Driver: 7.2.2.jre8
    tidb-cdc
  • TiDB: 5.1.x, 5.2.x, 5.3.x, 5.4.x, 6.0.0
  • JDBC Driver: 8.0.27
    Db2-cdc
  • Db2: 11.5
  • Db2 Driver: 11.5.0.0

    Features

    1. Supports reading database snapshot and continues to read transaction logs with exactly-once processing even failures happen.
    2. CDC connectors for DataStream API, users can consume changes on multiple databases and tables in a single job without Debezium and Kafka deployed.
    3. CDC connectors for Table/SQL API, users can use SQL DDL to create a CDC source to monitor changes on a single table.

    Usage for Table/SQL API

    We need several steps to setup a Flink cluster with the provided connector.

    1. Setup a Flink cluster with version 1.12+ and Java 8+ installed.
    2. Download the connector SQL jars from the Download page (or build yourself).
    3. Put the downloaded jars under FLINK_HOME/lib/.
    4. Restart the Flink cluster.

    The example shows how to create a MySQL CDC source in Flink SQL Client and execute queries on it.

    -- creates a mysql cdc table source
    CREATE TABLE mysql_binlog (
     id INT NOT NULL,
     name STRING,
     description STRING,
     weight DECIMAL(10,3)
    ) WITH (
     'connector' = 'mysql-cdc',
     'hostname' = 'localhost',
     'port' = '3306',
     'username' = 'flinkuser',
     'password' = 'flinkpw',
     'database-name' = 'inventory',
     'table-name' = 'products'
    );
    
    -- read snapshot and binlog data from mysql, and do some transformation, and show on the client
    SELECT id, UPPER(name), description, weight FROM mysql_binlog;

    Usage for DataStream API

    Include following Maven dependency (available through Maven Central):

    <dependency>
      <groupId>com.ververica</groupId>
      <!-- add the dependency matching your database -->
      <artifactId>flink-connector-mysql-cdc</artifactId>
      <!-- The dependency is available only for stable releases, SNAPSHOT dependency need build by yourself. -->
      <version>2.4-SNAPSHOT</version>
    </dependency>
    
    import org.apache.flink.api.common.eventtime.WatermarkStrategy;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
    import com.ververica.cdc.connectors.mysql.source.MySqlSource;
    
    public class MySqlSourceExample {
      public static void main(String[] args) throws Exception {
        MySqlSource<String> mySqlSource = MySqlSource.<String>builder()
                .hostname("yourHostname")
                .port(yourPort)
                .databaseList("yourDatabaseName") // set captured database
                .tableList("yourDatabaseName.yourTableName") // set captured table
                .username("yourUsername")
                .password("yourPassword")
                .deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String
                .build();
    
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    
        // enable checkpoint
        env.enableCheckpointing(3000);
    
        env
          .fromSource(mySqlSource, WatermarkStrategy.noWatermarks(), "MySQL Source")
          // set 4 parallel source tasks
          .setParallelism(4)
          .print().setParallelism(1); // use parallelism 1 for sink to keep message ordering
    
        env.execute("Print MySQL Snapshot + Binlog");
      }
    }

    Building from source

    • Prerequisites:
      • git
      • Maven
      • At least Java 8
    git clone https://github.com/ververica/flink-cdc-connectors.git
    cd flink-cdc-connectors
    mvn clean install -DskipTests
    

    The dependencies are now available in your local .m2 repository.

    License

    The code in this repository is licensed under the Apache Software License 2.

    Contributing

    CDC Connectors for Apache Flink® welcomes anyone that wants to help out in any way, whether that includes reporting problems, helping with documentation, or contributing code changes to fix bugs, add tests, or implement new features. You can report problems to request features in the GitHub Issues.

    Community

    • DingTalk Chinese User Group

      You can search the group number [33121212] or scan the following QR code to join in the group.

    Documents

    To get started, please see https://ververica.github.io/flink-cdc-connectors/

    flink-cdc's People

    Contributors

    leonardbang avatar wuchong avatar goodboy008 avatar jiabao-sun avatar ruanhang1993 avatar luoyuxia avatar whhe avatar patrickren avatar molsionmo avatar fuyun2024 avatar gtk96 avatar fsk119 avatar minchowang avatar paul8263 avatar ashulin avatar snuyanzin avatar cleverdada avatar teckick avatar amber1990zhang avatar xieyi888 avatar wangxiaojing avatar vanliu-tx avatar xiangtao avatar lsyldliu avatar lidoudou1993 avatar lxin96 avatar legendtkl avatar mylanpangzi avatar fbad avatar empcl avatar

    Forkers

    enterpriseih

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.