Giter VIP home page Giter VIP logo

tapdata's Introduction


Try Cloud Service Official Website Docs

What is Tapdata ?

Tapdata is a Change Data Capture(CDC) based, Real-Time data integration platform that enables data to be synchronized in real-time among various systems such as databases, SaaS services, applications, and files. The synchronization tasks can be easily built through drag-and-drop operations, from table creation to full and incremental synchronization, all processes are fully automated.

  1. Supported Connectors
  2. FAQ

For more details, please read docs

Based on different distribution strategies and functionalities, Tapdata has three distribution versions: Community Edition, Cloud Service, and Enterprise Edition. All content under this open-source project corresponds to the Community Edition.

For the differences between the three versions, you can refer to: https://tapdata.io/

Table of contents

1. Key Features
2. Use Cases
3. Quick Start
4. Examples
5. License
6. Contact Us
7. Build From Source
8. Contributing

Key Features

End-To-End Visual UI

Full Process Automation
Automatic table creation, and Automatic switching between Full and Incremental synchronization.

Drag-And-Drop
Operate through a visual interface, Easy to use

Monitor
sync performance, task progress, key events, and logs.

Lightweight ETL Solution

Table Filter and Mapping
Rich visual processors that allow table selection, renaming, adding or deleting fields, and renaming.

Record Transformation
Easily use JS/Python code to do transformation

Union
Union multi tables to one target.

Build materialized views in MongoDB

Document-style view
Support Embedded documents and arrays, deeply integrated with MongoDB.

Multi-table association
hierarchical nesting, and convenient construction of 1:1 and 1:N models.

Multi-table stream merging
Unified batch and stream processing, multi-table updates in any order.

Primary Use Cases

  1. Synchronize data from traditional RDBMS to modern databases such as MongoDB, Elasticsearch (ES), or Redis to support new business uses.
  2. Consolidate data from various databases into a unified data warehouse.
  3. Stream database changes to Kafka.
  4. Perform data synchronization between heterogeneous databases.
  5. Use MongoDB to build your unified data hub.

Quick Start

Start with docker

To run the complete service, the basic resource requirements are:

  1. At least 5GB of available memory
  2. At least 20GB of available disk space
  3. At least 1 free CPU core

RUN docker run -d -p 3030:3030 ghcr.io/tapdata/tapdata:latest, wait for 3 minutes, then you can get it from http://localhost:3030/, if everything is ok, you can see login page like this: default username is: [email protected], default password is admin

Start with cloud service

Tapdata service is available in cloud service, you can use fully-managed service, or deploy engine to your private network

Try on https://cloud.tapdata.io/, support google and github account login, free trial, NO credit card needed, start your real-time data journey immediately.

Examples

🗂️ Create Datasource and Test it

  1. Login tapdata platform

  2. In the left navigation panel, click Connections

  3. On the right side of the page, click Create

  4. In the pop-up dialog, search and select MySQL

  5. On the page that you are redirected to, follow the instructions below to fill in the connection information for MySQL

  1. Click Test, make sure all test pass, then click Save

🗂️ Sync Data From MySQL To MongoDB

  1. Create MySQL and MongoDB data source

  2. In the left navigation panel, click Data Pipelines -> Data Replications

  3. On the right side of the page, click Create

  4. Drag and drop MySQL and MongoDB data sources onto the canvas

  5. Drag a line from the MySQL data source to MongoDB

  6. Configure the MySQL data source and select the data tables you want to synchronize

  1. Click the Save button in the upper right corner, then click the Start button

  2. Observe the indicators and events on the task page until data is in sync

🗂️ MySQL To PostgreSQL with Simple ETL

  1. Create MySQL and PostgreSQL data source

  2. In the left navigation panel, click Data Pipelines -> Data Transformation

  3. On the right side of the page, click Create

  4. Drag and drop MySQL and PostgreSQL data sources onto the canvas

  5. Drag a line from the MySQL data source to PostgreSQL

  6. Click the plus sign on the connection line and select Field Rename

  1. Click Field Rename node, change i_price to price, i_data to data in config form

  1. Click the Save button in the upper right corner, then click the Start button

  2. Observe the indicators and events on the task page until data is in sync

🗂️ Making materialized views in MongoDB

Materialized view is a special feature of tapdata, You can give full play to the characteristics of MongoDB document database and create the data model you need, try enjoy it !

In this example, I will make a view using 2 tables in MySQL: order and product, make product as a embedded document of order, here is the step:

  1. Create MySQL and MongoDB data source

  2. In the left navigation panel, click Data Pipelines -> Data Transformation

  3. On the right side of the page, click Create

  4. Click mysql data source in left up side, then drag and drop order table and product table onto the canvas

  5. Drag and drop "Master-slave merge" node in left bottom side onto the canvas

  6. Drag a line from the order table to Master-slave merge

  7. Drag a line from the product table to Master-slave merge

  8. Drag and drop MongoDB data source onto the canvas, and drag a line from the "Master-slave merge" node to MongoDB node

  1. Click "Master-slave merge" node, then drag product table into order table in the right side in "Table Name"

  1. Click "Master-slave merge" node, then click product table, config Data write model to "Match and Merge", Field write path to "product", Association Conditions to "order_id" => "order_id", then you can see Schema in bottom changed

  2. Click MongoDB node, and config target table name as order_with_product, update condition field config as "order_id"

  1. Click the Save button in the upper right corner, then click the Start button

  2. Observe the indicators and events on the task page until data is in sync

  3. Check collection order_with_product in MongoDB, and you will see the data model

🗂️ Data consistency check

Using the data verification feature, you can quickly check whether the synchronized data is consistent and accurate

  1. In the left navigation panel, click Data Pipelines -> Data Validation

  2. On the right side of the page, click Task Consistency Validation

  3. Choose 1 task, and valid type choose "All Fields Validation", it means system will check all fields for all record

  1. Click Save, then click Execute in the task list

  2. Wait validation task finished, click Result in the task list, and check the validation result

Architecture

Alt Text

License

Tapdata Community is under the Apache 2.0 license. See the LICENSE file for details.

Contact Us

tapdata's People

Contributors

11000100111010101100111 avatar 32073955 avatar cn-xufei avatar daniel2009 avatar dgshikun avatar dobybros avatar dreamcoin1998 avatar hantmac avatar harsenlin avatar issaacwang avatar jackin-code avatar jarad0628 avatar jiuyetx avatar kangqing2008 avatar ljvv7 avatar mark7412 avatar mnianqi avatar mskumar1809 avatar ningmeng777 avatar openlg avatar ply0011 avatar tapdatasteven avatar tjworks avatar umerwhu avatar username002 avatar weiliang110100 avatar xbsura avatar yuyu-0727 avatar zed1201 avatar zerohyuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tapdata's Issues

tapdata cloud同步数据测试通过了 但最终还是失败了

我在连接管理里面测试源数据库的连接情况,显示没有任何问题,目标数据库测试也没问题,但在执行同步任务的时候连接失败
Connect to source mysql failed, config: {snapshot.locking.mode=none, restRetryTime=3, connector.class=io.debezium.connector.mysql.MySqlConnector, dataFlow.id=634d7038c0c94167270a2910, max.queue.size=8000, job.id=634d703bc162f518b15885f2, table.whitelist=demo.student2,demo.student1, database.history.sourceConnId=6345927d8eb5d14b4f74047e, accessCode=1e611290a0134b4dacd4da56bdffcb8af8aaf012f6aa4a0da887b066db8346c3, poll.interval.ms=500, database.history.baseURL=https://cloud.tapdata.net/console/tm/api/, database.history.skip.unparseable.ddl=true, job.name=测试_1, database.whitelist=demo, database.history.job.id=634d703bc162f518b15885f2, binlogStartPoint=true, database.user=test, database.binlog.position={"filename":"master1-bin.000001","position":2335,"gtidSet":null}, database.history.dataFlowId=634d7038c0c94167270a2910, offset.storage=io.tapdata.common.MongoOffsetBackingStore, database.server.id=1181781442, roleId=0, database.server.name=634d703bc162f518b15885f2, userId=63458e7ec162f518b171e383, database.port=3307, database.history.roleId=0, threadName=Connector runner-测试_1-[634d703bc162f518b15885f2], offset.flush.interval.ms=60000, database.history.accessCode=1e611290a0134b4dacd4da56bdffcb8af8aaf012f6aa4a0da887b066db8346c3, baseURL=https://cloud.tapdata.net/console/tm/api/, database.hostname=my ip addr, database.password=********, database.history.userId=63458e7ec162f518b171e383, name=634d703bc162f518b15885f2, database.history.store.only.monitored.tables.ddl=true, max.batch.size=1000, database.history.restRetryTime=3, database.history=io.tapdata.common.MongoDatabaseHistoryBackingStore, snapshot.mode=schema_only}, err: Error reading MySQL variables: Access denied for user 'test'@'my ip addr' (using password: YES), stacks: org.apache.kafka.connect.errors.ConnectException: Error reading MySQL variables: Access denied for user 'test'@'my ip addr' (using password: YES) at io.debezium.connector.mysql.MySqlJdbcContext.querySystemVariables(MySqlJdbcContext.java:329) at io.debezium.connector.mysql.MySqlJdbcContext.readMySqlSystemVariables(MySqlJdbcContext.java:309) at io.debezium.connector.mysql.MySqlTaskContext.<init>(MySqlTaskContext.java:81) at io.debezium.connector.mysql.MySqlTaskContext.<init>(MySqlTaskContext.java:54) at io.debezium.connector.mysql.MySqlConnectorTask.createAndStartTaskContext(MySqlConnectorTask.java:428) at io.debezium.connector.mysql.MySqlConnectorTask.initTaskContext(MySqlConnectorTask.java:359) at io.debezium.connector.mysql.MySqlConnectorTask.start(MySqlConnectorTask.java:169) at io.debezium.tapdata.connector.common.BaseSourceTask.start(BaseSourceTask.java:105) at io.debezium.tapdata.embedded.EmbeddedEngine.run(EmbeddedEngine.java:815) at io.tapdata.manager.ConnectorJobManager.lambda$createMySqlEngine$5(ConnectorJobManager.java:921) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.sql.SQLException: Access denied for user 'test'@'my ip addr' (using password: YES) at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:833) at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:453) at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:246) at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:198) at com.tapdata.constant.JdbcUtil.retryByMaxConnect(JdbcUtil.java:2083) at io.debezium.tapdata.jdbc.JdbcConnection.lambda$patternBasedFactory$1(JdbcConnection.java:174) at io.debezium.tapdata.jdbc.JdbcConnection.connection(JdbcConnection.java:786) at io.debezium.tapdata.jdbc.JdbcConnection.connection(JdbcConnection.java:781) at io.debezium.tapdata.jdbc.JdbcConnection.connect(JdbcConnection.java:320) at io.debezium.connector.mysql.MySqlJdbcContext.querySystemVariables(MySqlJdbcContext.java:316) ... 12 more

数据库编码格式是否可以自动转换

在Oracle转换数据到MySQL时,MySQL不存在相应的数据表时创建数据表因为没有相应的编码导致创建失败。
[ERROR] 2022-08-02 14:17:28 [pool-34-thread-5] io.tapdata.manager.TransformerJobManager
Failed to create jdbc table [***], sql: CREATE TABLE
...省略
default charset=ZHS16GBK , cause: Unknown character set: 'ZHS16GBK'; Will stop job
是否可以设置数据库的编码格式,或是对数据库的编码格式进行查询,对数据转换自动进行编码格式的匹配。

Tapshell中cli.py中存在的一些缺陷

cli.py发起的http request(如:DataSource的save函数)后,manager工程中响应的对象为:com.tapdata.tm.base.dto.ResponseMessage。其代表error message的属性为message。但cli.py中使用的是msg,导致http请求出现error的时候,在shell中不能正确的显示error message。

# 如cli.py line 3169
res.json()["msg"] 修改为 res.json()["message"]

com.tapdata.tm.base.dto.ResponseMessage

public class ResponseMessage<T>{

	public static final String OK = "ok";

	private String reqId;

	private long ts = new Date().getTime();

	/**
	 * 请求处理的代码
	 */
	protected String code = OK;

	/**
	 * 请求处理失败时的错误消息
	 */
	protected String message;

	/**
	 * 请求处理成功的数据
	 */
	protected T data;

tapshell cli.py

@help_decorate("save a connection in idaas system")
    def save(self):
        api = system_server_conf["api"] + "/Connections" + system_server_conf["auth_param"]
        data = self.to_dict()
        res = requests.post(api, json=data)
        show_connections(quiet=True)
        if res.status_code == 200 and res.json()["code"] == "ok":
            self.id = res.json()["data"]["id"]
            self.c = DataSource.get(self.id)
            self.validate(quiet=False)
            return True
        else:
            logger.warn("save Connection fail, err is: {}", res.json())
            logger.warn("save Connection fail, err is: {}", res.json()["msg"])
        return False

can udf support change schema?

In current version, udf can not change schema, changing schema in udf is very common and useful, eg, add a last_update timestamp in processor, target table should auto get it and create table with last_update column

Tapdata bugfix and refactor

refactor:

  • log module
  • request module
  • param verify, include check.py(param check class) and rules.py(define param rule)

fix:

  • pipeline.config

test:

  • add end2end test module (tapdata/tapshell/test)

Provide a dedicated command for tapshell

Currently to get into tapshell prompt, the only way is to run "bash build/quick-use.sh".

tapshell is the primary user interface, we should provide a dedicated command to start/launch tapshell

插件接口优化

建议 ConnectorBase 优化接口形式

  1. onStart, onStop, 增加数据源类型(源 or 目标),方便根据不同类型做不同初始化动作。onStart(TapConnectionContext connectionContext, DataSourceType type)
  2. Consumer 类改为符合语义的自定义回调接口(临时兼容方案),比如 connectionTest方法可以改为 ConnectResult.submitResult

How to better raise questions and corresponding rules

The original intention of Tapdata community is to grow together with developers and create unlimited possibilities. Here, we can speak freely and help each other solve problems.

In the world of hackers, when you throw out a technical question, whether you can finally get a useful answer often depends on the way you ask and ask. The wisdom of asking questions teaches you how to ask questions correctly to get your satisfactory answers. Here are some excerpts to share with you.

【Before asking questions】
Before you are ready to ask technical questions through groups or forums, please try to search the answers by various means, and try not to often ask some questions that can be solved by search. Including but not limited to:

If the above behavior still does not solve your problem, please indicate that you have made the above efforts when asking others for advice and questions; This will help establish that you are not a questioner who gets something for nothing and wastes other people's time. Asking questions or initiating discussions on relevant technical topics in the community is encouraged, but asking questions is also an art that needs to be grasped by the questioner.

【When asking questions】

  1. Maintain a friendly and polite attitude
    No one is obliged to answer your question, especially the answer to this question may take the other party's time / energy, so we need to be polite when asking questions. It is suggested that when asking questions, we first introduce ourselves and politely express our problems. No matter whether the problems are really solved or not, we should thank each other to show our proper bearing and demeanor.
  2. When asking questions, try to describe your questions in detail, clearly, correctly and concisely; The person being asked will be more likely to give a more comprehensive answer to your question.
  3. Try to use specific words and sentences that others can understand when asking questions.

【after questioning】

  1. Express gratitude to the responders.
  2. If your problem has not been solved, please continue to ask and feed back the problems you encounter. It is best to attach operation tips or screenshots.

【Say no to the following actions】
In fact, participating in community interaction is also a social activity, which requires individuals to grasp social propriety and abide by basic social etiquette. Unacceptable participant behaviors include but are not limited to:

  1. Discussion issues rise to personal attacks, provocative, insulting or derogatory comments;
  2. Public or private harassment;
  3. Release private information of others without permission;
  4. Publish advertisements or other bad information without permission;
  5. Brush the screen and post without reason, thus occupying public resources, etc.

Reference source for community code development:
Original website of the wisdom of asking questions: http://www.catb.org/~esr/faqs/smart-questions.html
The English version is copyrighted by Eric S. Raymond, Rick Moen.

tapdata cloud 从本地mysql到本地sqlserver的全量加增量同步,全量同步可以正常实现,增量同步实现不了,也没报错

我使用tapdata cloud进行从mysql到sqlserver的全量加增量的数据同步
在全量同步完成后,在增量同步的过程中,我去本地的mysql添加了一行数据,但是在sqlserver中没有相应的行的添加(附件是运行日志,从22点9分左右到22点21分多的运行日志,我是在22点13分左右在MySQL中添加了一行,日志.txt
里程碑是这样显示的(我怀疑是可能是因为增量读取模式在22点10分就结束了,所以增量传输不了?):
image
全量是正常的:
image
所以我搞不清我这些操作问题出在了那里,该怎样解决?求解答

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.