pingcap / tidb-binlog Goto Github PK

View Code? Open in Web Editor NEW

290.0 47.0 132.0 14.67 MB

A tool used to collect and merge tidb's binlog for real-time data backup and synchronization.

License: Apache License 2.0

Makefile 0.26% Go 95.42% Shell 2.79% Groovy 1.53%

tidb-binlog's Introduction

TiDB-Binlog

TiDB-Binlog introduction

TiDB-Binlog is a tool used to collect TiDB's binary logs with the following features:

Data replication

Synchronize data from the TiDB cluster to heterogeneous databases.
Real-time backup and recovery

Backup the TiDB cluster into the Dump file and it can be used for recovery.
Multiple output format

Support MySQL, Dump file, etc.
History replay

Replay from any history point.

Documentation

Architecture

Service list

Pump

Pump is a daemon that receives real-time binlogs from tidb-server and writes in sequential disk files synchronously.

Drainer

Drainer collects binlogs from each Pump in the cluster, transforms binlogs to various dialects of SQL, and applies to the downstream database or filesystem.

How to build

To check the code style and build binaries, you can simply run:

make build   # build all components

If you only want to build binaries, you can run:

make pump  # build pump

make drainer  # build drainer

When TiDB-Binlog is built successfully, you can find the binary in the bin directory.

Run Test

Run all tests, including unit test and integration test

make test

See tests for how to execute and add integration tests.

Deployment

The recommended startup sequence: PD -> TiKV -> Pump -> TiDB -> Drainer

The best way to install TiDB-Binlog is via TiDB-Binlog-Ansible

Tutorial

Here's a tutorial to experiment with TiDB-Binlog (not for production use).

Config File

Pump config file: pump.toml
Drainer config file: drainer.toml

Contributing

Contributions are welcomed and greatly appreciated. See CONTRIBUTING.md for details on submitting patches and the contribution workflow.

License

TiDB-Binlog is under the Apache 2.0 license. See the LICENSE file for details.

tidb-binlog's People

Contributors

Stargazers

Watchers

Forkers

spongedu yangmain misselvexu 1008610010 isgasho skymysky yuanddqiao suzaku dut3062796s wangxiangustc leo-zhanglj koalacxr jianhaiqing kolbe ming535 july2993 sunxiaoguang amyangfei awesomegolang michelia lijianwh lichunzhu sokada1221 michael0508 nange aylei zier-one phoenixhadoop csuzhangxc zalopay-oss zhuboshuai wty4427300 sre-bot liuwenya1234 3pointer thuyenptr zyxbest d1jk djshow832 guowencui you06 jwongz freemindli nihao123451 tsthght jonahgeorge dixingxing0 glorv sourcelliu bruceyan1220 michaeljinxq forkkit ferrirw silly-fofo laashub-soa xuhuaiyu shenlijungg likai931018 lzwyeastar ti-srebot aylen dbaxg berrycol cofyc cxt90730 etsangsplk lance6716 shonge buggithubs newhe3dber ben1009 tisonkun nolouch coolboydan mini256 chenmin1992 gmhdbjd kitter mu-l ti-chi-bot s3nt3 minr bluedapp cartersz vincent-0329 newpi6 rleungx leavrth lcheung021 hi-rustin leonweb33 xy720 syniguo zhangyangyu zhangxi123051 tiancaiamao chad20n13 liyuhui666 joccau doudouwyh

tidb-binlog's Issues

We need an approach to make a pump offline gracefully in etcd.

pump: LF in info log

https://github.com/pingcap/tidb-binlog/blob/master/pump/version.go#L21

DO not add LF to the end of log.

`s.jobWg.Wait()` in `addJob`(syncer.go) costs too much

one wait costs 0.01 s

Recover Table cause drainer exit in TiDB 3.0.0

The drainer exit When we do table recover command on TiDB cluster, I think this is a bug.

TiDB sql command

drop table sbtest4; 
recover table sysbench.sbtest4 ; 
// check the ddl jobs

mysql> admin show ddl jobs
    -> ;
+--------+----------+------------+---------------+--------------+-----------+----------+-----------+-----------------------------------+--------+
| JOB_ID | DB_NAME  | TABLE_NAME | JOB_TYPE      | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | START_TIME                        | STATE  |
+--------+----------+------------+---------------+--------------+-----------+----------+-----------+-----------------------------------+--------+
|     64 | sysbench | sbtest5    | create table  | public       |        41 |       63 |         0 | 2019-07-09 19:07:38.022 +0800 CST | synced |
|     62 | db24     |            | create schema | public       |        61 |        0 |         0 | 2019-07-09 15:19:22.836 +0800 CST | synced |
|     60 | sysbench |            | drop table    | none         |        41 |       43 |         0 | 2019-07-09 13:20:28.702 +0800 CST | synced |
|     59 |          |            | drop schema   | none         |        57 |        0 |         0 | 2019-07-09 13:11:52.302 +0800 CST | synced |
|     58 |          |            | create schema | public       |        57 |        0 |         0 | 2019-07-09 10:08:47.516 +0800 CST | synced |
|     56 | sysbench | sbtest4    | recover table | public       |        41 |       49 |         0 | 2019-07-09 09:57:17.215 +0800 CST | synced |
|     55 | sysbench | sbtest4    | drop table    | none         |        41 |       49 |         0 | 2019-07-09 09:56:54.265 +0800 CST | synced |
|     54 | sysbench | sbtest3    | add index     | public       |        41 |       52 |     10000 | 2019-07-09 09:47:21.565 +0800 CST | synced |
|     53 | sysbench | sbtest3    | create table  | public       |        41 |       52 |         0 | 2019-07-09 09:47:19.815 +0800 CST | synced |
|     51 | sysbench | sbtest4    | add index     | public       |        41 |       49 |     10000 | 2019-07-09 09:47:18.565 +0800 CST | synced |
+--------+----------+------------+---------------+--------------+-----------+----------+-----------+-----------------------------------+--------+

Drainer Error

[2019/07/09 13:20:30.290 +08:00] [INFO] [collector.go:280] ["get ddl job"] [job="ID:60, Type:drop table, State:synced, SchemaState:none, SchemaID:41, TableID:43, RowCount:0, ArgLen:0, start time: 2019-07-09 13:20:28.702 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2019/07/09 13:20:30.291 +08:00] [ERROR] [server.go:267] ["syncer exited abnormal"] [
    error="handle ddl job ID:56, Type:none, State:synced, SchemaState:public, SchemaID:41, TableID:49, RowCount:0, ArgLen:0, start time: 2019-07-09 09:57:17.215 +0800 CST, Err:[meta:1146]table doesn't exist, ErrCount:1, SnapshotVersion:0 failed, the schema info: {\n\t\thasImplicitCol: false,\n\t\tschemaMetaVersion: 0,\n\t\tschemaNameToID: {\n\t\t\tmysql: 3,\n\t\t\tsysbench: 41,\n\t\t\ttest: 1\n\t\t},\n\t\ttableIDToName: {\n\t\t\t11: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: columns_priv\n\t\t\t},\n\t\t\t13: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: GLOBAL_VARIABLES\n\t\t\t},\n\t\t\t15: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: tidb\n\t\t\t},\n\t\t\t17: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: help_topic\n\t\t\t},\n\t\t\t19: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: stats_meta\n\t\t\t},\n\t\t\t21: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: stats_histograms\n\t\t\t},\n\t\t\t23: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: stats_buckets\n\t\t\t},\n\t\t\t25: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: gc_delete_range\n\t\t\t},\n\t\t\t27: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: gc_delete_range_done\n\t\t\t},\n\t\t\t29: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: stats_feedback\n\t\t\t},\n\t\t\t31: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: role_edges\n\t\t\t},\n\t\t\t33: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: default_roles\n\t\t\t},\n\t\t\t35: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: bind_info\n\t\t\t},\n\t\t\t37: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: stats_top_n\n\t\t\t},\n\t\t\t39: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: expr_pushdown_blacklist\n\t\t\t},\n\t\t\t43: {\n\t\t\t\tdb-name: sysbench,\n\t\t\t\ttbl-name: sbtest1\n\t\t\t},\n\t\t\t44: {\n\t\t\t\tdb-name: sysbench,\n\t\t\t\ttbl-name: sbtest2\n\t\t\t},\n\t\t\t5: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: user\n\t\t\t},\n\t\t\t52: {\n\t\t\t\tdb-name: sysbench,\n\t\t\t\ttbl-name: sbtest3\n\t\t\t},\n\t\t\t7: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: db\n\t\t\t},\n\t\t\t9: {\n\t\t\t\tdb-name: mysql,\n\t\t\t\ttbl-name: tables_priv\n\t\t\t}\n\t\t}\n\t}: table sbtest4(49) not found"] [errorVerbose="table sbtest4(49) not found\n
github.com/pingcap/errors.NotFound

This mean that handing the ddl job ( recover table ) issue the error info.

I check the source code of drainer , add the skip logic for the ActionRecoverTable ddl job.
build a new drainer version for test, it worked .

// schema.go line 393
// 25 means: job type is ActionRecoverTable
        if (job.Type==25){
			 log.Info("ActionRecoverTable Occur Skip ~~~~~ ", zap.Int64("Job id:", job.ID), zap.Uint8(" type:", uint8(job.Type)), zap.Int64(" SchemaID:", job.SchemaID), zap.Int64(" TableID:",job.TableID))
			return
		}

Date should originate from 1000-01-01 in flash

Now we are passing date values as it is to flash. However flash date starts from 1000-01-01, not 1970-01-01. So need to add an offset to the value passed to flash.

Complete the `usageline` and `flagsline` help information of Pump, Cistern and Drainer

ordering guarantees when the downstream is Kafka

How's the ordering guarantees when the downstream is Kafka?
With multi partition of Kafka Topic, how will drainer send to different partition?
If only using one partition for ordering guarantees, will the throughput enough to be use?

Make integration tests faster (eg. finished in less than 3 minutes)

The task of running integration tests is the bottleneck of the build pipeline.
We may try to make it faster by:

Reducing the number of SQLs tested, may be some of the tests should belong to chaos testing instead of integration testing;
Split the task into multiple ones and run them in parallel

add a interface for executor to save drainer checkpoint

each component should provide a way to expose the savepoint to the outside

we must know whether slave is already consistent with master, and for how many binlog entries it lag and so on.

so we know how to get the savepoint information, it can be a http status or provide command to do it.

validate Config.SyncerConfig.To is a nullptr or not for fix "drainer panic: runtime error"

launch command and arguments:

$ ~/.go/src/github.com/pingcap/tidb-binlog/bin/drainer --pd-urls=http://arch-dev:2379 --addr=arch-dev:8249 --data-dir=./drainer --L=debug

error logs and stdout:

2017/04/19 11:29:20 version.go:23: [info] Build TS: 2017-04-19 03:28:59
2017/04/19 11:29:20 version.go:24: [info] Go Version: go1.8.1
2017/04/19 11:29:20 version.go:25: [info] Go OS/Arch: linuxamd64
2017/04/19 11:29:20 client.go:97: [info] [pd] create pd client with endpoints [http://arch-dev:2379]
2017/04/19 11:29:20 client.go:179: [info] [pd] leader switches to: http://192.168.56.200:2379, previous:
2017/04/19 11:29:20 client.go:115: [info] [pd] init cluster id 6409853286206803787
2017/04/19 11:29:20 server.go:78: [info] clusterID of drainer server is 6409853286206803787
2017/04/19 11:29:20 client.go:273: [error] [pd] create tso stream error: rpc error: code = 1 desc = context canceled
2017/04/19 11:29:20 client.go:97: [info] [pd] create pd client with endpoints [http://arch-dev:2379]
2017/04/19 11:29:20 client.go:179: [info] [pd] leader switches to: http://192.168.56.200:2379, previous:
2017/04/19 11:29:20 client.go:115: [info] [pd] init cluster id 6409853286206803787
2017/04/19 11:29:20 client.go:97: [info] [pd] create pd client with endpoints [arch-dev:2379]
2017/04/19 11:29:20 client.go:179: [info] [pd] leader switches to: http://192.168.56.200:2379, previous:
2017/04/19 11:29:20 client.go:115: [info] [pd] init cluster id 6409853286206803787
2017/04/19 11:29:20 scan.go:132: [debug] txn getData nextStartKey["mDDLJobHi\xffstory\x00\x00\x00\xfc\x00\x00\x00\x00\x00\x00\x00h"], txn 391268941171523587
2017/04/19 11:29:30 collector.go:167: [info] node arch-dev:8250 get save point {0 0}
2017/04/19 11:29:40 schema.go:38: [info] [local schema/table] map[29:{test t1} 41:{test a} 44:{test b} 35:{test c}]
2017/04/19 11:29:40 schema.go:39: [info] [local schema] map[1:0xc4203e24d0]
2017/04/19 11:29:40 schema.go:40: [info] [ignore schema] map[7:{}]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0xdf2919]

goroutine 168 [running]:
github.com/pingcap/tidb-binlog/drainer/executor.newMysql(0x0, 0xc420155b30, 0x44239b, 0x10, 0xf24220)
        /home/qupeng/.go/src/github.com/pingcap/tidb-binlog/drainer/executor/mysql.go:14 +0x29
github.com/pingcap/tidb-binlog/drainer/executor.New(0x1049008, 0x5, 0x0, 0xc42033f1a0, 0x0, 0x1, 0x0)
        /home/qupeng/.go/src/github.com/pingcap/tidb-binlog/drainer/executor/executor.go:19 +0x149
github.com/pingcap/tidb-binlog/drainer.createExecutors(0x1049008, 0x5, 0x0, 0x1, 0x1048aea, 0x5, 0x1047314, 0x2, 0x1047f9a)
        /home/qupeng/.go/src/github.com/pingcap/tidb-binlog/drainer/util.go:195 +0xab
github.com/pingcap/tidb-binlog/drainer.(*Syncer).run(0xc42025c240, 0xc420447f50, 0x18, 0x20)
        /home/qupeng/.go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:484 +0x9c
github.com/pingcap/tidb-binlog/drainer.(*Syncer).Start(0xc42025c240, 0xc420406000, 0x18, 0x20, 0x0, 0x0)
        /home/qupeng/.go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:86 +0xa7
github.com/pingcap/tidb-binlog/drainer.(*Server).StartSyncer.func1(0xc4201bb180, 0xc420406000, 0x18, 0x20)
        /home/qupeng/.go/src/github.com/pingcap/tidb-binlog/drainer/server.go:175 +0x81
created by github.com/pingcap/tidb-binlog/drainer.(*Server).StartSyncer
        /home/qupeng/.go/src/github.com/pingcap/tidb-binlog/drainer/server.go:180 +0x7a

when sync to mysql, data may not consitent with timestamp data type if explicit_defaults_for_timestamp var diffreent

in tidb will like this explicit_defaults_for_timestamp is on as default

mysql> create table tm(id int auto_increment, t_timestamp TIMESTAMP, primary key(id));
Query OK, 0 rows affected (0.24 sec)

mysql> show create table tm;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                        |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tm    | CREATE TABLE `tm` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `t_timestamp` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)

mysql> insert into tm(t_timestamp) values(null);
Query OK, 1 row affected (0.13 sec)

mysql> select * from tm;
+----+-------------+
| id | t_timestamp |
+----+-------------+
|  1 | NULL        |
+----+-------------+
1 row in set (0.01 sec)

in mysql if explicit_defaults_for_timestamp is off will like this
for mysql Default Value is ON(>= 8.0.2) or OFF(<= 8.0.1)

mysql> create table tm(id int auto_increment, t_timestamp TIMESTAMP, primary key(id));
Query OK, 0 rows affected (0.03 sec)

mysql> show create table tm;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                    |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tm    | CREATE TABLE `tm` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `t_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> insert into tm(t_timestamp) values(null);
Query OK, 1 row affected (0.00 sec)

mysql> select * from tm;
+----+---------------------+
| id | t_timestamp         |
+----+---------------------+
|  1 | 2018-05-31 12:55:51 |
+----+---------------------+
1 row in set (0.01 sec)

about explicit_defaults_for_timestamp

pump should retry notifying registered drainers instead of exiting on startup

When pump starts, it seems to try only one time to notify registered drainers, and it exits if it fails:

2019/04/24 12:14:30 version.go:33: [info] Release Version: v3.0.0-beta.1-31-g1a25971
2019/04/24 12:14:30 version.go:34: [info] Git Commit Hash: 1a259716e0aaef328f94e124e02e90be58d756dc
2019/04/24 12:14:30 version.go:35: [info] Build TS: 2019-04-23 12:42:17
2019/04/24 12:14:30 version.go:36: [info] Go Version: go1.11.2
2019/04/24 12:14:30 version.go:37: [info] Go OS/Arch: linuxamd64
2019/04/24 12:14:32 server.go:116: [info] clusterID of pump server is 6683140680737258511
2019/04/24 12:14:32 storage.go:116: [info] options: &{ValueLogFileSize:524288000 Sync:true KVConfig:<nil>}
2019/04/24 12:14:32 storage.go:1090: [info] Storage config: &{BlockCacheCapacity:8388608 BlockRestartInterval:16 BlockSize:4096 CompactionL0Trigger:8 CompactionTableSize:67108864 CompactionTotalSize:536870912 CompactionTotalSizeMultiplier:8 WriteBuffer:67108864 WriteL0PauseTrigger:24 WriteL0SlowdownTrigger:17}
2019/04/24 12:14:32 storage.go:188: [info] gcTS: 407757428283932672, maxCommitTS: 407928127082725377, headPointer: {Fid:1 Offset:2000}, handlePointer: {Fid:1 Offset:2000}
2019/04/24 12:14:32 server.go:316: [info] register success, this pump's node id is localhost.localdomain:8250
2019/04/24 12:14:32 node.go:160: [info] start try to notify drainer:  192.168.236.157:8249
2019/04/24 12:14:32 main.go:71: [error] pump server error, fail to notify all living drainer: notify drainer(192.168.236.157:8249); but return error(rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.236.157:8249: connect: connection refused")

pump should retry for some time period of number of retries before exiting. This would greatly simplify cluster startup, and would follow the example of other components that handle startup order more flexibly.

use schema and table's origin name in translator

now we use schema.Name.L (lower) to restore sqls from binlog;
for some case-sensitive heterogeneous databases, it would bring some problem.

syncer can't be closed with ctrl + c or kill

Increase test coverage

find a blurry offset in kafka on given tidb-binlog commitTS

drainer should add advertise-address config

Like pump, drainer should have advertise-addr as well

drainer process cannot up when hostname changed

Hi,i come across a problem when hostname changed .
at first every goes ok,then i shutdown drainer process,at a while,the drainer server hostname was changed by our jobs ,after that i start drainer process,it automately register a new nodeid and try to start two drainer process by using the same port.
accordig to tidb document ,i make the old node-id state offline by using tool of binlogctl ,but still not work.some logs as blow

after make it state offline ,it still cannot startup drainer process normal

2019/07/10 15:45:37 main.go:53: [error] start drainer server error, [tikv:9001]PD server timeout[try again later]
github.com/pingcap/errors.AddStack
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174
github.com/pingcap/parser/terror.(*Error).GenWithStackByArgs
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]/terror/terror.go:231
github.com/pingcap/tidb/store/tikv.backoffType.TError
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/store/tikv/backoff.go:146
github.com/pingcap/tidb/store/tikv.(*Backoffer).Backoff
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/store/tikv/backoff.go:248
github.com/pingcap/tidb/store/tikv.(*RegionCache).loadRegion
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/store/tikv/region_cache.go:333
github.com/pingcap/tidb/store/tikv.(*RegionCache).LocateKey
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/store/tikv/region_cache.go:152
github.com/pingcap/tidb/store/tikv.(*Scanner).getData
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/store/tikv/scan.go:151
github.com/pingcap/tidb/store/tikv.(*Scanner).Next
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/store/tikv/scan.go:92
github.com/pingcap/tidb/store/tikv.newScanner
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/store/tikv/scan.go:51
github.com/pingcap/tidb/store/tikv.(*tikvSnapshot).Iter
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/store/tikv/snapshot.go:285
github.com/pingcap/tidb/structure.(*TxStructure).iterateHash
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/structure/hash.go:241
github.com/pingcap/tidb/structure.(*TxStructure).HGetAll
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/structure/hash.go:207
github.com/pingcap/tidb/meta.(*Meta).GetAllHistoryDDLJobs
/home/jenkins/workspace/release_tidb_2.1-ga/go/pkg/mod/github.com/pingcap/[email protected]+incompatible/meta/meta.go:663
github.com/pingcap/tidb-binlog/drainer.loadHistoryDDLJobs
/home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/tidb-binlog/drainer/util.go:83
github.com/pingcap/tidb-binlog/drainer.(*Server).Start
/home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/tidb-binlog/drainer/server.go:306
main.main
/home/jenkins/workspace/release_tidb_2.1-ga/go/src/github.com/pingcap/tidb-binlog/cmd/drainer/main.go:52
runtime.main
/usr/local/go/src/runtime/proc.go:200
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337

lost Pos info when offset < 5000 and call the checkpoint.Save

checkpoint/pb.go

note when pos.Offset <= the magic 5000, it will save the Pos to be pb.Pos{}, where suffix = 0, at this time if the server is stop, then we lost the Pos info

cistern: record the savepoint of collecting pump into local store

So we can run two or more cisterns for one tidb cluster, enhancing the availability.

Uniform use of `toml` format of configuration file

drainer crashes on VIEW ddl

If the drainer encounters a CREATE VIEW statement, it crashes:

2019/04/05 14:24:59 schema.go:235: [debug] handle ddl job id(1109): {"id":1109,"type":21,"schema_id":1062,"table_id":1108,"state":6,"err":null,"err_count":0,"
row_count":0,"raw_args":null,"schema_state":5,"snapshot_ver":0,"start_ts":407500477872209971,"dependency_id":0,"query":"CREATE VIEW v1 AS SELECT JSON_TYPE(JSO
N_OBJECT());","binlog":{"SchemaVersion":1441,"DBInfo":null,"TableInfo":{"id":1108,"name":{"O":"v1","L":"v1"},"charset":"utf8","collate":"utf8_general_ci","col
s":[{"id":1,"name":{"O":"JSON_TYPE(JSON_OBJECT())","L":"json_type(json_object())"},"offset":0,"origin_default":null,"default":null,"default_bit":null,"generat
ed_expr_string":"","generated_stored":false,"dependences":null,"type":{"Tp":0,"Flag":0,"Flen":0,"Decimal":0,"Charset":"","Collate":"","Elems":null},"state":5,
"comment":""}],"index_info":null,"fk_info":null,"state":5,"pk_is_handle":false,"comment":"","auto_inc_id":0,"max_col_id":1,"max_idx_id":0,"update_timestamp":4
07500477885317127,"ShardRowIDBits":0,"partition":null,"compression":""},"FinishedTS":407500477898948611},"version":1,"reorg_meta":null,"priority":0}
2019/04/05 14:24:59 syncer.go:638: [debug] receive publish binlog item: {startTS: 407502545135861761, commitTS: 407502545135861764, node: seastar.local:8250}
2019/04/05 14:24:59 server.go:225: [error] syncer exited, error table v1(1108) not found
github.com/pingcap/errors.NotFoundf
        /Users/kolbe/Devel/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:72
github.com/pingcap/tidb-binlog/drainer.(*Schema).ReplaceTable
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/drainer/schema.go:183
github.com/pingcap/tidb-binlog/drainer.(*Schema).handleDDL
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/drainer/schema.go:397
github.com/pingcap/tidb-binlog/drainer.(*Schema).handlePreviousDDLJobIfNeed
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/drainer/schema.go:238
github.com/pingcap/tidb-binlog/drainer.(*Syncer).run
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:496
github.com/pingcap/tidb-binlog/drainer.(*Syncer).Start
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:96
github.com/pingcap/tidb-binlog/drainer.(*Server).StartSyncer.func1
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/drainer/server.go:223
runtime.goexit
        /usr/local/Cellar/go/1.12.1/libexec/src/runtime/asm_amd64.s:1337

Make update will mis-update some vendors

When updating vendors using make update, some vendor that are limited with the specified version in glide.yaml will be mis-updated.

Add metrics for pump, cistern and drainer, and push to Promethues

We should add some metrics, such as the savepoints of pump, the window boundary, lost binlog entities (if has), the number of successful execution of SQL and etc.

Avoid pulling one same binlogs more times

the collector will pull the same binlogs again from alive pump.

drainer failed to exec ddl if the downstream use proxysql

version:v2.1.4
syncer.to mysql(is a TiDB)

sql:

CREATE TABLE films (
  id int(11),
  release_year int(11),
  category_id int(11),
  rating decimal(3,2)
);
insert into films values
(1,2015,1,8.00),
(2,2015,2,8.50),
(3,2015,3,9.00),
(4,2016,2,8.20),
(5,2016,1,8.40),
(6,2017,2,7.00);

error log :

{"log":"2019/07/03 16:44:08 pump.go:115: \u001b[0;37m[info] [pump 13274bc26df3:10060] create pull binlogs client\u001b[0m\n","stream":"stderr","time":"2019-07-03T08:44:08.719759354Z"}
{"log":"2019/07/03 16:44:08 pump.go:115: \u001b[0;37m[info] [pump 33576518e473:10060] create pull binlogs client\u001b[0m\n","stream":"stderr","time":"2019-07-03T08:44:08.719763484Z"}
{"log":"2019/07/03 16:44:08 pump.go:115: \u001b[0;37m[info] [pump 7410ddec094a:10060] create pull binlogs client\u001b[0m\n","stream":"stderr","time":"2019-07-03T08:44:08.719767392Z"}
{"log":"2019/07/03 16:44:09 syncer.go:503: \u001b[0;37m[info] [ddl][start]use `binlog`; CREATE TABLE films (\n","stream":"stderr","time":"2019-07-03T08:44:09.669498645Z"}
{"log":"  id int(11),\n","stream":"stderr","time":"2019-07-03T08:44:09.669534124Z"}
{"log":"  release_year int(11),\n","stream":"stderr","time":"2019-07-03T08:44:09.669542576Z"}
{"log":"  category_id int(11),\n","stream":"stderr","time":"2019-07-03T08:44:09.669549275Z"}
{"log":"  rating decimal(3,2)\n","stream":"stderr","time":"2019-07-03T08:44:09.669581963Z"}
{"log":");[commit ts]409506059518476290\u001b[0m\n","stream":"stderr","time":"2019-07-03T08:44:09.669588491Z"}
{"log":"2019/07/03 16:44:09 sql.go:107: \u001b[0;31m[error] exec sqls[[use `binlog`; CREATE TABLE films (\n","stream":"stderr","time":"2019-07-03T08:44:09.679585095Z"}
{"log":"  id int(11),\n","stream":"stderr","time":"2019-07-03T08:44:09.679603484Z"}
{"log":"  release_year int(11),\n","stream":"stderr","time":"2019-07-03T08:44:09.679610162Z"}
{"log":"  category_id int(11),\n","stream":"stderr","time":"2019-07-03T08:44:09.67963325Z"}
{"log":"  rating decimal(3,2)\n","stream":"stderr","time":"2019-07-03T08:44:09.679637648Z"}
{"log":");]] commit failed Error 1105: line 1 column 12 near \"`; CREATE TABLE films (\n","stream":"stderr","time":"2019-07-03T08:44:09.679641773Z"}
{"log":"  id int(11),\n","stream":"stderr","time":"2019-07-03T08:44:09.67964633Z"}
{"log":"  release_year int(11),\n","stream":"stderr","time":"2019-07-03T08:44:09.679650288Z"}
{"log":"  category_id int(11),\n","stream":"stderr","time":"2019-07-03T08:44:09.679654275Z"}
{"log":"  rating decimal(3,2)\n","stream":"stderr","time":"2019-07-03T08:44:09.679658037Z"}
{"log":")`\" (total length 121)\u001b[0m\n","stream":"stderr","time":"2019-07-03T08:44:09.679661988Z"}

and the docker whitch include drainer restart agen and agen

drainer tries to create MySQL timestamp column with illegal default value

2019/04/05 15:26:01 syncer.go:516: [info] [ddl][start]use `timestamp_insert`; create table t (id int, c1 timestamp default null);;[commit ts]407502548517519418
2019/04/05 15:26:01 syncer.go:211: [debug] add job: {binlogTp: 2, mutationTp: Insert, sql: use `timestamp_insert`; create table t (id int, c1 timestamp default null);;, args: [], key: , commitTS: 407502548517519418, nodeID: seastar.local:8250}
2019/04/05 15:26:01 sql.go:87: [error] [exec][sql]use `timestamp_insert`; create table t (id int, c1 timestamp default null);;[args][][error]Error 1067: Invalid default value for 'c1'
2019/04/05 15:26:03 syncer.go:638: [debug] receive publish binlog item: {startTS: 407503677174579201, commitTS: 407503677174579201, node: seastar.local:8250}
2019/04/05 15:26:04 sql.go:87: [error] [exec][sql]use `timestamp_insert`; create table t (id int, c1 timestamp default null);;[args][][error]Error 1067: Invalid default value for 'c1'
2019/04/05 15:26:06 syncer.go:638: [debug] receive publish binlog item: {startTS: 407503677973856257, commitTS: 407503677973856257, node: seastar.local:8250}
2019/04/05 15:26:07 sql.go:87: [error] [exec][sql]use `timestamp_insert`; create table t (id int, c1 timestamp default null);;[args][][error]Error 1067: Invalid default value for 'c1'
2019/04/05 15:26:09 syncer.go:638: [debug] receive publish binlog item: {startTS: 407503678760288257, commitTS: 407503678760288257, node: seastar.local:8250}
2019/04/05 15:26:10 sql.go:87: [error] [exec][sql]use `timestamp_insert`; create table t (id int, c1 timestamp default null);;[args][][error]Error 1067: Invalid default value for 'c1'
2019/04/05 15:26:12 syncer.go:638: [debug] receive publish binlog item: {startTS: 407503679547244545, commitTS: 407503679547244545, node: seastar.local:8250}
2019/04/05 15:26:13 sql.go:87: [error] [exec][sql]use `timestamp_insert`; create table t (id int, c1 timestamp default null);;[args][][error]Error 1067: Invalid default value for 'c1'
2019/04/05 15:26:13 syncer.go:360: [fatal] Error 1067: Invalid default value for 'c1'
github.com/pingcap/errors.AddStack
        /Users/kolbe/Devel/go/pkg/mod/github.com/pingcap/[email protected]/errors.go:174
github.com/pingcap/errors.Trace
        /Users/kolbe/Devel/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:12
github.com/pingcap/tidb-binlog/pkg/sql.ExecuteTxnWithHistogram
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/pkg/sql/sql.go:92
github.com/pingcap/tidb-binlog/pkg/sql.ExecuteSQLsWithHistogram
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/pkg/sql/sql.go:58
github.com/pingcap/tidb-binlog/drainer/executor.(*mysqlExecutor).Execute
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/drainer/executor/mysql.go:33
github.com/pingcap/tidb-binlog/drainer.execute
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/drainer/util.go:125
github.com/pingcap/tidb-binlog/drainer.(*Syncer).sync
        /Users/kolbe/Devel/go/src/github.com/pingcap/tidb-binlog/drainer/syncer.go:356
runtime.goexit
        /usr/local/Cellar/go/1.12.1/libexec/src/runtime/asm_amd64.s:1337```

Does it need to add a flag to specify the name of a pump node.

If user starts a pump with a specified name, just use it. otherwise pump uses a UUID generated.
if user starts a pump that already has a name and specifies another name, it should use the original one.

binlog's commit ts less than last ts

When I start drainer, there are lots of error says commit ts is less than last ts
I started drainer like this

sudo ./drainer -config ../conf/drainer.toml -initial-commit-ts 410003709545414657 &

and the log is

2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410003997877600257, and is less than the last ts 410003997877600257
2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410003998703353857, and is less than the last ts 410003998703353857
2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410003999516000257, and is less than the last ts 410003999516000257
2019/07/25 15:56:33 syncer.go:297: [info] [write save point]410003997169811457
2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410004000328646657, and is less than the last ts 410004000328646657
2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410004001141293057, and is less than the last ts 410004001141293057
2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410004001953939458, and is less than the last ts 410004001953939458
2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410004002753478659, and is less than the last ts 410004002753478659
2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410004003566125057, and is less than the last ts 410004003566125057
2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410004004378771457, and is less than the last ts 410004004378771457
2019/07/25 15:56:33 merge.go:305: [error] binlog's commit ts is 410004005217632257, and is less than the last ts 410004005217632257

and it do not sync any sql executed unless I before restarting drainer. If I restart it, it will sync the operation I execute before and stop syncing again.

Need a better error message when pump is running out of storage

As reported in pingcap/tidb-operator#656

pump should print a better error message for the case when stop-write-at-available-space is causing the no available space error

tidb-binlog/diff don't suppost json type

for tidb

     t_json: {"key1":"value1","key2":"value2"}
     1 row in set (0.00 sec)

for mysql

     t_json: {"key1": "value1", "key2": "value2"}
     1 row in set (0.00 sec)

note the space after , and :，the data may different, but we should treat it as consistent, it will fail now

cistern: mark the namespace of the store with `ClusterID`

Is there a limitation on mysql version?

When I use drainer to connect to mysql, error accurs:

pump should log version info before other logs

2018/04/09 19:35:03 Connected to 10.7.104.184:2181
2018/04/09 19:35:03 Authenticated: id=144229920926728194, timeout=40000
2018/04/09 19:35:03 Re-submitting `0` credentials after reconnect
2018/04/09 19:35:03 config.go:235: [info] get kafka addrs from zookeeper: 10.7.232.216:9092,10.7.104.184:9092,10.7.113.43:9092
2018/04/09 19:35:03 Recv loop terminated: err=EOF
2018/04/09 19:35:03 Send loop terminated: err=<nil>
2018/04/09 19:35:03 version.go:18: [info] Git Commit Hash: 2a7761f992dbf13ed8ccf59ca485e9c5cb5143d6
2018/04/09 19:35:03 version.go:19: [info] Build TS: 2018-04-02 10:27:34
2018/04/09 19:35:03 version.go:20: [info] Go Version: go1.10
2018/04/09 19:35:03 version.go:21: [info] Go OS/Arch: linuxamd64
time="2018-04-09T19:35:03+08:00" level=info msg="[pd] create pd client with endpoints [http://jira-cluster-pd:2379]"
time="2018-04-09T19:35:03+08:00" level=info msg="[pd] leader switches to: http://jira-cluster-pd-f56wf.jira-cluster-pd-peer.jira-tidb.svc:2379, previous: "
time="2018-04-09T19:35:03+08:00" level=info msg="[pd] init cluster id 6537979386539282199"
2018/04/09 19:35:03 server.go:126: [info] clusterID of pump server is 6537979386539282199
2018/04/09 19:35:03 binlogger.go:80: [info] create and lock binlog file data.pump/clusters/6537979386539282199/binlog-0000000000000000-20180409193503
2018/04/09 19:35:06 server.go:451: [info] generate fake binlog successfully

pump logs kafka connection info before version info, if kafka connection failed there'll be no version info from logs which may make debug more difficult.

datasource mysql

Does it support mysql data source?

pump: error message not shown in log

tidb-binlog/cmd/pump/main.go

Line 33 in 9c999b0

fmt.Fprintf(os.Stderr, "creating pump server error, %v", err)

NATS sink

I would like change events to be able to be published to NATS.
As I understand it Kafka is supported currently ?

NATS is written in golang.
NAtS and NATS Streaming are a different system btw. Only NATS streaming offers durable ACKS and should be the target.

correctness bug about drainer consume binlogs from pump

pump have a incorrect implement of hadFinished.

pump client in drainer only update current pos periodically while it meets complete binlogs (p+c binlog is already match, called complete binlog) except faker binlog/rollback binlog. in some case current poswould fall Behindend pos` of pump but drainer's pump client has finished.

Chinese garbled，tidb-binlog sync data to Kafka

1、tidb 2.1.4，kafka1.1.1

2、db and table character set is utf8mb4

3、chinese garbled as follow：

������ �¶����̕� ��������� ���������� ���̋����� ��� �2�2019-07-10 11:41:45 ������� �2�2019-07-10 11:41:45 �� �2�2019-07-10 11:41:45 �2�2019-07-10 11:41:45 ������ �¶����̕� ��������� ���������� ���̋����� ��� �2�2019-07-10 11:41:45 ������� �2�2019-07-10 11:41:45 �� �2�2019-07-10 11:41:45 �2�2019-07-10 11:41:45 ���
--

drainer should exhaust removed sources when merging

When a source is removed, RemoveSource should exhaust the channel and push its pending binlogs with strategy to avoid loss of pending binlogs in the channel.

Previous attempt to solve this problem: #500

Unified command line arguments processing

drainer:

Doesn't support -v/--version
-h/--help is not in usage message
uses -L for log level

pump:

Doesn't support -v for version
-h/--help is not in help message
uses -debug for log level
-heartbeat-interval uint & -metrics-interval int, what's the different between uint & int

cistern:

Doesn't support -v for version
-h/--help is not in help message
uses -debug for log level

These tools should process command line arguments in an intuitive manner.

Validation of `-addr` and `-advertise-addr`

When given -addr :8250 without specifying -advertise-addr, both of these options end up being ":8250".

Validation should be added so that invalid address that can't be connected should fail the check.

something wrong about genInsertSQLs

恩，实际发现的问题是，对复合主键的表，mysql的genInsertSQLs生成出来的vals会少第二个主键列的数据。当然，tidb遇到符合主键的表应该是会自己生成隐藏列来做主键吧，那这样的话在接tidb时就没有上面的问题了。

若曦的问题

make the diff support all-databases and databases args flag like mysqldump

 -A, --all-databases Dump all the databases. This will be same as --databases
                      with all databases selected.
  -B, --databases     Dump several databases. Note the difference in usage; in
                      this case no tables are given. All name arguments are
                      regarded as database names. 'USE db_name;' will be
                      included in the output.

Use BoltDB instead of rocksdb in cistern.

switch from juju/errors to pingcap/errors

I tried to do this switch, but what held me back is that tidb is vendored and still using juju/errors, including some of its esoteric APIs.
The latest version of tidb has switched to using pingcap/errors. Similarly, pd has switched to pkg/errors.
So I am hoping that the vendored tidb can be updated.

One of the main motivations for me to prompt tidb-binlog to do this change is that juju/errors is LGPL code, which makes proper distribution of tidb-binlog more difficult.