alibaba / mongoshake Goto Github PK

MongoShake is a universal data replication platform based on MongoDB's oplog. Redundant replication and active-active replication are two most important functions. 基于mongodb oplog的集群复制工具，可以满足迁移和同步的需求，进一步实现灾备和多活功能。

License: GNU General Public License v3.0

Makefile 0.02% Shell 0.44% Python 5.58% C 0.63% Go 93.29% Dockerfile 0.04%

mongo-database mongodb-oplog replication active-active mongoshake

mongoshake's People

Contributors

Stargazers

Watchers

Forkers

pincal stutiredboy yuanddqiao martwu laicz dikang123 tom2jack oawang newlysoft philosquare jiayizhang callmedba gtsigner vinllen zhoukunpeng504 laik qiudao767 yuanleilei dotpot luoyaogui augustsai0126 tomlinux yy1117 mdh67899 ibuystuff daydayfantasy hustchensi manleyliu k90s 307545758 xuqianjin-stars pinkonio phoenixlx arronsh sky-9527-github deanwei fanlifei xiaotianxing zhongcaojieji cuiyongyong zhanglei glowwormx awesomegolang xiaoshangwu hetz ychtiger tonyzhimin shaoyuan-zwh sdgdsffdsfff qianhua suhuaguo liaokunlin thomaslun leoloel cosmosyy lmtwga keepwalking86 zuiwengf ruiaylin fsyyft-fork wuqingzf gugemichael yaoqi xubiqin zwunix reyren anlem12 ivan-zhang01 adofsauron loeyae tools-env wei-blockchain cocoakekeyu isgasho oreoxin echocxw conquero rootzc dailai poppyguo hellocodem loda507 igithu gui-chen nextdreamblue dcboy knightmre zhangyinwei cisco emludei chennengneng loojk2008 konooo ted1994g opsfarmer rubiknslk yangboyd huangzhiyong thinking-one-hour-everyday iaunn

mongoshake's Issues

mongoshake context storage with api

collector context storage mainly including store checkpoint
type include : database, api
for api storage, address is http url
看配置文件中，mongoshake上下文信息保存方式可以为api。对于api的接口如何定义，能否举个例子说明下

mongoshake用户权限

mongodb没有做username:password时，rs-->standalone，sharding-->sharding已经同步成功。
使用权限后，createUser参数是
{
user: "shake",
pwd: "shake",
roles: [
{ role: "readWrite", db: "mongoshake" },
{ role: "readWrite", db: "pri" },
{ role: "read", db: "local" },
{ role: "read", db: "admin" },
"clusterMonitor",
]
}
启动mongoshake时提示Oplog Tailer initialize failed: worker initialize error.
查看代码，到了
w := NewWorker(coordinator, syncer, uint32(i))
if !w.init() {
return errors.New("worker initialize error")
}
继续查看源码
func NewWorker(coordinator *ReplicationCoordinator, syncer *OplogSyncer, id uint32) *Worker {
return &Worker{
coordinator: coordinator,
syncer: syncer,
id: id,
queue: make(chan []*oplog.GenericOplog, conf.Options.WorkerBatchQueueSize),
}
}

type GenericOplog struct {
Raw []byte
Parsed *PartialLog
}

type PartialLog struct {
Timestamp bson.MongoTimestamp bson:"ts"
Operation string bson:"op"
Gid string bson:"g"
Namespace string bson:"ns"
Object bson.M bson:"o"
Query bson.M bson:"o2"
UniqueIndexes bson.M bson:"uk"

到这，不知道怎么找错误。
能否帮忙看看是什么问题，谢谢。
我的邮箱是[email protected] [email protected]

Oplog Tailer initialize failed: no oplog ns in mongo

when use sharding sync,
conf urls:
mongo_urls = mongodb://sync:[email protected]:27021;192.168.123.20:27022;192.168.123.30:27023

but when exec:
./bin/collector -conf=conf/collector.conf

it issue :
[17:15:37 HKT 2018/12/18] [CRIT] (mongoshake/collector.(*ReplicationCoordinator).sanitizeMongoDB:67) There has no oplog collection in mongo db server
Oplog Tailer initialize failed: no oplog ns in mongo

what is happen?

MongoDB3.6中shard的config库同步问题

MongoDB3.6版本，shard中的config库的集合默认会进行同步，会触发以下两个问题：

1.日志[Hash object is UNKNOWN. use default value 0.]出现。

2.有时候会同步失败（不断重试？），日志如下：
[2018/11/21 10:28:08 CST] [CRIT] [executor.(*Executor).execute:108] Replayer-2, executor-10, oplog for [config.cache.chunks.config.system.sessions] op[i] failed. (*errors.errorString) [doUpdateOnInsert run upsert/update[true] failed[After applying the update, the (immutable) field '_id' was found to have been altered to _id: { _id: { uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855), id: UUID("000068ce-5e79-43d2-bdc2-e29a52cfa0b2") } }]], logs 1. firstLog &{6626132753786077187 i config.cache.chunks.config.system.sessions map[_id:map[_id:map[id:{4 [0 0 104 206 94 121 67 210 189 194 226 154 82 207 160 178]} uid:[227 176 196 66 152 252 28 20 154 251 244 200 153 111 185 36 39 174 65 228 100 155 147 76 164 149 153 27 120 82 184 85]]] max:map[_id:map[id:{4 [177 127 180 126 251 47 65 51 159 197 221 69 132 79 56 16]} uid:[227 176 196 66 152 252 28 20 154 251 244 200 153 111 185 36 39 174 65 228 100 155 147 76 164 149 153 27 120 82 184 85]]] shard:shard3 lastmod:12884901888] map[] map[] map[] 0 0}

config库下某集合的oplog如下：
{
"ts" : Timestamp(1542766641, 808),
"t" : NumberLong(2),
"h" : NumberLong("-3856466002771816776"),
"v" : 2,
"op" : "i",
"ns" : "config.cache.chunks.config.system.sessions",
"ui" : UUID("32fce0d3-5412-4005-9921-682f9a4f4d39"),
"wall" : ISODate("2018-11-21T02:17:21.353Z"),
"o" : {
"_id" : {
"_id" : {
"id" : UUID("b17fb47e-fb2f-4133-9fc5-dd45844f3810"),
"uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=")
}
},
"max" : {
"_id" : { "$maxKey" : 1 }
},
"shard" : "shard1",
"lastmod" : Timestamp(3, 1)
}
}

第一个问题，上面的_id在go中是map，所有出现HASH OBJECT is UNKNOW。
第二个问题，貌似是config.cache.chunks.config.system.sessions和config.system.sessions重复insert然后改update又失败，原因实在是不明白！！为何upsert还会失败？

config库中某些集合在3.6版本是分片的，本来config库是不需要备份的。所以，
我认为可以通过过滤掉config库的所有集合的同步来规避此问题。

Syncing with mongo 2.4 - bug

src mongodb-2.4
dst mongodb-2.4

do not syn data.

logs_get logs_repl logs_success

mongo-shake 版本，使用的当前的master分支.

配置：
除了变更源库、目标库、监控的port、增加了黑名单以及修改了context.start_position的时间。
别的都是默认配置

问题：

总是 logs_get 先拿到，貌似并没有及时 logs_repl

logs_repl 以及 logs_success 总会延迟个两分钟以内。然后一次接受前面的logs_get。

我的理解是等了个时间周期一起 logs_repl ？是 queue处理的时间么？

另外ps: 求问是否有命令可以查看当前mongo-shake的版本~~

MongoShake同步集群的若干问题

1.同步分片集，需要一直关闭balance？为什么呢？

2.可以指定副本作为同步源吗？

3.MongoShake同步对源库的影响有多大？

同步时间间隔以及是否同步admin表

在测试过程中又两个疑惑。
1：src端在admin上createUser，src的oplog中有记录，但是没有同步到dst端，是否因为mongoshake不同步admin DB？

2：src到dst的oplog传输间隔。参数checkpoint.interval = 20还是batch_size控制。
根据bin/mongoshake-stat --port=9100获取到的数据，大概是凑够255条数据在dst端统一应用oplog。能否设置成3秒同步一次，无论是否到255条数据

Contributor的回答：
1：admin表不同步
2：checkpoint.interval单位是ms，默认5000ms也就是5s。也就是说最小的同步checkpoint是5s。batch数据是批量传送的，存在缓存机制，batch数据传送后才会触发刷新操作。目前没法做到固定时间刷一次。

MongoDB到MongoDB分片集同步，目标写入能多个mongos做负载均衡吗？

如题？

MongoShake的使用问题

mongoshake重启后，能否做到采集oplog的数据不丢失，不重复？是如何做到的？

2.mongoshake的HA方案，未来是否可以考虑利用像zookeeper这种服务，部署成集群模式的HA，一台机器active状态，另外一台机器为standby状态。active机器挂了。standby转换成active。持续读oplog

3.对于kafka，file，tcp，rpc，这几种消费数据的操作，是否可以独立出来。server只需要读取oplog，由用户编写client，来消费server发布的消息，这种方式可能对数据的消费，处理的更灵活一些。

4.目前数据发kafka，是单topic单分区，来保证数据的有序消费。是否可以支持多分区发送消息。（例如同一表发送相同的分区，也能做到数据库的单表有序消费）。

5.目前写入数据是采用go语言的binary.Write，Java如何解析从kafka，tcp，rpc发送的数据。

MongoShake出现了同步数据不完整的情况，数据的数量对不上，然后oplog的日志的数据却能对上

多线程持续insert，shake程序挂掉？？？

多线程做压测，只有insert语句，跑着跑着就这样了？什么情况？

goroutine 209 [sleep]:
time.Sleep(0x37e11d600)
/home/peng/go/go/src/runtime/time.go:105 +0x14f
vendor/github.com/vinllen/mgo.(*mongoServer).pinger(0xc0002432c0, 0xc0081c2a01)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/server.go:407 +0x4a1
created by vendor/github.com/vinllen/mgo.newServer
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/server.go:107 +0x156

goroutine 173 [select]:
vendor/github.com/vinllen/mgo.(*mongoCluster).syncServersLoop(0xc006b5e5a0)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/cluster.go:431 +0x2f0
created by vendor/github.com/vinllen/mgo.newCluster
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/cluster.go:83 +0x17d

goroutine 175 [sleep]:
time.Sleep(0x37e11d600)
/home/peng/go/go/src/runtime/time.go:105 +0x14f
vendor/github.com/vinllen/mgo.(*mongoServer).pinger(0xc0001eeff0, 0xc0050e1601)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/server.go:407 +0x4a1
created by vendor/github.com/vinllen/mgo.newServer
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/server.go:107 +0x156

goroutine 188 [IO wait]:
internal/poll.runtime_pollWait(0x7fdf64675480, 0x72, 0xc000283c98)
/home/peng/go/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00012cf98, 0x72, 0xffffffffffffff00, 0xa5f4a0, 0xdb1698)
/home/peng/go/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00012cf98, 0xc006b89000, 0x10, 0x10)
/home/peng/go/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc00012cf80, 0xc006b89030, 0x10, 0x10, 0x0, 0x0, 0x0)
/home/peng/go/go/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc00012cf80, 0xc006b89030, 0x10, 0x10, 0xc00012cf80, 0x0, 0x0)
/home/peng/go/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc0001f60f8, 0xc006b89030, 0x10, 0x10, 0x0, 0x0, 0x0)
/home/peng/go/go/src/net/net.go:177 +0x68
vendor/github.com/vinllen/mgo.fill(0xa65940, 0xc0001f60f8, 0xc006b89030, 0x10, 0x10, 0x0, 0x91)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/socket.go:601 +0x53
vendor/github.com/vinllen/mgo.(*mongoSocket).readLoop(0xc005262900)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/socket.go:619 +0x122
created by vendor/github.com/vinllen/mgo.newSocket
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/socket.go:211 +0x1df

goroutine 227 [select]:
vendor/github.com/vinllen/mgo.(*mongoCluster).syncServersLoop(0xc0082ff0e0)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/cluster.go:431 +0x2f0
created by vendor/github.com/vinllen/mgo.newCluster
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/cluster.go:83 +0x17d

goroutine 189 [IO wait]:
internal/poll.runtime_pollWait(0x7fdf646753b0, 0x72, 0xc005942c98)
/home/peng/go/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000160898, 0x72, 0xffffffffffffff00, 0xa5f4a0, 0xdb1698)
/home/peng/go/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000160898, 0xc006b89200, 0x10, 0x10)
/home/peng/go/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc000160880, 0xc006b89290, 0x10, 0x10, 0x0, 0x0, 0x0)
/home/peng/go/go/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc000160880, 0xc006b89290, 0x10, 0x10, 0xc000160880, 0x0, 0x0)
/home/peng/go/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc0001f6108, 0xc006b89290, 0x10, 0x10, 0x0, 0x0, 0x0)
/home/peng/go/go/src/net/net.go:177 +0x68
vendor/github.com/vinllen/mgo.fill(0xa65940, 0xc0001f6108, 0xc006b89290, 0x10, 0x10, 0x0, 0x91)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/socket.go:601 +0x53
vendor/github.com/vinllen/mgo.(*mongoSocket).readLoop(0xc005262d80)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/socket.go:619 +0x122
created by vendor/github.com/vinllen/mgo.newSocket
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/socket.go:211 +0x1df

goroutine 229 [sleep]:
time.Sleep(0x37e11d600)
/home/peng/go/go/src/runtime/time.go:105 +0x14f
vendor/github.com/vinllen/mgo.(*mongoServer).pinger(0xc000243680, 0xc0081c2a01)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/server.go:407 +0x4a1
created by vendor/github.com/vinllen/mgo.newServer
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/server.go:107 +0x156

goroutine 177 [IO wait]:
internal/poll.runtime_pollWait(0x7fdf646752e0, 0x72, 0xc00027ec98)
/home/peng/go/go/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00012d118, 0x72, 0xffffffffffffff00, 0xa5f4a0, 0xdb1698)
/home/peng/go/go/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00012d118, 0xc005718200, 0x10, 0x10)
/home/peng/go/go/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc00012d100, 0xc005718200, 0x10, 0x10, 0x0, 0x0, 0x0)
/home/peng/go/go/src/internal/poll/fd_unix.go:169 +0x179
net.(*netFD).Read(0xc00012d100, 0xc005718200, 0x10, 0x10, 0xc00012d100, 0x0, 0x0)
/home/peng/go/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc0001280a0, 0xc005718200, 0x10, 0x10, 0x0, 0x0, 0x0)
/home/peng/go/go/src/net/net.go:177 +0x68
vendor/github.com/vinllen/mgo.fill(0xa65940, 0xc0001280a0, 0xc005718200, 0x10, 0x10, 0x0, 0x91)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/socket.go:601 +0x53
vendor/github.com/vinllen/mgo.(*mongoSocket).readLoop(0xc006fa2000)
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/socket.go:619 +0x122
created by vendor/github.com/vinllen/mgo.newSocket
/home/peng/gopath/src/mongo-shake/src/vendor/github.com/vinllen/mgo/socket.go:211 +0x1df

如何收集由于事物产生的oplog

我使用了mongo的事物，产生下面这条oplog，但是mongoshake收集不到这条消息，是因为被过滤掉了吗，我该如何收到这条消息

版本：
shake 1.4.4
mongodb 4.0

{
    "ts" : Timestamp(1543466613, 2),
    "t" : NumberLong(1),
    "h" : NumberLong(-8027060675486644031),
    "v" : 2,
    "op" : "c",
    "ns" : "admin.$cmd",
    "wall" : ISODate("2018-11-29T04:43:33.411Z"),
    "lsid" : {
        "id" : UUID("9fd21994-9ee6-471c-b7bf-f54b02a70329"),
        "uid" : { "$binary" : "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=", "$type" : "00" }
    },
    "txnNumber" : NumberLong(1),
    "stmtId" : 0,
    "prevOpTime" : {
        "ts" : Timestamp(0, 0),
        "t" : NumberLong(-1)
    },
    "o" : {
        "applyOps" : [ 
            {
                "op" : "i",
                "ns" : "mbxt.projectPurchaseReg",
                "ui" : UUID("c54e7a9d-4a89-4f70-b6ac-89f9e285b386"),
                "o" : {
                    "_id" : ObjectId("5bff6e75e039a32bc8b043fa"),
                    "createTime" : NumberLong(1543466613348),
                    "delete" : 0
                }
            }, 
            {
                "op" : "u",
                "ns" : "mbxt.project",
                "ui" : UUID("276fef79-1bc7-426f-87a0-170b922b0f16"),
                "o" : {
                    "$v" : 1,
                    "$set" : {
                        "status" : 4,
                        "updateTime" : NumberLong(1543466613402)
                    }
                },
                "o2" : {
                    "_id" : ObjectId("5bf38f62ec1c53c0a8adab56")
                }
            }
        ]
    }
}

Skip error

Hi,

I got an error like

[10:14:51 CEST 2018/09/06] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-1, executor-1, oplog for [DB.coll] op[u] failed. (*errors.errorString) [doUpdate run upsert/update[true] failed[cannot use the part (3 of h.3.odt) to traverse the element ({3: null})]], logs 64. firstLog &{6597668025535561848 u DB.coll map[$set:map[h.672.odt:2018-09-05 08:27:51 +0000 UTC h.672.oct:1]] map[_id:ObjectIdHex("594fb8ffa4efa290145b4c18")] map[] map[] 0 0}
and it loop over it, is there a way to skip on error ?

Thanks in advance.

Mongodb 3.4 版本下不会同步索引

Mongodb 3.4 版本下不会同步索引，创建索引不会写入oplog，因为没有system.indexes，官方的说法如下：
Changed in version 3.0: MongoDB 3.0 deprecates direct access to the system.indexes collection, which had previously been used to list all indexes in a database.

directly connect hidden node failed

directly connect hidden node failed，and I attempt to change the connection way（direct+Monotonic） in sourcecode, but an error will be reported after running “Oplog Tailer initialize failed: no reachable servers”。

内存OOM问题

限制只有一个参数可以设置worker.batch_queue_size。

1.如何计算worker.batch_queue_size和最终需要内存的关系？能否给出大致的公式，因为size=256时，4G的内存都OOM了~

2.shake发生OOM了，但是由始至终，worker的监控指标jobs_in_queue一直是0，是不是有问题？

How to get started?

Is there some tutorial how to setup, configure, etc? Your MongoShake Detailed Documentation only tells me what is possible but not how.

配置问题求解

具体情况如下，求解：

./collector -conf=../conf/collector.conf
Conf.Options check failed: context storage type or address is invalid

数据传输情况：
mongodb replicaset --> mongo-shake --> monogdb

配置文件,这一部分没理解，求解答：
context.storage = mongoshake
context.address = chkp_default
context.start_position = 2000-01-01T00:00:01Z

请问，官方能否出一套配置实列？

mongodb 4.0 transaction support

Current MongoShake doesn't support syncing transaction which is opened in MongoDB 4.0, we plan to support this in the next version: v1.6.0.

dml_only=true时日志出现索引同步的相关信息

日志：
[2018/11/28 11:21:21 CST] [WARN] [oplog.(*PrimaryKeyHasher).DistributeOplogByMod:110] Couldn't extract hash object. collector has mixed up. use Oplog.Namespace instead &{6628744020887601158 i test3.system.indexes map[ns:test3.userinfo201811281115492220 v:1 key:map[_id:hashed] name:_id_hashed] map[] map[] map[] 0 0}

版本：
shake 1.4.3
mongodb 3.2

关于分片集群必须关闭Balancer的疑问

1.另一个issue谈到，当迁移发生，老shard要删除，新shard要增加，同时读取无法解决时序问题。

我想，能不能忽略掉同步源的这些由于迁移(fromMigrate)产生的oplog，源库发生的迁移，目标库不管，目标库balancer自己做自身集群的均衡。这样，两个集群都可以开启balancer，数据又不会出问题。

2.如果一定要关闭balancer，那么在运营过程中，发现源库数据量不均或者需要扩容，想手工做均衡，要怎么做？
我现在想到的方案是：先关闭shake，然后源库做迁移，迁移之后，要全量同步一次数据到目标同时记下时间点t，再启动shake并从t时间开始同步。这块有没有更好的方案？？？

shake不能处理moveChunk这种命令，貌似有点局限

源库是mongos

connect source mongodb, set username and password if enable authority.

split by comma(,) if use multiple instance in one replica-set.

split by semicolon(;) if sharding enable.

mongo_urls = mongodb://username:[email protected]:20040,127.0.0.1:20041

这个参数表示可以源库配置为mongos，但是我自己配置源库是mongos，使用direct的方式，会报错在源库找不到mongos，请问是direct方式不支持源库是mongos吗？

数据一致性的问题

没问题

分片集群+唯一索引的场景是否能正确处理？

当集合分片了，比如col集合，数据分布在shard1，和shard2上，col集合的键key建立了唯一索引且key不是sharding-key，此时，mongoshake以集合维度进行并发处理。

问题来了：
首先，shard1上的col集合部分，先执行：insert key=1，后执行 update key = 2；
然后，shard2上的col集合部分，执行：insert key=1
这个执行时可以进行的，因为shard2 insert之前，shard1已经将key update了。

但是，由于没有全局时序，假设 shake首先拉取到shard1点insert语句，然后拉取到shard2点insert语句，再取到shard1点update语句，那么岂不是shard2的insert会失败？

这个问题，shake如何处理？还是会有问题呢？

start_position参数

conf/collector.conf中的参数context.start_position，默认参数是2000-01-01T00:00:01Z。是否在启动mongoshake时将+8CST时区转换为UTC时区，也就是减8小时？在测试中发现，mongoshake未转换此时间。

关于配置的若干疑问（所有打算使用的人都很关心~）

1.context.storage.url这个配置，注释写如果shard类型，要配置到config server，这个是不是有问题？config server貌似是不允许写入的，应该配置到mongos或者某个mongod吧？

if source database is replica, context will store in the source mongodb.
if source database is sharding, context must be given to point the config server address.
context.storage.url is only used in sharding. When source mongodb type is replicaSet,
checkpoint will write into source mongodb as default.
context.storage = database
*context.storage.url = mongodb://127.0.0.1:20070

2.syncer.reader.buffer_time 这个时间是指syncer往worker发送数据的时间间隔？syncer抓取到oplog先缓存一定时间再一批发送到worker？

3.worker和worker.batch_queue_size
测试过程中发现，worker在shard模式需要和shard数量保持一致，为什么呢？？
worker.batch_queue_size 这个参数的意义是什么？调整它会有什么影响？

4.system_profile何用？
http_profile是获取统计信息和配置的restful端口，但是system_profile是干什么的呢？

5.checkpoint.interval 这个是指checkpoint持久化的时间间隔吧？注释写的batch_size是什么意思？

save checkpoint every ${batch_size}. reduce the number
of batch_size will saving the checkpoint with higher frequency
default size is 16 * 1024
checkpoint.interval = 5000

6.log_file能否用绝对路径？

mongodb分片集群索引同步

mongodb版本：3.2.18
mongo-shake版本：1.4.3
问题：源库建立所有的索引类型，使用mongo-shake的direct模式，索引并未成功迁移。
mongo-shake日志：
[11:14:23 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-5, executor-5, oplog for [test.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 9. firstLog &{6621316596503674881 i test.system.indexes map[key:map[age:1] name:putongsuoyin ns:test.putongsuoyin] map[] map[] map[] 0 0}
[11:14:23 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-4, executor-4, oplog for [zjc.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 4. firstLog &{6620556030809997314 i zjc.system.indexes map[v:1 key:map[_id:1] name:id ns:zjc.coll] map[] map[] map[] 0 0}
[11:14:23 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-4, executor-4, oplog for [zjc.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 4. firstLog &{6620556030809997314 i zjc.system.indexes map[ns:zjc.coll v:1 key:map[_id:1] name:id] map[] map[] map[] 0 0}
[11:14:23 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-5, executor-5, oplog for [test.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 9. firstLog &{6621316596503674881 i test.system.indexes map[ns:test.putongsuoyin key:map[age:1] name:putongsuoyin] map[] map[] map[] 0 0}
[11:14:24 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-1, executor-1, oplog for [zjc.$cmd] op[c] failed. (*mgo.QueryError) [collection already exists], logs 2. firstLog &{6620556030809997313 c zjc.$cmd map[create:coll] map[] map[] map[] 0 0}
[11:14:24 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-5, executor-5, oplog for [test.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 9. firstLog &{6621316596503674881 i test.system.indexes map[key:map[age:1] name:putongsuoyin ns:test.putongsuoyin] map[] map[] map[] 0 0}
[11:14:24 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-4, executor-4, oplog for [zjc.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 4. firstLog &{6620556030809997314 i zjc.system.indexes map[v:1 key:map[_id:1] name:id ns:zjc.coll] map[] map[] map[] 0 0}
[11:14:24 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-1, executor-1, oplog for [zjc.$cmd] op[c] failed. (*mgo.QueryError) [collection already exists], logs 2. firstLog &{6620556030809997313 c zjc.$cmd map[create:coll] map[] map[] map[] 0 0}
[11:14:24 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-5, executor-5, oplog for [test.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 9. firstLog &{6621316596503674881 i test.system.indexes map[ns:test.putongsuoyin key:map[age:1] name:putongsuoyin] map[] map[] map[] 0 0}
[11:14:24 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-4, executor-4, oplog for [zjc.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 4. firstLog &{6620556030809997314 i zjc.system.indexes map[v:1 key:map[_id:1] name:id ns:zjc.coll] map[] map[] map[] 0 0}
[11:14:24 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-1, executor-1, oplog for [zjc.$cmd] op[c] failed. (*mgo.QueryError) [collection already exists], logs 2. firstLog &{6620556030809997313 c zjc.$cmd map[create:coll] map[] map[] map[] 0 0}
[11:14:24 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-4, executor-4, oplog for [zjc.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 4. firstLog &{6620556030809997314 i zjc.system.indexes map[name:id ns:zjc.coll v:1 key:map[_id:1]] map[] map[] map[] 0 0}
[11:14:24 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-5, executor-5, oplog for [test.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 9. firstLog &{6621316596503674881 i test.system.indexes map[ns:test.putongsuoyin key:map[age:1] name:putongsuoyin] map[] map[] map[] 0 0}
[11:14:25 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-1, executor-1, oplog for [zjc.$cmd] op[c] failed. (*mgo.QueryError) [collection already exists], logs 2. firstLog &{6620556030809997313 c zjc.$cmd map[create:coll] map[] map[] map[] 0 0}
[11:14:25 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-4, executor-4, oplog for [zjc.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 4. firstLog &{6620556030809997314 i zjc.system.indexes map[name:id ns:zjc.coll v:1 key:map[_id:1]] map[] map[] map[] 0 0}
[11:14:25 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-5, executor-5, oplog for [test.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 9. firstLog &{6621316596503674881 i test.system.indexes map[ns:test.putongsuoyin key:map[age:1] name:putongsuoyin] map[] map[] map[] 0 0}
[11:14:25 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-1, executor-1, oplog for [zjc.$cmd] op[c] failed. (*mgo.QueryError) [collection already exists], logs 2. firstLog &{6620556030809997313 c zjc.$cmd map[create:coll] map[] map[] map[] 0 0}
[11:14:25 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-4, executor-4, oplog for [zjc.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 4. firstLog &{6620556030809997314 i zjc.system.indexes map[v:1 key:map[_id:1] name:id ns:zjc.coll] map[] map[] map[] 0 0}
[11:14:25 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-5, executor-5, oplog for [test.system.indexes] op[i] failed. (*mgo.QueryError) [invalid batch request for index creation], logs 9. firstLog &{6621316596503674881 i test.system.indexes map[ns:test.putongsuoyin key:map[age:1] name:putongsuoyin] map[] map[] map[] 0 0}
[11:14:25 CST 2018/11/08] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-1, executor-1, oplog for [zjc.$cmd] op[c] failed. (*mgo.QueryError) [collection already exists], logs 2. firstLog &{6620556030809997313 c zjc.$cmd map[create:coll] map[] map[] map[] 0 0}

bug？ shake采用主备模式，主备切换期间，源的数据变动会丢失

1.环境信息：
源：复制集
目标：单mongod
shake：双实例，主备模式
mongodb3.2

2.场景：
当kill-9将主杀掉，几秒后从会变成主，就在这个时间间隙中，往源insert一条记录A，目标找不到这条记录。再次向源insert一条记录B，目标能同步B，但是A依旧缺失。当将shake全部停止并重启（或许需要重置checkpoint），目标能重新得到A。

3.两各shake配置信息：
（1）
mongo_urls = mongodb://172.17.160.241:27001,172.17.160.241:27002,172.17.160.241:27003
collector.id = mongoshake2
checkpoint.interval = 5000
http_profile = 9100
system_profile = 9200
log_level = info
log_file = collector.log
log_buffer = false
filter.namespace.black =
filter.namespace.white =
oplog.gids =
shard_key = collection
syncer.reader.buffer_time = 3
worker = 3
worker.batch_queue_size = 64
worker.oplog_compressor = none
tunnel = direct
tunnel.address = mongodb://172.17.160.242:27001
context.storage = database
context.address = ckpt_default
context.start_position = 2000-01-01T00:00:01Z
master_quorum = true
replayer.dml_only = true
replayer.executor = 3
replayer.executor.upsert = false
replayer.executor.insert_on_dup_update = false
replayer.conflict_write_to = none
replayer.durable = true

（2）
mongo_urls = mongodb://172.17.160.241:27001,172.17.160.241:27002,172.17.160.241:27003
collector.id = mongoshake
checkpoint.interval = 5000
http_profile = 9101
system_profile = 9201
log_level = info
log_file = collector2.log
log_buffer = false
filter.namespace.black =
filter.namespace.white =
oplog.gids =
shard_key = collection
syncer.reader.buffer_time = 3
worker = 1
worker.batch_queue_size = 64
worker.oplog_compressor = none
tunnel = direct
tunnel.address = mongodb://172.17.160.242:27001
context.storage = database
context.address = ckpt_default
context.start_position = 2000-01-01T00:00:01Z
master_quorum = true
replayer.dml_only = true
replayer.executor = 1
replayer.executor.upsert = true
replayer.executor.insert_on_dup_update = false
replayer.conflict_write_to = none
replayer.durable = true

关于MongoShake应靠近目标还是源的问题

FAQ中提到，想要提高QPS，尽量将MongoShake靠近目标。

但是，我觉得oplog的网络传输可能才是瓶颈。MongoShake用了比如并行复制，发送前压缩，操作merge，无关oplog过滤等这些优化手段，理论上应该能在对oplog在传输到tunnel之前做一定的压缩。所以如果将MongoShake靠近源（比如同机房），最终网络上传输的数据就会变得更少，所以也有可能靠近源性能更好不是吗？

求解答，谢谢

release-v1.0.0-20180718 编译后，使用默认配置，运行报错

release-v1.0.0-20180718 编译后，使用默认配置，运行报错，错误日志如下
[11:19:47 CST 2018/07/19] [CRIT] (mongoshake/executor.(*Executor).execute:107) Replayer-0, executor-0, oplog for [db008.test01] op[i] failed. (*errors.errorString) [EOF], logs 1. firstLog &{6579760464337567746 i db008.test01 map[_id:ObjectIdHex("5b50014081ab277c676f2f61") name:canyou ok] map[] map[] map[] 0 0}

check point doesn't update，and it causes OOM

I use tunnel file，and find the document in ckpt_default doesn't update。the process will continue to consume memory，until OOM。
the ckpt_default document is：
{ "_id" : ObjectId("5b7a3b1dd001499fc4c2359e"), "name" : "rs2", "ckpt" : Timestamp(946656001, 0) }

fatal: Not a git repository (or any of the parent directories): .git

when exec step:
cd ../../ && ./build.sh
it issue:
fatal: Not a git repository (or any of the parent directories): .git

and how to fix it?

some logs can't be printed before exist

The LOG4GO library has the buffer so that the log may not be printed out before exist.

关于其他管道传输

我在FAQ document中看过了有关问题，使用file方式在本地测试中receiver已经成功收到oplog，但是不知道如何再写到数据库中，刚接触golang，能否给一个示例

开始同步的时间

我在源库新建了db 和 collection （testshinan.test-shinan），里面两条数据 A 和 B。时间是 2019-01-25 17：00 左右。  
然后，我在目标库，testshinan.test-shinan 加了数据A. 算是我理解的首次全量同步。
然后把配置项的 filter.namespace.white = testshinan.test-shinan 以及 context.start_position = 2019-01-25T00:00:01Z

问题：
 进程起了好久，我的数据B，还没过去。他是要轮询全部的oplog么？还是从 2019-01-25T00:00:01Z 这个时间点开始的？

或者说 context.start_position 这个时间需要和 oplog里的时间要一致

多个数据库配置同步方案

您好，请问有没有完善点的资料和文档，目前我有一个多个mongodb的备份需求，目的是做一套线下的数据库同步线上数据的方案，线下数据不会影响线上的数据，线上数据拉下来如果遇到本地冲突就抛弃，我看你们的项目有这个功能，所以非常兴奋，我们之前也是通过oplog同步，但是库太多表太多一直没找到一个好的同步方案，麻烦您们能有一份配置文档

认证开启场景，需要什么权限的用户？

对源和目标，分别需要最小的权限是什么？
我的理解是，对源只要oplog库的read权限；对目标需要被同步的库，以及mongoshake、mongoshake_conflict库的readwrite权限。

不知有无遗漏，可否整理下？

Mongoshake with Kafka

Hello,
I get the following error after trying to use MongoShake with Kafka : "Oplog Tailer initialize failed: worker initialize error". It only occurs with Kafka, when using other tunnel pipeline types like direct/File/RPC this error is not shown. I have a MongoDB replica set of 3 members (1 primary, 2 secondaries) and one Kafka broker with "mongoshake" topic in it.
mongo_urls is of the form : mongodb://IP_ADDR1:27017,IP_ADDR2:27017,IP_ADDR3:27017
and tunnel.address is of the form mongoshake@BROKER_IP and tunnel is set to kafka.
Does someone have an idea of how to solve this problem?
Thanks

Stream to another DB name

Hi,

In direct connection can we sync collection in whitelist to a different DB name in the target mongo ?

Tks !

启动collector报的错-“There has no oplog collection in mongo db server”

今天尝试在两个mongodb3.2.12 分片副本集

collector 配置链接mongos
启动时报错如下：

[09:26:34 CST 2018/10/29] [CRIT] (mongoshake/collector.(*ReplicationCoordinator).sanitizeMongoDB:67) There has no oplog collection in mongo db server

为什么我的mongoshake同步这么慢？

环境如下：
1，mongoshake部署在mongodb副本集的服务器上。
2，服务器是32核的，内存64G，mongod占用40G。
3，运行collector后，collector占用虚拟内存7G多，实际内存一致保持3G。
4，源库实例中，有140多个数据库。

mongoshake配置：
1，版本 1.4.5
2，配置文件参数除了黑名单，源库，目标库设置外，其他都为默认选项。
3，黑名单设置了43个库.表， 1个库

collector部分日志如下：

mongo-state输出：

docs/rpc

Im trying to use mongoshake with rpc if you could provide some examples in the documentation on how to do that?
also if you could provide some examples for ETL flows using mongoshake?

Syncing with mongo 3.6 - bug

When trying to sync 3.6 mongodb, mongo-shake prints an error

[07:40:40 UTC 2018/08/07] [CRIT] (mongoshake/executor.(*Executor).execute:108) Replayer-2, executor-2, oplog for [dev.users] op[u] failed. (*errors.errorString) [doUpdate run upsert/update[true] failed[The $v update field is only recognized internally]], logs 1. firstLog &{6575777281602486273 u dev.users map[$v:1 $set:map[lastname:Zamir]] map[_id:ObjectIdHex("5b41d9f9e35d435d6f507f94")] map[] map[] 0 0}

After a little bit of search online, I found https://jira.mongodb.org/browse/SERVER-32240 , If I understood it, the error occurs because they added the $v field in the oplog on the 3.6 version.

分片集群同步配置该怎么填写？

比如我三台机器上都运行了mongos，3个分片的mongod（shard1，shard2，shard3）和1个configserver，如下

mongos是192.168.1.1:27021,192.168.1.2:27021,192.168.1.3:27021
configserver是192.168.1.1:27020,192.168.1.2:27020,192.168.1.3:27020
shard1是192.168.1.1:27017,192.168.1.2:27017,192.168.1.3:27017
shard2是192.168.1.1:27018,192.168.1.2:27018,192.168.1.3:27018
shard3是192.168.1.1:27019,192.168.1.2:27019,192.168.1.3:27019
我的目的端和源端一样的配置，IP地址是4,5,6，配置文件该怎么填写？

#源
mongo_urls = mongodb://192.168.1.1:27017,192.168.1.1:27018,192.168.1.1:27019

#context
context.storage = database
context.storage.url = mongodb://127.0.0.1:27017
context.address = ckpt_default

#目的
tunnel = direct
tunnel.address = mongodb://192.168.1.4:27017,192.168.1.4:27018,192.168.1.4:27019

除了这些，configserver是不是也需要建立一个同步通道？

BSONObj size:is invalid.

(*mgo.BulkError) [BSONObj size: 21725988 (0x14B8324) is invalid. Size must be between 0 and 16793600(16MB)

woker之间唯一索引冲突如何避免

代码中，batcher根据文档id或ns的hash值，将oplog发送给不同的woker，然后在woker内部检查冲突。但是如果是根据id进行hash，同一ns的oplog可能会发送到不同的woker，那么不同woker中会不会出现唯一索引冲突的情况？这种冲突可以检测到吗？

第一个release 包发布时间

想问下第一个release包的发布有计划吗

mongo3.6中的change stream 是否能解决开启balancer的集群不能同步问题

之前有提到之所以不能同步开启balancer集群是因为时序问题,我看最新的change stream中包含一个global logical clock是否能解决这个问题,change stream也没有说不能同步开启了balancer的集群

分片集群同步，checkpoint持久化失败问题

shading集群，关键配置如下：

context.storage = database
context.storage.url = mongodb://127.0.0.1:20070（配置的是源的mongos，是不是这个配置有问题？？）
context.address = ckpt_default

日志有如下信息：

[2018/10/09 20:35:19 CST] [WARN] [collector.(*OplogSyncer).checkpoint:64] CheckpointOperation updated is not suitable. lowest [0]. current [406598025991
5268096]. reason : no candidates ack values found