Comments (37)
TL; DR
Delete events may be missing in watch response. This issue can only be reproduced when the compact revision happens to be a delete operation.
How to reproduce this issue?
- Start an one member etcd cluster in one terminal
- Execute commands below in another terminal
$ etcdctl put k1 v2 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":2}}
$ etcdctl put k1 v3 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2}}
$ etcdctl put k1 v4 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":4,"raft_term":2}}
$ etcdctl del k1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":5,"raft_term":2},"deleted":1}
$ etcdctl put k1 v6 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":6,"raft_term":2}}
- Compact revision 5
$ etcdctl compact 5
compacted revision 5
- Watch starting from rev 5
$ etcdctl watch k1 --rev 5
PUT
k1
v6
Obviously the deletion operation with revision 5 is missing.
Alternatively, If you want to try the robustness test, then run command below on #18145,
$ go test -run TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1 -v --count 100 --timeout 1h -failfast
Expected behaviour
As mentioned in #18089 (comment), some revisions may be compacted, and it may impact the watch clients. But it shouldn't never drop any events quietly.
Expected behaviour:
- Either the client side receives all events: no any events are dropped
- or the client side gets an
ErrCompacted
response.
Root cause
Usually etcd keeps the compact revision (see code below). For example, when you compact revision 5, actually etcd only compacts revisions <=4.
etcd/server/storage/mvcc/key_index.go
Line 256 in 8a0054f
But if the compact revision is a delete operation, then etcd doesn't keep it (in other words, etcd will compact/remove it) (see code below),
etcd/server/storage/mvcc/key_index.go
Lines 244 to 247 in 8a0054f
But the problem is that etcdserver doesn't return ErrCompacted
if the startWatchRevision happens to be the last compact revision. Because the if condition is if w.minRev < compactRev
instead of if w.minRev <= compactRev
.
etcd/server/storage/mvcc/watcher_group.go
Line 253 in 8a0054f
Note this isn't a regression. It's an old issue.
Potential solution
It's open to discussion.
When executing compaction operation, we should only keep the latest revision for each key. For example, if the latest revision for "k1" is 10, and user compacts 10, then we should keep 10; otherwise if user compacts any revision <=9, we just compact it, there is no point to keep it. Note that different keys may have different latest revision.
The available
is the revisions to keep,
etcd/server/storage/mvcc/key_index.go
Line 208 in 8a0054f
We also need to change the if condition to if w.minRev <= compactRev
,
etcd/server/storage/mvcc/watcher_group.go
Line 253 in 8a0054f
So the watch client will get an ErrCompacted
response.
Impact
All versions (including 3.4, 3.5 and main) are impacted.
Major, but it might not be easy to reproduce.
- As mentioned above, it can only be reproduced when the compact revision happens to be a delete operation.
- Also usually compaction operation isn't that frequent in production environment. Normally the watch delay won't be greater than compaction interval.
Immediate action
Create a simple e2e test to reliably reproduce this issue. Please refer to the section "How to reproduce this issue?
" above.
Brainstorm the solution.
from etcd.
Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?
Following the clear manual etcdctl
reproduce steps provided by @ahrtr I reproduced the issue on v3.4.0
so it looks like this has been present for at least ~5 years.
from etcd.
I managed to reproduce it without multi TXN, just a delete request. This extends the impact to Kubernetes.
Again the common thing is that etcd misses a watch even if it restores kvstore on revision before a delete.
In last repro:
logger.go:146: 2024-06-07T14:46:59.319+0200 ERROR Broke watch guarantee {"guarantee": "reliable", "client": 1}
[]model.PersistedEvent{
... // 160 identical elements
{Event: {Type: "delete-operation", Key: "key6"}, Revision: 140},
{Event: {Type: "put-operation", Key: "key9", Value: {Value: "138"}}, Revision: 141, ...},
- {Event: model.Event{Type: "delete-operation", Key: "key7"}, Revision: 142},
{Event: {Type: "put-operation", Key: "key3", Value: {Value: "147"}}, Revision: 143},
{Event: {Type: "put-operation", Key: "key3", Value: {Value: "148"}}, Revision: 144},
... // 79 identical elements
}
etcd bootstrap log:
{"level":"info","ts":"2024-06-07T14:46:56.832687+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":141}
from etcd.
Something similar reproduced on v3.4, however this time it was long after crash, and it flagged broken resumable properly. Which just means it provided incorrect event on a newly opened watch which provided revision.
=== RUN TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
logger.go:146: 2024-06-07T15:10:05.793+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:05.793+0200 INFO spawning process {"args": ["/tmp/etcd-release-3.4-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001", "--snapshot-count=1000", "--snapshot-count=1000", "--initial-cluster-token=new", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1937017586/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (2063001): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
member-0 (2063001): A future release of golang/protobuf will delete this package,
member-0 (2063001): which has long been excluded from the compatibility promise.
member-0 (2063001):
member-0 (2063001): 2024-06-07 15:10:05.801650 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
member-0 (2063001): 2024-06-07 15:10:05.801710 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
member-0 (2063001): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063001): 2024-06-07 15:10:05.801731 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063001): 2024-06-07 15:10:05.801737 I | etcdmain: etcd Version: 3.4.32
member-0 (2063001): 2024-06-07 15:10:05.801741 I | etcdmain: Git SHA: 88ff600-FAILPOINTS
member-0 (2063001): 2024-06-07 15:10:05.801745 I | etcdmain: Go Version: go1.22.0
member-0 (2063001): 2024-06-07 15:10:05.801750 I | etcdmain: Go OS/Arch: linux/amd64
member-0 (2063001): 2024-06-07 15:10:05.801755 I | etcdmain: setting maximum number of CPUs to 24, total number of available CPUs is 24
member-0 (2063001): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063001): 2024-06-07 15:10:05.802103 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063001): 2024-06-07 15:10:05.802482 I | embed: name = TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0
member-0 (2063001): 2024-06-07 15:10:05.802489 I | embed: data dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001
member-0 (2063001): 2024-06-07 15:10:05.802494 I | embed: member dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member
member-0 (2063001): 2024-06-07 15:10:05.802500 I | embed: heartbeat = 100ms
member-0 (2063001): 2024-06-07 15:10:05.802505 I | embed: election = 1000ms
member-0 (2063001): 2024-06-07 15:10:05.802510 I | embed: snapshot count = 1000
member-0 (2063001): 2024-06-07 15:10:05.802521 I | embed: advertise client URLs = http://localhost:20000
member-0 (2063001): 2024-06-07 15:10:05.823121 I | etcdserver: starting member ca50e9357181d758 in cluster 34f27e83b3bc2ff
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=()
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 became follower at term 0
member-0 (2063001): raft2024/06/07 15:10:05 INFO: newRaft ca50e9357181d758 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 became follower at term 1
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063001): 2024-06-07 15:10:05.837990 W | auth: simple token is not cryptographically signed
member-0 (2063001): 2024-06-07 15:10:05.849603 I | etcdserver: starting server... [version: 3.4.32, cluster version: to_be_decided]
member-0 (2063001): 2024-06-07 15:10:05.849682 I | etcdserver: ca50e9357181d758 as single-node; fast-forwarding 9 ticks (election ticks 10)
member-0 (2063001): 2024-06-07 15:10:05.849979 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap.db, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.850015 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.850029 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/wal, suffix: wal, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.851460 I | embed: listening for peers on 127.0.0.1:20001
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063001): 2024-06-07 15:10:05.852075 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 is starting a new election at term 1
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 became candidate at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 became leader at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 2
member-0 (2063001): 2024-06-07 15:10:06.425669 I | etcdserver: setting up the initial cluster version to 3.4
member-0 (2063001): 2024-06-07 15:10:06.427632 I | embed: ready to serve client requests
member-0 (2063001): 2024-06-07 15:10:06.427648 I | etcdserver: published {Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
member-0 (2063001): 2024-06-07 15:10:06.428428 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
member-0 (2063001): 2024-06-07 15:10:06.429334 N | etcdserver/membership: set the initial cluster version to 3.4
member-0 (2063001): 2024-06-07 15:10:06.429387 I | etcdserver/api: enabled capabilities for version 3.4
logger.go:146: 2024-06-07T15:10:06.436+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 2063001}
logger.go:146: 2024-06-07T15:10:06.443+0200 INFO Start traffic
member-0 (2063001): 2024-06-07 15:10:06.463482 I | mvcc: store.index: compact 10
member-0 (2063001): 2024-06-07 15:10:06.467626 I | mvcc: finished scheduled compaction at 10 (took 4.101548ms)
member-0 (2063001): 2024-06-07 15:10:06.485280 I | mvcc: store.index: compact 21
member-0 (2063001): 2024-06-07 15:10:06.491005 I | mvcc: finished scheduled compaction at 21 (took 5.681245ms)
member-0 (2063001): 2024-06-07 15:10:06.499354 I | mvcc: store.index: compact 30
member-0 (2063001): 2024-06-07 15:10:06.503435 I | mvcc: finished scheduled compaction at 30 (took 4.021316ms)
member-0 (2063001): 2024-06-07 15:10:06.516624 I | mvcc: store.index: compact 36
member-0 (2063001): 2024-06-07 15:10:06.522273 I | mvcc: finished scheduled compaction at 36 (took 5.604531ms)
member-0 (2063001): 2024-06-07 15:10:06.531697 I | mvcc: store.index: compact 43
member-0 (2063001): 2024-06-07 15:10:06.544126 I | mvcc: finished scheduled compaction at 43 (took 12.389592ms)
member-0 (2063001): 2024-06-07 15:10:06.544147 I | mvcc: store.index: compact 45
member-0 (2063001): 2024-06-07 15:10:06.549828 I | mvcc: finished scheduled compaction at 45 (took 5.663001ms)
member-0 (2063001): 2024-06-07 15:10:06.565709 I | mvcc: store.index: compact 52
member-0 (2063001): 2024-06-07 15:10:06.569921 I | mvcc: finished scheduled compaction at 52 (took 4.110444ms)
member-0 (2063001): 2024-06-07 15:10:06.576837 I | mvcc: store.index: compact 58
member-0 (2063001): 2024-06-07 15:10:06.580559 I | mvcc: finished scheduled compaction at 58 (took 3.700063ms)
member-0 (2063001): 2024-06-07 15:10:06.593001 I | mvcc: store.index: compact 63
member-0 (2063001): 2024-06-07 15:10:06.598580 I | mvcc: finished scheduled compaction at 63 (took 5.512127ms)
member-0 (2063001): 2024-06-07 15:10:06.610192 I | mvcc: store.index: compact 74
member-0 (2063001): 2024-06-07 15:10:06.610655 W | etcdserver: failed to apply request "compaction:<revision:74 > header:<ID:15517310789174624164 > " with response "" took (2.796µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063001): 2024-06-07 15:10:06.614154 I | mvcc: finished scheduled compaction at 74 (took 3.943601ms)
member-0 (2063001): 2024-06-07 15:10:06.628541 I | mvcc: store.index: compact 76
member-0 (2063001): 2024-06-07 15:10:06.632295 I | mvcc: finished scheduled compaction at 76 (took 3.736361ms)
member-0 (2063001): 2024-06-07 15:10:06.638518 I | mvcc: store.index: compact 79
member-0 (2063001): 2024-06-07 15:10:06.642166 I | mvcc: finished scheduled compaction at 79 (took 3.625864ms)
member-0 (2063001): 2024-06-07 15:10:06.695912 I | mvcc: store.index: compact 81
member-0 (2063001): 2024-06-07 15:10:06.699766 I | mvcc: finished scheduled compaction at 81 (took 3.833413ms)
member-0 (2063001): 2024-06-07 15:10:06.811286 I | mvcc: store.index: compact 88
member-0 (2063001): 2024-06-07 15:10:06.815069 I | mvcc: finished scheduled compaction at 88 (took 3.751339ms)
member-0 (2063001): 2024-06-07 15:10:06.926055 I | mvcc: store.index: compact 98
member-0 (2063001): 2024-06-07 15:10:06.929943 I | mvcc: finished scheduled compaction at 98 (took 3.865914ms)
member-0 (2063001): 2024-06-07 15:10:07.006149 I | mvcc: store.index: compact 104
member-0 (2063001): 2024-06-07 15:10:07.009806 I | mvcc: finished scheduled compaction at 104 (took 3.640982ms)
member-0 (2063001): 2024-06-07 15:10:07.047942 I | mvcc: store.index: compact 105
member-0 (2063001): 2024-06-07 15:10:07.051740 I | mvcc: finished scheduled compaction at 105 (took 3.764455ms)
member-0 (2063001): 2024-06-07 15:10:07.161575 I | mvcc: store.index: compact 113
member-0 (2063001): 2024-06-07 15:10:07.165496 I | mvcc: finished scheduled compaction at 113 (took 3.891012ms)
member-0 (2063001): 2024-06-07 15:10:07.169287 I | mvcc: store.index: compact 114
member-0 (2063001): 2024-06-07 15:10:07.173334 I | mvcc: finished scheduled compaction at 114 (took 4.008702ms)
member-0 (2063001): 2024-06-07 15:10:07.177192 I | mvcc: store.index: compact 115
member-0 (2063001): 2024-06-07 15:10:07.177233 I | mvcc: finished scheduled compaction at 115 (took 15.309µs)
member-0 (2063001): 2024-06-07 15:10:07.336646 I | mvcc: store.index: compact 125
member-0 (2063001): 2024-06-07 15:10:07.340373 I | mvcc: finished scheduled compaction at 125 (took 3.708237ms)
member-0 (2063001): 2024-06-07 15:10:07.396406 I | mvcc: store.index: compact 129
member-0 (2063001): 2024-06-07 15:10:07.400353 I | mvcc: finished scheduled compaction at 129 (took 3.92806ms)
member-0 (2063001): 2024-06-07 15:10:07.421834 I | mvcc: store.index: compact 131
member-0 (2063001): 2024-06-07 15:10:07.426046 I | mvcc: finished scheduled compaction at 131 (took 4.182771ms)
member-0 (2063001): 2024-06-07 15:10:07.432261 W | etcdserver: failed to apply request "compaction:<revision:131 > header:<ID:15517310789174624338 > " with response "" took (9.297µs) to execute, err is mvcc: required revision has been compacted
logger.go:146: 2024-06-07T15:10:07.439+0200 INFO Triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
/tmp/etcd-release-3.4-failpoints/bin/etcdctl (/tmp/etcd-release-3.4-failpoints/bin/etcdctl_--endpoints=http://localhost:20000_--command-timeout=1.1s_get_0_-w_json_-w_json_-w_json) (2063029): {"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":136,"raft_term":2}}
member-0 (2063001): 2024-06-07 15:10:07.470978 I | mvcc: store.index: compact 135
/tmp/etcd-release-3.4-failpoints/bin/etcdctl (/tmp/etcd-release-3.4-failpoints/bin/etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (2063037): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":137,"raft_term":2},"version":"3.4.32","dbSize":24576,"leader":14578408409545168728,"raftIndex":171,"raftTerm":2,"raftAppliedIndex":171,"dbSizeInUse":16384}}]
cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
logger.go:146: 2024-06-07T15:10:07.472+0200 INFO Setting up gofailpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
logger.go:146: 2024-06-07T15:10:07.472+0200 INFO Waiting for member to exit {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
member-0 (2063001): 2024-06-07 15:10:07.475266 I | mvcc: finished scheduled compaction at 135 (took 4.25666ms)
member-0 (2063001): panic: failpoint panic: {}
member-0 (2063001):
member-0 (2063001): goroutine 120 [running]:
member-0 (2063001): go.etcd.io/gofail/runtime.actPanic(0x0?)
member-0 (2063001): ./pkg/mod/go.etcd.io/[email protected]/runtime/terms.go:318 +0x65
member-0 (2063001): go.etcd.io/gofail/runtime.(*term).do(...)
member-0 (2063001): ./pkg/mod/go.etcd.io/[email protected]/runtime/terms.go:290
member-0 (2063001): go.etcd.io/gofail/runtime.(*terms).eval(0xc0000bcde0?)
member-0 (2063001): ./pkg/mod/go.etcd.io/[email protected]/runtime/terms.go:105 +0xe3
member-0 (2063001): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc00051bba0?)
member-0 (2063001): ./pkg/mod/go.etcd.io/[email protected]/runtime/failpoint.go:38 +0x98
member-0 (2063001): go.etcd.io/etcd/etcdserver.(*raftNode).start.func1()
member-0 (2063001): /tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:242 +0xae5
member-0 (2063001): created by go.etcd.io/etcd/etcdserver.(*raftNode).start in goroutine 109
member-0 (2063001): /tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:165 +0x6f
logger.go:146: 2024-06-07T15:10:07.479+0200 INFO server exited {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "code": 2}
logger.go:146: 2024-06-07T15:10:07.479+0200 INFO Member exited as expected {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:07.479+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:07.479+0200 INFO spawning process {"args": ["/tmp/etcd-release-3.4-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001", "--snapshot-count=1000", "--snapshot-count=1000", "--initial-cluster-token=new", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1937017586/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (2063048): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
member-0 (2063048): A future release of golang/protobuf will delete this package,
member-0 (2063048): which has long been excluded from the compatibility promise.
member-0 (2063048):
member-0 (2063048): 2024-06-07 15:10:07.488960 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
member-0 (2063048): 2024-06-07 15:10:07.489016 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
member-0 (2063048): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063048): 2024-06-07 15:10:07.489034 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063048): 2024-06-07 15:10:07.489040 I | etcdmain: etcd Version: 3.4.32
member-0 (2063048): 2024-06-07 15:10:07.489044 I | etcdmain: Git SHA: 88ff600-FAILPOINTS
member-0 (2063048): 2024-06-07 15:10:07.489048 I | etcdmain: Go Version: go1.22.0
member-0 (2063048): 2024-06-07 15:10:07.489052 I | etcdmain: Go OS/Arch: linux/amd64
member-0 (2063048): 2024-06-07 15:10:07.489056 I | etcdmain: setting maximum number of CPUs to 24, total number of available CPUs is 24
member-0 (2063048): 2024-06-07 15:10:07.489120 N | etcdmain: the server is already initialized as member before, starting as etcd member...
member-0 (2063048): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063048): 2024-06-07 15:10:07.489130 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063048): 2024-06-07 15:10:07.489546 I | embed: name = TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0
member-0 (2063048): 2024-06-07 15:10:07.489553 I | embed: data dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001
member-0 (2063048): 2024-06-07 15:10:07.489558 I | embed: member dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member
member-0 (2063048): 2024-06-07 15:10:07.489562 I | embed: heartbeat = 100ms
member-0 (2063048): 2024-06-07 15:10:07.489566 I | embed: election = 1000ms
member-0 (2063048): 2024-06-07 15:10:07.489571 I | embed: snapshot count = 1000
member-0 (2063048): 2024-06-07 15:10:07.489581 I | embed: advertise client URLs = http://localhost:20000
member-0 (2063048): 2024-06-07 15:10:07.489586 I | embed: initial advertise peer URLs = http://localhost:20001
member-0 (2063048): 2024-06-07 15:10:07.489591 I | embed: initial cluster =
member-0 (2063048): 2024-06-07 15:10:07.498379 I | etcdserver: restarting member ca50e9357181d758 in cluster 34f27e83b3bc2ff at commit index 172
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 switched to configuration voters=()
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 became follower at term 2
member-0 (2063048): raft2024/06/07 15:10:07 INFO: newRaft ca50e9357181d758 [peers: [], term: 2, commit: 172, applied: 0, lastindex: 172, lastterm: 2]
member-0 (2063048): 2024-06-07 15:10:07.501601 W | auth: simple token is not cryptographically signed
member-0 (2063048): 2024-06-07 15:10:07.505904 I | mvcc: restore compact to 135
member-0 (2063048): 2024-06-07 15:10:07.508923 I | etcdserver: starting server... [version: 3.4.32, cluster version: to_be_decided]
member-0 (2063048): 2024-06-07 15:10:07.509090 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap.db, max: 5, interval: 30s
member-0 (2063048): 2024-06-07 15:10:07.509121 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap, max: 5, interval: 30s
member-0 (2063048): 2024-06-07 15:10:07.509492 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/wal, suffix: wal, max: 5, interval: 30s
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063048): 2024-06-07 15:10:07.509806 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
member-0 (2063048): 2024-06-07 15:10:07.509871 N | etcdserver/membership: set the initial cluster version to 3.4
member-0 (2063048): 2024-06-07 15:10:07.509893 I | etcdserver/api: enabled capabilities for version 3.4
member-0 (2063048): 2024-06-07 15:10:07.510650 W | etcdserver: failed to apply request "compaction:<revision:135 > header:<ID:15517310789174624347 > " with response "" took (8.226µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:07.511244 I | embed: listening for peers on 127.0.0.1:20001
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 is starting a new election at term 2
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 became candidate at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 became leader at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3
member-0 (2063048): 2024-06-07 15:10:08.716725 I | embed: ready to serve client requests
member-0 (2063048): 2024-06-07 15:10:08.716809 I | etcdserver: published {Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
member-0 (2063048): 2024-06-07 15:10:08.718833 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
logger.go:146: 2024-06-07T15:10:08.723+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 2063048}
logger.go:146: 2024-06-07T15:10:08.724+0200 INFO Finished triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (2063048): 2024-06-07 15:10:08.734129 I | mvcc: store.index: compact 138
member-0 (2063048): 2024-06-07 15:10:08.734171 W | etcdserver: failed to apply request "compaction:<revision:138 > header:<ID:15517310789175053071 > " with response "" took (1.643µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:08.737837 I | mvcc: finished scheduled compaction at 138 (took 3.684213ms)
member-0 (2063048): 2024-06-07 15:10:08.749556 I | mvcc: store.index: compact 145
member-0 (2063048): 2024-06-07 15:10:08.755487 I | mvcc: finished scheduled compaction at 145 (took 5.872415ms)
member-0 (2063048): 2024-06-07 15:10:08.785454 I | mvcc: store.index: compact 180
member-0 (2063048): 2024-06-07 15:10:08.793336 I | mvcc: finished scheduled compaction at 180 (took 7.56219ms)
member-0 (2063048): 2024-06-07 15:10:08.801507 I | mvcc: store.index: compact 183
member-0 (2063048): 2024-06-07 15:10:08.807138 I | mvcc: finished scheduled compaction at 183 (took 5.584654ms)
member-0 (2063048): 2024-06-07 15:10:08.811544 I | mvcc: store.index: compact 188
member-0 (2063048): 2024-06-07 15:10:08.817315 I | mvcc: finished scheduled compaction at 188 (took 5.727472ms)
member-0 (2063048): 2024-06-07 15:10:08.823976 I | mvcc: store.index: compact 195
member-0 (2063048): 2024-06-07 15:10:08.823985 W | etcdserver: failed to apply request "compaction:<revision:195 > header:<ID:15517310789175053189 > " with response "" took (2.235µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:08.827999 I | mvcc: finished scheduled compaction at 195 (took 3.994016ms)
member-0 (2063048): 2024-06-07 15:10:08.907484 I | mvcc: store.index: compact 219
member-0 (2063048): 2024-06-07 15:10:08.911307 I | mvcc: finished scheduled compaction at 219 (took 3.790332ms)
member-0 (2063048): 2024-06-07 15:10:08.938127 I | mvcc: store.index: compact 220
member-0 (2063048): 2024-06-07 15:10:08.941826 I | mvcc: finished scheduled compaction at 220 (took 3.665007ms)
member-0 (2063048): 2024-06-07 15:10:09.027783 I | mvcc: store.index: compact 230
member-0 (2063048): 2024-06-07 15:10:09.031653 I | mvcc: finished scheduled compaction at 230 (took 3.827542ms)
member-0 (2063048): 2024-06-07 15:10:09.157620 I | mvcc: store.index: compact 239
member-0 (2063048): 2024-06-07 15:10:09.161540 I | mvcc: finished scheduled compaction at 239 (took 3.871184ms)
member-0 (2063048): 2024-06-07 15:10:09.232656 I | mvcc: store.index: compact 246
member-0 (2063048): 2024-06-07 15:10:09.236458 I | mvcc: finished scheduled compaction at 246 (took 3.767421ms)
member-0 (2063048): 2024-06-07 15:10:09.413328 I | mvcc: store.index: compact 256
member-0 (2063048): 2024-06-07 15:10:09.417222 I | mvcc: finished scheduled compaction at 256 (took 3.859343ms)
member-0 (2063048): 2024-06-07 15:10:09.438400 I | mvcc: store.index: compact 257
member-0 (2063048): 2024-06-07 15:10:09.442288 I | mvcc: finished scheduled compaction at 257 (took 3.864001ms)
member-0 (2063048): 2024-06-07 15:10:09.508357 I | mvcc: store.index: compact 264
member-0 (2063048): 2024-06-07 15:10:09.513601 I | mvcc: finished scheduled compaction at 264 (took 5.151462ms)
member-0 (2063048): 2024-06-07 15:10:09.657334 I | mvcc: store.index: compact 277
member-0 (2063048): 2024-06-07 15:10:09.661195 I | mvcc: finished scheduled compaction at 277 (took 3.822834ms)
member-0 (2063048): 2024-06-07 15:10:09.678618 I | mvcc: store.index: compact 280
member-0 (2063048): 2024-06-07 15:10:09.682446 I | mvcc: finished scheduled compaction at 280 (took 3.801793ms)
member-0 (2063048): 2024-06-07 15:10:09.692338 I | mvcc: store.index: compact 282
member-0 (2063048): 2024-06-07 15:10:09.696220 I | mvcc: finished scheduled compaction at 282 (took 3.859612ms)
logger.go:146: 2024-06-07T15:10:09.762+0200 INFO Finished traffic
logger.go:146: 2024-06-07T15:10:10.773+0200 INFO Reporting complete traffic {"successes": 824, "failures": 44, "successRate": 0.9493087557603687, "period": "3.319285557s", "qps": 248.24619209464416}
logger.go:146: 2024-06-07T15:10:10.773+0200 INFO Reporting traffic before failure injection {"successes": 408, "failures": 0, "successRate": 1, "period": "996.375923ms", "qps": 409.48400155189216}
logger.go:146: 2024-06-07T15:10:10.773+0200 INFO Reporting traffic during failure injection {"successes": 25, "failures": 43, "successRate": 0.36764705882352944, "period": "1.284883155s", "qps": 19.45702214455446}
logger.go:146: 2024-06-07T15:10:10.774+0200 INFO Reporting traffic after failure injection {"successes": 391, "failures": 1, "successRate": 0.9974489795918368, "period": "1.038026479s", "qps": 376.6763256142332}
logger.go:146: 2024-06-07T15:10:10.774+0200 INFO Finished simulating traffic {"max-revision": 288}
logger.go:146: 2024-06-07T15:10:10.780+0200 INFO Validating linearizable operations {"timeout": "5m0s"}
logger.go:146: 2024-06-07T15:10:10.925+0200 INFO Linearization success {"duration": "145.013099ms"}
logger.go:146: 2024-06-07T15:10:10.926+0200 INFO Validating watch
logger.go:146: 2024-06-07T15:10:10.926+0200 ERROR Broke watch guarantee {"guarantee": "resumable", "client": 2, "request": {"Key":"","Revision":220,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":false}, "got-event": {"Type":"put-operation","Key":"key6","Value":{"Value":"235","Hash":0},"Revision":221,"IsCreate":false,"PrevValue":null}, "want-event": {"Type":"delete-operation","Key":"key7","Value":{"Value":"","Hash":0},"Revision":220,"IsCreate":false}}
validate.go:49: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window
logger.go:146: 2024-06-07T15:10:10.926+0200 INFO Validating serializable operations
logger.go:146: 2024-06-07T15:10:10.936+0200 INFO Saving robustness test report {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979"}
logger.go:146: 2024-06-07T15:10:10.936+0200 INFO Saving member data dir {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/server-TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:10.936+0200 INFO no watch operations for client, skip persisting {"client-id": 1}
logger.go:146: 2024-06-07T15:10:10.936+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-1/operations.json"}
logger.go:146: 2024-06-07T15:10:10.937+0200 INFO Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-2/watch.json"}
logger.go:146: 2024-06-07T15:10:10.937+0200 INFO no KV operations for client, skip persisting {"client-id": 2}
logger.go:146: 2024-06-07T15:10:10.938+0200 INFO no watch operations for client, skip persisting {"client-id": 3}
logger.go:146: 2024-06-07T15:10:10.938+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-3/operations.json"}
logger.go:146: 2024-06-07T15:10:10.939+0200 INFO no watch operations for client, skip persisting {"client-id": 4}
logger.go:146: 2024-06-07T15:10:10.939+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-4/operations.json"}
logger.go:146: 2024-06-07T15:10:10.940+0200 INFO no watch operations for client, skip persisting {"client-id": 5}
logger.go:146: 2024-06-07T15:10:10.940+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-5/operations.json"}
logger.go:146: 2024-06-07T15:10:10.941+0200 INFO no watch operations for client, skip persisting {"client-id": 6}
logger.go:146: 2024-06-07T15:10:10.941+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-6/operations.json"}
logger.go:146: 2024-06-07T15:10:10.943+0200 INFO no watch operations for client, skip persisting {"client-id": 7}
logger.go:146: 2024-06-07T15:10:10.943+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-7/operations.json"}
logger.go:146: 2024-06-07T15:10:10.944+0200 INFO no watch operations for client, skip persisting {"client-id": 8}
logger.go:146: 2024-06-07T15:10:10.944+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-8/operations.json"}
logger.go:146: 2024-06-07T15:10:10.945+0200 INFO no watch operations for client, skip persisting {"client-id": 9}
logger.go:146: 2024-06-07T15:10:10.945+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-9/operations.json"}
logger.go:146: 2024-06-07T15:10:10.946+0200 INFO no watch operations for client, skip persisting {"client-id": 10}
logger.go:146: 2024-06-07T15:10:10.946+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-10/operations.json"}
logger.go:146: 2024-06-07T15:10:10.947+0200 INFO Saving visualization {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/history.html"}
logger.go:146: 2024-06-07T15:10:11.016+0200 INFO killing server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:11.016+0200 INFO stopping server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-07T15:10:11.021+0200 INFO stopped server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
--- FAIL: TestRobustnessExploratory (5.24s)
--- FAIL: TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1 (5.23s)
from etcd.
Thanks, @jmhbnz, for confirming this on v3.4.0!
I appreciate everyone's insights so far. To clarify my earlier question, it's sometimes valuable to confirm even seemingly obvious details, just to ensure we're all on the same page and can make the most informed decisions going forward.
Now that we know this isn't a recent regression, we can focus on understanding the underlying cause and potential solutions.
from etcd.
One thing I noticed, that the WatchChan is not broken during etcd downtime, so the time it's down is short enough that the etcd client code retries the request in a transparent way. There might be a bug there too.
Still there is a possibility of a bug in recording client responses, it is a little more complicated to record history of watch than KV requests.
etcd/tests/robustness/client/client.go
Lines 268 to 304 in 8a0054f
from etcd.
Added logs to ensure that issue is not one robustness test, got:
They clearly show that revision 125 was not provided to client.
Adding new event response to watch, revs: 124...124
./bin/etcdctl (./bin/etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (3237740): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":123,"raft_term":2},"version":"3.6.0-alpha.0","dbSize":20480,"leader":14578408409545168728,"raftIndex":190,"raftTerm":2,"raftAppliedIndex":189,"dbSizeInUse":16384,"storageVersion":"3.6.0"}}]
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): {"level":"info","ts":"2024-06-10T14:40:21.080206+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":120}
cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
logger.go:146: 2024-06-10T14:40:21.080+0200 INFO Setting up gofailpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
logger.go:146: 2024-06-10T14:40:21.080+0200 INFO Waiting for member to exit {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): panic: failpoint panic: {}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): goroutine 140 [running]:
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.actPanic(0x10100409d13?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/[email protected]/runtime/terms.go:318 +0x65
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*term).do(...)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/[email protected]/runtime/terms.go:290
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*terms).eval(0xc0000bdce0?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/[email protected]/runtime/terms.go:105 +0xe3
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc000257c90?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/[email protected]/runtime/failpoint.go:38 +0x98
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start.func1()
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/etcd/server/v3/etcdserver/raft.go:229 +0x716
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): created by go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start in goroutine 171
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/etcd/server/v3/etcdserver/raft.go:166 +0x6f
logger.go:146: 2024-06-10T14:40:21.090+0200 INFO server exited {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "code": 2}
logger.go:146: 2024-06-10T14:40:21.090+0200 INFO Member exited as expected {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
logger.go:146: 2024-06-10T14:40:21.090+0200 INFO starting server... {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
Adding new error response to watch, err: rpc error: code = Unavailable desc = error reading from server: EOF
logger.go:146: 2024-06-10T14:40:21.090+0200 INFO spawning process {"args": ["./bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001", "--snapshot-count=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=100", "--heartbeat-interval=101", "--experimental-compaction-batch-limit=10", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--initial-cluster-token=new", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2003573077/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100443+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100561+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100609+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100628+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./bin/etcd","--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","--snapshot-count=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=100","--heartbeat-interval=101","--experimental-compaction-batch-limit=10","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--initial-cluster-token=new","--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001","--initial-cluster-state=new"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100701+0200","caller":"etcdmain/etcd.go:106","msg":"server has already been initialized","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","dir-type":"member"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100719+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100733+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.101260+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.101669+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"8a0054fec","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":true,"name":"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":100,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.102192+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0000924a0}"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.109701+0200","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db) successfully"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.109759+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db","took":"7.653391ms"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110518+0200","caller":"etcdserver/bootstrap.go:412","msg":"recovered v2 store from snapshot","snapshot-index":102,"snapshot-size":"7.2 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110534+0200","caller":"storage/backend.go:108","msg":"Skipping snapshot backend","consistent-index":191,"snapshot-index":102}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110547+0200","caller":"etcdserver/bootstrap.go:231","msg":"recovered v3 backend","backend-size-bytes":20480,"backend-size":"20 kB","backend-size-in-use-bytes":12288,"backend-size-in-use":"12 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111143+0200","caller":"etcdserver/bootstrap.go:579","msg":"restarting local member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","commit-index":191}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111158+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111260+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111357+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111368+0200","caller":"membership/cluster.go:291","msg":"recovered/added member from store","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","recovered-remote-peer-id":"ca50e9357181d758","recovered-remote-peer-urls":["http://localhost:20001"],"recovered-remote-peer-is-learner":false}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111377+0200","caller":"membership/cluster.go:301","msg":"set cluster version from store","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111391+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111403+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111447+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111480+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111494+0200","logger":"raft","caller":"[email protected]/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [ca50e9357181d758], term: 2, commit: 191, applied: 102, lastindex: 192, lastterm: 2]"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.115179+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.118827+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":113}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.119624+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":124}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.123574+0200","caller":"mvcc/kvstore.go:413","msg":"resume scheduled compaction","scheduled-compact-revision":120}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.123661+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":120}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141549+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141594+0200","caller":"etcdserver/server.go:576","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-id":"34f27e83b3bc2ff","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141643+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":120,"took":"17.934179ms","hash":1204195312,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141667+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141660+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141708+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141769+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141786+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141776+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141804+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141807+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.616829+0200","logger":"raft","caller":"[email protected]/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617106+0200","logger":"raft","caller":"[email protected]/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617166+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617184+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617211+0200","logger":"raft","caller":"[email protected]/raft.go:883","msg":"ca50e9357181d758 became candidate at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623853+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623883+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623904+0200","logger":"raft","caller":"[email protected]/raft.go:938","msg":"ca50e9357181d758 became leader at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623916+0200","logger":"raft","caller":"[email protected]/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628269+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628273+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628459+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628566+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628867+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.630495+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
logger.go:146: 2024-06-10T14:40:21.633+0200 INFO started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 3237749}
logger.go:146: 2024-06-10T14:40:21.635+0200 INFO Finished triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
Adding response watch: header:<cluster_id:238453183653593855 member_id:14578408409545168728 revision:125 raft_term:3 > created:true
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.114627+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":125}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.131637+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":125,"took":"16.956782ms","hash":981787830,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.131665+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":981787830,"revision":125,"compact-revision":120}
Adding new event response to watch, revs: 126...132
from etcd.
Just as a thought exercise, how can admin of etcd cluster be alerted about the dropped watch events issue in general?
Excellent question! Unfortunately, there's no simple way for etcd cluster admins to be directly alerted about dropped watch events in a production environment. The robustness tests achieve this by collecting comprehensive data – requests from all clients, WAL entries, etc. – and performing in-depth analysis. This level of data collection and analysis isn't feasible in production due to the sheer volume of information involved.
This highlights the importance of rigorous, continuous robustness testing with diverse traffic types and failure injection. It allows us to thoroughly analyze smaller chunks of data in a controlled environment, identifying issues that might otherwise go unnoticed in the vastness of production data.
On Kubernetes side, I have some plans to improve watch cache observability. Stay tuned!
from etcd.
Always keep the compact revision, and delay compact tombstone revision #18162
Thanks @MadhavJivrajani . We need to add more tests ((or update existing tests)), please see all the proposed test cases as mentioned in #18162 (comment). Note I also added the test cases in the doc (under the option 2).
@fuweid @MadhavJivrajani Please consider a whole plan to resolve this issue. Please feel free to ping on slack if you want any discussion.
from etcd.
cc @MadhavJivrajani @siyuanfoundation @fuweid
Can someone take a look and double check my findings?
from etcd.
/assign
from etcd.
I rerun the report on latest version of #17833 and it found that linearization was broken. Looks like a false positive due to some bug.
from etcd.
Ok, ok I encountered another case.
Failpoint used: raftBeforeLeaderSend=panic()
Scenario: TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
Etcd version:
./bin/etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: 6f916d428
Go Version: go1.22.4
Go OS/Arch: linux/amd64
Report: 1717749910517706368.tar.gz
Found when adding compaction to operations serathius@5959110
Logs showing broken watch guarantee due to missing log:
=== RUN TestDataReports/1717749910517706368
logger.go:146: 2024-06-07T11:31:24.546+0200 INFO Validating linearizable operations {"timeout": "5m0s"}
logger.go:146: 2024-06-07T11:31:24.950+0200 INFO Linearization success {"duration": "404.080944ms"}
logger.go:146: 2024-06-07T11:31:24.951+0200 INFO Validating watch
logger.go:146: 2024-06-07T11:31:24.960+0200 ERROR Broke watch guarantee {"guarantee": "reliable", "client": 2}
[]model.PersistedEvent{
... // 154 identical elements
{Event: {Type: "put-operation", Key: "key3", Value: {Hash: 3071434148}}, Revision: 139},
{Event: {Type: "put-operation", Key: "key5", Value: {Hash: 2150768673}}, Revision: 140},
- {Event: model.Event{Type: "delete-operation", Key: "key6"}, Revision: 141},
{Event: {Type: "put-operation", Key: "key8", Value: {Value: "121"}}, Revision: 141, ...},
{Event: {Type: "put-operation", Key: "key9", Value: {Value: "122"}}, Revision: 141},
... // 134 identical elements
}
validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
logger.go:146: 2024-06-07T11:31:24.960+0200 INFO Validating serializable operations
logger.go:146: 2024-06-07T11:31:24.967+0200 INFO Saving visualization {"path": "./src/go.etcd.io/etcd/tests/robustness/testdata/1717749910517706368/history.html"}
Screenshot showing a transaction for revision 141 and fact that key "key6" was present so delete really happened.
Watch response showing missing delete event
Complete log:
=== RUN TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
(./etcd_--version) (1897882): etcd Version: 3.6.0-alpha.0
(./etcd_--version) (1897882): Git SHA: 6f916d428
(./etcd_--version) (1897882): Go Version: go1.22.4
(./etcd_--version) (1897882): Go OS/Arch: linux/amd64
(./etcd_--version) (1897894): etcd Version: 3.6.0-alpha.0
(./etcd_--version) (1897894): Git SHA: 6f916d428
(./etcd_--version) (1897894): Go Version: go1.22.4
(./etcd_--version) (1897894): Go OS/Arch: linux/amd64
logger.go:146: 2024-06-07T10:45:06.314+0200 INFO starting server... {"name": "member-0"}
logger.go:146: 2024-06-07T10:45:06.314+0200 INFO spawning process {"args": ["./etcd", "--name=member-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--initial-cluster-token=new", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--heartbeat-interval=101", "--initial-cluster=member-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2482998486/001", "name": "member-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.3237+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.323787+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.32395+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.323991+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./etcd","--name=member-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","--snapshot-count=1000","--experimental-compaction-batch-limit=100","--initial-cluster-token=new","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=1000","--heartbeat-interval=101","--initial-cluster=member-0=http://localhost:20001","--initial-cluster-state=new"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3244+0200","caller":"etcdmain/etcd.go:123","msg":"Initialize and start etcd server","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","dir-type":"empty"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.324433+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324445+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324785+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324963+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"6f916d428","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":false,"name":"member-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":1000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"member-0=http://localhost:20001","initial-cluster-state":"new","initial-cluster-token":"new","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.325529+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0003943b0}"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332264+0200","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) successfully"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3323+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db","took":"6.84944ms"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332419+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332446+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345644+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345682+0200","caller":"etcdserver/bootstrap.go:493","msg":"starting local member","local-member-id":"ca50e9357181d758","cluster-id":"34f27e83b3bc2ff"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345701+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345758+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=()"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.34579+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345798+0200","logger":"raft","caller":"[email protected]/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345809+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.34584+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.360383+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.364937+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372639+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372684+0200","caller":"etcdserver/server.go:585","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-version":"to_be_decided"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372729+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372781+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372793+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372768+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372839+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372854+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.37288+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372974+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3731+0200","caller":"membership/cluster.go:411","msg":"added member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","added-peer-id":"ca50e9357181d758","added-peer-peer-urls":["http://localhost:20001"],"added-peer-is-learner":false}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548218+0200","logger":"raft","caller":"[email protected]/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548313+0200","logger":"raft","caller":"[email protected]/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548384+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548402+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548425+0200","logger":"raft","caller":"[email protected]/raft.go:883","msg":"ca50e9357181d758 became candidate at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559268+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559325+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559374+0200","logger":"raft","caller":"[email protected]/raft.go:938","msg":"ca50e9357181d758 became leader at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559391+0200","logger":"raft","caller":"[email protected]/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.561842+0200","caller":"etcdserver/server.go:2318","msg":"setting up initial cluster version using v3 API","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56395+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:member-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56395+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564116+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564142+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564779+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565461+0200","caller":"membership/cluster.go:594","msg":"set initial cluster version","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565615+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565717+0200","caller":"version/monitor.go:116","msg":"cluster version differs from storage version.","cluster-version":"3.6.0","storage-version":"3.5.0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565719+0200","caller":"etcdserver/server.go:2338","msg":"cluster version is updated","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56587+0200","caller":"schema/migration.go:65","msg":"updated storage version","new-storage-version":"3.6.0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.567317+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
logger.go:146: 2024-06-07T10:45:06.570+0200 INFO started server. {"name": "member-0", "pid": 1897904}
logger.go:146: 2024-06-07T10:45:06.577+0200 INFO Start traffic
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.592935+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":3}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.593112+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":3,"took":"120.206µs","hash":1734659494,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.593139+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1734659494,"revision":3,"compact-revision":-1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.694305+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":37}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.700835+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":37,"took":"6.440584ms","hash":2304164051,"current-db-size-bytes":303104,"current-db-size":"303 kB","current-db-size-in-use-bytes":274432,"current-db-size-in-use":"274 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.700886+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2304164051,"revision":37,"compact-revision":3}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.742881+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":58}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.749877+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":58,"took":"6.926064ms","hash":1646957282,"current-db-size-bytes":376832,"current-db-size":"377 kB","current-db-size-in-use-bytes":335872,"current-db-size-in-use":"336 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.749911+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1646957282,"revision":58,"compact-revision":37}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.802825+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":77}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.813113+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":77,"took":"10.227838ms","hash":539014635,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":417792,"current-db-size-in-use":"418 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.813147+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":539014635,"revision":77,"compact-revision":58}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.85343+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":85}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.857414+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":85,"took":"3.90014ms","hash":1467298815,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":229376,"current-db-size-in-use":"229 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.857438+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1467298815,"revision":85,"compact-revision":77}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.074676+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":97}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.079062+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":97,"took":"4.321109ms","hash":4113805127,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":114688,"current-db-size-in-use":"115 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.079099+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":4113805127,"revision":97,"compact-revision":85}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.148722+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":99}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.153407+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":99,"took":"4.623808ms","hash":1249150654,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":110592,"current-db-size-in-use":"111 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.153465+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1249150654,"revision":99,"compact-revision":97}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.198642+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":104}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.202576+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":104,"took":"3.889339ms","hash":1699712947,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":114688,"current-db-size-in-use":"115 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.202591+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1699712947,"revision":104,"compact-revision":99}
logger.go:146: 2024-06-07T10:45:07.573+0200 INFO Triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
./etcdctl (./etcdctl_--endpoints=http://localhost:20000_--command-timeout=1.1s_get_0_-w_json_-w_json_-w_json) (1897931): {"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":139,"raft_term":2}}
./etcdctl (./etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (1897941): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":140,"raft_term":2},"version":"3.6.0-alpha.0","dbSize":450560,"leader":14578408409545168728,"raftIndex":178,"raftTerm":2,"raftAppliedIndex":177,"dbSizeInUse":294912,"storageVersion":"3.6.0"}}]
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.601459+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":137}
cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
logger.go:146: 2024-06-07T10:45:07.608+0200 INFO Setting up gofailpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
logger.go:146: 2024-06-07T10:45:07.608+0200 INFO Waiting for member to exit {"member": "member-0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.610769+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":137,"took":"9.25467ms","hash":583134963,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":368640,"current-db-size-in-use":"369 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.610806+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":583134963,"revision":137,"compact-revision":104}
member-0 (1897904): panic: failpoint panic: {}
member-0 (1897904):
member-0 (1897904): goroutine 154 [running]:
member-0 (1897904): go.etcd.io/gofail/runtime.actPanic(0x10100000000?)
member-0 (1897904): go.etcd.io/[email protected]/runtime/terms.go:318 +0x65
member-0 (1897904): go.etcd.io/gofail/runtime.(*term).do(...)
member-0 (1897904): go.etcd.io/[email protected]/runtime/terms.go:290
member-0 (1897904): go.etcd.io/gofail/runtime.(*terms).eval(0xc00016fbc0?)
member-0 (1897904): go.etcd.io/[email protected]/runtime/terms.go:105 +0xe3
member-0 (1897904): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc0001f9c90?)
member-0 (1897904): go.etcd.io/[email protected]/runtime/failpoint.go:38 +0x98
member-0 (1897904): go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start.func1()
member-0 (1897904): go.etcd.io/etcd/server/v3/etcdserver/raft.go:229 +0x716
member-0 (1897904): created by go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start in goroutine 168
member-0 (1897904): go.etcd.io/etcd/server/v3/etcdserver/raft.go:166 +0x6f
logger.go:146: 2024-06-07T10:45:07.617+0200 INFO server exited {"name": "member-0", "code": 2}
logger.go:146: 2024-06-07T10:45:07.617+0200 INFO Member exited as expected {"member": "member-0"}
logger.go:146: 2024-06-07T10:45:07.617+0200 INFO starting server... {"name": "member-0"}
logger.go:146: 2024-06-07T10:45:07.617+0200 INFO spawning process {"args": ["./etcd", "--name=member-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--initial-cluster-token=new", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--heartbeat-interval=101", "--initial-cluster=member-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2482998486/001", "name": "member-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627613+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627706+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627725+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.62775+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./etcd","--name=member-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","--snapshot-count=1000","--experimental-compaction-batch-limit=100","--initial-cluster-token=new","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=1000","--heartbeat-interval=101","--initial-cluster=member-0=http://localhost:20001","--initial-cluster-state=new"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.627828+0200","caller":"etcdmain/etcd.go:106","msg":"server has already been initialized","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","dir-type":"member"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627848+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.627856+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.628737+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.628956+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"6f916d428","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":true,"name":"member-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":1000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.62947+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc000126248}"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.636941+0200","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) successfully"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637029+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db","took":"7.640918ms"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637902+0200","caller":"etcdserver/bootstrap.go:440","msg":"No snapshot found. Recovering WAL from scratch!"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637928+0200","caller":"etcdserver/bootstrap.go:231","msg":"recovered v3 backend","backend-size-bytes":520192,"backend-size":"520 kB","backend-size-in-use-bytes":126976,"backend-size-in-use":"127 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.639787+0200","caller":"etcdserver/bootstrap.go:579","msg":"restarting local member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","commit-index":178}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.639855+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640008+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640301+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640316+0200","caller":"membership/cluster.go:291","msg":"recovered/added member from store","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","recovered-remote-peer-id":"ca50e9357181d758","recovered-remote-peer-urls":["http://localhost:20001"],"recovered-remote-peer-is-learner":false}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640327+0200","caller":"membership/cluster.go:301","msg":"set cluster version from store","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640337+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640351+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640396+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640423+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640438+0200","logger":"raft","caller":"[email protected]/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [], term: 2, commit: 178, applied: 0, lastindex: 179, lastterm: 2]"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.644606+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648367+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648847+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":140}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660157+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660204+0200","caller":"etcdserver/server.go:576","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-id":"34f27e83b3bc2ff","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660292+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660307+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660366+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660347+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660385+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660397+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660382+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660464+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.66053+0200","caller":"membership/cluster.go:420","msg":"ignore already added member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","added-peer-id":"ca50e9357181d758","added-peer-peer-urls":["http://localhost:20001"],"added-peer-is-learner":false}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660578+0200","caller":"membership/cluster.go:586","msg":"updated cluster version","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","from":"3.6","to":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944302+0200","logger":"raft","caller":"[email protected]/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944353+0200","logger":"raft","caller":"[email protected]/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944416+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.94444+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944485+0200","logger":"raft","caller":"[email protected]/raft.go:883","msg":"ca50e9357181d758 became candidate at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.95921+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.959254+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.959286+0200","logger":"raft","caller":"[email protected]/raft.go:938","msg":"ca50e9357181d758 became leader at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.9593+0200","logger":"raft","caller":"[email protected]/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964178+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:member-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964176+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964348+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.96437+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964837+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
logger.go:146: 2024-06-07T10:45:07.966+0200 INFO started server. {"name": "member-0", "pid": 1897950}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.966818+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
logger.go:146: 2024-06-07T10:45:07.967+0200 INFO Finished triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.640086+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647928+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":141,"took":"7.766192ms","hash":361777187,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":258048,"current-db-size-in-use":"258 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647968+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":361777187,"revision":141,"compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.660836+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":149}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.668671+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":149,"took":"7.779358ms","hash":3929730471,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":262144,"current-db-size-in-use":"262 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.668707+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":3929730471,"revision":149,"compact-revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.679649+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":158}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691774+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":158,"took":"12.045594ms","hash":1641924706,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":225280,"current-db-size-in-use":"225 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691827+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1641924706,"revision":158,"compact-revision":149}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691851+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":159}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.699572+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":159,"took":"7.690631ms","hash":1830603898,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":147456,"current-db-size-in-use":"148 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.6996+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.69961+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":161}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.703539+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":161,"took":"3.907723ms","hash":545936533,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":188416,"current-db-size-in-use":"188 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.703569+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.726023+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":168}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.730396+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":168,"took":"4.324136ms","hash":4120988724,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":233472,"current-db-size-in-use":"234 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.730417+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":4120988724,"revision":168,"compact-revision":161}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.797393+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":213}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.801495+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":213,"took":"4.014924ms","hash":2299796790,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":319488,"current-db-size-in-use":"320 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.801512+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2299796790,"revision":213,"compact-revision":168}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.822852+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":225}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.828806+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":225,"took":"5.900397ms","hash":3531019373,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":65536,"current-db-size-in-use":"66 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.828848+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":3531019373,"revision":225,"compact-revision":213}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.944933+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":240}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.948934+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":240,"took":"3.945123ms","hash":2274426623,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":69632,"current-db-size-in-use":"70 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.948977+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2274426623,"revision":240,"compact-revision":225}
logger.go:146: 2024-06-07T10:45:09.002+0200 INFO Finished traffic
logger.go:146: 2024-06-07T10:45:10.014+0200 INFO Reporting complete traffic {"successes": 678, "failures": 31, "successRate": 0.9562764456981664, "period": "2.425517855s", "qps": 279.52793610748336}
logger.go:146: 2024-06-07T10:45:10.014+0200 INFO Reporting traffic before failure injection {"successes": 402, "failures": 0, "successRate": 1, "period": "996.95569ms", "qps": 403.22754966171067}
logger.go:146: 2024-06-07T10:45:10.014+0200 INFO Reporting traffic during failure injection {"successes": 7, "failures": 14, "successRate": 0.3333333333333333, "period": "393.883383ms", "qps": 17.77175758643263}
logger.go:146: 2024-06-07T10:45:10.014+0200 INFO Reporting traffic after failure injection {"successes": 269, "failures": 17, "successRate": 0.9405594405594405, "period": "1.034678782s", "qps": 259.98406914272647}
logger.go:146: 2024-06-07T10:45:10.015+0200 INFO Finished simulating traffic {"max-revision": 248}
logger.go:146: 2024-06-07T10:45:10.027+0200 INFO Validating linearizable operations {"timeout": "5m0s"}
logger.go:146: 2024-06-07T10:45:10.502+0200 INFO Linearization success {"duration": "475.746216ms"}
logger.go:146: 2024-06-07T10:45:10.503+0200 INFO Validating watch
logger.go:146: 2024-06-07T10:45:10.509+0200 ERROR Broke watch guarantee {"guarantee": "reliable", "client": 2, "diff": " []model.PersistedEvent{\n \t... // 154 identical elements\n \t{Event: {Type: \"put-operation\", Key: \"key3\", Value: {Hash: 3071434148}}, Revision: 139},\n \t{Event: {Type: \"put-operation\", Key: \"key5\", Value: {Hash: 2150768673}}, Revision: 140},\n- \t{Event: model.Event{Type: \"delete-operation\", Key: \"key6\"}, Revision: 141},\n \t{Event: {Type: \"put-operation\", Key: \"key8\", Value: {Value: \"121\"}}, Revision: 141, ...},\n \t{Event: {Type: \"put-operation\", Key: \"key9\", Value: {Value: \"122\"}}, Revision: 141},\n \t... // 134 identical elements\n }\n"}
validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
logger.go:146: 2024-06-07T10:45:10.509+0200 INFO Validating serializable operations
logger.go:146: 2024-06-07T10:45:10.517+0200 INFO Saving robustness test report {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368"}
logger.go:146: 2024-06-07T10:45:10.517+0200 INFO Saving member data dir {"member": "member-0", "path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/server-member-0"}
logger.go:146: 2024-06-07T10:45:10.518+0200 INFO no watch operations for client, skip persisting {"client-id": 1}
logger.go:146: 2024-06-07T10:45:10.518+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-1/operations.json"}
logger.go:146: 2024-06-07T10:45:10.519+0200 INFO Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-2/watch.json"}
logger.go:146: 2024-06-07T10:45:10.519+0200 INFO no KV operations for client, skip persisting {"client-id": 2}
logger.go:146: 2024-06-07T10:45:10.519+0200 INFO no watch operations for client, skip persisting {"client-id": 3}
logger.go:146: 2024-06-07T10:45:10.519+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-3/operations.json"}
logger.go:146: 2024-06-07T10:45:10.520+0200 INFO no watch operations for client, skip persisting {"client-id": 4}
logger.go:146: 2024-06-07T10:45:10.520+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-4/operations.json"}
logger.go:146: 2024-06-07T10:45:10.522+0200 INFO no watch operations for client, skip persisting {"client-id": 5}
logger.go:146: 2024-06-07T10:45:10.522+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-5/operations.json"}
logger.go:146: 2024-06-07T10:45:10.523+0200 INFO no watch operations for client, skip persisting {"client-id": 6}
logger.go:146: 2024-06-07T10:45:10.523+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-6/operations.json"}
logger.go:146: 2024-06-07T10:45:10.524+0200 INFO no watch operations for client, skip persisting {"client-id": 7}
logger.go:146: 2024-06-07T10:45:10.524+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-7/operations.json"}
logger.go:146: 2024-06-07T10:45:10.525+0200 INFO no watch operations for client, skip persisting {"client-id": 8}
logger.go:146: 2024-06-07T10:45:10.525+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-8/operations.json"}
logger.go:146: 2024-06-07T10:45:10.526+0200 INFO no watch operations for client, skip persisting {"client-id": 9}
logger.go:146: 2024-06-07T10:45:10.526+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-9/operations.json"}
logger.go:146: 2024-06-07T10:45:10.527+0200 INFO no watch operations for client, skip persisting {"client-id": 10}
logger.go:146: 2024-06-07T10:45:10.527+0200 INFO Saving operation history {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-10/operations.json"}
logger.go:146: 2024-06-07T10:45:10.528+0200 INFO Saving visualization {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/history.html"}
logger.go:146: 2024-06-07T10:45:10.583+0200 INFO killing server... {"name": "member-0"}
logger.go:146: 2024-06-07T10:45:10.583+0200 INFO stopping server... {"name": "member-0"}
logger.go:146: 2024-06-07T10:45:10.588+0200 INFO stopped server. {"name": "member-0"}
from etcd.
cc @ahrtr @jmhbnz @wenjiaswe @MadhavJivrajani @siyuanfoundation
from etcd.
@fuweid Think this is related to #17780
See the following logs:
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648367+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648847+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":140}
...
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.966818+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
logger.go:146: 2024-06-07T10:45:07.967+0200 INFO Finished triggering failpoint {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.640086+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647928+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":141,"took":"7.766192ms","hash":361777187,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":258048,"current-db-size-in-use":"258 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647968+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":361777187,"revision":141,"compact-revision":137}
Looks like we lost first delete after restoring from restoring from "last compact revision".
Off by one error on tombstone?
from etcd.
Yes. Let me try to reproduce it in my local. Update it later.
from etcd.
Report: 1717749910517706368.tar.gz
For this one, the tombstone of key6
would be deleted by the following compact request compaction:<revision:141 >.
Not sure when the watcher setups watch request. If it setups after compactor deletes that key6
, it won't get the event.
However, for the replay, it assumes that etcd process was running. There is mismatch between real events and replay.
2 178 norm header:<ID:15517310785104357973 > compaction:<revision:137 >
2 179 norm header:<ID:15517310785104357976 > txn:<success:<request_range:<key:"key1" > > success:<request_delete_range:<key:"key6" > > success:<request_put:<key:"key8" value:"121" > > success:<request_put:<key:"key9" value:"122" > > >
3 180 norm
3 181 norm header:<ID:15517310785104689154 > cluster_member_attr_set:<member_ID:14578408409545168728 member_attributes:<name:"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0" client_urls:"http://localhost:20000" > >
3 182 norm header:<ID:15517310785104689159 > txn:<success:<request_put:<key:"key2" value:"125" > > success:<request_put:<key:"key6" value:"126" > > success:<request_put:<key:"key1" value:"127" > > success:<request_put:<key:"key0" value:"128" > > >
3 183 norm header:<ID:15517310785104689162 > put:<key:"key7" value:"129" >
3 184 norm header:<ID:15517310785104689163 > compaction:<revision:141 >
from etcd.
I managed to reproduce it without multi TXN, just a delete request. This extends the impact to Kubernetes.
Again the common thing is that etcd misses a watch even if it restores kvstore on revision before a delete.
Correct. :( Currently, watcher isn't able to send out all the possible deleted events.
from etcd.
Did the client side receive an ErrCompacted? If yes, then it should be expected behaviour.
- Since the real time events are cached, so as long as etcdserver doesn't restart, then the events won't be lost.
- But if the etcdserver restarts for whatever reason after compaction but before sending out the watch response, then the client side may miss some compacted events. But the client will definitely get the latest value against any watched key.
- As long as users don't update a key too frequent, it isn't a big problem
from etcd.
Did the client side receive an ErrCompacted? If yes,
No, it would be recorded in the report.
from etcd.
However, for the replay, it assumes that etcd process was running. There is mismatch between real events and replay.
@fuweid, don't understand the statement. Distributed system needs to uphold it's guarantees even if a node is down. Events on WAL used for replay should match what users observes. For watch client should observe either full history or or get ErrCompacted, as @ahrtr mentioned.
from etcd.
@fuweid Please feel free to ping me on slack if you want a quick discussion on this issue.
from etcd.
One thing I noticed, that the WatchChan is not broken during etcd downtime, so the time it's down is short enough that the etcd client code retries the request in a transparent way.
It's expected behaviour and by design as long as etcdserver doesn't return any error (e.g ErrCompacted).
from etcd.
Added logs to ensure that issue is not one robustness test, got:
Where did you add the log? Also suggest to provide a clear summary on the exact steps you did.
from etcd.
Code #18145
from etcd.
don't understand the statement. Distributed system needs to uphold it's guarantees even if a node is down. Events on WAL used for replay should match what users observes. For watch client should observe either full history or or get ErrCompacted, as @ahrtr mentioned.
Hi @serathius sorry for late reply. In this case 1717749910517706368.tar.gz, both key8
and key9
's latest revision is 141. It won't be compacted. But the revision 141 for key6
is tombstone. It could be deleted before sending delete-operation event to client.(Maybe we should log watch-operations in robustness).
IMO, the robustness replay assumes that the delete-operation event is just sent out after applying to db. Even if the compact request is following that delete request, the robustness replay always assumes that the client should receive that delete-operation event.
However, in real world, delete-operation event depends on compactor which is handled in background and we can't control in replay. That's why I said that watch can't guarantee the client can see all the events in current design.
2 178 norm header:<ID:15517310785104357973 > compaction:<revision:137 >
2 179 norm header:<ID:15517310785104357976 > txn:<success:<request_range:<key:"key1" > > success:<request_delete_range:<key:"key6" > > success:<request_put:<key:"key8" value:"121" > > success:<request_put:<key:"key9" value:"122" > > >
3 180 norm
3 181 norm header:<ID:15517310785104689154 > cluster_member_attr_set:<member_ID:14578408409545168728 member_attributes:<name:"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0" client_urls:"http://localhost:20000" > >
3 182 norm header:<ID:15517310785104689159 > txn:<success:<request_put:<key:"key2" value:"125" > > success:<request_put:<key:"key6" value:"126" > > success:<request_put:<key:"key1" value:"127" > > success:<request_put:<key:"key0" value:"128" > > >
3 183 norm header:<ID:15517310785104689162 > put:<key:"key7" value:"129" >
3 184 norm header:<ID:15517310785104689163 > compaction:<revision:141 >
Please feel free to ping me on slack if you want a quick discussion on this issue.
@ahrtr will ping you tomorrow. Thanks!
from etcd.
As mentioned above, it can only be reproduced when
the compact revision happens to be a delete operation.
and the delete operation is the last operation for the key. If there is any operations following the delete operation on the key (i.e. adding the key back), then the issue can't be reproduced.
This should be pretty common, delete operations are pretty frequent, for example pods are immutable resource, meaning they are frequently created and then deleted and never resurrected.
Also usually compaction operation isn't that frequent in production environment.
It's once every 5 minutes, ask anyone if they want to build a distributed system that makes a mistake every 5 minutes.
I think that the only saving grace is that this requires a slow watch that is reading events from before compaction window. This should be pretty rare for watch to be 5 minutes behind, however recently I have seen case where lock contention on watch cache caused it to be behind even 2 hours.
from etcd.
@ahrtr confirmed the bug being present on both release-3.4 and release-3.5 branches
Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?
It's important to know if this is a regression in some newer version, or it was always there. If this is a new bug we might need to consider an announcement.
from etcd.
Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?
Obviously this isn't a regression. It's an old issue. Also updated my above comment.
This should be pretty common, delete operations are pretty frequent, for example pods are immutable resource, meaning they are frequently created and then deleted and never resurrected.
Pls see updated section "Impact" above. added and the delete operation is the last operation for the key
It's once every 5 minutes, ask anyone if they want to build a distributed system that makes a mistake every 5 minutes.
I think that the only saving grace is that this requires a slow watch that is reading events from before compaction window.
- 5 minutes isn't frequent as compare to watch delay.
- If you read the reproducing steps in my above comment, there is also very low possibility to see this issue on first watch response.
from etcd.
Just had a quick talk with @serathius , we need to clearly define the behaviour of compaction, and also the impact on watch, Please anyone feel free to comment in https://docs.google.com/document/d/1APJE38DxENsRLLVapRz1CQ6UD4-uYaFutO12Y01VcNw/edit
Eventually we need to document the behaviour in etcd's official document
from etcd.
Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?
Following the clear manual
etcdctl
reproduce steps provided by @ahrtr I reproduced the issue onv3.4.0
so it looks like this has been present for at least ~5 years.
+1 on successfully reproducing the issue
from etcd.
Actions
- Create a simple e2e test to reproduce this issue. Refer to the section "How to reproduce this issue" in #18089 (comment);
- Update the implementation to ensure it aligns with doc
- Please feel free to comment on the doc
- Update etcd's official document to clarify the compaction behavior
from etcd.
from etcd.
The code to handle compacted version in watch was written 8 years ago, and it never handled the w.minRev == compactRev
case
etcd/server/storage/mvcc/watcher_group.go
Lines 253 to 262 in 5790774
from etcd.
Just as a thought exercise, how can admin of etcd cluster be alerted about the dropped watch events issue in general?
In kubernetes/kubernetes#123072, apiserver_watch_cache_events_received_total{resource="pods"}
is useful to detect watch event loss as the diff between two watch client (apiserver) diverges.
However, this issue seems difficult to catch by alarms because just one delete event is missed. Diff by 1 could be caused by different times that client restarted.
Trying to monitor in production and measure how many kubernetes clusters are impacted. Any insights are much appreciated !!
from etcd.
Tried replicating #18089 (comment) in an e2e here: #18201
from etcd.
Another instance: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-etcd-robustness-main-amd64/1803129487966081024
(among others in https://testgrid.k8s.io/sig-etcd-robustness#ci-etcd-robustness-main-amd64)
from etcd.
Related Issues (20)
- HashKV should compute the hash of all MVCC keys up to a given revision HOT 6
- Completion command is not included in releases of etcdctl HOT 1
- Etcd compilation process error HOT 4
- Etcd send a corrupt snapshot or missing hash snapshot to a snapshot api call which causes the restoration to fail. HOT 7
- option go_package in proto files HOT 4
- Client v3 and GRPC versions conflict HOT 3
- Member remains learner after promotion HOT 1
- Improvement of OpenSSF Scorecard Score HOT 4
- Enable `unexported-return` linter rule HOT 7
- High memory usage (memory leak?) with 1 node cluster and small DB (but a lot of updates) HOT 9
- The second node took about 7 seconds to join in a single node cluster HOT 9
- Add hashKV checking to robustness test HOT 8
- Introduce a Go workspace HOT 2
- Need to update golang pkg.dev document HOT 6
- support compile `tools` in makefile HOT 2
- Detected disallowed custom content in v2store
- Confusing documentation for `WatchResponse.canceled`
- etcd cluster crash
- Bump Go to 1.21.13/1.22.6 HOT 10
- etcd Go & Java client SDK's retry mechanism may break `Serializable` HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from etcd.