Giter VIP home page Giter VIP logo

Comments (37)

ahrtr avatar ahrtr commented on September 27, 2024 4

TL; DR

Delete events may be missing in watch response. This issue can only be reproduced when the compact revision happens to be a delete operation.

How to reproduce this issue?

  • Start an one member etcd cluster in one terminal
  • Execute commands below in another terminal
$ etcdctl  put k1 v2 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":2,"raft_term":2}}
$ etcdctl  put k1 v3 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":3,"raft_term":2}}
$ etcdctl  put k1 v4 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":4,"raft_term":2}}
$ etcdctl del k1 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":5,"raft_term":2},"deleted":1}
$ etcdctl  put k1 v6 -w json
{"header":{"cluster_id":14841639068965178418,"member_id":10276657743932975437,"revision":6,"raft_term":2}}
  • Compact revision 5
$ etcdctl compact 5
compacted revision 5
  • Watch starting from rev 5
$ etcdctl watch k1 --rev 5
PUT
k1
v6

Obviously the deletion operation with revision 5 is missing.

Alternatively, If you want to try the robustness test, then run command below on #18145,

$ go test -run TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1 -v --count 100 --timeout 1h -failfast

Expected behaviour

As mentioned in #18089 (comment), some revisions may be compacted, and it may impact the watch clients. But it shouldn't never drop any events quietly.

Expected behaviour:

  • Either the client side receives all events: no any events are dropped
  • or the client side gets an ErrCompacted response.

Root cause

Usually etcd keeps the compact revision (see code below). For example, when you compact revision 5, actually etcd only compacts revisions <=4.

available[rev] = struct{}{}

But if the compact revision is a delete operation, then etcd doesn't keep it (in other words, etcd will compact/remove it) (see code below),

// remove any tombstone
if revIndex == len(g.revs)-1 && genIdx != len(ki.generations)-1 {
delete(available, g.revs[revIndex])
}

But the problem is that etcdserver doesn't return ErrCompacted if the startWatchRevision happens to be the last compact revision. Because the if condition is if w.minRev < compactRev instead of if w.minRev <= compactRev.

if w.minRev < compactRev {

Note this isn't a regression. It's an old issue.

Potential solution

It's open to discussion.

When executing compaction operation, we should only keep the latest revision for each key. For example, if the latest revision for "k1" is 10, and user compacts 10, then we should keep 10; otherwise if user compacts any revision <=9, we just compact it, there is no point to keep it. Note that different keys may have different latest revision.

The available is the revisions to keep,

func (ki *keyIndex) compact(lg *zap.Logger, atRev int64, available map[Revision]struct{}) {

We also need to change the if condition to if w.minRev <= compactRev,

if w.minRev < compactRev {

So the watch client will get an ErrCompacted response.

Impact

All versions (including 3.4, 3.5 and main) are impacted.

Major, but it might not be easy to reproduce.

  • As mentioned above, it can only be reproduced when the compact revision happens to be a delete operation.
  • Also usually compaction operation isn't that frequent in production environment. Normally the watch delay won't be greater than compaction interval.

Immediate action

Create a simple e2e test to reliably reproduce this issue. Please refer to the section "How to reproduce this issue?" above.

Brainstorm the solution.

from etcd.

jmhbnz avatar jmhbnz commented on September 27, 2024 4

Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?

Following the clear manual etcdctl reproduce steps provided by @ahrtr I reproduced the issue on v3.4.0 so it looks like this has been present for at least ~5 years.

from etcd.

serathius avatar serathius commented on September 27, 2024 2

I managed to reproduce it without multi TXN, just a delete request. This extends the impact to Kubernetes.

Again the common thing is that etcd misses a watch even if it restores kvstore on revision before a delete.

In last repro:

    logger.go:146: 2024-06-07T14:46:59.319+0200 ERROR   Broke watch guarantee   {"guarantee": "reliable", "client": 1}
  []model.PersistedEvent{
        ... // 160 identical elements
        {Event: {Type: "delete-operation", Key: "key6"}, Revision: 140},
        {Event: {Type: "put-operation", Key: "key9", Value: {Value: "138"}}, Revision: 141, ...},
-       {Event: model.Event{Type: "delete-operation", Key: "key7"}, Revision: 142},
        {Event: {Type: "put-operation", Key: "key3", Value: {Value: "147"}}, Revision: 143},
        {Event: {Type: "put-operation", Key: "key3", Value: {Value: "148"}}, Revision: 144},
        ... // 79 identical elements
  }

etcd bootstrap log:

{"level":"info","ts":"2024-06-07T14:46:56.832687+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":141}

from etcd.

serathius avatar serathius commented on September 27, 2024 2

Something similar reproduced on v3.4, however this time it was long after crash, and it flagged broken resumable properly. Which just means it provided incorrect event on a newly opened watch which provided revision.

=== RUN   TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
    logger.go:146: 2024-06-07T15:10:05.793+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:05.793+0200 INFO    spawning process        {"args": ["/tmp/etcd-release-3.4-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001", "--snapshot-count=1000", "--snapshot-count=1000", "--initial-cluster-token=new", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1937017586/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (2063001): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
member-0 (2063001):     A future release of golang/protobuf will delete this package,
member-0 (2063001):     which has long been excluded from the compatibility promise.
member-0 (2063001): 
member-0 (2063001): 2024-06-07 15:10:05.801650 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
member-0 (2063001): 2024-06-07 15:10:05.801710 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
member-0 (2063001): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063001): 2024-06-07 15:10:05.801731 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063001): 2024-06-07 15:10:05.801737 I | etcdmain: etcd Version: 3.4.32
member-0 (2063001): 2024-06-07 15:10:05.801741 I | etcdmain: Git SHA: 88ff600-FAILPOINTS
member-0 (2063001): 2024-06-07 15:10:05.801745 I | etcdmain: Go Version: go1.22.0
member-0 (2063001): 2024-06-07 15:10:05.801750 I | etcdmain: Go OS/Arch: linux/amd64
member-0 (2063001): 2024-06-07 15:10:05.801755 I | etcdmain: setting maximum number of CPUs to 24, total number of available CPUs is 24
member-0 (2063001): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063001): 2024-06-07 15:10:05.802103 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063001): 2024-06-07 15:10:05.802482 I | embed: name = TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0
member-0 (2063001): 2024-06-07 15:10:05.802489 I | embed: data dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001
member-0 (2063001): 2024-06-07 15:10:05.802494 I | embed: member dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member
member-0 (2063001): 2024-06-07 15:10:05.802500 I | embed: heartbeat = 100ms
member-0 (2063001): 2024-06-07 15:10:05.802505 I | embed: election = 1000ms
member-0 (2063001): 2024-06-07 15:10:05.802510 I | embed: snapshot count = 1000
member-0 (2063001): 2024-06-07 15:10:05.802521 I | embed: advertise client URLs = http://localhost:20000
member-0 (2063001): 2024-06-07 15:10:05.823121 I | etcdserver: starting member ca50e9357181d758 in cluster 34f27e83b3bc2ff
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=()
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 became follower at term 0
member-0 (2063001): raft2024/06/07 15:10:05 INFO: newRaft ca50e9357181d758 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 became follower at term 1
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063001): 2024-06-07 15:10:05.837990 W | auth: simple token is not cryptographically signed
member-0 (2063001): 2024-06-07 15:10:05.849603 I | etcdserver: starting server... [version: 3.4.32, cluster version: to_be_decided]
member-0 (2063001): 2024-06-07 15:10:05.849682 I | etcdserver: ca50e9357181d758 as single-node; fast-forwarding 9 ticks (election ticks 10)
member-0 (2063001): 2024-06-07 15:10:05.849979 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap.db, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.850015 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.850029 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/wal, suffix: wal, max: 5, interval: 30s
member-0 (2063001): 2024-06-07 15:10:05.851460 I | embed: listening for peers on 127.0.0.1:20001
member-0 (2063001): raft2024/06/07 15:10:05 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063001): 2024-06-07 15:10:05.852075 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 is starting a new election at term 1
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 became candidate at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: ca50e9357181d758 became leader at term 2
member-0 (2063001): raft2024/06/07 15:10:06 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 2
member-0 (2063001): 2024-06-07 15:10:06.425669 I | etcdserver: setting up the initial cluster version to 3.4
member-0 (2063001): 2024-06-07 15:10:06.427632 I | embed: ready to serve client requests
member-0 (2063001): 2024-06-07 15:10:06.427648 I | etcdserver: published {Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
member-0 (2063001): 2024-06-07 15:10:06.428428 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
member-0 (2063001): 2024-06-07 15:10:06.429334 N | etcdserver/membership: set the initial cluster version to 3.4
member-0 (2063001): 2024-06-07 15:10:06.429387 I | etcdserver/api: enabled capabilities for version 3.4
    logger.go:146: 2024-06-07T15:10:06.436+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 2063001}
    logger.go:146: 2024-06-07T15:10:06.443+0200 INFO    Start traffic
member-0 (2063001): 2024-06-07 15:10:06.463482 I | mvcc: store.index: compact 10
member-0 (2063001): 2024-06-07 15:10:06.467626 I | mvcc: finished scheduled compaction at 10 (took 4.101548ms)
member-0 (2063001): 2024-06-07 15:10:06.485280 I | mvcc: store.index: compact 21
member-0 (2063001): 2024-06-07 15:10:06.491005 I | mvcc: finished scheduled compaction at 21 (took 5.681245ms)
member-0 (2063001): 2024-06-07 15:10:06.499354 I | mvcc: store.index: compact 30
member-0 (2063001): 2024-06-07 15:10:06.503435 I | mvcc: finished scheduled compaction at 30 (took 4.021316ms)
member-0 (2063001): 2024-06-07 15:10:06.516624 I | mvcc: store.index: compact 36
member-0 (2063001): 2024-06-07 15:10:06.522273 I | mvcc: finished scheduled compaction at 36 (took 5.604531ms)
member-0 (2063001): 2024-06-07 15:10:06.531697 I | mvcc: store.index: compact 43
member-0 (2063001): 2024-06-07 15:10:06.544126 I | mvcc: finished scheduled compaction at 43 (took 12.389592ms)
member-0 (2063001): 2024-06-07 15:10:06.544147 I | mvcc: store.index: compact 45
member-0 (2063001): 2024-06-07 15:10:06.549828 I | mvcc: finished scheduled compaction at 45 (took 5.663001ms)
member-0 (2063001): 2024-06-07 15:10:06.565709 I | mvcc: store.index: compact 52
member-0 (2063001): 2024-06-07 15:10:06.569921 I | mvcc: finished scheduled compaction at 52 (took 4.110444ms)
member-0 (2063001): 2024-06-07 15:10:06.576837 I | mvcc: store.index: compact 58
member-0 (2063001): 2024-06-07 15:10:06.580559 I | mvcc: finished scheduled compaction at 58 (took 3.700063ms)
member-0 (2063001): 2024-06-07 15:10:06.593001 I | mvcc: store.index: compact 63
member-0 (2063001): 2024-06-07 15:10:06.598580 I | mvcc: finished scheduled compaction at 63 (took 5.512127ms)
member-0 (2063001): 2024-06-07 15:10:06.610192 I | mvcc: store.index: compact 74
member-0 (2063001): 2024-06-07 15:10:06.610655 W | etcdserver: failed to apply request "compaction:<revision:74 > header:<ID:15517310789174624164 > " with response "" took (2.796µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063001): 2024-06-07 15:10:06.614154 I | mvcc: finished scheduled compaction at 74 (took 3.943601ms)
member-0 (2063001): 2024-06-07 15:10:06.628541 I | mvcc: store.index: compact 76
member-0 (2063001): 2024-06-07 15:10:06.632295 I | mvcc: finished scheduled compaction at 76 (took 3.736361ms)
member-0 (2063001): 2024-06-07 15:10:06.638518 I | mvcc: store.index: compact 79
member-0 (2063001): 2024-06-07 15:10:06.642166 I | mvcc: finished scheduled compaction at 79 (took 3.625864ms)
member-0 (2063001): 2024-06-07 15:10:06.695912 I | mvcc: store.index: compact 81
member-0 (2063001): 2024-06-07 15:10:06.699766 I | mvcc: finished scheduled compaction at 81 (took 3.833413ms)
member-0 (2063001): 2024-06-07 15:10:06.811286 I | mvcc: store.index: compact 88
member-0 (2063001): 2024-06-07 15:10:06.815069 I | mvcc: finished scheduled compaction at 88 (took 3.751339ms)
member-0 (2063001): 2024-06-07 15:10:06.926055 I | mvcc: store.index: compact 98
member-0 (2063001): 2024-06-07 15:10:06.929943 I | mvcc: finished scheduled compaction at 98 (took 3.865914ms)
member-0 (2063001): 2024-06-07 15:10:07.006149 I | mvcc: store.index: compact 104
member-0 (2063001): 2024-06-07 15:10:07.009806 I | mvcc: finished scheduled compaction at 104 (took 3.640982ms)
member-0 (2063001): 2024-06-07 15:10:07.047942 I | mvcc: store.index: compact 105
member-0 (2063001): 2024-06-07 15:10:07.051740 I | mvcc: finished scheduled compaction at 105 (took 3.764455ms)
member-0 (2063001): 2024-06-07 15:10:07.161575 I | mvcc: store.index: compact 113
member-0 (2063001): 2024-06-07 15:10:07.165496 I | mvcc: finished scheduled compaction at 113 (took 3.891012ms)
member-0 (2063001): 2024-06-07 15:10:07.169287 I | mvcc: store.index: compact 114
member-0 (2063001): 2024-06-07 15:10:07.173334 I | mvcc: finished scheduled compaction at 114 (took 4.008702ms)
member-0 (2063001): 2024-06-07 15:10:07.177192 I | mvcc: store.index: compact 115
member-0 (2063001): 2024-06-07 15:10:07.177233 I | mvcc: finished scheduled compaction at 115 (took 15.309µs)
member-0 (2063001): 2024-06-07 15:10:07.336646 I | mvcc: store.index: compact 125
member-0 (2063001): 2024-06-07 15:10:07.340373 I | mvcc: finished scheduled compaction at 125 (took 3.708237ms)
member-0 (2063001): 2024-06-07 15:10:07.396406 I | mvcc: store.index: compact 129
member-0 (2063001): 2024-06-07 15:10:07.400353 I | mvcc: finished scheduled compaction at 129 (took 3.92806ms)
member-0 (2063001): 2024-06-07 15:10:07.421834 I | mvcc: store.index: compact 131
member-0 (2063001): 2024-06-07 15:10:07.426046 I | mvcc: finished scheduled compaction at 131 (took 4.182771ms)
member-0 (2063001): 2024-06-07 15:10:07.432261 W | etcdserver: failed to apply request "compaction:<revision:131 > header:<ID:15517310789174624338 > " with response "" took (9.297µs) to execute, err is mvcc: required revision has been compacted
    logger.go:146: 2024-06-07T15:10:07.439+0200 INFO    Triggering failpoint    {"failpoint": "raftBeforeLeaderSend=panic()"}
/tmp/etcd-release-3.4-failpoints/bin/etcdctl (/tmp/etcd-release-3.4-failpoints/bin/etcdctl_--endpoints=http://localhost:20000_--command-timeout=1.1s_get_0_-w_json_-w_json_-w_json) (2063029): {"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":136,"raft_term":2}}
member-0 (2063001): 2024-06-07 15:10:07.470978 I | mvcc: store.index: compact 135
/tmp/etcd-release-3.4-failpoints/bin/etcdctl (/tmp/etcd-release-3.4-failpoints/bin/etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (2063037): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":137,"raft_term":2},"version":"3.4.32","dbSize":24576,"leader":14578408409545168728,"raftIndex":171,"raftTerm":2,"raftAppliedIndex":171,"dbSizeInUse":16384}}]
    cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
    logger.go:146: 2024-06-07T15:10:07.472+0200 INFO    Setting up gofailpoint  {"failpoint": "raftBeforeLeaderSend=panic()"}
    logger.go:146: 2024-06-07T15:10:07.472+0200 INFO    Waiting for member to exit      {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
member-0 (2063001): 2024-06-07 15:10:07.475266 I | mvcc: finished scheduled compaction at 135 (took 4.25666ms)
member-0 (2063001): panic: failpoint panic: {}
member-0 (2063001): 
member-0 (2063001): goroutine 120 [running]:
member-0 (2063001): go.etcd.io/gofail/runtime.actPanic(0x0?)
member-0 (2063001):     ./pkg/mod/go.etcd.io/[email protected]/runtime/terms.go:318 +0x65
member-0 (2063001): go.etcd.io/gofail/runtime.(*term).do(...)
member-0 (2063001):     ./pkg/mod/go.etcd.io/[email protected]/runtime/terms.go:290
member-0 (2063001): go.etcd.io/gofail/runtime.(*terms).eval(0xc0000bcde0?)
member-0 (2063001):     ./pkg/mod/go.etcd.io/[email protected]/runtime/terms.go:105 +0xe3
member-0 (2063001): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc00051bba0?)
member-0 (2063001):     ./pkg/mod/go.etcd.io/[email protected]/runtime/failpoint.go:38 +0x98
member-0 (2063001): go.etcd.io/etcd/etcdserver.(*raftNode).start.func1()
member-0 (2063001):     /tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:242 +0xae5
member-0 (2063001): created by go.etcd.io/etcd/etcdserver.(*raftNode).start in goroutine 109
member-0 (2063001):     /tmp/etcd-release-3.4-failpoints/etcdserver/raft.go:165 +0x6f
    logger.go:146: 2024-06-07T15:10:07.479+0200 INFO    server exited   {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "code": 2}
    logger.go:146: 2024-06-07T15:10:07.479+0200 INFO    Member exited as expected       {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:07.479+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:07.479+0200 INFO    spawning process        {"args": ["/tmp/etcd-release-3.4-failpoints/bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001", "--snapshot-count=1000", "--snapshot-count=1000", "--initial-cluster-token=new", "--experimental-compaction-batch-limit=100", "--experimental-watch-progress-notify-interval=100ms", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory1937017586/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (2063048): WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
member-0 (2063048):     A future release of golang/protobuf will delete this package,
member-0 (2063048):     which has long been excluded from the compatibility promise.
member-0 (2063048): 
member-0 (2063048): 2024-06-07 15:10:07.488960 W | pkg/flags: unrecognized environment variable ETCD_VERIFY=all
member-0 (2063048): 2024-06-07 15:10:07.489016 W | pkg/flags: unrecognized environment variable ETCD_UNSUPPORTED_ARCH=amd64
member-0 (2063048): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063048): 2024-06-07 15:10:07.489034 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063048): 2024-06-07 15:10:07.489040 I | etcdmain: etcd Version: 3.4.32
member-0 (2063048): 2024-06-07 15:10:07.489044 I | etcdmain: Git SHA: 88ff600-FAILPOINTS
member-0 (2063048): 2024-06-07 15:10:07.489048 I | etcdmain: Go Version: go1.22.0
member-0 (2063048): 2024-06-07 15:10:07.489052 I | etcdmain: Go OS/Arch: linux/amd64
member-0 (2063048): 2024-06-07 15:10:07.489056 I | etcdmain: setting maximum number of CPUs to 24, total number of available CPUs is 24
member-0 (2063048): 2024-06-07 15:10:07.489120 N | etcdmain: the server is already initialized as member before, starting as etcd member...
member-0 (2063048): [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
member-0 (2063048): 2024-06-07 15:10:07.489130 W | embed: Running http and grpc server on single port. This is not recommended for production.
member-0 (2063048): 2024-06-07 15:10:07.489546 I | embed: name = TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0
member-0 (2063048): 2024-06-07 15:10:07.489553 I | embed: data dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001
member-0 (2063048): 2024-06-07 15:10:07.489558 I | embed: member dir = /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member
member-0 (2063048): 2024-06-07 15:10:07.489562 I | embed: heartbeat = 100ms
member-0 (2063048): 2024-06-07 15:10:07.489566 I | embed: election = 1000ms
member-0 (2063048): 2024-06-07 15:10:07.489571 I | embed: snapshot count = 1000
member-0 (2063048): 2024-06-07 15:10:07.489581 I | embed: advertise client URLs = http://localhost:20000
member-0 (2063048): 2024-06-07 15:10:07.489586 I | embed: initial advertise peer URLs = http://localhost:20001
member-0 (2063048): 2024-06-07 15:10:07.489591 I | embed: initial cluster = 
member-0 (2063048): 2024-06-07 15:10:07.498379 I | etcdserver: restarting member ca50e9357181d758 in cluster 34f27e83b3bc2ff at commit index 172
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 switched to configuration voters=()
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 became follower at term 2
member-0 (2063048): raft2024/06/07 15:10:07 INFO: newRaft ca50e9357181d758 [peers: [], term: 2, commit: 172, applied: 0, lastindex: 172, lastterm: 2]
member-0 (2063048): 2024-06-07 15:10:07.501601 W | auth: simple token is not cryptographically signed
member-0 (2063048): 2024-06-07 15:10:07.505904 I | mvcc: restore compact to 135
member-0 (2063048): 2024-06-07 15:10:07.508923 I | etcdserver: starting server... [version: 3.4.32, cluster version: to_be_decided]
member-0 (2063048): 2024-06-07 15:10:07.509090 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap.db, max: 5, interval: 30s
member-0 (2063048): 2024-06-07 15:10:07.509121 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/snap, suffix: snap, max: 5, interval: 30s
member-0 (2063048): 2024-06-07 15:10:07.509492 I | pkg/fileutil: started to purge file, dir: /tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11009933374/001/member/wal, suffix: wal, max: 5, interval: 30s
member-0 (2063048): raft2024/06/07 15:10:07 INFO: ca50e9357181d758 switched to configuration voters=(14578408409545168728)
member-0 (2063048): 2024-06-07 15:10:07.509806 I | etcdserver/membership: added member ca50e9357181d758 [http://localhost:20001] to cluster 34f27e83b3bc2ff
member-0 (2063048): 2024-06-07 15:10:07.509871 N | etcdserver/membership: set the initial cluster version to 3.4
member-0 (2063048): 2024-06-07 15:10:07.509893 I | etcdserver/api: enabled capabilities for version 3.4
member-0 (2063048): 2024-06-07 15:10:07.510650 W | etcdserver: failed to apply request "compaction:<revision:135 > header:<ID:15517310789174624347 > " with response "" took (8.226µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:07.511244 I | embed: listening for peers on 127.0.0.1:20001
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 is starting a new election at term 2
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 became candidate at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: ca50e9357181d758 became leader at term 3
member-0 (2063048): raft2024/06/07 15:10:08 INFO: raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3
member-0 (2063048): 2024-06-07 15:10:08.716725 I | embed: ready to serve client requests
member-0 (2063048): 2024-06-07 15:10:08.716809 I | etcdserver: published {Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]} to cluster 34f27e83b3bc2ff
member-0 (2063048): 2024-06-07 15:10:08.718833 N | embed: serving insecure client requests on 127.0.0.1:20000, this is strongly discouraged!
    logger.go:146: 2024-06-07T15:10:08.723+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 2063048}
    logger.go:146: 2024-06-07T15:10:08.724+0200 INFO    Finished triggering failpoint   {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (2063048): 2024-06-07 15:10:08.734129 I | mvcc: store.index: compact 138
member-0 (2063048): 2024-06-07 15:10:08.734171 W | etcdserver: failed to apply request "compaction:<revision:138 > header:<ID:15517310789175053071 > " with response "" took (1.643µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:08.737837 I | mvcc: finished scheduled compaction at 138 (took 3.684213ms)
member-0 (2063048): 2024-06-07 15:10:08.749556 I | mvcc: store.index: compact 145
member-0 (2063048): 2024-06-07 15:10:08.755487 I | mvcc: finished scheduled compaction at 145 (took 5.872415ms)
member-0 (2063048): 2024-06-07 15:10:08.785454 I | mvcc: store.index: compact 180
member-0 (2063048): 2024-06-07 15:10:08.793336 I | mvcc: finished scheduled compaction at 180 (took 7.56219ms)
member-0 (2063048): 2024-06-07 15:10:08.801507 I | mvcc: store.index: compact 183
member-0 (2063048): 2024-06-07 15:10:08.807138 I | mvcc: finished scheduled compaction at 183 (took 5.584654ms)
member-0 (2063048): 2024-06-07 15:10:08.811544 I | mvcc: store.index: compact 188
member-0 (2063048): 2024-06-07 15:10:08.817315 I | mvcc: finished scheduled compaction at 188 (took 5.727472ms)
member-0 (2063048): 2024-06-07 15:10:08.823976 I | mvcc: store.index: compact 195
member-0 (2063048): 2024-06-07 15:10:08.823985 W | etcdserver: failed to apply request "compaction:<revision:195 > header:<ID:15517310789175053189 > " with response "" took (2.235µs) to execute, err is mvcc: required revision has been compacted
member-0 (2063048): 2024-06-07 15:10:08.827999 I | mvcc: finished scheduled compaction at 195 (took 3.994016ms)
member-0 (2063048): 2024-06-07 15:10:08.907484 I | mvcc: store.index: compact 219
member-0 (2063048): 2024-06-07 15:10:08.911307 I | mvcc: finished scheduled compaction at 219 (took 3.790332ms)
member-0 (2063048): 2024-06-07 15:10:08.938127 I | mvcc: store.index: compact 220
member-0 (2063048): 2024-06-07 15:10:08.941826 I | mvcc: finished scheduled compaction at 220 (took 3.665007ms)
member-0 (2063048): 2024-06-07 15:10:09.027783 I | mvcc: store.index: compact 230
member-0 (2063048): 2024-06-07 15:10:09.031653 I | mvcc: finished scheduled compaction at 230 (took 3.827542ms)
member-0 (2063048): 2024-06-07 15:10:09.157620 I | mvcc: store.index: compact 239
member-0 (2063048): 2024-06-07 15:10:09.161540 I | mvcc: finished scheduled compaction at 239 (took 3.871184ms)
member-0 (2063048): 2024-06-07 15:10:09.232656 I | mvcc: store.index: compact 246
member-0 (2063048): 2024-06-07 15:10:09.236458 I | mvcc: finished scheduled compaction at 246 (took 3.767421ms)
member-0 (2063048): 2024-06-07 15:10:09.413328 I | mvcc: store.index: compact 256
member-0 (2063048): 2024-06-07 15:10:09.417222 I | mvcc: finished scheduled compaction at 256 (took 3.859343ms)
member-0 (2063048): 2024-06-07 15:10:09.438400 I | mvcc: store.index: compact 257
member-0 (2063048): 2024-06-07 15:10:09.442288 I | mvcc: finished scheduled compaction at 257 (took 3.864001ms)
member-0 (2063048): 2024-06-07 15:10:09.508357 I | mvcc: store.index: compact 264
member-0 (2063048): 2024-06-07 15:10:09.513601 I | mvcc: finished scheduled compaction at 264 (took 5.151462ms)
member-0 (2063048): 2024-06-07 15:10:09.657334 I | mvcc: store.index: compact 277
member-0 (2063048): 2024-06-07 15:10:09.661195 I | mvcc: finished scheduled compaction at 277 (took 3.822834ms)
member-0 (2063048): 2024-06-07 15:10:09.678618 I | mvcc: store.index: compact 280
member-0 (2063048): 2024-06-07 15:10:09.682446 I | mvcc: finished scheduled compaction at 280 (took 3.801793ms)
member-0 (2063048): 2024-06-07 15:10:09.692338 I | mvcc: store.index: compact 282
member-0 (2063048): 2024-06-07 15:10:09.696220 I | mvcc: finished scheduled compaction at 282 (took 3.859612ms)
    logger.go:146: 2024-06-07T15:10:09.762+0200 INFO    Finished traffic
    logger.go:146: 2024-06-07T15:10:10.773+0200 INFO    Reporting complete traffic      {"successes": 824, "failures": 44, "successRate": 0.9493087557603687, "period": "3.319285557s", "qps": 248.24619209464416}
    logger.go:146: 2024-06-07T15:10:10.773+0200 INFO    Reporting traffic before failure injection      {"successes": 408, "failures": 0, "successRate": 1, "period": "996.375923ms", "qps": 409.48400155189216}
    logger.go:146: 2024-06-07T15:10:10.773+0200 INFO    Reporting traffic during failure injection      {"successes": 25, "failures": 43, "successRate": 0.36764705882352944, "period": "1.284883155s", "qps": 19.45702214455446}
    logger.go:146: 2024-06-07T15:10:10.774+0200 INFO    Reporting traffic after failure injection       {"successes": 391, "failures": 1, "successRate": 0.9974489795918368, "period": "1.038026479s", "qps": 376.6763256142332}
    logger.go:146: 2024-06-07T15:10:10.774+0200 INFO    Finished simulating traffic     {"max-revision": 288}
    logger.go:146: 2024-06-07T15:10:10.780+0200 INFO    Validating linearizable operations      {"timeout": "5m0s"}
    logger.go:146: 2024-06-07T15:10:10.925+0200 INFO    Linearization success   {"duration": "145.013099ms"}
    logger.go:146: 2024-06-07T15:10:10.926+0200 INFO    Validating watch
    logger.go:146: 2024-06-07T15:10:10.926+0200 ERROR   Broke watch guarantee   {"guarantee": "resumable", "client": 2, "request": {"Key":"","Revision":220,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":false}, "got-event": {"Type":"put-operation","Key":"key6","Value":{"Value":"235","Hash":0},"Revision":221,"IsCreate":false,"PrevValue":null}, "want-event": {"Type":"delete-operation","Key":"key7","Value":{"Value":"","Hash":0},"Revision":220,"IsCreate":false}}
    validate.go:49: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window
    logger.go:146: 2024-06-07T15:10:10.926+0200 INFO    Validating serializable operations
    logger.go:146: 2024-06-07T15:10:10.936+0200 INFO    Saving robustness test report   {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979"}
    logger.go:146: 2024-06-07T15:10:10.936+0200 INFO    Saving member data dir  {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/server-TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:10.936+0200 INFO    no watch operations for client, skip persisting {"client-id": 1}
    logger.go:146: 2024-06-07T15:10:10.936+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-1/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.937+0200 INFO    Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-2/watch.json"}
    logger.go:146: 2024-06-07T15:10:10.937+0200 INFO    no KV operations for client, skip persisting    {"client-id": 2}
    logger.go:146: 2024-06-07T15:10:10.938+0200 INFO    no watch operations for client, skip persisting {"client-id": 3}
    logger.go:146: 2024-06-07T15:10:10.938+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-3/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.939+0200 INFO    no watch operations for client, skip persisting {"client-id": 4}
    logger.go:146: 2024-06-07T15:10:10.939+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-4/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.940+0200 INFO    no watch operations for client, skip persisting {"client-id": 5}
    logger.go:146: 2024-06-07T15:10:10.940+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-5/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.941+0200 INFO    no watch operations for client, skip persisting {"client-id": 6}
    logger.go:146: 2024-06-07T15:10:10.941+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-6/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.943+0200 INFO    no watch operations for client, skip persisting {"client-id": 7}
    logger.go:146: 2024-06-07T15:10:10.943+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-7/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.944+0200 INFO    no watch operations for client, skip persisting {"client-id": 8}
    logger.go:146: 2024-06-07T15:10:10.944+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-8/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.945+0200 INFO    no watch operations for client, skip persisting {"client-id": 9}
    logger.go:146: 2024-06-07T15:10:10.945+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-9/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.946+0200 INFO    no watch operations for client, skip persisting {"client-id": 10}
    logger.go:146: 2024-06-07T15:10:10.946+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/client-10/operations.json"}
    logger.go:146: 2024-06-07T15:10:10.947+0200 INFO    Saving visualization    {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717765810935903979/history.html"}
    logger.go:146: 2024-06-07T15:10:11.016+0200 INFO    killing server...       {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:11.016+0200 INFO    stopping server...      {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-07T15:10:11.021+0200 INFO    stopped server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
--- FAIL: TestRobustnessExploratory (5.24s)
    --- FAIL: TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1 (5.23s)

from etcd.

serathius avatar serathius commented on September 27, 2024 2

Thanks, @jmhbnz, for confirming this on v3.4.0!

I appreciate everyone's insights so far. To clarify my earlier question, it's sometimes valuable to confirm even seemingly obvious details, just to ensure we're all on the same page and can make the most informed decisions going forward.

Now that we know this isn't a recent regression, we can focus on understanding the underlying cause and potential solutions.

from etcd.

serathius avatar serathius commented on September 27, 2024 1

One thing I noticed, that the WatchChan is not broken during etcd downtime, so the time it's down is short enough that the etcd client code retries the request in a transparent way. There might be a bug there too.

Still there is a possibility of a bug in recording client responses, it is a little more complicated to record history of watch than KV requests.

func (c *RecordingClient) watch(ctx context.Context, request model.WatchRequest) clientv3.WatchChan {
ops := []clientv3.OpOption{}
if request.WithPrefix {
ops = append(ops, clientv3.WithPrefix())
}
if request.Revision != 0 {
ops = append(ops, clientv3.WithRev(request.Revision))
}
if request.WithProgressNotify {
ops = append(ops, clientv3.WithProgressNotify())
}
if request.WithPrevKV {
ops = append(ops, clientv3.WithPrevKV())
}
respCh := make(chan clientv3.WatchResponse)
c.watchMux.Lock()
c.watchOperations = append(c.watchOperations, model.WatchOperation{
Request: request,
Responses: []model.WatchResponse{},
})
index := len(c.watchOperations) - 1
c.watchMux.Unlock()
go func() {
defer close(respCh)
for r := range c.client.Watch(ctx, request.Key, ops...) {
c.watchOperations[index].Responses = append(c.watchOperations[index].Responses, ToWatchResponse(r, c.baseTime))
select {
case respCh <- r:
case <-ctx.Done():
return
}
}
}()
return respCh
}

from etcd.

serathius avatar serathius commented on September 27, 2024 1

Added logs to ensure that issue is not one robustness test, got:

They clearly show that revision 125 was not provided to client.

Adding new event response to watch, revs: 124...124 
./bin/etcdctl (./bin/etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (3237740): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":123,"raft_term":2},"version":"3.6.0-alpha.0","dbSize":20480,"leader":14578408409545168728,"raftIndex":190,"raftTerm":2,"raftAppliedIndex":189,"dbSizeInUse":16384,"storageVersion":"3.6.0"}}]
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): {"level":"info","ts":"2024-06-10T14:40:21.080206+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":120}
    cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
    logger.go:146: 2024-06-10T14:40:21.080+0200 INFO    Setting up gofailpoint  {"failpoint": "raftBeforeLeaderSend=panic()"}
    logger.go:146: 2024-06-10T14:40:21.080+0200 INFO    Waiting for member to exit      {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): panic: failpoint panic: {}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): 
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): goroutine 140 [running]:
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.actPanic(0x10100409d13?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/[email protected]/runtime/terms.go:318 +0x65
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*term).do(...)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/[email protected]/runtime/terms.go:290
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*terms).eval(0xc0000bdce0?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/[email protected]/runtime/terms.go:105 +0xe3
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc000257c90?)
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/[email protected]/runtime/failpoint.go:38 +0x98
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start.func1()
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/etcd/server/v3/etcdserver/raft.go:229 +0x716
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701): created by go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start in goroutine 171
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237701):        go.etcd.io/etcd/server/v3/etcdserver/raft.go:166 +0x6f
    logger.go:146: 2024-06-10T14:40:21.090+0200 INFO    server exited   {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "code": 2}
    logger.go:146: 2024-06-10T14:40:21.090+0200 INFO    Member exited as expected       {"member": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
    logger.go:146: 2024-06-10T14:40:21.090+0200 INFO    starting server...      {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0"}
Adding new error response to watch, err: rpc error: code = Unavailable desc = error reading from server: EOF 
    logger.go:146: 2024-06-10T14:40:21.090+0200 INFO    spawning process        {"args": ["./bin/etcd", "--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001", "--snapshot-count=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=100", "--heartbeat-interval=101", "--experimental-compaction-batch-limit=10", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--initial-cluster-token=new", "--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2003573077/001", "name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100443+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100561+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100609+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100628+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./bin/etcd","--name=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","--snapshot-count=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=100","--heartbeat-interval=101","--experimental-compaction-batch-limit=10","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--initial-cluster-token=new","--initial-cluster=TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0=http://localhost:20001","--initial-cluster-state=new"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100701+0200","caller":"etcdmain/etcd.go:106","msg":"server has already been initialized","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","dir-type":"member"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.100719+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.100733+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.101260+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.101669+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"8a0054fec","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":true,"name":"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":100,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.102192+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0000924a0}"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.109701+0200","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db) successfully"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.109759+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap/db","took":"7.653391ms"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110518+0200","caller":"etcdserver/bootstrap.go:412","msg":"recovered v2 store from snapshot","snapshot-index":102,"snapshot-size":"7.2 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110534+0200","caller":"storage/backend.go:108","msg":"Skipping snapshot backend","consistent-index":191,"snapshot-index":102}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.110547+0200","caller":"etcdserver/bootstrap.go:231","msg":"recovered v3 backend","backend-size-bytes":20480,"backend-size":"20 kB","backend-size-in-use-bytes":12288,"backend-size-in-use":"12 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111143+0200","caller":"etcdserver/bootstrap.go:579","msg":"restarting local member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","commit-index":191}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111158+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111260+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111357+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111368+0200","caller":"membership/cluster.go:291","msg":"recovered/added member from store","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","recovered-remote-peer-id":"ca50e9357181d758","recovered-remote-peer-urls":["http://localhost:20001"],"recovered-remote-peer-is-learner":false}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111377+0200","caller":"membership/cluster.go:301","msg":"set cluster version from store","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111391+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111403+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111447+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111480+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.111494+0200","logger":"raft","caller":"[email protected]/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [ca50e9357181d758], term: 2, commit: 191, applied: 102, lastindex: 192, lastterm: 2]"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"warn","ts":"2024-06-10T14:40:21.115179+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.118827+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":113}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.119624+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":124}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.123574+0200","caller":"mvcc/kvstore.go:413","msg":"resume scheduled compaction","scheduled-compact-revision":120}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.123661+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":120}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141549+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141594+0200","caller":"etcdserver/server.go:576","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-id":"34f27e83b3bc2ff","cluster-version":"3.6"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141643+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":120,"took":"17.934179ms","hash":1204195312,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141667+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141660+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141708+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141769+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141786+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize14022561371/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141776+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141804+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.141807+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.616829+0200","logger":"raft","caller":"[email protected]/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617106+0200","logger":"raft","caller":"[email protected]/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617166+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 2"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617184+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.617211+0200","logger":"raft","caller":"[email protected]/raft.go:883","msg":"ca50e9357181d758 became candidate at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623853+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623883+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623904+0200","logger":"raft","caller":"[email protected]/raft.go:938","msg":"ca50e9357181d758 became leader at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.623916+0200","logger":"raft","caller":"[email protected]/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628269+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628273+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628459+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628566+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.628867+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:21.630495+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
    logger.go:146: 2024-06-10T14:40:21.633+0200 INFO    started server. {"name": "TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0", "pid": 3237749}
    logger.go:146: 2024-06-10T14:40:21.635+0200 INFO    Finished triggering failpoint   {"failpoint": "raftBeforeLeaderSend=panic()"}
Adding response watch: header:<cluster_id:238453183653593855 member_id:14578408409545168728 revision:125 raft_term:3 > created:true 
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.114627+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":125}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.131637+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":125,"took":"16.956782ms","hash":981787830,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
./bin/etcd (TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0) (3237749): {"level":"info","ts":"2024-06-10T14:40:22.131665+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":981787830,"revision":125,"compact-revision":120}
Adding new event response to watch, revs: 126...132 

from etcd.

serathius avatar serathius commented on September 27, 2024 1

Just as a thought exercise, how can admin of etcd cluster be alerted about the dropped watch events issue in general?

Excellent question! Unfortunately, there's no simple way for etcd cluster admins to be directly alerted about dropped watch events in a production environment. The robustness tests achieve this by collecting comprehensive data – requests from all clients, WAL entries, etc. – and performing in-depth analysis. This level of data collection and analysis isn't feasible in production due to the sheer volume of information involved.

This highlights the importance of rigorous, continuous robustness testing with diverse traffic types and failure injection. It allows us to thoroughly analyze smaller chunks of data in a controlled environment, identifying issues that might otherwise go unnoticed in the vastness of production data.

On Kubernetes side, I have some plans to improve watch cache observability. Stay tuned!

from etcd.

ahrtr avatar ahrtr commented on September 27, 2024 1

Always keep the compact revision, and delay compact tombstone revision #18162

Thanks @MadhavJivrajani . We need to add more tests ((or update existing tests)), please see all the proposed test cases as mentioned in #18162 (comment). Note I also added the test cases in the doc (under the option 2).

@fuweid @MadhavJivrajani Please consider a whole plan to resolve this issue. Please feel free to ping on slack if you want any discussion.

from etcd.

serathius avatar serathius commented on September 27, 2024

cc @MadhavJivrajani @siyuanfoundation @fuweid

Can someone take a look and double check my findings?

from etcd.

fuweid avatar fuweid commented on September 27, 2024

/assign

from etcd.

serathius avatar serathius commented on September 27, 2024

I rerun the report on latest version of #17833 and it found that linearization was broken. Looks like a false positive due to some bug.

from etcd.

serathius avatar serathius commented on September 27, 2024

Ok, ok I encountered another case.

Failpoint used: raftBeforeLeaderSend=panic()
Scenario: TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
Etcd version:

./bin/etcd --version
etcd Version: 3.6.0-alpha.0
Git SHA: 6f916d428
Go Version: go1.22.4
Go OS/Arch: linux/amd64

Report: 1717749910517706368.tar.gz

Found when adding compaction to operations serathius@5959110

Logs showing broken watch guarantee due to missing log:

=== RUN   TestDataReports/1717749910517706368
    logger.go:146: 2024-06-07T11:31:24.546+0200 INFO    Validating linearizable operations      {"timeout": "5m0s"}
    logger.go:146: 2024-06-07T11:31:24.950+0200 INFO    Linearization success   {"duration": "404.080944ms"}
    logger.go:146: 2024-06-07T11:31:24.951+0200 INFO    Validating watch
    logger.go:146: 2024-06-07T11:31:24.960+0200 ERROR   Broke watch guarantee   {"guarantee": "reliable", "client": 2}
  []model.PersistedEvent{
        ... // 154 identical elements
        {Event: {Type: "put-operation", Key: "key3", Value: {Hash: 3071434148}}, Revision: 139},
        {Event: {Type: "put-operation", Key: "key5", Value: {Hash: 2150768673}}, Revision: 140},
-       {Event: model.Event{Type: "delete-operation", Key: "key6"}, Revision: 141},
        {Event: {Type: "put-operation", Key: "key8", Value: {Value: "121"}}, Revision: 141, ...},
        {Event: {Type: "put-operation", Key: "key9", Value: {Value: "122"}}, Revision: 141},
        ... // 134 identical elements
  }

    validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
    logger.go:146: 2024-06-07T11:31:24.960+0200 INFO    Validating serializable operations
    logger.go:146: 2024-06-07T11:31:24.967+0200 INFO    Saving visualization    {"path": "./src/go.etcd.io/etcd/tests/robustness/testdata/1717749910517706368/history.html"}

Screenshot showing a transaction for revision 141 and fact that key "key6" was present so delete really happened.
image

Watch response showing missing delete event
image

Complete log:

=== RUN   TestRobustnessExploratory/Etcd/LowTraffic/ClusterOfSize1
(./etcd_--version) (1897882): etcd Version: 3.6.0-alpha.0
(./etcd_--version) (1897882): Git SHA: 6f916d428
(./etcd_--version) (1897882): Go Version: go1.22.4
(./etcd_--version) (1897882): Go OS/Arch: linux/amd64
(./etcd_--version) (1897894): etcd Version: 3.6.0-alpha.0
(./etcd_--version) (1897894): Git SHA: 6f916d428
(./etcd_--version) (1897894): Go Version: go1.22.4
(./etcd_--version) (1897894): Go OS/Arch: linux/amd64
    logger.go:146: 2024-06-07T10:45:06.314+0200 INFO    starting server...      {"name": "member-0"}
    logger.go:146: 2024-06-07T10:45:06.314+0200 INFO    spawning process        {"args": ["./etcd", "--name=member-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--initial-cluster-token=new", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--heartbeat-interval=101", "--initial-cluster=member-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2482998486/001", "name": "member-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.3237+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.323787+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.32395+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.323991+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./etcd","--name=member-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","--snapshot-count=1000","--experimental-compaction-batch-limit=100","--initial-cluster-token=new","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=1000","--heartbeat-interval=101","--initial-cluster=member-0=http://localhost:20001","--initial-cluster-state=new"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3244+0200","caller":"etcdmain/etcd.go:123","msg":"Initialize and start etcd server","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","dir-type":"empty"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.324433+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324445+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324785+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.324963+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"6f916d428","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":false,"name":"member-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":1000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"member-0=http://localhost:20001","initial-cluster-state":"new","initial-cluster-token":"new","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.325529+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0003943b0}"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332264+0200","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) successfully"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3323+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db","took":"6.84944ms"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332419+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.332446+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345644+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345682+0200","caller":"etcdserver/bootstrap.go:493","msg":"starting local member","local-member-id":"ca50e9357181d758","cluster-id":"34f27e83b3bc2ff"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345701+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345758+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=()"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.34579+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345798+0200","logger":"raft","caller":"[email protected]/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.345809+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.34584+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897904): {"level":"warn","ts":"2024-06-07T10:45:06.360383+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.364937+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372639+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372684+0200","caller":"etcdserver/server.go:585","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-version":"to_be_decided"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372729+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372781+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372793+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372768+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372839+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372854+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.37288+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.372974+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.3731+0200","caller":"membership/cluster.go:411","msg":"added member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","added-peer-id":"ca50e9357181d758","added-peer-peer-urls":["http://localhost:20001"],"added-peer-is-learner":false}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548218+0200","logger":"raft","caller":"[email protected]/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548313+0200","logger":"raft","caller":"[email protected]/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548384+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 1"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548402+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.548425+0200","logger":"raft","caller":"[email protected]/raft.go:883","msg":"ca50e9357181d758 became candidate at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559268+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559325+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559374+0200","logger":"raft","caller":"[email protected]/raft.go:938","msg":"ca50e9357181d758 became leader at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.559391+0200","logger":"raft","caller":"[email protected]/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 2"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.561842+0200","caller":"etcdserver/server.go:2318","msg":"setting up initial cluster version using v3 API","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56395+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:member-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56395+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564116+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564142+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.564779+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565461+0200","caller":"membership/cluster.go:594","msg":"set initial cluster version","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565615+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565717+0200","caller":"version/monitor.go:116","msg":"cluster version differs from storage version.","cluster-version":"3.6.0","storage-version":"3.5.0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.565719+0200","caller":"etcdserver/server.go:2338","msg":"cluster version is updated","cluster-version":"3.6"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.56587+0200","caller":"schema/migration.go:65","msg":"updated storage version","new-storage-version":"3.6.0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.567317+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
    logger.go:146: 2024-06-07T10:45:06.570+0200 INFO    started server. {"name": "member-0", "pid": 1897904}
    logger.go:146: 2024-06-07T10:45:06.577+0200 INFO    Start traffic
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.592935+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":3}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.593112+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":3,"took":"120.206µs","hash":1734659494,"current-db-size-bytes":20480,"current-db-size":"20 kB","current-db-size-in-use-bytes":16384,"current-db-size-in-use":"16 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.593139+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1734659494,"revision":3,"compact-revision":-1}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.694305+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":37}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.700835+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":37,"took":"6.440584ms","hash":2304164051,"current-db-size-bytes":303104,"current-db-size":"303 kB","current-db-size-in-use-bytes":274432,"current-db-size-in-use":"274 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.700886+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2304164051,"revision":37,"compact-revision":3}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.742881+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":58}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.749877+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":58,"took":"6.926064ms","hash":1646957282,"current-db-size-bytes":376832,"current-db-size":"377 kB","current-db-size-in-use-bytes":335872,"current-db-size-in-use":"336 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.749911+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1646957282,"revision":58,"compact-revision":37}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.802825+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":77}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.813113+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":77,"took":"10.227838ms","hash":539014635,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":417792,"current-db-size-in-use":"418 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.813147+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":539014635,"revision":77,"compact-revision":58}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.85343+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":85}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.857414+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":85,"took":"3.90014ms","hash":1467298815,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":229376,"current-db-size-in-use":"229 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:06.857438+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1467298815,"revision":85,"compact-revision":77}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.074676+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":97}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.079062+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":97,"took":"4.321109ms","hash":4113805127,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":114688,"current-db-size-in-use":"115 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.079099+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":4113805127,"revision":97,"compact-revision":85}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.148722+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":99}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.153407+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":99,"took":"4.623808ms","hash":1249150654,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":110592,"current-db-size-in-use":"111 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.153465+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1249150654,"revision":99,"compact-revision":97}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.198642+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":104}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.202576+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":104,"took":"3.889339ms","hash":1699712947,"current-db-size-bytes":450560,"current-db-size":"451 kB","current-db-size-in-use-bytes":114688,"current-db-size-in-use":"115 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.202591+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1699712947,"revision":104,"compact-revision":99}
    logger.go:146: 2024-06-07T10:45:07.573+0200 INFO    Triggering failpoint    {"failpoint": "raftBeforeLeaderSend=panic()"}
./etcdctl (./etcdctl_--endpoints=http://localhost:20000_--command-timeout=1.1s_get_0_-w_json_-w_json_-w_json) (1897931): {"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":139,"raft_term":2}}
./etcdctl (./etcdctl_--endpoints=http://localhost:20000_endpoint_status_-w_json) (1897941): [{"Endpoint":"http://localhost:20000","Status":{"header":{"cluster_id":238453183653593855,"member_id":14578408409545168728,"revision":140,"raft_term":2},"version":"3.6.0-alpha.0","dbSize":450560,"leader":14578408409545168728,"raftIndex":178,"raftTerm":2,"raftAppliedIndex":177,"dbSizeInUse":294912,"storageVersion":"3.6.0"}}]
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.601459+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":137}
    cluster.go:1066: members agree on a leader, members:map[14578408409545168728:0] , leader:14578408409545168728
    logger.go:146: 2024-06-07T10:45:07.608+0200 INFO    Setting up gofailpoint  {"failpoint": "raftBeforeLeaderSend=panic()"}
    logger.go:146: 2024-06-07T10:45:07.608+0200 INFO    Waiting for member to exit      {"member": "member-0"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.610769+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":137,"took":"9.25467ms","hash":583134963,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":368640,"current-db-size-in-use":"369 kB"}
member-0 (1897904): {"level":"info","ts":"2024-06-07T10:45:07.610806+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":583134963,"revision":137,"compact-revision":104}
member-0 (1897904): panic: failpoint panic: {}
member-0 (1897904): 
member-0 (1897904): goroutine 154 [running]:
member-0 (1897904): go.etcd.io/gofail/runtime.actPanic(0x10100000000?)
member-0 (1897904):        go.etcd.io/[email protected]/runtime/terms.go:318 +0x65
member-0 (1897904): go.etcd.io/gofail/runtime.(*term).do(...)
member-0 (1897904):        go.etcd.io/[email protected]/runtime/terms.go:290
member-0 (1897904): go.etcd.io/gofail/runtime.(*terms).eval(0xc00016fbc0?)
member-0 (1897904):        go.etcd.io/[email protected]/runtime/terms.go:105 +0xe3
member-0 (1897904): go.etcd.io/gofail/runtime.(*Failpoint).Acquire(0xc0001f9c90?)
member-0 (1897904):        go.etcd.io/[email protected]/runtime/failpoint.go:38 +0x98
member-0 (1897904): go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start.func1()
member-0 (1897904):        go.etcd.io/etcd/server/v3/etcdserver/raft.go:229 +0x716
member-0 (1897904): created by go.etcd.io/etcd/server/v3/etcdserver.(*raftNode).start in goroutine 168
member-0 (1897904):        go.etcd.io/etcd/server/v3/etcdserver/raft.go:166 +0x6f
    logger.go:146: 2024-06-07T10:45:07.617+0200 INFO    server exited   {"name": "member-0", "code": 2}
    logger.go:146: 2024-06-07T10:45:07.617+0200 INFO    Member exited as expected       {"member": "member-0"}
    logger.go:146: 2024-06-07T10:45:07.617+0200 INFO    starting server...      {"name": "member-0"}
    logger.go:146: 2024-06-07T10:45:07.617+0200 INFO    spawning process        {"args": ["./etcd", "--name=member-0", "--listen-client-urls=http://localhost:20000", "--advertise-client-urls=http://localhost:20000", "--listen-peer-urls=http://localhost:20001", "--initial-advertise-peer-urls=http://localhost:20001", "--initial-cluster-token=new", "--data-dir", "/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001", "--snapshot-count=1000", "--experimental-compaction-batch-limit=100", "--initial-cluster-token=new", "--election-timeout=521", "--experimental-snapshot-catchup-entries=100", "--experimental-watch-progress-notify-interval=100ms", "--snapshot-count=1000", "--heartbeat-interval=101", "--initial-cluster=member-0=http://localhost:20001", "--initial-cluster-state=new"], "working-dir": "/tmp/TestRobustnessExploratory2482998486/001", "name": "member-0", "environment-variables": ["GOFAIL_HTTP=127.0.0.1:12381", "ETCD_VERIFY=all", "EXPECT_DEBUG=true", "PATH=./pkg/mod/golang.org/[email protected]/bin:./.gvm/gos/go1.22/bin:./bin:./bin/:./.gvm/pkgsets/go1.22/global/bin:./.gvm/gos/go1.22/bin:./.gvm/pkgsets/go1.22/global/overlay/bin:./.gvm/bin:./.gvm/bin:./bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:./.local/bin:./bin", "ETCD_UNSUPPORTED_ARCH=amd64"]}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627613+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_VERIFY=all"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627706+0200","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_UNSUPPORTED_ARCH=amd64"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627725+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.62775+0200","caller":"etcdmain/etcd.go:63","msg":"Running: ","args":["./etcd","--name=member-0","--listen-client-urls=http://localhost:20000","--advertise-client-urls=http://localhost:20000","--listen-peer-urls=http://localhost:20001","--initial-advertise-peer-urls=http://localhost:20001","--initial-cluster-token=new","--data-dir","/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","--snapshot-count=1000","--experimental-compaction-batch-limit=100","--initial-cluster-token=new","--election-timeout=521","--experimental-snapshot-catchup-entries=100","--experimental-watch-progress-notify-interval=100ms","--snapshot-count=1000","--heartbeat-interval=101","--initial-cluster=member-0=http://localhost:20001","--initial-cluster-state=new"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.627828+0200","caller":"etcdmain/etcd.go:106","msg":"server has already been initialized","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","dir-type":"member"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.627848+0200","caller":"embed/config.go:913","msg":"Running http and grpc server on single port. This is not recommended for production."}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.627856+0200","caller":"embed/etcd.go:125","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:20001"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.628737+0200","caller":"embed/etcd.go:133","msg":"configuring client listeners","listen-client-urls":["http://localhost:20000"]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.628956+0200","caller":"embed/etcd.go:314","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"6f916d428","go-version":"go1.22.4","go-os":"linux","go-arch":"amd64","max-cpu-set":24,"max-cpu-available":24,"member-initialized":true,"name":"member-0","data-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member","force-new-cluster":false,"heartbeat-interval":"101ms","election-timeout":"505ms","initial-election-tick-advance":true,"snapshot-count":1000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":100,"initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[],"experimental-set-member-localaddr":false,"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.62947+0200","logger":"bbolt","caller":"backend/backend.go:197","msg":"Opening db file (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc000126248}"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.636941+0200","logger":"bbolt","caller":"[email protected]/db.go:321","msg":"Opening bbolt db (/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db) successfully"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637029+0200","caller":"storage/backend.go:80","msg":"opened backend db","path":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap/db","took":"7.640918ms"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637902+0200","caller":"etcdserver/bootstrap.go:440","msg":"No snapshot found. Recovering WAL from scratch!"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.637928+0200","caller":"etcdserver/bootstrap.go:231","msg":"recovered v3 backend","backend-size-bytes":520192,"backend-size":"520 kB","backend-size-in-use-bytes":126976,"backend-size-in-use":"127 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.639787+0200","caller":"etcdserver/bootstrap.go:579","msg":"restarting local member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","commit-index":178}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.639855+0200","caller":"etcdserver/bootstrap.go:93","msg":"bootstrapping cluster"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640008+0200","caller":"etcdserver/bootstrap.go:100","msg":"bootstrapping storage"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640301+0200","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640316+0200","caller":"membership/cluster.go:291","msg":"recovered/added member from store","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","recovered-remote-peer-id":"ca50e9357181d758","recovered-remote-peer-urls":["http://localhost:20001"],"recovered-remote-peer-is-learner":false}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640327+0200","caller":"membership/cluster.go:301","msg":"set cluster version from store","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640337+0200","caller":"etcdserver/bootstrap.go:108","msg":"bootstrapping raft"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640351+0200","caller":"etcdserver/server.go:306","msg":"bootstrap successfully"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640396+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640423+0200","logger":"raft","caller":"[email protected]/raft.go:870","msg":"ca50e9357181d758 became follower at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.640438+0200","logger":"raft","caller":"[email protected]/raft.go:483","msg":"newRaft ca50e9357181d758 [peers: [], term: 2, commit: 178, applied: 0, lastindex: 179, lastterm: 2]"}
member-0 (1897950): {"level":"warn","ts":"2024-06-07T10:45:07.644606+0200","caller":"auth/store.go:1134","msg":"simple token is not cryptographically signed"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648367+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648847+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":140}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660157+0200","caller":"storage/quota.go:93","msg":"enabled backend quota with default value","quota-name":"v3-applier","quota-size-bytes":2147483648,"quota-size":"2.1 GB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660204+0200","caller":"etcdserver/server.go:576","msg":"starting etcd server","local-member-id":"ca50e9357181d758","local-server-version":"3.6.0-alpha.0","cluster-id":"34f27e83b3bc2ff","cluster-version":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660292+0200","caller":"etcdserver/server.go:470","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"ca50e9357181d758","forward-ticks":4,"forward-duration":"404ms","election-ticks":5,"election-timeout":"505ms"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660307+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap.db","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660366+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/snap","suffix":"snap","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660347+0200","caller":"embed/etcd.go:610","msg":"serving peer traffic","address":"127.0.0.1:20001"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660385+0200","caller":"embed/etcd.go:283","msg":"now serving peer/client/metrics","local-member-id":"ca50e9357181d758","initial-advertise-peer-urls":["http://localhost:20001"],"listen-peer-urls":["http://localhost:20001"],"advertise-client-urls":["http://localhost:20000"],"listen-client-urls":["http://localhost:20000"],"listen-metrics-urls":[]}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660397+0200","caller":"embed/etcd.go:582","msg":"cmux::serve","address":"127.0.0.1:20001"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660382+0200","caller":"fileutil/purge.go:50","msg":"started to purge file","dir":"/tmp/TestRobustnessExploratoryEtcdLowTrafficClusterOfSize11932190500/001/member/wal","suffix":"wal","max":5,"interval":"30s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660464+0200","logger":"raft","caller":"[email protected]/raft.go:1920","msg":"ca50e9357181d758 switched to configuration voters=(14578408409545168728)"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.66053+0200","caller":"membership/cluster.go:420","msg":"ignore already added member","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","added-peer-id":"ca50e9357181d758","added-peer-peer-urls":["http://localhost:20001"],"added-peer-is-learner":false}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.660578+0200","caller":"membership/cluster.go:586","msg":"updated cluster version","cluster-id":"34f27e83b3bc2ff","local-member-id":"ca50e9357181d758","from":"3.6","to":"3.6"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944302+0200","logger":"raft","caller":"[email protected]/raft.go:956","msg":"ca50e9357181d758 is starting a new election at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944353+0200","logger":"raft","caller":"[email protected]/raft.go:899","msg":"ca50e9357181d758 became pre-candidate at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944416+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgPreVoteResp from ca50e9357181d758 at term 2"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.94444+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgPreVoteResp votes and 0 vote rejections"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.944485+0200","logger":"raft","caller":"[email protected]/raft.go:883","msg":"ca50e9357181d758 became candidate at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.95921+0200","logger":"raft","caller":"[email protected]/raft.go:1043","msg":"ca50e9357181d758 received MsgVoteResp from ca50e9357181d758 at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.959254+0200","logger":"raft","caller":"[email protected]/raft.go:1649","msg":"ca50e9357181d758 has received 1 MsgVoteResp votes and 0 vote rejections"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.959286+0200","logger":"raft","caller":"[email protected]/raft.go:938","msg":"ca50e9357181d758 became leader at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.9593+0200","logger":"raft","caller":"[email protected]/node.go:370","msg":"raft.node: ca50e9357181d758 elected leader ca50e9357181d758 at term 3"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964178+0200","caller":"etcdserver/server.go:1774","msg":"published local member to cluster through raft","local-member-id":"ca50e9357181d758","local-member-attributes":"{Name:member-0 ClientURLs:[http://localhost:20000]}","cluster-id":"34f27e83b3bc2ff","publish-timeout":"6.01s"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964176+0200","caller":"embed/serve.go:104","msg":"ready to serve client requests"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964348+0200","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.96437+0200","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.964837+0200","caller":"v3rpc/health.go:62","msg":"grpc service status changed","service":"","status":"SERVING"}
    logger.go:146: 2024-06-07T10:45:07.966+0200 INFO    started server. {"name": "member-0", "pid": 1897950}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.966818+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
    logger.go:146: 2024-06-07T10:45:07.967+0200 INFO    Finished triggering failpoint   {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.640086+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647928+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":141,"took":"7.766192ms","hash":361777187,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":258048,"current-db-size-in-use":"258 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647968+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":361777187,"revision":141,"compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.660836+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":149}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.668671+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":149,"took":"7.779358ms","hash":3929730471,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":262144,"current-db-size-in-use":"262 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.668707+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":3929730471,"revision":149,"compact-revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.679649+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":158}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691774+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":158,"took":"12.045594ms","hash":1641924706,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":225280,"current-db-size-in-use":"225 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691827+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":1641924706,"revision":158,"compact-revision":149}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.691851+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":159}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.699572+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":159,"took":"7.690631ms","hash":1830603898,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":147456,"current-db-size-in-use":"148 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.6996+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.69961+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":161}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.703539+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":161,"took":"3.907723ms","hash":545936533,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":188416,"current-db-size-in-use":"188 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.703569+0200","caller":"mvcc/kvstore.go:248","msg":"previous compaction was interrupted, skip storing compaction hash value"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.726023+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":168}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.730396+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":168,"took":"4.324136ms","hash":4120988724,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":233472,"current-db-size-in-use":"234 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.730417+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":4120988724,"revision":168,"compact-revision":161}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.797393+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":213}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.801495+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":213,"took":"4.014924ms","hash":2299796790,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":319488,"current-db-size-in-use":"320 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.801512+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2299796790,"revision":213,"compact-revision":168}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.822852+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":225}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.828806+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":225,"took":"5.900397ms","hash":3531019373,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":65536,"current-db-size-in-use":"66 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.828848+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":3531019373,"revision":225,"compact-revision":213}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.944933+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":240}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.948934+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":240,"took":"3.945123ms","hash":2274426623,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":69632,"current-db-size-in-use":"70 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.948977+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":2274426623,"revision":240,"compact-revision":225}
    logger.go:146: 2024-06-07T10:45:09.002+0200 INFO    Finished traffic
    logger.go:146: 2024-06-07T10:45:10.014+0200 INFO    Reporting complete traffic      {"successes": 678, "failures": 31, "successRate": 0.9562764456981664, "period": "2.425517855s", "qps": 279.52793610748336}
    logger.go:146: 2024-06-07T10:45:10.014+0200 INFO    Reporting traffic before failure injection      {"successes": 402, "failures": 0, "successRate": 1, "period": "996.95569ms", "qps": 403.22754966171067}
    logger.go:146: 2024-06-07T10:45:10.014+0200 INFO    Reporting traffic during failure injection      {"successes": 7, "failures": 14, "successRate": 0.3333333333333333, "period": "393.883383ms", "qps": 17.77175758643263}
    logger.go:146: 2024-06-07T10:45:10.014+0200 INFO    Reporting traffic after failure injection       {"successes": 269, "failures": 17, "successRate": 0.9405594405594405, "period": "1.034678782s", "qps": 259.98406914272647}
    logger.go:146: 2024-06-07T10:45:10.015+0200 INFO    Finished simulating traffic     {"max-revision": 248}
    logger.go:146: 2024-06-07T10:45:10.027+0200 INFO    Validating linearizable operations      {"timeout": "5m0s"}
    logger.go:146: 2024-06-07T10:45:10.502+0200 INFO    Linearization success   {"duration": "475.746216ms"}
    logger.go:146: 2024-06-07T10:45:10.503+0200 INFO    Validating watch
    logger.go:146: 2024-06-07T10:45:10.509+0200 ERROR   Broke watch guarantee   {"guarantee": "reliable", "client": 2, "diff": "  []model.PersistedEvent{\n  \t... // 154 identical elements\n  \t{Event: {Type: \"put-operation\", Key: \"key3\", Value: {Hash: 3071434148}}, Revision: 139},\n  \t{Event: {Type: \"put-operation\", Key: \"key5\", Value: {Hash: 2150768673}}, Revision: 140},\n- \t{Event: model.Event{Type: \"delete-operation\", Key: \"key6\"}, Revision: 141},\n  \t{Event: {Type: \"put-operation\", Key: \"key8\", Value: {Value: \"121\"}}, Revision: 141, ...},\n  \t{Event: {Type: \"put-operation\", Key: \"key9\", Value: {Value: \"122\"}}, Revision: 141},\n  \t... // 134 identical elements\n  }\n"}
    validate.go:49: Failed validating watch history, err: broke Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b
    logger.go:146: 2024-06-07T10:45:10.509+0200 INFO    Validating serializable operations
    logger.go:146: 2024-06-07T10:45:10.517+0200 INFO    Saving robustness test report   {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368"}
    logger.go:146: 2024-06-07T10:45:10.517+0200 INFO    Saving member data dir  {"member": "member-0", "path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/server-member-0"}
    logger.go:146: 2024-06-07T10:45:10.518+0200 INFO    no watch operations for client, skip persisting {"client-id": 1}
    logger.go:146: 2024-06-07T10:45:10.518+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-1/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.519+0200 INFO    Saving watch operations {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-2/watch.json"}
    logger.go:146: 2024-06-07T10:45:10.519+0200 INFO    no KV operations for client, skip persisting    {"client-id": 2}
    logger.go:146: 2024-06-07T10:45:10.519+0200 INFO    no watch operations for client, skip persisting {"client-id": 3}
    logger.go:146: 2024-06-07T10:45:10.519+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-3/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.520+0200 INFO    no watch operations for client, skip persisting {"client-id": 4}
    logger.go:146: 2024-06-07T10:45:10.520+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-4/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.522+0200 INFO    no watch operations for client, skip persisting {"client-id": 5}
    logger.go:146: 2024-06-07T10:45:10.522+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-5/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.523+0200 INFO    no watch operations for client, skip persisting {"client-id": 6}
    logger.go:146: 2024-06-07T10:45:10.523+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-6/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.524+0200 INFO    no watch operations for client, skip persisting {"client-id": 7}
    logger.go:146: 2024-06-07T10:45:10.524+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-7/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.525+0200 INFO    no watch operations for client, skip persisting {"client-id": 8}
    logger.go:146: 2024-06-07T10:45:10.525+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-8/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.526+0200 INFO    no watch operations for client, skip persisting {"client-id": 9}
    logger.go:146: 2024-06-07T10:45:10.526+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-9/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.527+0200 INFO    no watch operations for client, skip persisting {"client-id": 10}
    logger.go:146: 2024-06-07T10:45:10.527+0200 INFO    Saving operation history        {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/client-10/operations.json"}
    logger.go:146: 2024-06-07T10:45:10.528+0200 INFO    Saving visualization    {"path": "/tmp/TestRobustnessExploratory_Etcd_LowTraffic_ClusterOfSize1/1717749910517706368/history.html"}
    logger.go:146: 2024-06-07T10:45:10.583+0200 INFO    killing server...       {"name": "member-0"}
    logger.go:146: 2024-06-07T10:45:10.583+0200 INFO    stopping server...      {"name": "member-0"}
    logger.go:146: 2024-06-07T10:45:10.588+0200 INFO    stopped server. {"name": "member-0"}

from etcd.

serathius avatar serathius commented on September 27, 2024

cc @ahrtr @jmhbnz @wenjiaswe @MadhavJivrajani @siyuanfoundation

from etcd.

serathius avatar serathius commented on September 27, 2024

@fuweid Think this is related to #17780

See the following logs:


member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648367+0200","caller":"mvcc/kvstore.go:330","msg":"restored last compact revision","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":137}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.648847+0200","caller":"mvcc/kvstore.go:404","msg":"kvstore restored","current-rev":140}
...
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:07.966818+0200","caller":"embed/serve.go:188","msg":"serving client traffic insecurely; this is strongly discouraged!","traffic":"grpc+http","address":"127.0.0.1:20000"}
    logger.go:146: 2024-06-07T10:45:07.967+0200 INFO    Finished triggering failpoint   {"failpoint": "raftBeforeLeaderSend=panic()"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.640086+0200","caller":"mvcc/index.go:194","msg":"compact tree index","revision":141}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647928+0200","caller":"mvcc/kvstore_compaction.go:71","msg":"finished scheduled compaction","compact-revision":141,"took":"7.766192ms","hash":361777187,"current-db-size-bytes":520192,"current-db-size":"520 kB","current-db-size-in-use-bytes":258048,"current-db-size-in-use":"258 kB"}
member-0 (1897950): {"level":"info","ts":"2024-06-07T10:45:08.647968+0200","caller":"mvcc/hash.go:143","msg":"storing new hash","hash":361777187,"revision":141,"compact-revision":137}

Looks like we lost first delete after restoring from restoring from "last compact revision".
Off by one error on tombstone?

from etcd.

fuweid avatar fuweid commented on September 27, 2024

Yes. Let me try to reproduce it in my local. Update it later.

from etcd.

fuweid avatar fuweid commented on September 27, 2024

Report: 1717749910517706368.tar.gz

For this one, the tombstone of key6 would be deleted by the following compact request compaction:<revision:141 >.
Not sure when the watcher setups watch request. If it setups after compactor deletes that key6, it won't get the event.
However, for the replay, it assumes that etcd process was running. There is mismatch between real events and replay.

   2           178      norm    header:<ID:15517310785104357973 > compaction:<revision:137 >
   2           179      norm    header:<ID:15517310785104357976 > txn:<success:<request_range:<key:"key1" > > success:<request_delete_range:<key:"key6" > > success:<request_put:<key:"key8" value:"121" > > success:<request_put:<key:"key9" value:"122" > > >
   3           180      norm
   3           181      norm    header:<ID:15517310785104689154 > cluster_member_attr_set:<member_ID:14578408409545168728 member_attributes:<name:"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0" client_urls:"http://localhost:20000" > >
   3           182      norm    header:<ID:15517310785104689159 > txn:<success:<request_put:<key:"key2" value:"125" > > success:<request_put:<key:"key6" value:"126" > > success:<request_put:<key:"key1" value:"127" > > success:<request_put:<key:"key0" value:"128" > > >
   3           183      norm    header:<ID:15517310785104689162 > put:<key:"key7" value:"129" >
   3           184      norm    header:<ID:15517310785104689163 > compaction:<revision:141 >

from etcd.

fuweid avatar fuweid commented on September 27, 2024

I managed to reproduce it without multi TXN, just a delete request. This extends the impact to Kubernetes.

Again the common thing is that etcd misses a watch even if it restores kvstore on revision before a delete.

Correct. :( Currently, watcher isn't able to send out all the possible deleted events.

from etcd.

ahrtr avatar ahrtr commented on September 27, 2024

Did the client side receive an ErrCompacted? If yes, then it should be expected behaviour.

  • Since the real time events are cached, so as long as etcdserver doesn't restart, then the events won't be lost.
  • But if the etcdserver restarts for whatever reason after compaction but before sending out the watch response, then the client side may miss some compacted events. But the client will definitely get the latest value against any watched key.
    • As long as users don't update a key too frequent, it isn't a big problem

from etcd.

serathius avatar serathius commented on September 27, 2024

Did the client side receive an ErrCompacted? If yes,

No, it would be recorded in the report.

from etcd.

serathius avatar serathius commented on September 27, 2024

However, for the replay, it assumes that etcd process was running. There is mismatch between real events and replay.

@fuweid, don't understand the statement. Distributed system needs to uphold it's guarantees even if a node is down. Events on WAL used for replay should match what users observes. For watch client should observe either full history or or get ErrCompacted, as @ahrtr mentioned.

from etcd.

ahrtr avatar ahrtr commented on September 27, 2024

@fuweid Please feel free to ping me on slack if you want a quick discussion on this issue.

from etcd.

ahrtr avatar ahrtr commented on September 27, 2024

One thing I noticed, that the WatchChan is not broken during etcd downtime, so the time it's down is short enough that the etcd client code retries the request in a transparent way.

It's expected behaviour and by design as long as etcdserver doesn't return any error (e.g ErrCompacted).

from etcd.

ahrtr avatar ahrtr commented on September 27, 2024

Added logs to ensure that issue is not one robustness test, got:

Where did you add the log? Also suggest to provide a clear summary on the exact steps you did.

from etcd.

serathius avatar serathius commented on September 27, 2024

Code #18145

from etcd.

fuweid avatar fuweid commented on September 27, 2024

don't understand the statement. Distributed system needs to uphold it's guarantees even if a node is down. Events on WAL used for replay should match what users observes. For watch client should observe either full history or or get ErrCompacted, as @ahrtr mentioned.

Hi @serathius sorry for late reply. In this case 1717749910517706368.tar.gz, both key8 and key9's latest revision is 141. It won't be compacted. But the revision 141 for key6 is tombstone. It could be deleted before sending delete-operation event to client.(Maybe we should log watch-operations in robustness).

IMO, the robustness replay assumes that the delete-operation event is just sent out after applying to db. Even if the compact request is following that delete request, the robustness replay always assumes that the client should receive that delete-operation event.

However, in real world, delete-operation event depends on compactor which is handled in background and we can't control in replay. That's why I said that watch can't guarantee the client can see all the events in current design.

   2           178      norm    header:<ID:15517310785104357973 > compaction:<revision:137 >
   2           179      norm    header:<ID:15517310785104357976 > txn:<success:<request_range:<key:"key1" > > success:<request_delete_range:<key:"key6" > > success:<request_put:<key:"key8" value:"121" > > success:<request_put:<key:"key9" value:"122" > > >
   3           180      norm
   3           181      norm    header:<ID:15517310785104689154 > cluster_member_attr_set:<member_ID:14578408409545168728 member_attributes:<name:"TestRobustnessExploratoryEtcdLowTrafficClusterOfSize1-test-0" client_urls:"http://localhost:20000" > >
   3           182      norm    header:<ID:15517310785104689159 > txn:<success:<request_put:<key:"key2" value:"125" > > success:<request_put:<key:"key6" value:"126" > > success:<request_put:<key:"key1" value:"127" > > success:<request_put:<key:"key0" value:"128" > > >
   3           183      norm    header:<ID:15517310785104689162 > put:<key:"key7" value:"129" >
   3           184      norm    header:<ID:15517310785104689163 > compaction:<revision:141 >

Please feel free to ping me on slack if you want a quick discussion on this issue.

@ahrtr will ping you tomorrow. Thanks!

from etcd.

serathius avatar serathius commented on September 27, 2024

As mentioned above, it can only be reproduced when
the compact revision happens to be a delete operation.
and the delete operation is the last operation for the key. If there is any operations following the delete operation on the key (i.e. adding the key back), then the issue can't be reproduced.

This should be pretty common, delete operations are pretty frequent, for example pods are immutable resource, meaning they are frequently created and then deleted and never resurrected.

Also usually compaction operation isn't that frequent in production environment.

It's once every 5 minutes, ask anyone if they want to build a distributed system that makes a mistake every 5 minutes.

I think that the only saving grace is that this requires a slow watch that is reading events from before compaction window. This should be pretty rare for watch to be 5 minutes behind, however recently I have seen case where lock contention on watch cache caused it to be behind even 2 hours.

from etcd.

serathius avatar serathius commented on September 27, 2024

@ahrtr confirmed the bug being present on both release-3.4 and release-3.5 branches

Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?

It's important to know if this is a regression in some newer version, or it was always there. If this is a new bug we might need to consider an announcement.

from etcd.

ahrtr avatar ahrtr commented on September 27, 2024

Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?

Obviously this isn't a regression. It's an old issue. Also updated my above comment.

This should be pretty common, delete operations are pretty frequent, for example pods are immutable resource, meaning they are frequently created and then deleted and never resurrected.

Pls see updated section "Impact" above. added and the delete operation is the last operation for the key

It's once every 5 minutes, ask anyone if they want to build a distributed system that makes a mistake every 5 minutes.

I think that the only saving grace is that this requires a slow watch that is reading events from before compaction window.

  • 5 minutes isn't frequent as compare to watch delay.
  • If you read the reproducing steps in my above comment, there is also very low possibility to see this issue on first watch response.

from etcd.

ahrtr avatar ahrtr commented on September 27, 2024

Just had a quick talk with @serathius , we need to clearly define the behaviour of compaction, and also the impact on watch, Please anyone feel free to comment in https://docs.google.com/document/d/1APJE38DxENsRLLVapRz1CQ6UD4-uYaFutO12Y01VcNw/edit

Eventually we need to document the behaviour in etcd's official document

from etcd.

henrybear327 avatar henrybear327 commented on September 27, 2024

Can someone take a look if the bug was introduced recently (introduced in one of the patch versions) or is an old one (tracing back to before v3.4)?

Following the clear manual etcdctl reproduce steps provided by @ahrtr I reproduced the issue on v3.4.0 so it looks like this has been present for at least ~5 years.

+1 on successfully reproducing the issue

from etcd.

ahrtr avatar ahrtr commented on September 27, 2024

Actions

  • Create a simple e2e test to reproduce this issue. Refer to the section "How to reproduce this issue" in #18089 (comment);
  • Update the implementation to ensure it aligns with doc
    • Please feel free to comment on the doc
  • Update etcd's official document to clarify the compaction behavior

from etcd.

MadhavJivrajani avatar MadhavJivrajani commented on September 27, 2024

cc @shyamjvs @chaochn47 @dims

from etcd.

siyuanfoundation avatar siyuanfoundation commented on September 27, 2024

The code to handle compacted version in watch was written 8 years ago, and it never handled the w.minRev == compactRev case

if w.minRev < compactRev {
select {
case w.ch <- WatchResponse{WatchID: w.id, CompactRevision: compactRev}:
w.compacted = true
wg.delete(w)
default:
// retry next time
}
continue
}

from etcd.

chaochn47 avatar chaochn47 commented on September 27, 2024

Just as a thought exercise, how can admin of etcd cluster be alerted about the dropped watch events issue in general?

In kubernetes/kubernetes#123072, apiserver_watch_cache_events_received_total{resource="pods"} is useful to detect watch event loss as the diff between two watch client (apiserver) diverges.

However, this issue seems difficult to catch by alarms because just one delete event is missed. Diff by 1 could be caused by different times that client restarted.

Trying to monitor in production and measure how many kubernetes clusters are impacted. Any insights are much appreciated !!

from etcd.

MadhavJivrajani avatar MadhavJivrajani commented on September 27, 2024

Tried replicating #18089 (comment) in an e2e here: #18201

from etcd.

MadhavJivrajani avatar MadhavJivrajani commented on September 27, 2024

Another instance: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-etcd-robustness-main-amd64/1803129487966081024

(among others in https://testgrid.k8s.io/sig-etcd-robustness#ci-etcd-robustness-main-amd64)

from etcd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.