Giter VIP home page Giter VIP logo

Comments (5)

dt avatar dt commented on July 18, 2024

@aadityasondhi did you recently fix a similar issue?

from cockroach.

aadityasondhi avatar aadityasondhi commented on July 18, 2024

This one fails on sm.WALBytesIn.

from cockroach.

aadityasondhi avatar aadityasondhi commented on July 18, 2024

This is a little confusing to me. In pebble, this metric is an atomic that we only increment using a uint.

Here is where it is set.
https://github.com/cockroachdb/pebble/blob/200f9cf1e217afc4e9052891126be661d757007a/db.go#L1974

Here is where it is incremented.
https://github.com/cockroachdb/pebble/blob/200f9cf1e217afc4e9052891126be661d757007a/db.go#L973

from cockroach.

RaduBerinde avatar RaduBerinde commented on July 18, 2024

Is it perhaps possible that CockroachDB collects and updated metrics in parallel? It's possible that two goroutines get the metrics at around the same time but then they update the counters in the reverse order.

Indeed, I see another goroutine inside ComputeMetrics in the test failure:

github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicastats.(*ReplicaStats).AverageRatePerSecond(0xc0056aa6c0, {0x17ddc3a65714f7c7?, 0xc008afa4f0?, 0x0?})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/replicastats/replica_stats.go:357 +0x22b
github.com/cockroachdb/cockroach/pkg/kv/kvserver/load.(*ReplicaLoad).getLocked(0xc008afa4e0, 0x3)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/load/replica_load.go:184 +0x98
github.com/cockroachdb/cockroach/pkg/kv/kvserver/load.(*ReplicaLoad).Stats(0xc008afa4e0)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/load/replica_load.go:199 +0x12a
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).updateReplicationGauges.func1(0xc00a8b1608)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:3282 +0x872
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*storeReplicaVisitor).Visit(0xc005230b40, 0xc013cf8eb8)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:556 +0x326
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).updateReplicationGauges(0xc005a60c08, {0xe06fb98, 0xc006a0db00})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:3231 +0x92f
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).computeMetrics(_, {_, _})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:3471 +0xe6
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).ComputeMetrics(...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:3590
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).WaitForFullReplication.func2(0xc005a60c08)
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1443 +0x16e
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores.func1(0x97c62c0?, 0xc005a60c08)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:150 +0x58
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*IntMap).Range(0xc003ff4db8, 0xc013cf9710)
	github.com/cockroachdb/cockroach/pkg/util/syncutil/int_map.go:385 +0x19d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores(0xc003ff4d80, 0xc013cf9928)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:149 +0x8e
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).WaitForFullReplication(0xc004227808)
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1432 +0x5ae
github.com/cockroachdb/cockroach/pkg/ccl/backupccl/backuptestutils.StartBackupRestoreTestCluster({0xe0facf0, 0xc0022ba680}, 0x4, {0xc002f58c30, 0x3, 0x4b14d0?})
	github.com/cockroachdb/cockroach/pkg/ccl/backupccl/backuptestutils/testutils.go:162 +0x967
github.com/cockroachdb/cockroach/pkg/ccl/backupccl.backupRestoreTestSetupWithParams({_, _}, _, _, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...})
	github.com/cockroachdb/cockroach/pkg/ccl/backupccl/utils_test.go:78 +0x1bf
github.com/cockroachdb/cockroach/pkg/ccl/backupccl.TestBackupRestoreExecLocality(0xc0022ba680)
	github.com/cockroachdb/cockroach/pkg/ccl/backupccl/backup_test.go:477 +0x8da

I think we should add some synchronization around the entire ComputeMetrics operation.

from cockroach.

cockroach-teamcity avatar cockroach-teamcity commented on July 18, 2024

ccl/backupccl.TestBackupRestoreExecLocality failed on master @ c557fb59f6aec659d364e9002fc083c59c6392b6:

Fatal error:

panic: Counters should not decrease [recovered]
	panic: Counters should not decrease [recovered]
	panic: Counters should not decrease [recovered]
	panic: Counters should not decrease

Stack:

goroutine 52 [running]:
testing.tRunner.func1.2({0x95e58a0, 0xe02eb10})
	GOROOT/src/testing/testing.go:1631 +0x3f7
testing.tRunner.func1()
	GOROOT/src/testing/testing.go:1634 +0x6b6
panic({0x95e58a0?, 0xe02eb10?})
	GOROOT/src/runtime/panic.go:770 +0x132
github.com/cockroachdb/cockroach/pkg/util/leaktest.AfterTest.func2()
	github.com/cockroachdb/cockroach/pkg/util/leaktest/leaktest.go:133 +0x4fa
panic({0x95e58a0?, 0xe02eb10?})
	GOROOT/src/runtime/panic.go:770 +0x132
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Start.func1()
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:393 +0x99
panic({0x95e58a0?, 0xe02eb10?})
	GOROOT/src/runtime/panic.go:770 +0x132
github.com/cockroachdb/cockroach/pkg/util/metric.(*Counter).Update(0xc006905920, 0x998c0ba)
	github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:755 +0x6d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*StoreMetrics).updateEngineMetrics(_, {0xc00e986408, {0x998c0ba, 0x2fe7c34, 0x55320878, 0x6a35, 0xc897, 0x6969, 0x140d1}, {0x2d16, ...}, ...})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/metrics.go:3790 +0xfd7
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).computeMetrics(_, {_, _})
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:3487 +0x1d8
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).ComputeMetrics(...)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:3599
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).WaitForFullReplication.func2(0xc00692ac08)
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1443 +0x16e
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores.func1(0x97e6560?, 0xc00692ac08)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:150 +0x58
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*IntMap).Range(0xc004067988, 0xc01245cee0)
	github.com/cockroachdb/cockroach/pkg/util/syncutil/int_map.go:385 +0x19d
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores(0xc004067950, 0xc016db10f8)
	github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:149 +0x8e
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).WaitForFullReplication(0xc003c78308)
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1432 +0x5ae
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Start(0xc003c78308, {0xe0db2d0, 0xc003c45a00})
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:456 +0xa38
github.com/cockroachdb/cockroach/pkg/testutils/testcluster.StartTestCluster({_, _}, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...})
	github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:238 +0x92
github.com/cockroachdb/cockroach/pkg/ccl/backupccl/backuptestutils.StartBackupRestoreTestCluster({0xe126fb0, 0xc003c45a00}, 0x4, {0xc00418ec30, 0x3, 0x4b14d0?})
	github.com/cockroachdb/cockroach/pkg/ccl/backupccl/backuptestutils/testutils.go:130 +0x313
github.com/cockroachdb/cockroach/pkg/ccl/backupccl.backupRestoreTestSetupWithParams({_, _}, _, _, _, {{{{0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...})
	github.com/cockroachdb/cockroach/pkg/ccl/backupccl/utils_test.go:78 +0x1bf
github.com/cockroachdb/cockroach/pkg/ccl/backupccl.TestBackupRestoreExecLocality(0xc003c45a00)
	github.com/cockroachdb/cockroach/pkg/ccl/backupccl/backup_test.go:477 +0x8da
testing.tRunner(0xc003c45a00, 0xa6f5fd8)
	GOROOT/src/testing/testing.go:1689 +0x21f
created by testing.(*T).Run in goroutine 1
	GOROOT/src/testing/testing.go:1742 +0x826
Log preceding fatal error

=== RUN   TestBackupRestoreExecLocality
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestBackupRestoreExecLocality1780957092
    test_log_scope.go:81: use -show-logs to present logs inline
    testcluster.go:393: -- test log scope end --

ERROR: a panic has occurred!
Details cannot be printed yet because we are still unwinding.
Hopefully the test harness prints the panic below, otherwise check the test logs.

test logs left over in: outputs.zip/logTestBackupRestoreExecLocality1780957092
    testcluster.go:393: panic: Counters should not decrease
--- FAIL: TestBackupRestoreExecLocality (102.19s)

Parameters:

  • attempt=1
  • race=true
  • run=3
  • shard=22
Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

from cockroach.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.