Comments (10)
I see this line (called through validate_chunk_state_witness --> apply_new_chunk --> apply_chunk may also get to MissingTrieValue:
nearcore/chain/chain/src/runtime/mod.rs
Line 917 in a1a01b4
Err(e) => match e {
Error::StorageError(err) => match &err {
StorageError::FlatStorageBlockNotSupported(_)
| StorageError::MissingTrieValue(..) => Err(err.into()),
_ => panic!("{err}"),
},
_ => Err(e),
},
This won't panic on MissingTrieValue
, it'll return an error.
It could panic if it hits another StorageError
that isn't accounted for, but I don't know if that's even possible 0_o
It seems that most of StorageError
variants are fatal errors (corrupted database etc):
/// Errors which may occur during working with trie storages, storing
/// trie values (trie nodes and state values) by their hashes.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum StorageError {
/// Key-value db internal failure
StorageInternalError,
/// Requested trie value by its hash which is missing in storage.
MissingTrieValue(MissingTrieValueContext, CryptoHash),
/// Found trie node which shouldn't be part of state. Raised during
/// validation of state sync parts where incorrect node was passed.
/// TODO (#8997): consider including hash of trie node.
UnexpectedTrieValue,
/// Either invalid state or key-value db is corrupted.
/// For PartialStorage it cannot be corrupted.
/// Error message is unreliable and for debugging purposes only. It's also probably ok to
/// panic in every place that produces this error.
/// We can check if db is corrupted by verifying everything in the state trie.
StorageInconsistentState(String),
/// Flat storage error, meaning that it doesn't support some block anymore.
/// We guarantee that such block cannot become final, thus block processing
/// must resume normally.
FlatStorageBlockNotSupported(String),
/// In-memory trie could not be loaded for some reason.
MemTrieLoadingError(String),
}
Maybe we shouldn't panic on UnexpectedTrieValue
, that looks like something that could be triggered by an invalid witness. But OTOH we have tests which check for this error and they don't trigger the panic, so it's probably a different code path. /cc @Longarithm
from nearcore.
Issue is not valid anymore. Closing it
from nearcore.
The backtrace from the linked zulip thread suggests that the panic happened inside <near_store::trie::trie_storage::TrieMemoryPartialStorage as near_store::trie::trie_storage::TrieStorage>::retrieve_raw_bytes
2024-04-16T20:21:23.545144Z DEBUG chunk_tracing{chunk_hash=HnFSQEoLMEnMXK2pxnnnbv7GkwFobanyrd7JJbNS2Rrj}:new_chunk{shard_id=3}:apply_chunk{shard_id=3}:process_state_update:apply{protocol_version=84 num_transactions=19}:process_receipt{receipt_id=GHhLncT5GM2ksuwVzUqPMkzCp132V7xToQZPfUbKeRgP predecessor=operator.meta-pool.near receiver=lockup-meta-pool.near id=GHhLncT5GM2ksuwVzUqPMkzCp132V7xToQZPfUbKeRgP}:run{code.hash=EXekfV3kpFHHsTi4JUDh2MVLCKS3hpKdPbXMuRirxrvY vm_kind=NearVm}: vm: close time.busy=49.3µs time.idle=3.42µs
thread '<unnamed>' panicked at core/store/src/trie/trie_storage.rs:317:16:
!!!CRASH!!!: MissingTrieValue(TrieMemoryPartialStorage, 5FWvfWAJxH1mbCHuzLGwBfL9EYjH8YWVin6Pmp3H8gdM)
stack backtrace:
0: rust_begin_unwind
1: core::panicking::panic_fmt
2: core::result::unwrap_failed
3: <near_store::trie::trie_storage::TrieMemoryPartialStorage as near_store::trie::trie_storage::TrieStorage>::retrieve_raw_bytes
4: near_store::trie::Trie::internal_retrieve_trie_node
5: near_store::trie::Trie::retrieve_raw_node
6: near_store::trie::Trie::lookup_from_state_column
7: near_store::trie::Trie::get_optimized_ref
8: near_store::trie::Trie::get
9: near_store::trie::update::TrieUpdate::get
10: near_store::get_code
11: node_runtime::actions::execute_function_call
12: node_runtime::Runtime::apply_action
13: node_runtime::Runtime::apply_action_receipt
14: node_runtime::Runtime::apply::{{closure}}
15: node_runtime::Runtime::apply
16: <near_chain::runtime::NightshadeRuntime as near_chain::types::RuntimeAdapter>::apply_chunk
17: near_chain::update_shard::apply_new_chunk
18: core::ops::function::FnOnce::call_once{{vtable.shim}}
19: <rayon_core::job::HeapJob<BODY> as rayon_core::job::Job>::execute
20: rayon_core::registry::WorkerThread::wait_until_cold
But looking at master
, there's no unwraps in this function 0_o:
nearcore/core/store/src/trie/trie_storage.rs
Lines 311 to 322 in 0202ed6
That's pretty confusing 0_O.
@staffik you mentioned that the code was at commit b62b6a
, but I don't know where to find it, is it on some private branch?
from nearcore.
This was commit from Alex PR: b62b6a3.
Maybe this unwrap was removed recently in master.
from nearcore.
This was commit from Alex PR: b62b6a3.
Ah thanks for the link, now I see it 👍
Maybe this unwrap was removed recently in master.
I took a look at it's the same code as on master
.
Mysterious 0_O
from nearcore.
Oh, we used tracing. It was: https://github.com/near/nearcore/pull/10843/files#diff-e073548a40d97af14f75cf143fab41a1cffe61d159e0b9a6297daeab0b2a5d45R317
from nearcore.
I see this line (called through validate_chunk_state_witness --> apply_new_chunk --> apply_chunk may also get to MissingTrieValue:
nearcore/chain/chain/src/runtime/mod.rs
Line 917 in a1a01b4
from nearcore.
Oh, we used tracing. It was: https://github.com/near/nearcore/pull/10843/files#diff-e073548a40d97af14f75cf143fab41a1cffe61d159e0b9a6297daeab0b2a5d45R317
Ahh ok, so the panic was caused by custom code that was added for debug purposes. The code on master
doesn't have expect("!!!CRASH!!!"));
, so there's nothing to fix there.
from nearcore.
Idk, it doesn't feel very productive to read the code in hopes of finding of a possible panic. AFAIU the panic that spawned this issue can't happen on master
, so there's nothing concrete to fix.
I remember that we wanted to fuzz the validation code, maybe that'd be a quicker way to find possible crashes in validation?
And good validation tests would ensure that the validation doesn't crash in the future, when the code changes.
from nearcore.
Made an issue about fuzzing: #11132
from nearcore.
Related Issues (20)
- Tracking issue for fixing minimal required stake HOT 2
- [Epoch Sync] Blocks received during state sync marked as invalid
- mainnet: synced node has regular Downloading blocks logs HOT 3
- [ReshardingV3] Receipts Reassignment HOT 1
- [EpochSync] Defend against malicious peers sending invalid epoch sync
- Difficulties Managing Gas Allocation for Chained Cross-Contract Calls HOT 2
- publishing: add a CI task to test packaging/building of all published crates individually HOT 1
- High cardinality metric HOT 4
- Downloading blocks stuck at 0.00% HOT 3
- Removing the validator key and then sending SIGHUP to the neard process does not appear to stop it validating
- [network] Expand STUN-based IP self-discovery to all nodes
- [network] Implement Tier3 connection pool
- [network] Implement p2p state part transfer
- [state sync] Update sync actor to use p2p state part transfer
- [network] Implement rate limiting for state part requests
- [Tracking Issue] P2P State Part Transfer
- Validate FT transfer benchmark performance HOT 3
- Print an error log when the CPU doesn't have the required features
- Spammy network messages in the logs HOT 1
- Locust issue in multinode mocknet/ft-bench: no txs send when `num-passive-users` is high
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nearcore.