Giter VIP home page Giter VIP logo

ajuna's Introduction

Ajuna has split the repositories into:

Ajuna Network

Build codecov Docker Image Version (latest semver) Docker Image Version (latest semver)

A game platform parachain built with Substrate.

Prerequisites

Build

  • Using cargo:

    # solochain
    cargo build-ajuna-solo
    
    # parachain with Bajun runtime
    cargo build-bajun-rococo
    cargo build-bajun-kusama
    
    # parachain with Ajuna runtime
    cargo build-ajuna-rococo
    cargo build-ajuna-polkadot
  • Using Docker:

    # solochain
    docker build -f docker/Dockerfile -t ajuna/solochain:latest . --build-arg features=solo  --build-arg bin=ajuna-solo
    
    # parachain with Bajun runtime
    docker build -f docker/Dockerfile -t ajuna/parachain-bajun:latest . --build-arg features=bajun --build-arg bin=bajun-para
    
    # parachain with Ajuna runtime
    docker build -f docker/Dockerfile -t ajuna/parachain-ajuna:latest . --build-arg features=ajuna --build-arg bin=ajuna-para

Run

  • Using compiled binaries:

    # solochain
    ./target/release/ajuna-solo --dev --tmp
  • Using Docker:

     # solochain
    docker-compose -f docker/solochain.yml up
    
    # parachain with rococo-local relay chain
    docker-compose -f docker/parachain.yml up

ajuna's People

Contributors

andyjsbell avatar clangenb avatar cowboy-bebug avatar darkfriend77 avatar didacsf avatar pawanbisht62 avatar raulmartinezm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ajuna's Issues

Rococo Failure: Async Backing failing after settting to 6 sec into a panick.

Description

We updated to async backing and to the polkadot SDK v1.1.0 repo.

Which worked fine and produced block with the v0.1.24

Changes for Async Backing

And then we add the changes to go for 6sec blocks and that started to panick, with v0.1.25

Expected vs. Actual Behavior

Should not panicked.

Logs, Errors or Screenshots

2023-11-06 10:38:30 [Relaychain] :sparkles: Imported #7801225 (0x83d9…1f03)
2023-11-06 10:38:30 [Parachain] :raised_hands: Starting consensus session on top of parent 0x380479f1aa5aa1d55cb5366bd40afc213d8a94e58543ed140eff4816d33ce424
2023-11-06 10:38:30 [Parachain] Migration did not execute. This probably should be removed
2023-11-06 10:38:30 [Parachain] panicked at 'attempt to divide by zero', /home/builder/cargo/git/checkouts/polkadot-sdk-cff69157b985ed76/f60318f/substrate/primitives/consensus/slots/src/lib.rs:70:14
2023-11-06 10:38:30 [Parachain] 1 storage transactions are left open by the runtime. Those will be rolled back.
2023-11-06 10:38:30 [Parachain] 1 storage transactions are left open by the runtime. Those will be rolled back.
2023-11-06 10:38:30 [Parachain] :exclamation:️ Inherent extrinsic returned unexpected error: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
WASM backtrace:
error while executing at wasm backtrace:
    0: 0x372f84 - <unknown>!rust_begin_unwind
    1: 0x3228d2 - <unknown>!core::panicking::panic_fmt::hf5c4cd929d4aaa9e
    2: 0x322b33 - <unknown>!core::panicking::panic::h2f041bf6aa990dfd
    3: 0x29b307 - <unknown>!<cumulus_pallet_aura_ext::consensus_hook::FixedVelocityConsensusHook<T,_,_,_> as cumulus_pallet_parachain_system::consensus_hook::ConsensusHook>::on_state_proof::hb41607f68205e002
    4: 0x601ff - <unknown>!frame_support::storage::transactional::with_storage_layer::hb8293059a00b312d
    5: 0x2cbff2 - <unknown>!<cumulus_pallet_parachain_system::pallet::Call<T> as frame_support::traits::dispatch::UnfilteredDispatchable>::dispatch_bypass_filter::{{closure}}::h847eeed1c6663413
    6: 0x2d1fc9 - <unknown>!environmental::using_once::h842ff517359bc5c6
    7: 0x1b6597 - <unknown>!<bajun_runtime::RuntimeCall as frame_support::traits::dispatch::UnfilteredDispatchable>::dispatch_bypass_filter::h921d73cf99248ad3
    8: 0x1b90c6 - <unknown>!<bajun_runtime::RuntimeCall as sp_runtime::traits::Dispatchable>::dispatch::h1e953c0ad270c4d2
    9: 0x14ed8d - <unknown>!<sp_runtime::generic::checked_extrinsic::CheckedExtrinsic<AccountId,Call,Extra> as sp_runtime::traits::Applyable>::apply::h63066e3dd511455b
   10: 0x27d0f5 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::apply_extrinsic::hb105ce907a92d0f9
   11: 0x2a7896 - <unknown>!BlockBuilder_apply_extrinsic. Dropping.
2023-11-06 10:38:30 [Parachain] panicked at 'Aura slot duration cannot be zero.', /home/builder/cargo/git/checkouts/polkadot-sdk-cff69157b985ed76/f60318f/substrate/frame/aura/src/lib.rs:406:9
2023-11-06 10:38:30 [Parachain] 1 storage transactions are left open by the runtime. Those will be rolled back.
2023-11-06 10:38:30 [Parachain] 1 storage transactions are left open by the runtime. Those will be rolled back.
2023-11-06 10:38:30 [Parachain] :exclamation:️ Inherent extrinsic returned unexpected error: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
WASM backtrace:
error while executing at wasm backtrace:
    0: 0x372f84 - <unknown>!rust_begin_unwind
    1: 0x3228d2 - <unknown>!core::panicking::panic_fmt::hf5c4cd929d4aaa9e
    2: 0x41a29 - <unknown>!frame_support::storage::transactional::with_storage_layer::h3afb82a03700ac1e
    3: 0x2cf4c9 - <unknown>!environmental::using_once::h0bf9be8bd6bc1041
    4: 0x1b65e5 - <unknown>!<bajun_runtime::RuntimeCall as frame_support::traits::dispatch::UnfilteredDispatchable>::dispatch_bypass_filter::h921d73cf99248ad3
    5: 0x1b90c6 - <unknown>!<bajun_runtime::RuntimeCall as sp_runtime::traits::Dispatchable>::dispatch::h1e953c0ad270c4d2
    6: 0x14ed8d - <unknown>!<sp_runtime::generic::checked_extrinsic::CheckedExtrinsic<AccountId,Call,Extra> as sp_runtime::traits::Applyable>::apply::h63066e3dd511455b
    7: 0x27d0f5 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::apply_extrinsic::hb105ce907a92d0f9
    8: 0x2a7896 - <unknown>!BlockBuilder_apply_extrinsic. Dropping.
2023-11-06 10:38:30 [Parachain] panicked at 'set_validation_data inherent needs to be present in every block!', /home/builder/cargo/git/checkouts/polkadot-sdk-cff69157b985ed76/f60318f/cumulus/pallets/parachain-system/src/lib.rs:248:13
2023-11-06 10:38:30 [Parachain] err=Error { inner: Proposing
Caused by:
    0: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
       WASM backtrace:
       error while executing at wasm backtrace:
           0: 0x372f84 - <unknown>!rust_begin_unwind
           1: 0x3228d2 - <unknown>!core::panicking::panic_fmt::hf5c4cd929d4aaa9e
           2: 0x155828 - <unknown>!<cumulus_pallet_parachain_system::pallet::Pallet<T> as frame_support::traits::hooks::OnFinalize<<<<T as frame_system::pallet::Config>::Block as sp_runtime::traits::HeaderProvider>::HeaderT as sp_runtime::traits::Header>::Number>>::on_finalize::h34e87e0da54a8f27
           3: 0x27da40 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::idle_and_finalize_hook::h0724321d9dba92d2
           4: 0x27dc99 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::finalize_block::hd8b056510bfa4d57
           5: 0x2a793f - <unknown>!BlockBuilder_finalize_block
    1: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
       WASM backtrace:
       error while executing at wasm backtrace:
           0: 0x372f84 - <unknown>!rust_begin_unwind
           1: 0x3228d2 - <unknown>!core::panicking::panic_fmt::hf5c4cd929d4aaa9e
           2: 0x155828 - <unknown>!<cumulus_pallet_parachain_system::pallet::Pallet<T> as frame_support::traits::hooks::OnFinalize<<<<T as frame_system::pallet::Config>::Block as sp_runtime::traits::HeaderProvider>::HeaderT as sp_runtime::traits::Header>::Number>>::on_finalize::h34e87e0da54a8f27
           3: 0x27da40 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::idle_and_finalize_hook::h0724321d9dba92d2
           4: 0x27dc99 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::finalize_block::hd8b056510bfa4d57
           5: 0x2a793f - <unknown>!BlockBuilder_finalize_block }
2023-11-06 10:38:31 [Relaychain] :zzz: Idle (11 peers), best: #7801225 (0x83d9…1f03), finalized #7801221 (0x6bd1…ada0), :arrow_down: 14.2kiB/s :arrow_up: 12.4kiB/s
2023-11-06 10:38:31 [Parachain] :zzz: Idle (4 peers), best: #2336953 (0x3804…e424), finalized #2336953 (0x3804…e424), :arrow_down: 0.7kiB/s :arrow_up: 1.8MiB/s
2023-11-06 10:38:36 [Relaychain] :sparkles: Imported #7801226 (0x4181…ed3d)
2023-11-06 10:38:36 [Relaychain] :recycle:  Reorg on #7801226,0x4181…ed3d to #7801226,0x95a1…f5d9, common ancestor #7801225,0x83d9…1f03
2023-11-06 10:38:36 [Relaychain] :sparkles: Imported #7801226 (0x95a1…f5d9)
2023-11-06 10:38:36 [Relaychain] :zzz: Idle (10 peers), best: #7801226 (0x95a1…f5d9), finalized #7801223 (0x18a0…7309), :arrow_down: 75.3kiB/s :arrow_up: 108.0kiB/s
2023-11-06 10:38:36 [Parachain] :zzz: Idle (4 peers), best: #2336953 (0x3804…e424), finalized #2336953 (0x3804…e424), :arrow_down: 0.6kiB/s :arrow_up: 1.8MiB/s
2023-11-06 10:38:41 [Relaychain] :zzz: Idle (10 peers), best: #7801226 (0x95a1…f5d9), finalized #7801223 (0x18a0…7309), :arrow_down: 55.7kiB/s :arrow_up: 163.8kiB/s
2023-11-06 10:38:41 [Parachain] :zzz: Idle (4 peers), best: #2336953 (0x3804…e424), finalized #2336953 (0x3804…e424), :arrow_down: 0.7kiB/s :arrow_up: 1.9MiB/s
2023-11-06 10:38:42 [Relaychain] :sparkles: Imported #7801227 (0x7516…eb87)

Additional Information

Our thoughts are that the experimental flag got not activated.

We set MinimumPeriod = 0, and were expecting that through the experimental flag it was not being used any more

assert!(!slot_duration.is_zero(), "Aura slot duration cannot be zero.");
https://github.com/paritytech/polkadot-sdk/blob/f60318f68687e601c47de5ad5ca88e2c3f8139a7/substrate/frame/aura/src/lib.rs#L406C7-L406C7

seems like we don't get the experimental set properly.

/// Determine the Aura slot-duration based on the Timestamp module configuration.
pub fn slot_duration() -> T::Moment {
	#[cfg(feature = "experimental")]
	{
		T::SlotDuration::get()
	}

	#[cfg(not(feature = "experimental"))]
	{
		// we double the minimum block-period so each author can always propose within
		// the majority of its slot.
		<T as pallet_timestamp::Config>::MinimumPeriod::get().saturating_mul(2u32.into())
	}
}

Steps done sofar

To make sure the experimental flag is activated in the wasm, we did a local build

cargo clean --release
cargo -vv build-bajun &> build.log

and we verified the build log for the experimental flag finding 3 references including pallet_aura and then sent the upgrade to be set directly on rococo with santiago
Deployed -> https://rococo.subscan.io/extrinsic/0xf29a3cc9fe9c34b71ec2e6b53647e0c431a46662cd8abf8dbb100a529f6a9661

Then resynced collators, but panick still happend.

Integer overflow in ajuna-awesome-avatars pallet

[Medium] Integer overflow in ajuna-awesome-avatars pallet

Summary

The functions validate_percentages and forge_probability are using unsafe math and can run into an overflow.

Issue details

validate_percentages function

The values of p_1 and p_2 are calculated using the sum method over an Iterator. The documentation specifies that 'sum method will panic if the computation overflows and debug assertions are enabled'. An overflow can be triggered with specific values for single_mint_probs and batch_mint_probs.

The call below can be used to reproduce the issue:

origin:     0404040404040404040404040404040404040404040404040404040404040404 (5C9yEy27...)
call:       RuntimeCall::AwesomeAvatars(Call::set_season { season_id: 0, season: Season { name: BoundedVec([], 100), description: BoundedVec([50, 0, 0, 255, 255, 255, 255, 249, 176, 216, 5, 0, 0, 0, 255, 0, 0, 0, 0, 114, 255, 255, 255, 0, 0, 0, 0, 64, 0, 0, 0, 61, 149, 235, 206, 40, 8, 0, 0, 0, 8, 245, 255, 0, 31, 3, 2, 236, 9, 0, 172, 236, 0, 0, 0, 0, 2, 0, 0], 1000), early_start: 0, start: 1280, end: 134744064, max_tier_forges: 135661576, max_variations: 8, max_components: 8, min_sacrifices: 11, max_sacrifices: 0, tiers: BoundedVec([], 6), single_mint_probs: BoundedVec([0, 0, 1, 134, 160], 5), batch_mint_probs: BoundedVec([], 5), base_prob: 8, per_period: 0, periods: 255 } })

forge_probability function

The substraction MAX_PERCENTAGE - season.base_prob an lead to an overflow if the base_prob value is bigger than MAX_PERCENTAGE (which is set to 100 by default).

The call below can be used to reproduce the issue :

  origin:     0404040404040404040404040404040404040404040404040404040404040404 (5C9yEy27...)
  call:       RuntimeCall::AwesomeAvatars(Call::set_season { season_id: 1, season: Season { name: BoundedVec([], 100), description: BoundedVec([], 1000), early_start: 256, start: 66816, end: 50339627, max_tier_forges: 335740961, max_variations: 3, max_components: 16, min_sacrifices: 1, max_sacrifices: 16, tiers: BoundedVec([RarityTier::Common, RarityTier::Rare, RarityTier::Legendary], 6), single_mint_probs: BoundedVec([10, 90], 5), batch_mint_probs: BoundedVec([20, 80], 5), base_prob: 255, per_period: 65536010, periods: 0 } })
  result:     Ok(PostDispatchInfo { actual_weight: None, pays_fee: Pays::Yes })
  time spent: 491.812µs
  origin:     0101010101010101010101010101010101010101010101010101010101010101 (5C62Ck4U...)
  call:       RuntimeCall::AwesomeAvatars(Call::forge { leader: 0x5f12cc589baddb76c5797d0d61cba75638f5588a507fa9998b2fa4cd4f5ebfaa, sacrifices: [0xdeb1eb514a7f9b278582e5e03088cee263192bd57ed4ddd438e2f22bc9897c43, 0x98b23a560bab185569bbd1d13952d7dcd1330d08e208899c9a785f869989f329] })

Risk

By triggering these integer overflows, an attacker can:

  1. Crash the nodes compiled in debug mode with overflow checks enabled
  2. On nodes which have overflow checks disabled, unexpected behaviors and logic inconsistencies

The severity of the issue will be lowered to medium because only an organizers account can help triggering the overflows:

  1. the validate_percentages method can be called only via the set_season extrinsic
  2. the base_prob value has to be set to a lower value than MAX_PERCENTAGE, which can be done by calling the set_season

Mitigation

Implement proper integer overflow handling by checking call arguments and using safe arithmetic functions.

Node panicked

Description

Setting up a archive node, panicked Version: 0.1.7-28d91f0318f

0: sp_panic_handler::set::{{closure}}
   1: std::panicking::rust_panic_with_hook
             at /rustc/70b3681bf621bc0de91ffab711b2350068b4c466/library/std/src/[panicking.rs:702](http://panicking.rs:702/):17
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/70b3681bf621bc0de91ffab711b2350068b4c466/library/std/src/[panicking.rs:588](http://panicking.rs:588/):13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/70b3681bf621bc0de91ffab711b2350068b4c466/library/std/src/sys_common/[backtrace.rs:138](http://backtrace.rs:138/):18
   4: rust_begin_unwind
             at /rustc/70b3681bf621bc0de91ffab711b2350068b4c466/library/std/src/[panicking.rs:584](http://panicking.rs:584/):5
   5: core::panicking::panic_fmt
             at /rustc/70b3681bf621bc0de91ffab711b2350068b4c466/library/core/src/[panicking.rs:142](http://panicking.rs:142/):14
   6: core::result::unwrap_failed
             at /rustc/70b3681bf621bc0de91ffab711b2350068b4c466/library/core/src/[result.rs:1785](http://result.rs:1785/):5
   7: <sp_state_machine::ext::Ext<H,B> as sp_externalities::Externalities>::storage
   8: <&mut dyn sp_externalities::Externalities as sp_io::storage::Storage>::get_version_1
   9: std::thread::local::LocalKey<T>::with
  10: tracing::span::Span::in_scope
  11: sp_io::storage::get_version_1
  12: sp_io::storage::ExtStorageGetVersion1::call
  13: <sc_executor_wasmtime::imports::Registry as sp_wasm_interface::HostFunctionRegistry>::with_function_context
  14: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  15: <F as wasmtime::func::IntoFunc<T,(wasmtime::func::Caller<T>,A1),R>>::into_func::wasm_to_host_shim
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: wasmtime_runtime::traphandlers::catch_traps::call_closure
  20: wasmtime_setjmp
  21: wasmtime_runtime::traphandlers::<impl wasmtime_runtime::traphandlers::call_thread_state::CallThreadState>::with
  22: wasmtime_runtime::traphandlers::catch_traps
  23: wasmtime::func::invoke_wasm_and_catch_traps
  24: wasmtime::func::typed::TypedFunc<Params,Results>::call
  25: sc_executor_wasmtime::instance_wrapper::EntryPoint::call
  26: sc_executor_wasmtime::runtime::perform_call
  27: <sc_executor_wasmtime::runtime::WasmtimeInstance as sc_executor_common::wasm_runtime::WasmInstance>::call_with_allocation_stats
  28: sc_executor_common::wasm_runtime::WasmInstance::call_export
  29: std::panicking::try
  30: std::thread::local::LocalKey<T>::with
  31: sc_executor::native_executor::WasmExecutor<H>::with_instance::{{closure}}
  32: sc_executor::wasm_runtime::RuntimeCache::with_instance
  33: <sc_executor::native_executor::NativeElseWasmExecutor<D> as sp_core::traits::CodeExecutor>::call
  34: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_aux
  35: sp_state_machine::execution::StateMachine<B,H,Exec>::execute_using_consensus_failure_handler
  36: <sc_service::client::call_executor::LocalCallExecutor<Block,B,E> as sc_client_api::call_executor::CallExecutor<Block>>::contextual_call
  37: <sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt<Block>>::call_api_at
  38: sp_authority_discovery::AuthorityDiscoveryApi::authorities
  39: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  40: sc_authority_discovery::worker::Worker<Client,Network,Block,DhtEventStream>::run::{{closure}}
  41: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
  42: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
  43: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
  44: tokio::runtime::task::core::CoreStage<T>::poll
  45: tokio::runtime::task::harness::Harness<T,S>::poll
  46: std::thread::local::LocalKey<T>::with
  47: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  48: tokio::runtime::scheduler::multi_thread::worker::Context::run
  49: tokio::macros::scoped_tls::ScopedKey<T>::set
  50: tokio::runtime::scheduler::multi_thread::worker::run
  51: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
52: tokio::runtime::task::harness::Harness<T,S>::poll
  53: tokio::runtime::blocking::pool::Inner::run
  54: std::sys_common::backtrace::__rust_begin_short_backtrace
  55: core::ops::function::FnOnce::call_once{{vtable.shim}}
  56: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/70b3681bf621bc0de91ffab711b2350068b4c466/library/alloc/src/[boxed.rs:1872](http://boxed.rs:1872/):9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/70b3681bf621bc0de91ffab711b2350068b4c466/library/alloc/src/[boxed.rs:1872](http://boxed.rs:1872/):9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/70b3681bf621bc0de91ffab711b2350068b4c466/library/std/src/sys/unix/[thread.rs:108](http://thread.rs:108/):17
  57: start_thread
  58: clone

Thread 'tokio-runtime-worker' panicked at 'Externalities not allowed to fail within runtime: "Trie lookup error: Database missing expected key: 0xafd4e766a354dfeab9602d06e100e918d440dea2cefc85555ad6589747c0ded2"', /root/.cargo/git/checkouts/substrate-7e08433d4c370a21/2dff067/primitives/state-machine/src/[ext.rs:189](http://ext.rs:189/)

Steps to Reproduce

git clone https://github.com/ajuna-network/Ajuna.git
cd Ajuna
git checkout v0.1.7
cargo build-bajun-kusama
cp target/release/bajun-para /usr/bin/

nohup /usr/bin/bajun-para --chain=/root/data/Ajuna/resources/bajun/bajun-raw.json -d=/root/data/ --state-pruning=archive --rpc-external --rpc-port=$rpc_port --ws-external --ws-port=$ws_port --rpc-cors="*" --bootnodes=/ip4/[167.172.165.19/tcp/30333/ws/p2p/12D3KooWGtPZMr2Kvbt8b9qujuRRDsqrsA9855GsDDHWpnjzd7wF](http://167.172.165.19/tcp/30333/ws/p2p/12D3KooWGtPZMr2Kvbt8b9qujuRRDsqrsA9855GsDDHWpnjzd7wF) -- --chain=/root/data/Ajuna/resources/bajun/kusama.json  --rpc-external --rpc-port=9934 --ws-external --ws-port=9945 --no-telemetry --rpc-cors="*" >> ~/data/bajun-para.log 2>&1 &

Expected vs. Actual Behavior

What did you expect to happen after you followed the steps you described in the last section? What
actually happened?

Environment

rustup show
Default host: x86_64-unknown-linux-gnu
rustup home:  /root/.rustup

installed toolchains
--------------------

stable-x86_64-unknown-linux-gnu (default)
nightly-2022-05-15-x86_64-unknown-linux-gnu
nightly-x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.66.0 (69f9c33d7 2022-12-12)

Logs, Errors or Screenshots

Please provide the text of any logs or errors that you experienced; if
applicable, provide screenshots to help illustrate the problem.

Additional Information

Please add any other details that you think may help us solve your problem.

Post Mortem: How We Recovered Our Bricked Blockchain and Lessons for the Future

Post Mortem: How We Recovered Our Bricked Blockchain and Lessons for the Future

Incident Date: 2023-07-25 15:02:25

Resolved: 2023-07-26 11:30:25

Lead: Cédric Decoster

Summary

Our blockchain recently experienced a bricking incident due to an elongated runtime upgrade and storage migration. In this post-mortem, we will provide an in-depth analysis of the incident, how we managed to resolve it, and the lessons learned to prevent such issues in the future.


Table of Contents

  1. What Happened?
  2. Timeline of Events
  3. Solution Finding
  4. Actions Taken
  5. Root Cause Analysis
  6. Could This Have Been Prevented?
  7. Lessons Learned and Next Steps
  8. Recommendations
  9. Conclusion

What Happened?

In simple terms, our blockchain got stuck in a loop. The chain kept trying to complete a storage migration but failed because the time needed exceeded the allocated time on the collators. This resulted in our blockchain getting "bricked," becoming unusable until we took corrective action.


Timeline of Events

  1. Tested Runtime Upgrade: First, on a solo node, using a cloned storage from production.
  2. Tested on Rococo: Used try-runtime to test the upgrade.
  3. Deployed on Rococo: Deployed the runtime upgrade with storage migration on Rococo.
  4. Further Testing: Conducted additional tests.
  5. Production Testing: Used try-runtime to test the upgrade with production storage.
  6. Production Deployment: Rolled out the runtime upgrade and storage migration on the production chain.
    image
ValidationFunctionApplied for runtimeUpgrade 0.1.20 on
BajunNetwork(Parachain) 2'600'440
Kusama(Relay) 18'943'684
  1. Chain Bricked: The chain entered an unusable state, no more block produced in time, by our collators.
2023-07-25 14:29:24 [Parachain] Starting collation. relay_parent=0xd90fb54de6b3dbe9f9da31cc0ed0de4b6776448bff0da3917432fc798e439cb9 at=0x3c07058fcf3ecee9e822df4d8def197646e4d4c12795f7a25a26b182ee93055f
2023-07-25 14:29:24 [Parachain] 🙌 Starting consensus session on top of parent 0x3c07058fcf3ecee9e822df4d8def197646e4d4c12795f7a25a26b182ee93055f
2023-07-25 14:29:24 [Parachain] Updated GlobalConfig
2023-07-25 14:29:24 [Parachain] Migrated current season status
2023-07-25 14:29:24 [Parachain] Updated 991 accounts and 12859 avatars
2023-07-25 14:29:25 [Parachain] Updated 854 avatars in trade
2023-07-25 14:29:25 [Parachain] ⌛️ Discarding proposal for slot 140857947; block production took too long
2023-07-25 14:29:26 [Parachain] Updated 12859 old avatars
2023-07-25 14:29:26 [Parachain] Updated 4003 player account info entries
2023-07-25 14:29:26 [Parachain] Migrated seasons
2023-07-25 14:29:26 [Parachain] Upgraded storage to version StorageVersion(5)
  1. Deep Analysis: Realized that the storage migration was too slow.
  2. Unbricked the Chain: Utilized a powerful collator setup.
  3. Resumed Block Production: The chain returned to normal operation.

Solutions and Workarounds

Possible Solutions:

  1. Powerful Collator: Running the collator on a more powerful i9 CPU.

    • Status: Successful
    • Evaluation: Proved that the issue was computational and was in a range where we could resolve it by providing enough power, so it would pass collators and then hopefully also pass validators on the relay chain.
  2. Governance Vote for codeSubstitute: This would revert the network to an older, stable version and would require initiating a Root Track.

    • Status: Not Implemented
    • Evaluation: Could take up to 14 days for governance approval. This is a long-term fix.

Resources & External Help:

  • We referenced a blog post by T3rn which provided valuable insights.
  • We also reached out to Parity for support, receiving exceptional assistance from Santiago & Daan.

Actions Taken

  1. Collator Upgrade: Employed a collator with superior single-threaded performance to complete the storage migration.

our normal collators
2023-07-25 14:28:45 [Parachain] :gift: Prepared block for proposing at 2600441 (911 ms) [hash: 0x306e4b6184ef89d486f4cdb1af158c6b83b9466e6edd8f2c84666f67b99e74eb; parent_hash: 0x3c07…055f; extrinsics (2): [0xf8f7…c1e5, 0x92bf…81a3]]

our i9 local machine collator
ajuna-collator-1-1 | 2023-07-26 10:00:10 [Parachain] :gift: Prepared block for proposing at 2600441 (194 ms) [hash: 0xb0c410f751588f826ce9311d4821dea838a97b450d5cdd9dbc69dda513edc957; parent_hash: 0x3c07…055f; extrinsics (2): [0x239d…730b, 0

  1. Review of Test Protocols: Overhauled our testing procedures.
  2. Time Estimation: Developed tools to estimate storage migration times for future use.

Root Cause Analysis

The underlying cause was the lengthy storage migration time needed during the runtime upgrade, which exceeded the capabilities of the existing collators.

Key Errors

  • Error Message: "Discarding proposal for slot; block production took too long."

Could This Have Been Prevented?

Absolutely. With more cautious time estimates and exhaustive testing under different conditions, the bricking could have been avoided.


Lessons Learned and Next Steps

  1. Testing: Strengthen testing strategies for runtime upgrades and migrations.
  2. Monitoring: Employ real-time monitoring tools to quickly identify anomalies.
  3. Collaboration: Build stronger relationships with external organizations and experts.

Recommendations

  1. Performance Testing: Adopt rigorous testing for all runtime upgrades, particularly focusing on storage migration time.
  2. Revert Plan: Maintain a well-documented revert plan ready to be deployed.
  3. Hardware Benchmark: Publish minimum hardware requirements for collators.
  4. Monitoring and Alerts: Develop robust monitoring to quickly identify failed block productions.

Conclusion

While the incident was unfortunate, it presented us with invaluable lessons and opportunities for significant process improvements. We are committed to ensuring the resilience and reliability of our blockchain moving forward.

Special thanks to Santiago & Daan from Parity and Christian from Integritee for their incredible support during this crisis.


If you have any questions or concerns, feel free to reach out to us. Thank you for your continued support and trust.


This should provide a thorough, well-structured post-mortem report that can serve as a valuable resource for your team and others in the blockchain community.

Insecure randomness algorithm usage

[Medium] Insecure randomness algorithm usage

Summary

The source of randomness in ajuna-awesome-avatars pallet is configured to use the pallet_insecure_randomness_collective_flip generator implemented in Substrate. The output of collective flip is highly predictable as it is based on the last 81 blocks and should not be used as a true source of randomness.

Issue details

The randomness is used in random_hash.

#[inline]
fn random_hash(phrase: &[u8], who: &T::AccountId) -> T::Hash {
    let (seed, _) = T::Randomness::random(phrase);
    let seed = T::Hash::decode(&mut TrailingZeroInput::new(seed.as_ref()))
        .expect("input is padded with zeroes; qed");
    ...
}

Risk

Based on how sensitive the random data that is being generated is, the risk may be different, therefore we assigned medium severity to this issue.

Mitigation

We recommend using a secure randomness algorithm, either with the usage of an oracle of a project like drand or a secure library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.