Giter VIP home page Giter VIP logo

tofnd's Introduction

Tofnd: A gRPC threshold signature scheme daemon

Tofnd is a gRPC server written in Rust that wraps the tofn threshold cryptography library.

Setup

The gRPC protobuf file is a separate submodule. To fetch it, please be sure that the --recursive flag is enabled:

git clone [email protected]:axelarnetwork/tofnd.git --recursive

tofnd uses the hyperium/tonic Rust gRPC implementation, which requires:

  • Rust 1.56 or greater
    $ rustup update
    
  • rustfmt to tidy up the code it generates
    $ rustup component add rustfmt
    

tofnd depends on tofn, which needs the GNU Multiple Precision Arithmetic Library

  • MacOS: brew install gmp
  • Ubuntu: sudo apt install libgmp-dev

Build binaries

The pipeline will build binaries for the following OS/architecures :

  • Linux AMD64
  • MacOS AMD64
  • MacOS ARM64

See https://github.com/axelarnetwork/tofnd/releases

For any other OS/Architecture, binaries should be built locally.

Running the server

# install tofnd at ./target/release/tofnd
$ cargo install --path . && cd ./target/release

# init tofnd
$ ./tofnd -m create

# IMPORTANT: store the content of ./.tofnd/export file at a safe, offline place, and then delete the file
$ rm ./.tofnd/export

# start tofnd daemon
$ ./tofnd

Terminate the server with ctrl+C.

Password

By default, tofnd prompts for a password from stdin immediately upon launch. This password is used to encrypt on-disk storage. It is the responsibility of the user to keep this password safe.

Users may automate password entry as they see fit. Some examples follow. These examples are not necessarily secure as written---it's the responsibility of the user to secure password entry.

# feed password from MacOS keyring
$ security find-generic-password -a $(whoami) -s "tofnd" -w | ./tofnd

# feed password from 1password-cli
$ op get item tofnd --fields password | ./tofnd

# feed password from Pass
$ pass show tofnd | ./tofnd

# feed password from environment variable `PASSWORD`
$ echo $PASSWORD | ./tofnd

# feed password from a file `password.txt`
$ cat ./password.txt | ./tofnd

Sophisticated users may explicitly opt out of password entry via the --no-password terminal argument (see below). In this case, on-disk storage is not secure---it is the responsibility of the user to take additional steps to secure on-disk storage.

Command line arguments

We use clap to manage command line arguments.

Users can specify:

  1. Tofnd's root folder. Use --directory or -d to specify a full or a relative path. If no argument is provided, then the environment variable TOFND_HOME is used. If no environment variable is set either, the default ./tofnd directory is used.
  2. The port number of the gRPC server (default is 50051).
  3. The option to run in unsafe mode. By default, this option is off, and safe primes are used for keygen. Use the --unsafe flag only for testing.
  4. mnemonic operations for their tofnd instance (default is Existing). For more information, see on mnemonic options, see Mnemonic.
  5. The option to run in unsafe mode. By default, this option is off, and safe primes are used for keygen. Attention: Use the --unsafe flag only for testing.
  6. By default, tofnd expects a password from the standard input. Users that don't want to use passwords can use the --no-password flag. Attention: Use --no-password only for testing .
A threshold signature scheme daemon

USAGE:
    tofnd [FLAGS] [OPTIONS]

FLAGS:
        --no-password    Skip providing a password. Disabled by default. **Important note** If --no-password is set, the
                         a default (and public) password is used to encrypt.
        --unsafe         Use unsafe primes. Deactivated by default. **Important note** This option should only be used
                         for testing.
    -h, --help           Prints help information
    -V, --version        Prints version information

OPTIONS:
    -a, --address <ip>              [default: 0.0.0.0]
    -d, --directory <directory>     [env: TOFND_HOME=]  [default: .tofnd]
    -m, --mnemonic <mnemonic>       [default: existing]  [possible values: existing, create, import, export]
    -p, --port <port>               [default: 50051]]

Docker

Setup

To setup a tofnd container, use the create mnemonic command:

docker-compose run -e MNEMONIC_CMD=create tofnd

This will initialize tofnd, and then exit.

Execution

To run a tofnd daemon inside a container, run:

docker-compose up

Storage

We use data containers to persist data across restarts. To clean up storage, remove all tofnd containers, and run

docker volume rm tofnd_tofnd

Testing

For testing purposes, docker-compose.test.yml is available, which is equivelent to ./tofnd --no-password --unsafe. To spin up a test tofnd container, run

docker-compose -f docker-compose.test.yml up

The auto command

In containerized environments the auto mnemonic command can be used. This command is implemented in entrypoint.sh and does the following:

  1. Try to use existing mnemonic. If successful then launch tofnd server.
  2. Try to import a mnemonic from file. If successful then launch tofnd server.
  3. Create a new mnemonic. The newly created mnemonic is automatically written to the file TOFND_HOME/export---rename this file to TOFND_HOME/import so as to unblock future executions of tofnd. Then launch tofnd server.

The rationale behind auto is that users can frictionlessly launch and restart their tofnd nodes without the need to execute multiple commands. auto is currently the default command only in docker-compose.test.yml, but users can edit the docker-compose.yml to use it at their own discretion.

Attention: auto leaves the mnemonic on plain text on disk. You should remove the TOFND_HOME/import file and store the mnemonic at a safe, offline place.

Mnemonic

Tofnd uses the tiny-bip39 crate to enable users manage mnemonic passphrases. Currently, each party can use only one passphrase.

Mnemonic is used to enable recovery of shares in case of unexpected loss. See more about recovery under the Recover section.

Mnemonic options

The command line API supports the following commands:

  • Existing Starts the gRPC daemon using an existing mnemonic; Fails if no mnemonic exist.

  • Create Creates a new mnemonic, inserts it in the kv-store, exports it to a file and exits; Fails if a mnemonic already exists.

  • Import Prompts user to give a new mnemonic from standard input, inserts it in the kv-store and exits; Fails if a mnemonic exists or if the provided string is not a valid bip39 mnemonic.

  • Export Writes the existing mnemonic to <tofnd_root>/.tofnd/export and exits; Succeeds when there is an existing mnemonic. Fails if no mnemonic is stored, or the export file already exists.

Zeroization

We use the zeroize crate to clear sensitive info for memory as a good procatie. The data we clean are related to the mnemonic:

  1. entropy
  2. passwords
  3. passphrases

Note that, tiny-bip39 also uses zeroize internally.

KV Store

To persist information between different gRPCs (i.e. keygen and sign), we use a key-value storage based on sled.

Tofnd uses two separate KV Stores:

  1. Share KV Store. Stores all user's shares when keygen protocol is completed, and uses them for sign protocol. Default path is ./kvstore/shares.
  2. Mnemonic KV Store. Stores the entropy of a mnemonic passphrase. This entropy is used to encrypt and decrypt users' sensitive info, i.e. the content of the Share KV Store. Default path is ./kvstore/mnemonic.

Security

Important note: Currently, the mnemonic KV Store is not encrypted. The mnemonic entropy is stored in clear text on disk. Our current security model assumes secure device access.

Multiple shares

Multiple shares are handled internally. That is, if a party has 3 shares, the tofnd binary spawns 3 protocol execution threads, and each thread invokes tofn functions independently.

When a message is received from the gRPC client, it is broadcasted to all shares. This is done in the broadcast module.

At the end of the protocol, the outputs of all N party's shares are aggregated and a single result is created and sent to the client. There are separate modules keygen result and sign result that handles the aggregation results for each protocol.

For tofn support on multiple shares, see here.

gRPCs

Tofnd currently supports the following gRPCs:

  1. keygen
  2. sign
  3. recover

Keygen and sign use bidirectional streaming and recover is unary.

Diagrams

See a generic protocol sequence diagram, here.

See keygen and sign diagrams of detailed message flow of each protocol. By opening the .svg files at a new tab (instead of previewing from github), hyperlinks will be available that will point you to the code block in which the underlying operations are implemented.

Keygen

The keygen gRPC executes the keygen protocol as implemented in tofn and described in GG20.

The initialization of keygen is actualized by the following message:

message KeygenInit {
    string new_key_uid;  // keygen's identifier        
    repeated string party_uids;
    repeated uint32 party_share_counts;
    int32 my_party_index;       
    int32 threshold;
}

Successful keygen

On success, the keygen protocol returns a SecretKeyShare struct defined by tofn

pub struct SecretKeyShare {
    group: GroupPublicInfo,
    share: ShareSecretInfo,
}

This struct includes:

  1. The information that is needed by the party in order to participate in subsequent sign protocols that are associated with the completed keygen.
  2. The public key of the current keygen.

Since multiple shares per party are supported, keygen's result may produce multiple SecretKeyShares. The collection of SecretKeyShares is stored in the Share KV Store as the value with the key_uid as key.

Each SecretKeyShare is then encrypted using the party's mnemonic, and the encrypted data is sent to the client as bytes, along with the public key. We send the encrypted SecretKeyShares to facilitate recovery in case of data loss.

The gRPC message of keygen's data is the following:

message KeygenOutput {
    bytes pub_key = 1;                       // pub_key
    repeated bytes share_recovery_infos = 2; // recovery info
}

Unsuccessful keygen

The tofn library supports fault detection. That is, if a party does not follow the protocol (e.g. by corrupting zero knowledge proofs, stalling messages etc), a fault detection mechanism is triggered, and the protocol ends prematurely with all honest parties composing a faulter list.

In this case, instead of the aforementioned result, keygen returns a Vec<Faulters>, which is sent over the gRPC stream before closing the connection.

File structure

Keygen is implemented in tofnd/src/gg20/keygen, which has the following file structure:

├── keygen
    ├── mod.rs
    ├── init.rs
    ├── execute.rs
    ├── result.rs
    └── types.rs
  • In mod.rs, the handlers of protocol initialization, execution and aggregation of results are called. Also, in case of multiple shares, multiple execution threads are spawned.
  • In init.rs, the verification and sanitization of the Keygen Init message is handled.
  • In execute.rs, the instantiation and execution of the protocol is actualized.
  • In result.rs, the results of all party shares are aggregated, validated and sent to the gRPC client.
  • In types.rs, useful structs that are needed in the rest of the modules are defined.

Sign

The sign gRPC executes the sign protocol as implemented in tofn and described in GG20.

The initialization of sign is actualized by the following message:

message SignInit {
    string key_uid;     // keygen's identifier
    repeated string party_uids;
    bytes message_to_sign;
}

Successful sign

On success, the keygen protocol returns a signature which is a Vec<u8>.

Since multiple shares per party are supported, sign's result may produce multiple signaturess which are the same across all shares. Only one copy of the signature is sent to the gRPC client.

Unsuccessful sign

Similarly to keygen, if faulty parties are detected during the execution of sign, the protocol is stopped and a Vec<Faulters> is returned to the client.

Trigger recovery

Sign is started with the special gRPC message SignInit.

message SignInit {
    string key_uid = 1;
    repeated string party_uids = 2;
    bytes message_to_sign = 3;
}

key_uid indicates the session identifier of an executed keygen. In order to be able to participate to sign, parties need to have their share info stored at the Share KV Store as value, under the key key_uid. If this data is not present at the machine of a party (i.e. no key_uid exists in Share KV Store), a need_recover gRPC message is sent to the client and the connection is then closed. In the need_recover message, the missing key_uid is included.

message NeedRecover {
    string session_id = 1;
}

The client then proceeds by triggering recover gRPC, and then starts the sign again for the recovered party. Other participants are not affected.

File structure

The keygen protocol is implemented in tofnd/src/gg20/sign, which, similar to keygen, has the following file structure:

├── sign
    ├── mod.rs
    ├── init.rs
    ├── execute.rs
    ├── result.rs
    └── types.rs
  • In mod.rs, the handlers of protocol initialization, execution and aggregation of results are called. Also, in case of multiple shares, multiple execution threads are spawned.
  • In init.rs, the verification and sanitization of Sign Init message is handled. If the absence of shares is discovered, the client sends a need_recover and stops.
  • In execute.rs, the instantiation and execution of the protocol is actualized.
  • In result.rs, the results of all party shares are aggregated, validated and sent to the gRPC client.
  • In types.rs, useful structs that are needed in the rest of the modules are defined.

Recover

As discussed in keygen and sign section, the recovery of lost keys and shares is supported. In case of sudden data loss, for example due to a hard disk crash, parties are able to recover their shares. This is possible because each party sends it's encrypted secret info to the client before storing it inside the Share KV Store.

When keygen is completed, the party's information is encryped and sent to the client. When the absence of party's information is detected during sign, Tofnd sends the need_recover message, indicating that recovery must be triggered.

Recovery is a unary gRPC. The client re-sends the KeygenInit message and the encrypted recovery info. This allows Tofnd to reconstruct the Share KV Store by decrypting the recovery info using the party's mnemonic.

message RecoverRequest {
    KeygenInit keygen_init = 1;
    repeated bytes share_recovery_infos = 2;
}

If recovery was successful, a success message is sent, other wise Tofnd sends a fail message.

message RecoverResponse {
    enum Response {
        success = 0;
        fail = 1;
    }
    Response response = 1;
}

Testing

Honest behaviours

Both unit tests and integration tests are provided:

$ cargo test

Malicious behaviours

Tofn supports faulty behaviours to test fault detection. These behaviours are only supported under the malicious feature. See more for Rust features here.

Tofnd incorporates the malicious feature. You can run malicious tests by:

$ cargo test --all-features

License

All crates licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

tofnd's People

Contributors

axelar-cicd-bot avatar cgorenflo avatar erain9 avatar eranrund avatar fish-sammy avatar ggutoski avatar jcs47 avatar kalidax avatar milapsheth avatar riceandmeet avatar sdaveas avatar talalashraf avatar tomas-eminger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tofnd's Issues

mysterious github workflow failures

PR #34 started exhibiting mysterious github workflow errors earlier this afternoon. Example:
https://github.com/axelarnetwork/tofnd/pull/34/checks?check_run_id=2422640452

   Compiling tofn v0.1.0 (ssh://[email protected]/axelarnetwork/tofn.git#140a770c)
error[E0308]: mismatched types
   --> src/tests/tofnd_party.rs:156:30
    |
156 |                     result = res.clone();
    |                              ^^^^^^^^^^^ expected struct `Vec`, found struct `SignResult`
    |
    = note: expected struct `Vec<_>`
               found struct `SignResult`

I can't find any code that looks anything like this in tofnd_party.rs in its entire commit history, nor can I find this code anywhere else in either tofn or tofnd repos at the relevant commits.

I re-ran the relevant command (cargo test --release --all-features) on my local machine to reproduce the error, but everything passed.

Moreover, I reverted the commit that first caused this result to a previous commit c77416c where the github workflow has succeeded in the past and that workflow now also fails with the exact same error.

wat do

Complete a keygen happy path ceremony in axelar-core using rust-tssd

Current status of rust-tssd is a prototype keygen happy path passing only a minimal test. Next goal is the title: Complete a keygen happy path ceremony in axelar-core using rust-tssd.

Tasks

  • Golang client for rust-tssd. I currently have a local golang rust-tssd client repo for rust-tssd with a basic test---this repo should be moved into the rust-tssd repo itself.
  • Better tests: golang client, rust-tssd, thrush.
  • thrush, rust-tssd should never panic---return errors instead.
  • Rust-tssd does not yet store keys it generates. Add a KV store for (pubkey_name, pubkey_data). Need to search for a good KV store rust library.
  • Swap rust-tssd into axelar-core, run a ceremony. Prerequisite: figure out the deserialization needs of the resulting pubkey in axelar-core.
  • Code review?

Investigate: use Rust type system to facilitate enum variant iteration and string display

Currently we have ugliness in tofnd in order to iterate over malicious behaviours and convert them to/from strings. Whenever MaliciousType is changed we need to update code in two places to reflect the change. Example:

  1. let available_behaviours = [
  2. match behaviour {

We would like a way to automate this process, preferrably without modifying tofn.

Update the keygen output proto to also send the public share infos

axelar-core votes on the public key that all parties computed to get consensus. But, we also need to vote on the list of all public share infos that each party sees for consensus. This is not feasible right now, because the share infos are embedded inside the recovery info bytes that are returned to axelar-core. Modify the KeygenOutput proto to also send the list of share infos separately (like we do for the public key) so that it can be voted on.

Change microKV

Overview

We need a safe and persistent store key-value store in rust.

Currently we use microKV, but this library is fairly new, and it is unknown whether it will be maintained in the future.

Since we need both storage and encryption, we have two options:

  1. go for a solution that delivers both, or
  2. use a combination of two different libraries for storage and encryption.

Available solutions

Store w/ encryption (Option 1)

library name last update stars dependent projects comments
microKV 1mo ago 7 0 currently used
keyring 5mo ago 100 20 uses OS keyring under the hood

Store w/o encryption + encryption library (Option 2)

Store libs

library name last update stars dependent projects type
sled 4mo ago 4.5K 100 key-value store
rkv 2mo ago 200 6 key-value store
rocksDB 6mo ago 813 60 database
levelDB 6mo ago 117 4 database

Encryption libs

library name last update stars dependent projects comments
sodiumoxide 8mo ago 600 100 used by microKV

Discussion

In case we choose option 1, keyring, delivers a complete solution to our issue. Although updated relatively infrequently, an encouraging aspect is its popularity and the fact that a good number of projects already use it.

In case we choose option 2, the combination of sled and sodiumoxide seems the most appropriate way to go. Using a database for this could be an overkill.

Support for timeout / abort / cleanup

Need to add support for timeouts. User notifies tofnd that a party has timed out (or failed for some arbitrary reason). At this point the protocol must abort and drop everything. Tofnd should do the following:

  1. Clean up the grpc stream
  2. Pass the timeout/abort message to tofn so that tofn can clean up

Eventually we'll need a protobuf API change. That's covered under #55 .

Better error handling in tofnd and tofn

Error handling in tofnd and tofn is currently a mess. Need useful error messages and context. Need to choose a specific error handling pattern and enforce it consistently over both tofnd and https://github.com/axelarnetwork/tofn

Suggestions

implement sign in tofn

Goal

  • Implement sign happy path in tofn. (Then: add sign to tofnd, complete a sign ceremony, etc.)

Tasks

  • KV store for tofnd
  • encrypt sensitive traffic in tofn
  • authentication in axelar-core (see axelarnetwork/axelar-core#157 )
  • tofn, tofnd should never panic
  • design change: merge stateless functions into stateful

Eliminate code duplication in several places

There is repeated in tofnd. Making use of Enums and Traits can prevent that.

Some candidates are:

  1. fn check_keygen_results(results: Vec<KeygenResult>, expected_crimes: &[Vec<KeygenCrime>]) -> bool {
    and
    fn check_results(results: Vec<SignResult>, expected_crimes: &[Vec<SignCrime>]) {

  2. async fn execute_keygen(
    and
    async fn execute_sign(

  3. pub(crate) fn should_timeout_keygen(&self, traffic: &proto::TrafficOut) -> bool {
    and
    pub(crate) fn should_timeout_sign(&self, traffic: &proto::TrafficOut) -> bool {

  4. pub(crate) fn spoof_keygen(
    and
    pub(crate) fn spoof_sign(&mut self, traffic: &proto::TrafficOut) -> Option<proto::TrafficOut> {

Investigate test crushing using tokio core_threads = 4

It was observed that malicious tests fail when multiple core threads are used. Investigate the origin of this failure.

The output is:

h (t,n)=(3,4)"}:{round=3}:Sign:{state="sign [Gus-test-sig] party [uid:D, share:4/4]"}:{round=6}:Keygen:{state="keygen [Gus-test-key] party [B] with (t,n)=(2,4)"}:{round=3}:Sign:{state="sign [Gus-test-sig] party [uid:A, share:1/7]"}:{round=2}:Keygen:{state="keygen [Gus-test-key] party [A] with (t,n)=(6,5)"}:{round=3}:Sign:{state="sign [Gus-test-sig] party [uid:D, share:6/9]"}:{round=2}:Keygen:{state="keygen [Gus-test-key] party [A] with (t,n)=(6,5)"}:{round=3}:Sign:{state="sign [Gus-test-sig] party [uid:A, share:1/8]"}:{round=3}:Keygen:{state="keygen [Gus-test-key] party [C] with (t,n)=(7,8)"}:{round=3}:Sign:{state="sign [Gus-test-sig] party [uid:A, share:1/9]"}:{round=1}: tofnd::gg20: sign failure: "Expected sign init message"
thread 'tokio-runtime-worker' panicked at 'sign failure to complete', src/tests/tofnd_party.rs:167:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: RecvError(())', src/tests/tofnd_party.rs:44:45
kv_manager stop

Single source for tofnd.proto file

We should not be copying the tofnd.proto protocol buffers file between repos. Instead, we should have a single authoritative source for this file. Unfortunately, there are no obvious best options. Current leading contender is to put tofnd.proto into its own separate git repo and include it as a git submodule in axelar-core, tofnd, and wherever else it's needed.

Originally posted by @cgorenflo in axelarnetwork/axelar-core#297 (comment)

Update dependencies

The following dependencies are outdated:

package our version latest version
tonic 0.3 0.5.0
funty 1.1 1.2
prost 0.6 0.8
tokio 0.2 1.8
tonic-build 0.3 0.5

Before updating we need to investigate if we have conflicts with some of the new versions since some of the updates are major.

Accommodate Keygen Crimes

Until now, we only accommodate crimes for sign. Add support for keygen crimes, as well.

  • get keygen result in keygen.rs; This has alaready been done for Sign. We need this to retrieve the result (SectetKeyShare of Crimes) from a keygen execution and check that it was the expected one
  • refactor test cases to support keygen test cases
  • add tests for keygen

Thorough testing of behaviours

We only provide trivial tests for malicious behaviours with a single malicious type per test case. We should expand those into more complex scenarios, and find bugs that are possibly lurking.

This will be unblocked when axelarnetwork/tofn#35 is merged.

Eliminate dublicates in SignResult

The result contains the uid of the criminal one time for each of his shares because all shares that belong to the malicious party are registered as individual criminals, and multiple shares map to the same uid.

Example: if a malicious party A has shares with tofn indices 0,1,2, the criminal_list will contain [0:Malicious, 1:Malicious, 2:Malicious] (which correspond to [tofn_index, CrimeType] for each pair).

In tofnd, we examine the uid of the criminal, rather than his tofn share indices. This means that the criminal vector will become [A:Malicious, A:Malicious, A:Malicious] (which correspond to [tofnd_index, CrimeType] for each pair).

We need to eliminate these duplicates.

Fix tofnd test

Currently, tests pass even if a protocol fails. As discussed, need to fix test in tofnd_party.rs to make sure keygen (and sign) completes. if not then fail the test.

Originally posted by @ggutoski in #23 (comment)

Take into account the result of sign in tests

In tofnd tests, we only assert that the protocol is terminated gracefully. It would be helpful to actually check at the tofnd level if the protocol result is the expected, although analogous tests already exist in tofn.

This will be unblocked when #34 is merged.

weighted shares failure in axelar-core test

@sammy1991106 reports that a local cluster test with weighted share allocation 1,2,3,4,5 failed to complete keygen. See axelar-core branch feat/weighted-tss-share. Logs and discussion posted to #tofn channel in slack.

This is basically what I did to "fake" the 1, 2, 3, 4, ,5 scenario

Screen Shot 2021-04-15 at 3 13 29 AM

Sanitize {key,sig}IDs

IDs are user input fields, so we need to make sure that we cannot inject any malicious behaviour with these strings

Make use of test tofn structs

There are some structs in tofn that are defined only for testing purposes. Some of these structs are useful in tofnd tests, as well. Investigate what is the best way to import them. Right now we just duplicate code.

Examples:
https://github.com/axelarnetwork/tofn/blob/f05bb0150261fefd383cb0737c7b5ecd15308612/src/protocol/gg20/sign/malicious/tests/test_cases.rs#L41-L68

https://github.com/axelarnetwork/tofn/blob/f05bb0150261fefd383cb0737c7b5ecd15308612/src/protocol/gg20/sign/malicious/tests/test_cases.rs#L41-L68

Options:

  • check if we can import structs from test modules
  • check if we can expose them in a mod that is visible to tofnd

rust-gmp dependency build break

The build is currently broken on master: https://github.com/axelarnetwork/tofnd/runs/2196540191

#25 125.5 error[E0463]: can't find crate for `serde_derive`
#25 125.5   --> /usr/local/cargo/git/checkouts/rust-gmp-ec80bc1e51a8ce8f/641a89f/src/lib.rs:13:1
#25 125.5    |
#25 125.5 13 | extern crate serde_derive;
#25 125.5    | ^^^^^^^^^^^^^^^^^^^^^^^^^^ can't find crate
#25 125.5 
#25 125.5 error: aborting due to previous error

Research options for coloured terminal output redirected to a file

Log output has pretty colours in the terminal. Unfortunately, when this log output is redirected to a file the resulting text contains unreadable cruft about colour. We had a similar issue with Binance's tss-lib. What to do? Is it possible to get both (1) pretty coloured logs in the terminal, and (2) readable plaintext logs when redirected to file?

Custom config parameters for tofn malicious behaviours

Tofnd allows the end-user to specify a malicious behaviour in a config file or CLI flag. We currently enforce that all malicious behaviours include a victim arg, even those malicious variants that do not have a victim arg. For those variants, the victim arg is simply ignored:

We do this only because it's easy. Instead, we should demand a victim arg only for those malicious variants that actually need it.

In general, each malicious variant might have ts own custom args list. For each such variant, we'll need to write custom code to parse config data for that variant. That could become tedious in the future but we should do it anyway. See discussion in axelarnetwork/tofn#38.

Better handling of test files

Overview

Use testdir to allow for concurrent execution of all async rust tests that create conflicting kv-stores.

Summary

tests/mod.rs::basic_keygen_and_sign and tests/mod.rs::restart_one_party try to create kv-stores under the same directory. So far, we had to toggle these tests so that only one runs a time.

By using a library such as testdir, we can assign unique directories to each test to enable concurrent execution.

Sprint 13: swap from tssd to tofnd, begin to harden tofnd

Goal

Swap from tssd to tofnd. All tasks that do not block this goal are postponed. The intent is to have everyone using tofnd exclusively so that we can find pain points and build a track record. Any time remaining in the sprint upon completion of this goal will be spent on non-blocking tasks.

Blocking tasks

  • Do whatever tests are needed to give us confidence that we can swap from tssd to tofnd.
  • Get tofnd working with AWS, axelarate the way tssd currently works.
  • Code review tofn, tofnd. Merge into master.

Non-blocking tasks

A to-do list of ongoing tasks to be rolled over to future sprints.

  • encrypt sensitive traffic in tofn
  • authentication in axelar-core (see axelarnetwork/axelar-core#157 )
  • do not store party IDs in tofnd---store them in axelar-core instead. Restrict the tofnd API so that all party IDs are indices in 0..share_count.
  • tofn, tofnd should never panic
  • tofn keygen code clean: merge stateless functions into stateful
  • KV store: currently using microkv. Seeking alternatives. A leading contender is kv but there are concerns about in-memory security.
  • Eliminate use of zengo crates. Use k256 for math.
  • When to make the switch from GG20 to CGGMP? GG20 is currently incomplete, so perhaps it's best to switch now.
  • What to do about safe primes?
  • Testing/benchmark for 1-25 validators.
  • Edge case testing for bad messages.
  • Cleaning function in case of timelines to clean-up intermediate state if protocol stalls.
  • Restarts of the nodes: a node had a share, participated in the protocol, went offline, and then restarted. Want to make sure it can continue participating in the protocol.
  • Useful error messages and context. Use thiserror and anyhow crates to reduce boilerplate for error handling as described in Rust: Structuring and handling errors in 2020 - nick.groenen.me

Allow cross-feature tests in github actions

We have github actions for testing only the regular build. Tests for --feature="malicious" need to be added.

The following options didn't work

- name: Run cargo test
   uses: actions-rs/cargo@v1
   with:
     command: test 
     args: --all-features
- name: Run cargo test
   uses: actions-rs/cargo@v1
   with:
     command: test --all-features

Sign stalls with sufficiently large share counts

The following scenario causes Sign to stall:

  • uids: 5
  • party_share_counts: [1,2,3,4,10]
  • threshold: 13
  • sign_participants: [0,1,2,3,4]

Not sure if this is a tofn or tofnd issue, but am leaning towards tofn since the changes there are more drastic.

Last working commits:
tofnd: 3c760
tofn: cde2a

diffs between last working and first not working commit
tofnd: https://github.com/axelarnetwork/tofnd/compare/3c76050..be8f15e
tofn: https://github.com/axelarnetwork/tofn/compare/cde2aeb..36a85b0

Even though the last working commit of tofnd works for that scenario, it still fails with [1,2,3,4,20]. A solution is to increase the capacity of the channel between sign threads and the aggregator from 4 bytes to 100 bytes. Intermediate values might also work but are not tested. However, this does not fix the problem in master branch of tofnd.

multiple shares breaks under broadcasting of p2p messages

Support for fault attribution requires that all p2p messages are broadcast to all parties. (eg. p2p messages from Alice to Bob should also be delivered to Charlie.) This is now implemented in tofn for both keygen and sign in branch https://github.com/axelarnetwork/tofn/tree/blame

I began modifying tofnd to support this new requirement in branch https://github.com/axelarnetwork/tofnd/tree/blame2 . It works for test cases with only one share per party, but breaks if any party has >1 share. I tried to fix it but gave up. Instead I put a few TODOs in the code to mark potential trouble spots. The blame2 tofnd branch has very few changes from master, so you should be able to see all I've done just by viewing the diffs.

I suspect we might be able to completely eliminate the need for the TofndP2pMsg struct now that all parties need to receive all p2p messages but I'm not certain. Please take a look @sdaveas and see if you can get it working. You now know that part of tofnd better than I do.

Make tests run faster

Test cases are taking too much time to complete. An easy way to speed them up is to perform separate tests for each test case. That way rust's parallel test execution will kick in.

Steps

  1. Create an execute_test_case(test_case: &TestCase) function.
  2. Create multiple tests for reach test case:
    fn test_spoofs() {
        execute_test_case(&SPOOFS);
    }
    

Discussion

Perhaps an even better way to achieve speed up thnings is to use the core_threads option of tokio. This produced problems in the past (see #58) and needs reiteration.

Epic: mainnet must-haves

Migrated from Tss checklist notion doc.

  • fault attribution
    • easy faults
    • hard faults, requiring additional rounds of msgs
    • deserialization faults
    • timeouts
    • tests
    • Design note: Begin each round with a list of messages received from other parties. 3 states: timeout (or other declared failure), deserialization failure, or successfully deserialized message.
  • multiple shares per validator for stake-weighted threshold crypto
  • authentication of messages (Stelios)
  • tofn, tofnd should never panic (Stelios?)
  • use sled for KV store (done in #15)
  • better handling of test files. (done in #18)
  • Eliminate use of zengo crates. Use k256 for math. axelarnetwork/tofn#58
  • check for places where we need secure erasures of secrets in memory
  • research logging libraries, choose one and deploy it in tofn and tofnd. #19
  • Sad path should return specific crimes axelarnetwork/tofn#44
  • check the tofn, tofnd APIs to make sure no leakage of secrets is allowed via various queries, "injection attacks" (put a command inside some field to trigger a malicious behavior), etc.

Keygen enhancements

Nice to have but not needed for mainnet

  • robust keygen: new protocol design. (Do we want robust sign, too?)
  • Useful error messages and context. #28
  • Eliminate confusion on party vs participant indices axelarnetwork/tofn#11
  • git submodule for shared protobuf file #9
    • tofnd: done in #64
    • axelar-core: TODO
  • do not store party IDs in tofnd axelarnetwork/axelar-core#171
  • Switch from GG20 to CGGMP

Facilitate timeout faults

As discussed, we can respond to the Abort message with the list of parties we have not received an answer from.
Note that malicious parties can choose not to respond to Abort message. Need to think if this complicates our task.

Originally posted by @sdaveas in axelarnetwork/tofn#65 (comment)

Enable sad-path tests via conditional compilation for malicious behaviour

Motivation

We must allow users of tonfd to test sad-path. Thus, we must somehow expose an option for tofnd to behave maliciously (eg. emit a zk proof or commitment that fails to verify, etc). There is a danger that this malicious behaviour could be triggered by accident in real-world use. To minimize this danger I propose we use conditional compilation. That way, malicious behaviour is possible only if the tofnd binary was built to enable it.

Tasks

  • Research options for conditional compilation
  • Implement conditional compilation at the appropriate parts in tofnd to enable malicious behaviour
  • Whatever solution we implement will probably need to be replicated in tofn, too. Talk to @ggutoski about this. I suspect tofn will be easier than tofnd.

Suggestions for first steps

I think cargo "features" is the way to go. Enable malicious behaviour compile time:

cargo build --features malicious

Recommended reading:

  1. Features - The Cargo Book
  2. Conditional compilation - The Rust Reference
  3. Traps to avoid: Cargo [features] explained with examples - DEV Community

Implementation suggestions

Tofnd currently instantiates a non-malicious tofn::Sign struct here:

tofnd/src/gg20/sign.rs

Lines 220 to 224 in be8f15e

let mut sign = Sign::new(
&secret_key_share,
&participant_tofn_indices,
&message_to_sign,
)?;

Tofn needs to expose a malicious version (eg. SignMaliciousR1BadProof). Tofnd should use conditional compilation here to instantiate SignMaliciousR1BadProof instead of Sign.

Unfortunately, I see no way to achieve our goal without modifying happy-path production code to facilitate it. My first instinct is to replace the tofn method call Sign::new with a call to a factory function like this:

fn new_sign( /* same args as Sign::new */ ) -> Box<dyn SignOutputter>

where SignOutputter is a new trait that replaces the concrete tofn type Sign. (What is Box and dyn? You can learn here: Implementing an Object-Oriented Design Pattern - The Rust Programming Language. Feel free to chat with @ggutoski about it.)

This new trait should:

  1. be a supertrait of the Protocol trait defined in tofn.
  2. have a single method clone_output() that returns the output of the completed protocol.

Happy-path tofn Sign already has a clone_output method, so it's trivial to make Sign implement the new SignOutputter trait. (Until recently, this method was named get_result.)

I don't know whether the new SignOutputter trait should be defined in tofn or tofnd.

Sign::clone_output currently called here in tofnd:

Ok(sign.clone_output().ok_or("sign output is `None`")?)

There should be no need to modify this line of code to accommodate this proposal.

Confine conditional compilation to one source file

I suggest all conditional compilation be confined to a single new source file (eg. malicious_feature.rs or something). That's where the new new_sign factory function should go. It'll look something like this:

// sad path: "malicious" feature is enabled
#[cfg(feature = "malicious")]
fn new_sign( /* args */ ) -> Box<dyn SignOutputter> {
  Box::new( SignMaliciousR1BadProof::new( /* args */ ) )
}

// happy path: "malicious" feature is disabled
#[cfg(not(feature = "malicious"))]
fn new_sign( /* args */ ) -> Box<dyn SignOutputter> {
  Box::new( Sign::new( /* args */ ) )
}

Print a warning to stdout on launch

Currently, on startup tofnd prints something like the following to stdout:

tofnd listen addr 0.0.0.0:50051, use ctrl+c to shutdown
kv_manager found existing db [.kvstore]

Code to print this message occurs in main.rs here:

tofnd/src/main.rs

Lines 23 to 26 in be8f15e

println!(
"tofnd listen addr {:?}, use ctrl+c to shutdown",
incoming.local_addr()?
);

When compiled in malicious mode, it should instead look something like this:

WARNING: THIS tofnd BINARY AS COMPILED IN 'MALICIOUS' MODE.  MALICIOUS BEHAVIOUR IS INTENTIONALLY INSERTED INTO SOME MESSAGES.  THIS BEHAVIOUR WILL CAUSE OTHER tofnd PROCESSES TO IDENTIFY THE CURRENT PROCESS AS MALICIOUS. 
tofnd listen addr 0.0.0.0:50051, use ctrl+c to shutdown
kv_manager found existing db [.kvstore]

As suggested above, malicious code should be confined to a single new source file. Thus, the above println! in main.rs should be replaced with a function call (something like print_startup_message) that's implemented elsewhere with conditional compilation to print the warning whenever the "malicious" feature is set.

keygen improvements

user-story:

  • when a user wants to become a validator, they run Paillier keygen and record their public key on the axelar-chain. No user can be a validator within registering this key first.

  • either through axelar / or tss itself, users can export their Paillier keys as a mnemonic (24-48 words...).

  • at the end of each keygen, encrypted shares are recorded on chain (shares are encrypted under Paillier key with the corresponding proofs of correctness).

  • recovery mode: I was a validator in the past, I lost my keys, but I have my mnemonic. I want to re-register as a validator and recover a) my Paillier key, master key shares (if any) from the encryptions on chain.

  • do we add new queries at the core to import/export keys?

don't expect results of malicious parties in tests

When a crime is detected, parties can immediately stop the execution of the protocol. Although this is expected behaviour from tofn's side, our test framework in tofnd expect all threads to return. This causes some issues. The problem manifests when criminals are exposed, honest parties stop, but criminals still wait for messages. The result is that some tests are running for ever.

A temporary hacky solution is to make criminals have multiple shares. That way each one of the criminal shares can notify the other that the protocol is over. A proper solution is to decouple the data that are related to the protocol and the ones that are related to the communication inside Party struct.

pub(super) struct TofndParty {
db_name: String,
client: proto::gg20_client::Gg20Client<tonic::transport::Channel>,
server_handle: JoinHandle<()>,
server_shutdown_sender: oneshot::Sender<()>,
server_port: u16,
#[cfg(feature = "malicious")]
pub(super) malicious_data: PartyMaliciousData,
}

When this is done, we will not need to wait threads to return in order to reclaim their resources (as done below).

tofnd/src/tests/mod.rs

Lines 467 to 474 in 4179451

// execute the protocol in a spawn
let handle = tokio::spawn(async move {
let result = party
.execute_sign(init, channel_pair, delivery, &participant_uid)
.await;
(party, result)
});
sign_join_handles.push((i, handle));

add extra logging information to help with debugging

Currently, we're missing more visibility in where tss is at, and it prevents us from debugging quickly.
Since keygen/sign are computationally heavy protocols, it would be useful to logs more details on where we're at during the computations.
e.g., for keygen with 25 validators, we could capture:

  • "starting keygen phase 1. generating 1 broadcast message."
  • "finished generating phase 1 messages.
  • "starting keygen phase 2, generating 25 p2p messages."
  • "finished generating 12/25 messages."
  • "finished generating all phase 2 messages". etc.

maybe this needs to be captured at tofn (vs tofnd).

Attempt to InitKeygen for a duplicate key ID fails silently

Description/Reasoning

  • When keygen starts all parties store the keygen in their KV store
  • When an already existing key is used for keygen, keygen init fails in tofnd then the socket connection to the client is closed with no message.
  • This happens when a key ID is reused for keygen init, regardless of keygen success or failure

i.e. A key ID can never be reused to start the keygen operation

Current Behaviour

  • In case of duplicate key ID a tofnd client (vald) has no way of differentiating between an error (keygen failure: "kv_manager key eth-master-E53757BAC7 already reserved")vs a network send/receive omission failure.

  • This causes axelard to commit the KeygenInit request, responding with message broadcast success

  • This causes c2d2 controller to believe keygen is in progress, then wait until timeout (e.g. 3 hours) to consider the keygen a de-facto failure.

    • This situation lead to c2d2 getting into a stuck state where it could not complete key rotation because it was reusing the same key ID until it was able to generate the key with that ID successfully.
  • A workaround is used in c2d2 to address this: we always randomize the new key ID after keygen failure.

    • This is a necessary mechanism at the application layer regardless (sergey to add rationale where appropriate), but tofnd should make this failure case explicit to the client regardless. KeygenInit should not leave the client in an undefined state.

Expected Behaviour

  • KeygenInit should not leave the client in an undefined state.
  • @sdaveas Stelios @ggutoski

Steps to reproduce (for bugs)

  1. Start a keygen
  2. wait for keygen success or force a failure
  3. attempt keygen with same keyID

Relevant Logs or Files

Migrate to new tofn criminal return type

tofn has now migrated SignOutput sad-path return type from Vec<Criminal> to Vec<Vec<Crime>>. Need to upgrade tofnd to accept this new return type. The only use of tofn's Criminal is in proto_helpers.rs:

use tofn::protocol::{gg20::sign::SignOutput, CrimeType, Criminal};

We can also delete the Criminal struct from tofn, too.

Ensure correct sender in tofn traffic

Currently, the sender of a tofnd message is not authenticated. Thus, malicious parties could spoof messages from other parties.

Tofnd currently processes incoming traffic as follows: we ignore all fields of TrafficIn messages except the binary payload, which is passed directly to tofn for deserialization:

tofnd/src/gg20/protocol.rs

Lines 106 to 117 in 56068f8

while protocol.expecting_more_msgs_this_round() {
let traffic = chan.receiver.next().await.ok_or(format!(
"{}: stream closed by client before protocol has completed",
round
))?;
if traffic.is_none() {
warn!("ignore incoming msg: missing `data` field");
continue;
}
let traffic = traffic.unwrap();
protocol.set_msg_in(&traffic.payload)?;
}

This binary payload contains a from field indicating to tofn the tofn-index of the message sender---see tofn code:

struct MsgMeta {
    msg_type: MsgType,
    from: usize,
    payload: MsgBytes,
}

It is easy for a malicious actor to dig into the binary payload and spoof this from field and therefore send messages on behalf of other parties.

Instead of ignoring TrafficIn metadata, tofnd must somehow verify that the from_party_uid field in TrafficIn is consistent with the from field wrapped in the tofn binary payload.

Tofn currently does not expose the format of the binary payload. Thus, in order to enable tofnd to perform the above mentioned authentication, tofn must expose the necessary data in its API. This requirement is posted in a separate issue in the tofn repo: axelarnetwork/tofn#42

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.