For the Delendum Benchmarking project our first benchmarking scenario will be to itera

There is a typo - swapw2 should be <code class="notra

You can try something like: <div class="highlight highlight-source-rust notranslat

When I use the above program, I get <div class="snippet-clipboard-content notransl

Benchmarking 1: Create a hash chain scenario for Rescue Prime of 1, 10, 100, and 1000 hashes about examples HOT 21 CLOSED

Dominik1999 commented on August 29, 2024

Benchmarking 1: Create a hash chain scenario for Rescue Prime of 1, 10, 100, and 1000 hashes

from examples.

Comments (21)

bobbinth commented on August 29, 2024 1

There is a typo - swapw2 should be swapw.2.

from examples.

bobbinth commented on August 29, 2024 1

I removed the push.4.0.0.0 because we wanted to benchmark the systems using an external input of vec![0_u8, 32]. I am not sure if it makes a difference.

We can't do this. These are not a part of the input - these are needed to initialize capacity portion of the hasher state.

Program 1 takes a bit longer to be executed and proven

This is likely because the number of cycles for both programs gets rounded to the same power of two. Using back-of-the-envelope computations I get, for 1000 iterations:

Program 1: 1.1M cycles
Program 2: 1.9M cycles

But both of them get rounded to the next power of two which is about 2M cycles. This happens because trace length must always be a power of two.

from examples.

bobbinth commented on August 29, 2024 1

I removed the push.4.0.0.0 because we wanted to benchmark the systems using an external input of vec![0_u8, 32]. I am not sure if it makes a difference.

We can't do this. These are not a part of the input - these are needed to initialize capacity portion of the hasher state.

btw, we should really add an independently-computed result to make sure the program does the right thing.

from examples.

bobbinth commented on August 29, 2024 1

You can try something like:

let input = [BaseElement::ZERO; 4];
let result1 = Rp64_256::hash_elements(&input);
let result2 = Rp64_256::hash_elements(result1.as_elements());

from examples.

itzmeanjan commented on August 29, 2024

I just wrote assembly routine for computing Rescue Prime digest over 32 -bytes input. See 0xPolygonMiden/miden-vm#576

from examples.

bobbinth commented on August 29, 2024

I think if we need to compute a hash chain, we could do it more efficiently than using a general 1x1 hashing. For example, let's assume we have 4 elements on the stack which are the start of the hash chain (the seed) and that we want to hash it 100 times. The code could look like this:

# stack start: [a3, a2, a1, a0, ...]
push.4.0.0.0
mem_storew.0 # save [4, 0, 0, 0] into memory location 0
swapw
push.0.0.0.0 # => stack state [0, 0, 0, 0, a3, a2, a1, a0, 0, 0, 0, 4, ...]

# compute hash chain
repeat.100
    rpperm
    mem_loadw.0 # overwrite the top word with [4, 0, 0, 0]
    swapw2
    mem_loadw.1 # overwrite top word with [0, 0, 0, 0]
end

# drop everything but the digest from the stack
dropw
swapw
dropw

This whole thing should take about 720 cycles vs. the approach with general 1x1 hash which would take 2100+ cycles.

from examples.

bobbinth commented on August 29, 2024

Actually, the above will probably take closer to 1100 cycles as just 100 hashes will require 800 rows in the hasher chiplet + we need 200 memory accesses, ~100 rows for hashing the program itself.

from examples.

itzmeanjan commented on August 29, 2024

Yes makes sense, much better way to solve this. Though we may consider to expose an API for hashing 32 -bytes ( 4 field elements ), for which we can keep 0xPolygonMiden/miden-vm#576.

from examples.

Dominik1999 commented on August 29, 2024

When I use the above program, I get

thread 'main' panicked at 'Could not compile source: ParsingError("instruction 'swapw2' is invalid")', src/benches/iter_rescue_prime.rs:67:14

Can you provide such a program as @bobbinth sketches using your 1-to-1 RP hash @itzmeanjan ?

I would want to include it right here using the defined input vec![0_u8, 32] as stack_init

from examples.

Dominik1999 commented on August 29, 2024

Indeed, sorry, I could have found that myself.

Ok, so now we have this as our hash chain Miden assembly program:

Program 1:

# stack start: [a3, a2, a1, a0, ...]
begin
    mem_storew.0 # save [4, 0, 0, 0] into memory location 0
    swapw
    push.0.0.0.0 # => stack state [0, 0, 0, 0, a3, a2, a1, a0, 0, 0, 0, 4, ...]

# compute hash chain
    repeat.{}
        rpperm
        mem_loadw.0 # overwrite the top word with [4, 0, 0, 0]
        swapw.2
        mem_loadw.1 # overwrite top word with [0, 0, 0, 0]
    end

    # drop everything but the digest from the stack
    dropw
    swapw
    dropw
end

I removed the push.4.0.0.0 because we wanted to benchmark the systems using an external input of vec![0_u8, 32]. I am not sure if it makes a difference.

But, when I compare Program 1 against

Program 2:

begin
    repeat.{}
        rphash
    end
end

then Program 1 takes a bit longer to be executed and proven. (running it several times, it shows that Program 1 and Program 2 oscillate around the same time)

Program 1 Benchmarks (in the 1000 hashes scenario):

+ begin job_number:   3 iter_rescue_prime
+ job_name:           "iter_rescue_prime"
+ job_size:           1000
+ proof_duration:     1.040473916s
+ verify_duration:    2.318083ms
+ falsify_duration:   2.230666ms
+ output_bytes:       128
+ proof_bytes:        72980
+ end job_number:     3

Program 2 Benchmarks (in the 1000 hashes scenario):

+ begin job_number:   3 iter_rescue_prime
+ job_name:           "iter_rescue_prime"
+ job_size:           1000
+ proof_duration:     1.036488333s
+ verify_duration:    2.303875ms
+ falsify_duration:   2.272583ms
+ output_bytes:       128
+ proof_bytes:        73413
+ end job_number:     3

That should not be the case, right?

from examples.

Dominik1999 commented on August 29, 2024

Maybe we can talk about Program 1 @itzmeanjan and see how we can write it to create the most optimal hash chain for the given input [0u8; 32]? Do you have an idea?

I think it also should involve the RP 1-to-1 hash?

from examples.

itzmeanjan commented on August 29, 2024

I removed the push.4.0.0.0 because we wanted to benchmark the systems using an external input of vec![0_u8, 32]. I am not sure if it makes a difference.

As @bobbinth said it will produce incorrect result.

Note

May be avoid optimized version that @bobbinth suggested ( in #14 (comment) ) during first time and use 0xPolygonMiden/miden-vm#576 which exposes an API for hashing 32 -bytes input ( i.e. 4 field elements on stack, compared to 8 of them expected by rphash ) --- can be better choice, because that API won't let you misuse it and it's highly likely that it will produce correct result, though it's expensive. We can use optimized version in next round.

from examples.

itzmeanjan commented on August 29, 2024

@Dominik1999 this is the version of rescue prime I mentioned in my last message.

from examples.

Dominik1999 commented on August 29, 2024

Ok, it is almost done. However, when I hash using Miden assembly, I get a different result than when I hash in a pure Rust code.

But for me, it was also non-trivial to apply the rescue prime hash function using winter-crypto, so I am sure I made some mistakes in the implementation.

num_iter is the same in both cases it is 1.

Function 1: returns: [11330409471891329886, 5501098472172155749, 3153259110699805053, 2526470497159956267]

let input = vec![0u64; 4];

let source = format!(
            "  
            # stack start: [a3=0, a2=0, a1=0, a0=0, ...]
            begin
                push.4.0.0.0
                mem_storew.0 # save [4, 0, 0, 0] into memory location 0
                swapw
                push.0.0.0.0 # => stack state [0, 0, 0, 0, a3, a2, a1, a0, 0, 0, 0, 4, ...]
            
            # compute hash chain
                repeat.{}
                    rpperm
                    mem_loadw.0 # overwrite the top word with [4, 0, 0, 0]
                    swapw.2
                    mem_loadw.1 # overwrite top word with [0, 0, 0, 0]
                end
            
                # drop everything but the digest from the stack
                dropw
                swapw
                dropw
            end",
            num_iter
        );

Function 2: returns: [3777539536708172870, 16895864046036429333, 80443676073207975, 11079369254521998972]

    fn host_compute(&mut self) -> Option<Self::ComputeOut> {
        let mut input = vec![0u8; 32];
        let mut output = Self::ComputeOut::from(vec![0; 4]);
        
        for _i in 0..self.num_iter{
            let h_output = Rp64_256::hash(&input);
            output.copy_from_slice(&h_output.as_elements().iter().map(|x| x.as_int()).collect::<Vec<u64>>());
            h_output.as_elements().iter().enumerate().for_each(|(i, x)| input[i*8..(i+1)*8].copy_from_slice(&x.as_int().to_be_bytes()));
        }
        
        Some(output)
    }

With Function 2 I got help from @itzmeanjan. Also for Function 2 isn't there an easier way to put the output of the hash as input again?

Any idea?

Note that the input is different in both cases vec![0u64; 4] vs. vec![0u8; 32] which could explain the different result. But it is all 0's anyway.

from examples.

bobbinth commented on August 29, 2024

We should use hash_elements() instead of hash() as these functions work differently when the underlying data contains valid field element. See note on hash output consistency here.

from examples.

Dominik1999 commented on August 29, 2024

OK, I think I am even closer now, but still I get a different result

    fn host_compute(&mut self) -> Option<Self::ComputeOut> {
        let mut input = vec![BaseElement::new(0); 4];
        let mut output = Self::ComputeOut::from(vec![0; 4]);
        
        for _i in 0..self.num_iter{

            let h_output = Rp64_256::hash_elements(&input);
            h_output.as_elements().iter().enumerate().for_each(|(i, x)| input[i].clone_from(&x));

            output.copy_from_slice(&h_output.as_elements().iter().map(|x| x.as_int()).collect::<Vec<u64>>());
        }
        
        Some(output)
    }

returns [2526470497159956267, 3153259110699805053, 5501098472172155749, 11330409471891329886]

I am wondering what the minimal implementation would be to use Rp64_256::hash_elements() 2 times in a row for input = [0; 4].

from examples.

Dominik1999 commented on August 29, 2024

Ok, so I tried three different Rescue Prime Assmebly Programs now and compared it to a Rust Implementation

    let input = vec![BaseElement::ZERO; 4];
    let h_output = Rp64_256::hash_elements(&input);

I get different results.

Rust Hashing: [2526470497159956267, 3153259110699805053, 5501098472172155749, 11330409471891329886]
Assembly Program 1: [11330409471891329886, 5501098472172155749, 3153259110699805053, 2526470497159956267]
Assembly Program 2: [11330409471891329886, 5501098472172155749, 3153259110699805053, 2526470497159956267]
Assembly Program 3: [9233990305047322442, 13303861308640222486, 13320606106853856250, 14241590595577346877]

Here is the gist https://gist.github.com/Dominik1999/a984156dcafdeb9fa2d602ca67e79b74

Assembly Program 1 (Bobbin's Suggestion)
Assembly Program 2 (Anjan's Suggestion)
Assembly Program 3 (Dominik's Suggestion - simply apply rphash)

from examples.

bobbinth commented on August 29, 2024

Seems everything works as expected, right? Programs 1 and 2 give the same result as Rust (the result is reversed because it is in the stack order). Program 3 gives a different result because it hashes 8 elements rather than 4 (if you were to use let input = vec![BaseElement::ZERO; 8]; then assembly output would have been the same as Rust).

from examples.

Dominik1999 commented on August 29, 2024

You are right. Took me a while to remember (and a hint from @itzmeanjan) that Assembly returns the wrong order.

from examples.

itzmeanjan commented on August 29, 2024

You are right. Took me a while to remember (and a hint from @itzmeanjan) that Assembly returns the wrong order.

Commit delendum-xyz/zk-benchmarking@54d5bc0 should fix issues in Rescue Prime hash chain.

from examples.

Dominik1999 commented on August 29, 2024

Closed in delendum-xyz/zk-benchmarking#5

from examples.

Benchmarking 1: Create a hash chain scenario for Rescue Prime of 1, 10, 100, and 1000 hashes about examples HOT 21 CLOSED

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent