a16z / jolt Goto Github PK

View Code? Open in Web Editor NEW

538.0 538.0 86.0 14.31 MB

The simplest and most extensible ZK VM. Fast and fully open source from a16z crypto and friends. ⚡

Home Page: https://jolt.a16zcrypto.com

License: MIT License

Rust 100.00%

arkworks crypto cryptography snark zk zk-snarks zkp zkvm

jolt's People

Contributors

Stargazers

Watchers

jolt's Issues

Build requires nightly rust

I cloned the repo and ran the first command in the README.
Turns out it requires the latest rust (nightly).

I used rust installed using brew on my Mac, it failed:

jon@Jonathans-MBP-3 Lasso % git log -1
commit 97ccfd9ac1ec5e261c2f0fd45d5b494cc324b3ac (HEAD -> master, origin/master, origin/HEAD)
Author: Michael Zhu <[email protected]>
Date:   Fri Aug 11 09:59:15 2023 -0400

    Update README.md

jon@Jonathans-MBP-3 Lasso % cargo --version
cargo 1.71.0 (cfd3bbd8f 2023-06-08)

jon@Jonathans-MBP-3 Lasso % cargo build --release
   Compiling ark-lasso v0.4.0 (/Users/jon/src/Lasso)
error[E0554]: `#![feature]` may not be used on the stable release channel
 --> src/lib.rs:3:1
  |
3 | #![feature(extend_one)]
  | ^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
 --> src/lib.rs:4:1
  |
4 | #![feature(associated_type_defaults)]
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
 --> src/lib.rs:6:1
  |
6 | #![feature(generic_const_exprs)]
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: `#![feature]` may not be used on the stable release channel
 --> src/lib.rs:3:12
  |
3 | #![feature(extend_one)]
  |            ^^^^^^^^^^

For more information about this error, try `rustc --explain E0554`.
error: could not compile `ark-lasso` (lib) due to 4 previous errors

The solution is to install nightly using rustup (I answered Yes to overwrite the brew install)

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
rustup default nightly
cargo build --release

Add support for Zeromorph and/or HyperKZG commitments in addition to Hyrax

Zeromorph is described in this paper: https://eprint.iacr.org/2023/917. But it's very focused on hiding/zk, which is not an initial priority for Jolt.

HyperKZG is described and implemented here: https://github.com/microsoft/Nova/blob/main/src/provider/hyperkzg.rs

Switching to these commitment schemes will reduce verifier costs to logarithmic and slightly speed up the prover too.

Currently we call SubtableStrategy::materialize_subtables before doing any of the dereferencing work to create $E_i$ for all dimensions. For most of the existing tables this work is not in the critical path and the NUM_SUBTABLES * MEMORY_SIZE work is not a blocker.

Initially this strategy was used for Spark tables which can only be computed in O(memory_size) time if computed all at once, but we no longer have this restriction: Most of our SOS tables can compute a single T[k] evaluation in O(1) time without computing all other M evaluations.

We could explore a lazy subtable evaluation strategy where we only evaluate T[k] as needed and cache the results. This may result in speedups for tables where T[k] is expensive to compute.

Proof recursion

Proof recursion will eventually be needed for on-chain Lasso proofs.
There are multiple verifications being done within Lasso and only some will need to be 'made into circuits' for this to work.

Why?

Reducing proof size.
Better memory usage in some zkVM settings.

I want to open this ticket up early as it should influence design decisions such as hash selection.
Also, if there is already a plan for this I would love to learn about it!

Switch to Binius commitment scheme

This is a major issue involving the following sub-issues.

*Switch the finite field to GF[2^128].
**This will force changing all the "subtables and combining function" in the Instruction Lookups (Lasso).
**It will also require modest changes to the Lasso lookup argument (described in the Binius paper, Section 4.4).

*RISC-V addition and multiplication will no longer be amenable to solution via lookups. We will instead use gadgets (in this case, this means constraint systems) given for them in the Binius paper, Section 5.3. This involves the same changes/approach as supporting pre-compiles (Issue 210)

It also probably makes sense to handle XOR operations without lookups, since they will correspond to addition over GF[2^128].

*Implement recursion/continuations. This will require pre-compiles for native field arithmetic over GF[2^128], and for Keccak or Grostl hashing see Binius paper, Appendix A

Parallelize `to_grand_products`

Currently processes each dimension (1..C) in serial, but this can be parallelized.

Consider removing const generics

Surge makes extensive use of const generics to allow statically sized arrays throughout rather than Vectors or Slices.

In order to use this pattern through our generic SubtableStrategy traits we require the use of feature(generic_const_exprs) which is highly unstable. Beyond the instability, we have strange artifacts on many of our impls, such as [(); S::NUM_SUBTABLES]: Sized. These tell the compiler that the const generic expression is bounded.

It's likely that the overhead of using Slices and len asserts everywhere is minimal enough to be irrelevant. It's worth considering whether the benefit is worth the downside.

Remove dependencies on N as a u64 to allow big tables

Currently N, the size of the big table (N=M^c), is stored in various places as a u64. We'd like to explicitly support a big table size of N=128.

We should be able to do this by just storing M (the subtable size) and c and never explicitly computing N.

Parallelize compute_chunks_operands

For a 64 core machine at a cycle count of ~16M, Jolt spends ~1.8% of its time in a segment called compute_chunks_opreands here.

This segment allocates and computes C different chunks for each instruction. For example for the EQ instruction we split the input operands X, Y into 4 8-bit chunks (WORD_SIZE / C). We then can compute EQ over each chunk individually.

Idea for acceleration: Split chunks_x and chunks_y into mutable slices. Iterate over each and compute the values in parallel writing the the slice indexes directly.

It may be helpful to review the tracing strategy for performance testing.

Compile Lasso/Jolt verifier to RISC-V

Understanding Jolt verifier complexity in terms of RISC-V instructions will help us understand recursive properties and on-chain proving complexity.

Rename to Lasso

Reorg file structure

Some ideas:

Move all dense PCS code into directory
Move testing / benchmarking code into directory
Move subtable code to directory
Move memory checking code to directory

[Question] How could we use Lasso to solve real problems?

Hi, developers of Lasso, I recently got to know and learn about this impressive project.

After I have an idea about how to write a subtable, I'm getting excited and want to solve some more complicated problems with the help of those subtables from src/subtables. Like calculating ((a AND b) XOR b) OR a with help of subtable AND, OR and XOR.

But I really don't have a strong background of lookup argument and surge. This makes me not even know where to start. So if Lasso is able to solve any real problems, is there anybody could help me with it? A doc or tutorial would help a lot. Thanks in advance.

implement continuations via folding

Many zkVMs today use continuations, which means they break the execution of a computer program into chunks and prove each chunk independently before recursively aggregating the proofs into one. One major reason to do this is to control the prover's space requirements.

Continuations can be implemented especially efficiently when Jolt is combined with a homomorphic commitment scheme like HyperKZG or Zeromorph.

Still, implementing this performatively will be a major endeavor and will rely on some upcoming research.

Parallelize GrandProducts::build_grand_product_inputs

GrandProducts::build_grand_product_inputs can easily be parallelized over the sparsity. For ~20% savings on S=2^20. Gate the feature using "multicore".

The `combine_lookups` function

This issue is rather a question. The JoltInstruction trait contains a function called combine_lookups (here). I understand this is meant to represent the polynomial g from the Lasso and Jolt preprints (e.g. Jolt preprint, definition 2.6) and it receives the lookups T_i(r_j) of suitable chunks in suitable subtables.

Am I correct in thinking that, even though this function can contain any Rust code, it should strictly perform a polynomial evaluation on its input? Although this is perhaps not meant to be an extensible library at the moment, it would seem natural to enforce this on the code level.
In particular, combine_lookups should not e.g. use the instruction's input data, such as self.0 (in the implementors that have such a field) directly, correct?
I understand the degree function of in the JoltInstruction trait should return the degree of the polynomial g computed by combine_lookups. But is this degree variable-wise, or monomial-wise? In other words, should g(x_1, x_2) = x_1 * x_2 return 1 or 2? I'm guessing the latter, since the Jolt paper estipulates g should be a multilinear polynomial and hence have variable-wise degree <= 1.

Thank you for sharing this excellent library!

ReadWriteMemory::get_polys_r1cs by reference

ReadWriteMemory::get_polys_r1cs() is extremely slow (here) because it clones all of the underlying not once but twice. On a ~16M cycle count trace on a 64 core machine it takes 2% of end-to-end time.

First: remove the double clones from self.v_read and self.v_write_rd. Double clones come from a clone in DensePolynomial::evals() then another in .flat_map(...).collect().

Second: refactor get_polys_r1cs() to return lifetimed references to the underlying vectors via DensePolynomial::evals_ref() rather than cloning / reallocating at all.

It may be helpful to review the tracing strategy for performance testing.

Add support for Quarks grand product argument

Jolt currently uses the grand product argument from Thaler13, which has roughly log^2(n)-sized proofs for products of n values.

Section 6 of the Quarks paper reduces this proof size to close log(n) with a modest increase in prover commitment costs.

Support richer instruction sets

More instructions: The Jolt codebase currently implements the RISC-V 32-bit Base Integer instruction set (RV32I), but the Jolt construction is highly flexible. Add support for the RISC-V “M” extension for integer multiplication and division, as described in the Jolt paper. Further, Jolt can trivially support the 64-bit variant RV64IM.

What is this rand vec doing?

This is newly initialized so would have len 0.

let mut rand: Vec<F> = Vec::new();

assert_eq!(rand.len(), rand_prod.len());

https://github.com/a16z/Lasso/blob/7a89409879fb449528ff3dc5f7233cc1bddae39c/jolt-core/src/subprotocols/grand_product.rs#L368

Parallelize DensifiedRepresentation::from()

DensifiedRepresentation::from can be parallelized across the C dimensions. Use the multicore feature to gate.

Parallelize Sumcheck::prove_arbitrary summation

Feature gate with "multicore"

Use Quarks Grand Product Argument

Improve errors for large M

M > 2^20 crashes Lasso.

This is likely due to OOM when constructing the dereferenced memory tuples (address, value, timestamp).

We can likely detect that this will happen ahead of time based on the parameters and provide a more useful error.

Consider removing support for Spark

Currently Surge supports using Surge as a multi-dimensional sparse polynomial commitment scheme known in the Lasso paper as "Spark". To support, we generate and pass through a 'C' different log2(sparsity) sized randomness vectors for evaluating the multilinear extension of the $\tilde{eq}_i(x,r)$ polynomial.

Currently we generate both eq_randomness: Vec<F> of size log-s and spark_randomness: [Vec<F>; C] with each vec of size log-m. For Surge we use only eq_randomness.

The SubtableStrategy::materialize_subtables and SubtableStrategy::evaluate_subtable_mle function signatures depend on r: &[Vec<F>; C] despite being an unused parameter in all Subtable strategies beyond Spark.

This means that we pass spark_randomness deep into prover and verifier workflows.

Spark is useful in it's own right – support would be nice, but it does carry a substantial readability / maintainability cost.

Incorporate support for pre-compiles

Incorporating precompiles into Jolt is essential to achieve developer adoption, since performance of all zkVMs, Jolt included, is simply too slow today without them.

The most natural approach is to have the prover "collect" all the inputs to the various invocations of the precompile together and apply (a data-parallel version) of the precompile to the collected inputs.

If the precompile consists of a hand-specified constraint system (which is how the current generation of zkVMs implements precompiles), we would have developers specify the constraint system in an appropriate constraint specification language, and then apply a back-end for that constraint system to prove satisfiability.

Today’s zkVMs all use AIR as the constraint system. We could reuse these precompiles, applying BabySpartan or SuperSpartan (our sum-check-based SNARKs for AIR and Plonkish constraint systems) to prove their satisfiability.

Long-term, the fastest precompiles will be special-purpose sum-check-based SNARKs that avoid constraint systems entirely. In this case, we’d probably have developers specify them by expressing the prover and verifier directly, in Rust.

Optimize / Parallelize InstructionLookups::polynomialize

For a 64 core machine at a cycle count of ~16M Jolt spends ~4% of its time on segment called InstructionLookups::polynomialize here. This segment allocates and computes the offline memory checking (a,v,t) polynomials and other inputs for the Jolt instruction execution.

Currently much of the time is spent serially. We should parallelize this to get a speedup as the core count increases.

Similar to #292.

It may be helpful to review the tracing strategy for performance testing.

iai-callgrind benchmarks

iai-callgrind provides high-precision, high-granularity performance benchmarks of Rust code (using Valgrind under the hood).

It would be nice to have iai-callgrind benchmarks for hot paths in our code, including:

MSMs
polynomial binding
polynomial evaluation (as used in sumcheck)

Implement on-chain verifier (Solidity, Move, etc.)

The on-chain verifier will be much cheaper after switching the polynomial commitment scheme to one with logarithmic verifier costs (see this other issue)

Indices explanation.

Hi,

I was looking into the benchmarks and subtables examples.
Thank you for the great work 🙏

Indices are the actually lookups?
If so, a brief explanation on how they work would be helpful.
For example using "And(2^128, 2^4)" how would one see that the specific value [1010] is in the table?

i.e. In a zkVM you could take as input [0,1] and [1,0] AND them and get [0,0]. Using Lasso lib, how would you check that [0,0] is in the table?

https://github.com/a16z/Lasso/blob/b0db15bb83c81912ba514b614e60b4d3f5f42e7e/src/benches/bench.rs#L13

It is also used here as x_indices and y_indices with no explanation of what these things are.

https://github.com/a16z/Lasso/blob/b0db15bb83c81912ba514b614e60b4d3f5f42e7e/src/subtables/and.rs#L126

Verifier benchmarks

Add verifier time and size to benchmarks

Improve errors

Add more descriptive errors to errors.rs

Emulator fails with large programs

Running a 1024k trace length program gets the following error.

thread 'main' panicked at tracer/src/emulator/mmu.rs:489:22:
Unknown memory mapping 5050505.

https://github.com/a16z/Lasso/blob/jolt/tracer/src/emulator/mmu.rs#L489

Slow builds

Some preliminary investigation follows (mostly to demonstrate methods).

To find bloated libraries: cargo bloat --release --crate

Can find the source of these deps with: cargo tree -p <dependency> -i

Most of these can be traced back to cranelift-codegen <- wasmer <- circom-scotia. We don't use wasm in circom scotia so can likely kill this dependency with proper feature gating (potentially upstream).

We can profile the whole build with cargo build -p jolt-core --timings --release. This reveals a ton of time spent building dependencies that we don't care about.

What PCS is being used with Lasso?

Hey @sragss @moodlezoup ,

The write-up says Lasso is using Hyrax. Hyrax is actually a scheme from this paper ->
https://eprint.iacr.org/2017/1132.pdf

The same paper introduced a polynomial commitment scheme for multilinear polynomials: Hyrax-PCS.
I am not an expert on naming, but I am told the PCS in that paper is referred to as Hyrax-PCS 😉.

Write-up:
https://github.com/a16z/Lasso/blob/master/EngineeringOverview.md

DotProductProofLog - reading the code makes sense.
But is this Hyrax or just an efficient way to commit to Poly vectors using (bullet-proof / IPA)?

I was directed to another version of Spartan when asking around about Hyrax-PCS:
https://github.com/microsoft/Spartan2/tree/hyrax_pc2/src/provider

[EDIT]
Re-read the Hyrax paper. This all makes sense now.

This issue can be removed.

Thanks!

Optimize Rust verifier

The current Rust verifier is not optimized at all (no parallelization, etc.)

Multiple Dense PCS

Surge uses the dense multilinear polynomial commitment scheme from microsoft/Spartan.

We can swap this PCS with any other dense multilinear PCS for different performance characteristics. There are not many other dense multilinear PCS written in Rust, but it's worth generalizing the interface.

SubtableStrategy 'M' const generic

Currently M (the size of memory) is a const generic parameter on the SubtableStrategy trait. M is only used for the bit packing of AndSubtableStrategy::combine_lookups. It will likely be used by some subset of the yet-to-be-created tables, but not all.

Should M be moved to a generic on the AndSubtableStrategy impl?

This would allow us to remove the M generic in many other places.

Cleanup unused `KnowledgeProof` and `EqualityProof`

src/subprotocols/mod.rs

Gluing Tables Together

We'd like to support the use of multiple SubtableStrategies in recursive fashion to allow a single proof of lookups over multiple large tables. The strategy for combining large tables is identical to that of the combination of Subtables described in Surge with the addition of some "selector bits" out front.

Benchmark + Improve EqPolynomial::evals

We use EqPolynomial::evals everywhere. On inspection of the gant chart or direct tracing logs, you can see how often this is used. Any improvements here would speed up the system significantly.

Best steps are

Write a benchmark (Criterion would be good)
Speed up experimentally

Build error due to incompatible version of packages related to clap

When building, the following error occurs:
package clap_derive v4.4.2 cannot be built because it requires rustc 1.70.0 or newer, while the currently active rustc version is 1.66.0-nightly

A solution could be to update the version of the (nightly) rust-toolchain used.

Fix Github Workflows

Fix Github Workflows to run tests, potentially some linting, and add badge to README

Parallelize RAM Timestamp Count Initialization

For a 64 core machine at a cycle count of ~16M, Jolt spends ~3.5% of its time in a segment called memory_trace_processing here. This segment allocates and computes the offline memory checking (a,v,t) polynomials used for the combined registers and RAM. Some additional details can be found in the wiki.

Currently this ~300 line segment takes ~3.5% of end-to-end time because it is computed completely serially. No use of additional CPU cores. We should parallelize this to get up to a NUM_CPUx speedup.

The goal is to fill out the following.
Trace Length sized

a_ram
v_read
v_write_rd
v_write_ram
t_read
t_write_ram

Memory sized

v_final
t_final

It may be helpful to review the tracing strategy for performance testing.

Jolt: Refactor `PolynomialRepresentation`

PolynomialRepresentation has become a grab bag of storage for all of Jolt. We can refactor to make more specific structs throughout.

Improve Benchmarking

benches/dimensions.rs was intended to help understand performance of various Subtables and parts of the prover / verifier workflow. Unfortunately Criterion is quite bad at long running benchmarks (see) leading src/bench.rs to be the main source of benchmarks.

sudo cargo flamegraph gives some idea of work, but requires export RAYON_NUM_THEADS=1 to be useful and does not give a good idea of system level parallelized performance.

Additionally, bench.rs does not have proper granularity to understand where time is spent.

Optimize Sumcheck polynomial evaluations

Within a given round of Sumcheck, we compute evaluations of a polynomial after the combine or g function is applied. The number of points we compute is the degree of the resulting polynomial, as n+1 points uniquely define an n degree polynomial. We can compute evaluations at any points, and so we've chosen (0, 1, 2, ..., degree + 1).

Currently we naively compute these points, but there are lots of optimizations here, particularly for (0, 1).

See // TODO(#28) for details

Better naming for 'r' params

Throughout surge we overload the name r to mean various different verifier (fiat-shamir) provided randomness.

We should standardize the naming of these different r parameters and clean up the references.

fix build command

the build section in https://github.com/a16z/jolt?tab=readme-ov-file#build:

cargo build --profile build-fast jolt-core

should change to

cargo build --profile build-fast -p jolt-core

Add support for Zeromorph polynomial commitment

Zeromorph paper: https://eprint.iacr.org/2023/917

This will make the Jolt proofs much shorter than when using the Hyrax commitment, and will also slightly speed up the prover (Hyrax commitments are big enough that serializations group elements actually has a time cost).

A Rust implementation of Zeromorph is here, so it just needs to be integrated into Jolt (specifically, ported to arkworks): https://github.com/lurk-lab/arecibo/blob/dev/src/provider/non_hiding_zeromorph.rs

A Solidity implementation of the Zeromorph verifier (plus sum-check) is here: https://github.com/Maddiaa0/honk-verifier

a16z / jolt Goto Github PK

jolt's People

Contributors

Stargazers

Watchers

Forkers

jolt's Issues

Recommend Projects

Recommend Topics

Recommend Org