a16z / jolt Goto Github PK
View Code? Open in Web Editor NEWThe simplest and most extensible ZK VM. Fast and fully open source from a16z crypto and friends. ⚡
Home Page: https://jolt.a16zcrypto.com
License: MIT License
The simplest and most extensible ZK VM. Fast and fully open source from a16z crypto and friends. ⚡
Home Page: https://jolt.a16zcrypto.com
License: MIT License
I cloned the repo and ran the first command in the README.
Turns out it requires the latest rust (nightly).
I used rust installed using brew on my Mac, it failed:
jon@Jonathans-MBP-3 Lasso % git log -1
commit 97ccfd9ac1ec5e261c2f0fd45d5b494cc324b3ac (HEAD -> master, origin/master, origin/HEAD)
Author: Michael Zhu <[email protected]>
Date: Fri Aug 11 09:59:15 2023 -0400
Update README.md
jon@Jonathans-MBP-3 Lasso % cargo --version
cargo 1.71.0 (cfd3bbd8f 2023-06-08)
jon@Jonathans-MBP-3 Lasso % cargo build --release
Compiling ark-lasso v0.4.0 (/Users/jon/src/Lasso)
error[E0554]: `#![feature]` may not be used on the stable release channel
--> src/lib.rs:3:1
|
3 | #![feature(extend_one)]
| ^^^^^^^^^^^^^^^^^^^^^^^
error[E0554]: `#![feature]` may not be used on the stable release channel
--> src/lib.rs:4:1
|
4 | #![feature(associated_type_defaults)]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error[E0554]: `#![feature]` may not be used on the stable release channel
--> src/lib.rs:6:1
|
6 | #![feature(generic_const_exprs)]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
error[E0554]: `#![feature]` may not be used on the stable release channel
--> src/lib.rs:3:12
|
3 | #![feature(extend_one)]
| ^^^^^^^^^^
For more information about this error, try `rustc --explain E0554`.
error: could not compile `ark-lasso` (lib) due to 4 previous errors
The solution is to install nightly using rustup (I answered Yes to overwrite the brew install)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"
rustup default nightly
cargo build --release
Zeromorph is described in this paper: https://eprint.iacr.org/2023/917. But it's very focused on hiding/zk, which is not an initial priority for Jolt.
HyperKZG is described and implemented here: https://github.com/microsoft/Nova/blob/main/src/provider/hyperkzg.rs
Switching to these commitment schemes will reduce verifier costs to logarithmic and slightly speed up the prover too.
Currently we call SubtableStrategy::materialize_subtables
before doing any of the dereferencing work to create NUM_SUBTABLES * MEMORY_SIZE
work is not a blocker.
Initially this strategy was used for Spark tables which can only be computed in O(memory_size) time if computed all at once, but we no longer have this restriction: Most of our SOS tables can compute a single T[k] evaluation in O(1) time without computing all other M evaluations.
We could explore a lazy subtable evaluation strategy where we only evaluate T[k] as needed and cache the results. This may result in speedups for tables where T[k] is expensive to compute.
Proof recursion will eventually be needed for on-chain Lasso proofs.
There are multiple verifications being done within Lasso and only some will need to be 'made into circuits' for this to work.
Why?
I want to open this ticket up early as it should influence design decisions such as hash selection.
Also, if there is already a plan for this I would love to learn about it!
This is a major issue involving the following sub-issues.
*Switch the finite field to GF[2^128].
**This will force changing all the "subtables and combining function" in the Instruction Lookups (Lasso).
**It will also require modest changes to the Lasso lookup argument (described in the Binius paper, Section 4.4).
*RISC-V addition and multiplication will no longer be amenable to solution via lookups. We will instead use gadgets (in this case, this means constraint systems) given for them in the Binius paper, Section 5.3. This involves the same changes/approach as supporting pre-compiles (Issue 210)
It also probably makes sense to handle XOR operations without lookups, since they will correspond to addition over GF[2^128].
*Implement recursion/continuations. This will require pre-compiles for native field arithmetic over GF[2^128], and for Keccak or Grostl hashing see Binius paper, Appendix A
Currently processes each dimension (1..C) in serial, but this can be parallelized.
Surge makes extensive use of const generics to allow statically sized arrays throughout rather than Vectors or Slices.
In order to use this pattern through our generic SubtableStrategy
traits we require the use of feature(generic_const_exprs)
which is highly unstable. Beyond the instability, we have strange artifacts on many of our impls, such as [(); S::NUM_SUBTABLES]: Sized
. These tell the compiler that the const generic expression is bounded.
It's likely that the overhead of using Slices and len asserts everywhere is minimal enough to be irrelevant. It's worth considering whether the benefit is worth the downside.
Currently N, the size of the big table (N=M^c
), is stored in various places as a u64
. We'd like to explicitly support a big table size of N=128.
We should be able to do this by just storing M
(the subtable size) and c
and never explicitly computing N
.
For a 64 core machine at a cycle count of ~16M, Jolt spends ~1.8% of its time in a segment called compute_chunks_opreands here.
This segment allocates and computes C
different chunks for each instruction. For example for the EQ instruction we split the input operands X, Y into 4 8-bit chunks (WORD_SIZE / C). We then can compute EQ over each chunk individually.
Idea for acceleration: Split chunks_x
and chunks_y
into mutable slices. Iterate over each and compute the values in parallel writing the the slice indexes directly.
It may be helpful to review the tracing strategy for performance testing.
Understanding Jolt verifier complexity in terms of RISC-V instructions will help us understand recursive properties and on-chain proving complexity.
Some ideas:
Hi, developers of Lasso, I recently got to know and learn about this impressive project.
After I have an idea about how to write a subtable, I'm getting excited and want to solve some more complicated problems with the help of those subtables from src/subtables
. Like calculating ((a AND b) XOR b) OR a
with help of subtable AND
, OR
and XOR
.
But I really don't have a strong background of lookup argument
and surge
. This makes me not even know where to start. So if Lasso is able to solve any real problems, is there anybody could help me with it? A doc or tutorial would help a lot. Thanks in advance.
Many zkVMs today use continuations, which means they break the execution of a computer program into chunks and prove each chunk independently before recursively aggregating the proofs into one. One major reason to do this is to control the prover's space requirements.
Continuations can be implemented especially efficiently when Jolt is combined with a homomorphic commitment scheme like HyperKZG or Zeromorph.
Still, implementing this performatively will be a major endeavor and will rely on some upcoming research.
GrandProducts::build_grand_product_inputs
can easily be parallelized over the sparsity. For ~20% savings on S=2^20. Gate the feature using "multicore".
This issue is rather a question. The JoltInstruction
trait contains a function called combine_lookups
(here). I understand this is meant to represent the polynomial g
from the Lasso and Jolt preprints (e.g. Jolt preprint, definition 2.6) and it receives the lookups T_i(r_j)
of suitable chunks in suitable subtables.
combine_lookups
should not e.g. use the instruction's input data, such as self.0
(in the implementors that have such a field) directly, correct?degree
function of in the JoltInstruction
trait should return the degree of the polynomial g
computed by combine_lookups
. But is this degree variable-wise, or monomial-wise? In other words, should g(x_1, x_2) = x_1 * x_2
return 1 or 2? I'm guessing the latter, since the Jolt paper estipulates g
should be a multilinear polynomial and hence have variable-wise degree <= 1.Thank you for sharing this excellent library!
ReadWriteMemory::get_polys_r1cs()
is extremely slow (here) because it clones all of the underlying not once but twice. On a ~16M cycle count trace on a 64 core machine it takes 2% of end-to-end time.
First: remove the double clones from self.v_read
and self.v_write_rd
. Double clones come from a clone in DensePolynomial::evals()
then another in .flat_map(...).collect()
.
Second: refactor get_polys_r1cs()
to return lifetimed references to the underlying vectors via DensePolynomial::evals_ref()
rather than cloning / reallocating at all.
It may be helpful to review the tracing strategy for performance testing.
Jolt currently uses the grand product argument from Thaler13, which has roughly log^2(n)-sized proofs for products of n values.
Section 6 of the Quarks paper reduces this proof size to close log(n) with a modest increase in prover commitment costs.
More instructions: The Jolt codebase currently implements the RISC-V 32-bit Base Integer instruction set (RV32I), but the Jolt construction is highly flexible. Add support for the RISC-V “M” extension for integer multiplication and division, as described in the Jolt paper. Further, Jolt can trivially support the 64-bit variant RV64IM.
This is newly initialized so would have len 0.
let mut rand: Vec<F> = Vec::new();
assert_eq!(rand.len(), rand_prod.len());
DensifiedRepresentation::from
can be parallelized across the C
dimensions. Use the multicore
feature to gate.
Feature gate with "multicore"
M > 2^20 crashes Lasso.
This is likely due to OOM when constructing the dereferenced memory tuples (address, value, timestamp).
We can likely detect that this will happen ahead of time based on the parameters and provide a more useful error.
Currently Surge supports using Surge as a multi-dimensional sparse polynomial commitment scheme known in the Lasso paper as "Spark". To support, we generate and pass through a 'C' different log2(sparsity) sized randomness vectors for evaluating the multilinear extension of the
Currently we generate both eq_randomness: Vec<F>
of size log-s and spark_randomness: [Vec<F>; C]
with each vec of size log-m. For Surge we use only eq_randomness
.
The SubtableStrategy::materialize_subtables
and SubtableStrategy::evaluate_subtable_mle
function signatures depend on r: &[Vec<F>; C]
despite being an unused parameter in all Subtable strategies beyond Spark.
This means that we pass spark_randomness
deep into prover and verifier workflows.
Spark is useful in it's own right – support would be nice, but it does carry a substantial readability / maintainability cost.
Incorporating precompiles into Jolt is essential to achieve developer adoption, since performance of all zkVMs, Jolt included, is simply too slow today without them.
The most natural approach is to have the prover "collect" all the inputs to the various invocations of the precompile together and apply (a data-parallel version) of the precompile to the collected inputs.
If the precompile consists of a hand-specified constraint system (which is how the current generation of zkVMs implements precompiles), we would have developers specify the constraint system in an appropriate constraint specification language, and then apply a back-end for that constraint system to prove satisfiability.
Today’s zkVMs all use AIR as the constraint system. We could reuse these precompiles, applying BabySpartan or SuperSpartan (our sum-check-based SNARKs for AIR and Plonkish constraint systems) to prove their satisfiability.
Long-term, the fastest precompiles will be special-purpose sum-check-based SNARKs that avoid constraint systems entirely. In this case, we’d probably have developers specify them by expressing the prover and verifier directly, in Rust.
For a 64 core machine at a cycle count of ~16M Jolt spends ~4% of its time on segment called InstructionLookups::polynomialize
here. This segment allocates and computes the offline memory checking (a,v,t) polynomials and other inputs for the Jolt instruction execution.
Currently much of the time is spent serially. We should parallelize this to get a speedup as the core count increases.
Similar to #292.
It may be helpful to review the tracing strategy for performance testing.
iai-callgrind
provides high-precision, high-granularity performance benchmarks of Rust code (using Valgrind under the hood).
It would be nice to have iai-callgrind
benchmarks for hot paths in our code, including:
The on-chain verifier will be much cheaper after switching the polynomial commitment scheme to one with logarithmic verifier costs (see this other issue)
Hi,
I was looking into the benchmarks and subtables examples.
Thank you for the great work 🙏
Indices are the actually lookups?
If so, a brief explanation on how they work would be helpful.
For example using "And(2^128, 2^4)" how would one see that the specific value [1010] is in the table?
i.e. In a zkVM you could take as input [0,1] and [1,0] AND them and get [0,0]. Using Lasso lib, how would you check that [0,0] is in the table?
https://github.com/a16z/Lasso/blob/b0db15bb83c81912ba514b614e60b4d3f5f42e7e/src/benches/bench.rs#L13
It is also used here as x_indices and y_indices with no explanation of what these things are.
Add verifier time and size to benchmarks
Add more descriptive errors to errors.rs
Running a 1024k trace length program gets the following error.
thread 'main' panicked at tracer/src/emulator/mmu.rs:489:22:
Unknown memory mapping 5050505.
https://github.com/a16z/Lasso/blob/jolt/tracer/src/emulator/mmu.rs#L489
Some preliminary investigation follows (mostly to demonstrate methods).
To find bloated libraries: cargo bloat --release --crate
Can find the source of these deps with: cargo tree -p <dependency> -i
Most of these can be traced back to cranelift-codegen <- wasmer <- circom-scotia
. We don't use wasm in circom scotia so can likely kill this dependency with proper feature gating (potentially upstream).
We can profile the whole build with cargo build -p jolt-core --timings --release
. This reveals a ton of time spent building dependencies that we don't care about.
Hey @sragss @moodlezoup ,
The write-up says Lasso is using Hyrax. Hyrax is actually a scheme from this paper ->
https://eprint.iacr.org/2017/1132.pdf
The same paper introduced a polynomial commitment scheme for multilinear polynomials: Hyrax-PCS.
I am not an expert on naming, but I am told the PCS in that paper is referred to as Hyrax-PCS 😉.
Write-up:
https://github.com/a16z/Lasso/blob/master/EngineeringOverview.md
DotProductProofLog - reading the code makes sense.
But is this Hyrax or just an efficient way to commit to Poly vectors using (bullet-proof / IPA)?
I was directed to another version of Spartan when asking around about Hyrax-PCS:
https://github.com/microsoft/Spartan2/tree/hyrax_pc2/src/provider
[EDIT]
Re-read the Hyrax paper. This all makes sense now.
This issue can be removed.
Thanks!
The current Rust verifier is not optimized at all (no parallelization, etc.)
Surge uses the dense multilinear polynomial commitment scheme from microsoft/Spartan.
We can swap this PCS with any other dense multilinear PCS for different performance characteristics. There are not many other dense multilinear PCS written in Rust, but it's worth generalizing the interface.
Currently M
(the size of memory) is a const generic parameter on the SubtableStrategy
trait. M
is only used for the bit packing of AndSubtableStrategy::combine_lookups
. It will likely be used by some subset of the yet-to-be-created tables, but not all.
Should M
be moved to a generic on the AndSubtableStrategy
impl?
This would allow us to remove the M
generic in many other places.
src/subprotocols/mod.rs
We'd like to support the use of multiple SubtableStrategies in recursive fashion to allow a single proof of lookups over multiple large tables. The strategy for combining large tables is identical to that of the combination of Subtables described in Surge with the addition of some "selector bits" out front.
We use EqPolynomial::evals
everywhere. On inspection of the gant chart or direct tracing logs, you can see how often this is used. Any improvements here would speed up the system significantly.
Best steps are
When building, the following error occurs:
package
clap_derive v4.4.2 cannot be built because it requires rustc 1.70.0 or newer, while the currently active rustc version is 1.66.0-nightly
A solution could be to update the version of the (nightly) rust-toolchain used.
Fix Github Workflows to run tests, potentially some linting, and add badge to README
For a 64 core machine at a cycle count of ~16M, Jolt spends ~3.5% of its time in a segment called memory_trace_processing
here. This segment allocates and computes the offline memory checking (a,v,t) polynomials used for the combined registers and RAM. Some additional details can be found in the wiki.
Currently this ~300 line segment takes ~3.5% of end-to-end time because it is computed completely serially. No use of additional CPU cores. We should parallelize this to get up to a NUM_CPU
x speedup.
The goal is to fill out the following.
Trace Length sized
a_ram
v_read
v_write_rd
v_write_ram
t_read
t_write_ram
Memory sized
v_final
t_final
It may be helpful to review the tracing strategy for performance testing.
PolynomialRepresentation has become a grab bag of storage for all of Jolt. We can refactor to make more specific structs throughout.
benches/dimensions.rs
was intended to help understand performance of various Subtables and parts of the prover / verifier workflow. Unfortunately Criterion is quite bad at long running benchmarks (see) leading src/bench.rs
to be the main source of benchmarks.
sudo cargo flamegraph
gives some idea of work, but requires export RAYON_NUM_THEADS=1
to be useful and does not give a good idea of system level parallelized performance.
Additionally, bench.rs
does not have proper granularity to understand where time is spent.
Within a given round of Sumcheck, we compute evaluations of a polynomial after the combine
or g
function is applied. The number of points we compute is the degree of the resulting polynomial, as n+1
points uniquely define an n
degree polynomial. We can compute evaluations at any points, and so we've chosen (0, 1, 2, ..., degree + 1).
Currently we naively compute these points, but there are lots of optimizations here, particularly for (0, 1).
See // TODO(#28)
for details
Throughout surge we overload the name r
to mean various different verifier (fiat-shamir) provided randomness.
We should standardize the naming of these different r
parameters and clean up the references.
the build section in https://github.com/a16z/jolt?tab=readme-ov-file#build:
cargo build --profile build-fast jolt-core
should change to
cargo build --profile build-fast -p jolt-core
Zeromorph paper: https://eprint.iacr.org/2023/917
This will make the Jolt proofs much shorter than when using the Hyrax commitment, and will also slightly speed up the prover (Hyrax commitments are big enough that serializations group elements actually has a time cost).
A Rust implementation of Zeromorph is here, so it just needs to be integrated into Jolt (specifically, ported to arkworks): https://github.com/lurk-lab/arecibo/blob/dev/src/provider/non_hiding_zeromorph.rs
A Solidity implementation of the Zeromorph verifier (plus sum-check) is here: https://github.com/Maddiaa0/honk-verifier
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.