zkemail / zk-email-verify Goto Github PK
View Code? Open in Web Editor NEWVerify any text in any sent or received email, cryptographically and via only trusting the sending mailserver.
Home Page: https://prove.email
License: MIT License
Verify any text in any sent or received email, cryptographically and via only trusting the sending mailserver.
Home Page: https://prove.email
License: MIT License
We can generate the witness while we download the zkey, thus saving about 12 seconds. Pretty low priority. This trick should be upstreamed to heyanon and circom-starter.
Since there are other sha256-rsa implementations in circom (like https://github.com/zkp-application/circom-rsa-verify
), we should be able to use them to fuzz random inputs and ensure both circuits always give the same answer, or formally verify that they say identical things.
Versions < 0.6.11 allow double spending. Bump all the forks of snarkjs to avoid double spend hacks.
Allow anyone to mask any email, and only reveal the from: email address, and their chosen subset of the body. You can't currently do something like proving ownership of an email in a domain due to the BCC's breaking that membership proof soundness.
There are a few twitter packages with this line because the currently deployed circuits want it. We should recompile the circuits to not use this anymore and update those keys across the frontend and S3 buckets, and remove this signal entirely.
Zephyr reported that we might have a padding bug when the RSA key begins with zeros. Add a test for this in the circom tests for RSA for 1024 bit keys, for websites with poorer security.
Need to add a mitigation for the critical vulnerability where I can pretend to be another email address by making my email address <max_len_minus_10>@gmail.commydomain.com and <max_len_minus_10>@gmail.com reaches max_len so it truncates and thinks I'm the latter person.
Easy to fix by ensuring the array index via QuinSelector like this pseudocode:
message_id_regex_reveal[message_id_idx + max_message_id_len] === 0
Perhaps useful to get other people using the code here, but also a beginner task for me to read the code
For some reason, in the V0, either the regex or the packing fails and currency as "TEST" fails parsing from the circuit output, and skips the last letter.
Create a script to update tokens not in Token registry based on the default uniswap list - https://github.com/Uniswap/default-token-list/tree/main/src/tokens
Needed for users to create new circuits.
This might be easiest as a fork/PR to zkrepl.dev, so that it's one click to start the 3 hour process of generating and making these files. We should be able to target specific architectures for the binaries via for instance passing -march=icelake-server -mtune=icelake-server
to the C compiler, or target unknown in rust.
Ask me for the docker image for rapidsnark.
The vast majority of sent emails use 2048 bit RSA, but a minority of clients use 1024 bit RSA or 4096 bit RSA. Parameterize the circuits so that we can easily recompile circuits for different length keys. Likely will be done in tandem with issue 16 to streamline compilation of new regexes.
Start with a simple Puppeteer end-to-end test that pastes a valid email and address in, generates a proof, and verifies the proof. Benchmark this test on Browserstack to see where this code does and doesn't work.
Utilize hashing (either just naiively hash all the public inputs and check a pre-image) or do Dmitry’s new idea of efficient hashing on both sides of the proof to reduce public input size for larger circuits. Not needed for Twitter verification due to convenience of small SHAs for the time being.
Requested by external team for use in Cosmos WASM. We should start by compiling some for proof of twitter and email wallet.
Run the 7 bytes unpacking step in Javascript (this helper function is called packedNBytesToString in the code) on the public inputs show to the user, so they parse what is made public more easily.
Tasks
Add a button that deploys an autoscaled prover on the cloud for a quick circom proof. Add a note that your email address will be revealed to the server side prover as well as the password reset code.
On sampritipanda version for https://github.com/sampritipanda/snarkjs/commits/zkemail. The last commit in this fork takes vivek's snarkjs fork, upgrades it to the latest version of snarkjs, fixes merged conflicts randomly till it worked, rewrote the main.cjs by basically deleting and rebuilding the package I think. Because generate_witness.js was generated using a newer version of snarkjs/circom but vivek's version was a older one so it was causing some frontend issues.
It seems main branch works however, which is strange.
Benchmark the wasm generated by the rust in ark-circom as compared to the one generated by snarkjs, and see if there's significant improvements speed or memory wise in the former.
For some reason, only email bodies downloaded from gmail clients work. There is probably some dumb string parsing issue when downloading the email body, but detecting this and fixing it would make it a lot more general of a system. Easy for yush to generate an email that can be read on the outlook and gmail clients, so just ask him for an email if you need a copy to get started on this issue!
From PR #90 (https://github.com/zkemail/zk-email-verify/pulls#issuecomment-1653670956) -- the Twitter tests seem to be broken in Docker.
Edit: They should be tested with BOTH the stable npm packages and the latest pushes to main branch here. The former is to verify that the website will still build, and the latter are to verify that changes to the core libraries have an e2e sanity test.
This may not even be possible. There are two routes I can see.
More elegant route. Find an email from Twitter that includes the follower count (this may not exist). Make a ZK circuit to mask just that out, and then convert to an integer and prove that you have an account with at least K followers. To write this body regex, good to quickly tackle issue 16 first to utilize a new regex.
Less elegant route. This will not work in the long term and adds an extra trust assumption on the Merkle tree calculation accuracy. You can make (and periodically re-upload) a huge merkle tree of all Twitter usernames, and the ZK circuit proves membership of your account in that tree, and masks out the follower count to the nearest power of 10. This doesn't really work since you trust that the person constructing the Merkle tree didn't screw with it, which is very hard.
Raw DKIM solidity doesn't work due to calldata blowup. Compare the gas usage of an already existing solidity implementation, with and without calldata, to this implementation.
We made a new version of zk-regex and added circom circuits for common regexes to zk-regex-circom.
We will integrate zk-email-verify with zk-regex.
TODO:
When running the twitter demo, i got this error after witness gen:
twitter-verifier-zkeys.s3.amazonaws.com/e388b82/twitter.zkeyj.gz:1 Failed to load resource: net::ERR_CONNECTION_RESET
Storage of twitter.zkeyh.gz successful!
Storage of twitter.zkeyj.gz unsuccessful, make sure IndexedDB is enabled in your browser.
index-df1a66e7.js:226 TypeError: Failed to fetch
index-df1a66e7.js:454 Circuit inputs: Object
index-df1a66e7.js:454 zk-dl: 909916.0541992188 ms
index-df1a66e7.js:454 Starting proof generation
index-df1a66e7.js:226 generating proof for input
index-df1a66e7.js:160 witness calculation: 49734.8779296875 ms
index-df1a66e7.js:158 Uncaught (in promise) Error: Reading out of bounds
at uZ.readToBuffer (index-df1a66e7.js:158:21140)
at uZ.read (index-df1a66e7.js:158:21359)
at iF (index-df1a66e7.js:158:252217)
at async Rf (index-df1a66e7.js:158:252813)
at async nCe (index-df1a66e7.js:158:281177)
at async eSe (index-df1a66e7.js:160:6829)
at async kSe (index-df1a66e7.js:226:10996)
at async onClick (index-df1a66e7.js:454:11626)
The page then hung on Status:generating-proof
. We need to add
Make this an argument, not a hardcoded twitter verifier url.
helpers/src/dkim
has a bunch of JS. Replace that with TS.
Use the ENS DNSSEC proveAndCheck contract to automatically update the DNS key. Requires checking that it is the mail record URL, and automatically parsing the bytes out into our 17ish packed signals. Low priority since so few websites use DNSSEC.
eml used has a base64 encoded attachment. running generate_input.ts generates this error:
Error: No public key found on DKIM verification result
Logging the result.results[0] from result = await dkimVerify(email);
yields this output:
so the bodyHash and bodyHashExpecting are not matching so its failing here:
Instead of outputting the entire RSA key to check against the solidity contract, output the hash so we can save calldata gas. This requires
Not sure why this happens, but it seems to happen occasionally on fresh computers. I cannot reproduce the bug. Perhaps a decent 80/20 here is, if the files don't start downloading and they don't exist for say 10 seconds straight, we should tell the user to refresh the page.
Move twitter example to another repo (fork-able by others to build on top of zk-email) to keep this one cleaner and only for libraries.
In progress here - https://github.com/zkemail/proof-of-twitter
Following these instructions can hopefully help us get rid of create react app (cra) which slows down development and load times due to bloat. Vite is also much cleaner than webpack, which has caused me hell in the past especially with typescript and ECSMA version incompatibilities.
Updating create-react-app broke a bunch of stuff so I think this is easier than upgrading. But it's also a second task here to update as many packages as possible without breaking the app.
We need to cleanup the helpers package etc so these imports can come from the root!
You are importing from @zk-email/helpers/src which is the .ts code - to transpile ts code in node_modules we would need additional config
You can change the import to /dist - import { toCircomBigIntBytes } from "@zk-email/helpers/dist/binaryFormat";
The .tar.gz decompression step doesn't work -- it compresses fine and uploads fine to the s3 bucket (I think), but when downloading and decompressing (see zkp.ts in targz_frontend branch), the files are bigger than when they were uploaded, and the snarkjs fullProve step gives the error 'zkeyb invalid file format'. This details the error.
Solving this will let us add compressed downloads, halving the download size (decompression is very fast). I think we need to use the zlib
library in JS to do this: https://nodejs.org/api/zlib.html#zlib_zlib_unzip_buffer_options_callback .
Can use Tunnel: https://tunnel.dev/ for this or any other collaborate PR preview provider.
Currently, we load the zkey in chunks to ensure that we can actually fit it into memory. However, there is no verifier.sol generated. So, we either need to fork snarkjs to generate a verifier.sol from the chunked zkeys, or also output a raw zkey that corresponds to the same chunked zkey, that we can call the normal snarkjs verifier generation from.
Edit: This is now WIP at https://github.com/foolo/dkim-lookup!
DKIM is usually a nested DNS record. For instance, for replit, we can see here: https://easydmarc.com/tools/dkim-lookup?domain=replit.com that the DKIM is under the selector "google" and has the value:
Selector: google
Record value:
v=DKIM1; k=rsa; p=MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAk6RNxaxuNyiPhlH6rlgMOXNTaffcVsK+3E6lK1x8c7MO0w7on9zmaiApGE/2hBWQqRpy6EmRdUf6MJH5TmwM++51W4xR0TmTd1JvsbBR/9yjpR++vOahVkrdh0xPaq1zghHYaqNgsOThivw8Hgd8xWQzPPDcw7T+czQS0/Xe/nijU0dVlQX/s+evJpxP7VV/FzlMQvknMj1bCqAgzUFa1mXMO/ZfzHirpGVcJ+h1fMYOIzU4iV3KUIn6i1mg3T+Kw41MFW04F/4nnIQKTTFNGuI+T+6Ss1M1VcjlAxlwYZCJPE0Iy3cOWRBWsgXFZWx2rATlEtkasmf1NFpJu1nATwIDAQAB
Scrape the alexa top 1M websites (and a list of 50 websites that we manually add) for their DKIM key every day, and archive all the answers in a simple UI where someone can just type in a website, and see all the past DKIM keys for it. Note that these DNS records change roughly daily, and we want all selectors, including non-Google ones. Looking for a simple frontend, as well as script that can be run daily without being ratelimited. I recommend hitting DNS directly.
One way to do this is in python, use something like pydig to query the data, store it in a postgreSQL database, and provide a fastapi webserver for browsing it. Approximately 400 non-compressible bytes per entry times 1M sites changing daily would be a max of 400MB per day of data (thanks npulido for the suggestion).
Eventually, include dynamic checking (i.e. for each site, store the gap between the last n checks, and check more often around the distribution of those times).
Enable --O1 and --O2 flags from circom and see how much constraints go down. If it's significant, make sure unconstrained public variables used in the circuit aren't optimized out (i.e. any unused variables used to constrain or something), then update the build scripts and built circuits.
Thinking needed for website to create feedback loop
Done
Inside the body are the attachments. Create a circuit that handles the encoding/decoding of that section, and proves validity of some sub-part of that attachment. Note that this is probably blocked on making the circuits way faster via lookups or faster proving systems, because currently SHA is unsustainable at that scale.
This concerns to the helpers package on DKIM verification and input generation
X-Message-ID
to Message-ID
#90 -- https://github.com/zkemail/zk-email-verify/pulls#issuecomment-1653670956
Improved array indexing may reduce circuit size.
Have the regex match "this email was sent to" or "this email was meant for", and ensure to have enough HTML divs so that it can't be injected. This will let the user prove Twitter ownership from any email. Also increase the max size of the body email to accomodate this.
This will remove dependence on the password reset email, and any follower notification email can be used.
If DKIM fails, try replacing all TABs in body with spaces: "another weird case, is the email supports TABs (ascii 9) rather than spaces". Note that this is an easy find and replace of tabs > spaces on the frontend + in the body parsing js code.
If the email verification fails, it might be due to forwarding. This can insert labels that convert the subject from "This is the subject" to "[Label] This is the subject". We have to strip the label from the beginning of the line in such cases that DKIM verification fails.
Note that ee would need to test all such 2^n permutations of edge cases (in this case n = 2 as the emails can be tabbed/not tabbed or labeled/not labeled, adding exponentially more time to verify in the case of a failure.
I think the zkp2p fixes are here:
see: https://github.com/zkp2p/zk-p2p/blob/develop/client/src/components/ProofGen/validation/hdfc.tsx#L85
and: https://github.com/zkp2p/zk-p2p/blob/develop/client/src/components/ProofGen/validation/venmo.tsx#L62
We can use identical code and primitives to prove JWTs instead. Write this primitive in a branch. Contact yush for how to do this!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.