Comments (7)
yes! so we created this local only app: https://github.com/zkemail/selector-scraper
that scrapes the selectors from the last 10,000 emails in your personal inbox then displays them in a very simple, very ugly list on a frontend
turns out like most things, you can get like a good 20% of all websites with like only 40 selectors
and all the rest are one-off
so if we just slightly modified this script to add these selectors to a database with the date then we read from that database, we could have a historical registry
so then we just need a very simple, pretty client side website (i.e. one existing one is https://easydmarc.com/tools/dkim-lookup but we can do better) that also offers historical results and any matched selectors in the db for any searched domain (which we can get since all your emails are timestamped)
i ran this on one of my non-primary inboxes and got this list:
selector_db_dump.txt
from zk-email-verify.
Olof: instead of a database with dkim key(s) for each domain, we make a database with selectors for each domain, and then a website which fetches the selectors for a specific domain from the DB, and then gets the dkim keys on-the-fly with a dns lookup (that happens in the client's browser) to the domain of interest?
Well the database should store historical dkim keys, plus maybe a signature from the user uploading them -- and yes in real time, we can also get the latest one from local client DNS (as well as locally calculate the poseidon hash to compare to the onchain one). Unfortunately rn there isn't a great way to verify them except by trusting certain signatories for now.
from zk-email-verify.
Well the database should store historical dkim keys, plus maybe a signature from the user uploading them -- and yes in real time, we can also get the latest one from local client DNS (as well as locally calculate the poseidon hash to compare to the onchain one). Unfortunately rn there isn't a great way to verify them except by trusting certain signatories for now.
@Divide-By-0 In https://github.com/zkemail/selector-scraper we store the selectors into a sqlite db. Which of the following do we want?
- modify selector-scraper so that it stores selectors and fetches+stores DKIM keys, and also modify it to use postgreSQL instead of sqlite.
- create a new app that goes though all the selectors from the sqlite db, fecthes the DKIM keys online and puts them in a postgreSQL db, (together with info about selectors, dates ets)?
from zk-email-verify.
Well this sqlite one was the quickest to put up, but yeah I'd recommend moving to postgresql generally.
If you keep the current app, we'd have to find a way to make the public scraper code to adapt to only have access perms to add records *with signatures), not direct db access, and have some basic ddos protection. I would say you should do whatever is easiest for you, I'm fine keeping it as the same site or as two seperate sites.
from zk-email-verify.
Well this sqlite one was the quickest to put up, but yeah I'd recommend moving to postgresql generally.
If you keep the current app, we'd have to find a way to make the public scraper code to adapt to only have access perms to add records *with signatures), not direct db access, and have some basic ddos protection. I would say you should do whatever is easiest for you, I'm fine keeping it as the same site or as two seperate sites.
@Divide-By-0
Ok, thanks! I'm creating a Next.js app which has a Vercel Postgres database. I also created an uploader script (yes, another script :) ) that reads domains+selectors from the emails.db sqlite3 file, then fetches DKIM records from the DNS server, and uploads everything to the Postgres server on Vercel, where the data can the be used by the end-user website. Right now this uploader script is connecting via a database connection, but later we can change so that we have an API route instead. We can also change the uploader script so that it reads domains+selectors from some common file format, and we can then write data scrapers for other email provider than gmail, as long as their output has that common format.
from zk-email-verify.
@Divide-By-0 I worked a bit more on this website.
It's live on https://dkim-lookup.vercel.app/ and the code is here: https://github.com/foolo/dkim-lookup/tree/main/dkim-lookup-app
Current features are briefly:
- A database with historic and current keys for a list of domains and selectors (which have been scraped from our private email history).
- Regular batch job to update selectors via DNS lookup, and if there are updates, store a new db record with timestamp.
- Simple UI frontend for searching by domain.
Question:
Regarding "Scrape the alexa top 1M websites". We discussed this a while ago and I think we chose the email-inbox-scraping approach instead (?) for the reason that there is no direct way of knowing the selector names for a particular domain. Do we still want this feature in some form or another? For example we could loop the 1M-list and guess among the 25 most common selectors?
Then there is also the problem that the user-facing domain is not always the same as the DNS domain for DKIM key lookup. E.g. example.com
may use examplemail.com
for DKIM verification, so we won't necessarily find anything if we search for selectors directly on the domains from the Alexa list.
from zk-email-verify.
@Divide-By-0 Another example: on the 1M-list we would find yahoo.com
, but when we scrape emails, the from-address, and the DKIM domain is cc.yahoo-inc.com
:
From: Yahoo <[email protected]>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cc.yahoo-inc.com; s=fz2048;
from zk-email-verify.
Related Issues (20)
- Change imports from /dist to index.js in root for npm packages HOT 5
- Retry/Fail Proof if zkey Download Fails in Twitter Demo
- Remove address_plus_one
- Fuzz against independent implementations
- Integration with zk-regex HOT 3
- Script to update Token registry HOT 1
- RSA Add Test for 1024 bit keys with 2048 bit circuit HOT 1
- JS -> TS HOT 3
- Support weird emails in the "to" field
- Make zkp.ts generic HOT 4
- Benchmark constraint diff with O1 and O2 flags from circom
- Unable to generate circuit inputs for email with base64 encoded attachment HOT 5
- Export Rust verifier
- Move twitter example to separate repo
- Refactor DKIM and handle ARC HOT 1
- Fix simple DKIM edge cases HOT 4
- DKIM verification fails due to msg body modification by private relays/transfer agents HOT 5
- Use body hash from regex instead of idx passed in HOT 4
- Make bodyparsing hashing work for more mail clients
- Replace log2 with log2_ceil HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zk-email-verify.