Giter VIP home page Giter VIP logo

Comments (7)

Divide-By-0 avatar Divide-By-0 commented on June 16, 2024

yes! so we created this local only app: https://github.com/zkemail/selector-scraper
that scrapes the selectors from the last 10,000 emails in your personal inbox then displays them in a very simple, very ugly list on a frontend
turns out like most things, you can get like a good 20% of all websites with like only 40 selectors
and all the rest are one-off
so if we just slightly modified this script to add these selectors to a database with the date then we read from that database, we could have a historical registry

so then we just need a very simple, pretty client side website (i.e. one existing one is https://easydmarc.com/tools/dkim-lookup but we can do better) that also offers historical results and any matched selectors in the db for any searched domain (which we can get since all your emails are timestamped)

i ran this on one of my non-primary inboxes and got this list:
selector_db_dump.txt

from zk-email-verify.

Divide-By-0 avatar Divide-By-0 commented on June 16, 2024

Olof: instead of a database with dkim key(s) for each domain, we make a database with selectors for each domain, and then a website which fetches the selectors for a specific domain from the DB, and then gets the dkim keys on-the-fly with a dns lookup (that happens in the client's browser) to the domain of interest?

Well the database should store historical dkim keys, plus maybe a signature from the user uploading them -- and yes in real time, we can also get the latest one from local client DNS (as well as locally calculate the poseidon hash to compare to the onchain one). Unfortunately rn there isn't a great way to verify them except by trusting certain signatories for now.

from zk-email-verify.

foolo avatar foolo commented on June 16, 2024

Well the database should store historical dkim keys, plus maybe a signature from the user uploading them -- and yes in real time, we can also get the latest one from local client DNS (as well as locally calculate the poseidon hash to compare to the onchain one). Unfortunately rn there isn't a great way to verify them except by trusting certain signatories for now.

@Divide-By-0 In https://github.com/zkemail/selector-scraper we store the selectors into a sqlite db. Which of the following do we want?

  1. modify selector-scraper so that it stores selectors and fetches+stores DKIM keys, and also modify it to use postgreSQL instead of sqlite.
  2. create a new app that goes though all the selectors from the sqlite db, fecthes the DKIM keys online and puts them in a postgreSQL db, (together with info about selectors, dates ets)?

from zk-email-verify.

Divide-By-0 avatar Divide-By-0 commented on June 16, 2024

Well this sqlite one was the quickest to put up, but yeah I'd recommend moving to postgresql generally.

If you keep the current app, we'd have to find a way to make the public scraper code to adapt to only have access perms to add records *with signatures), not direct db access, and have some basic ddos protection. I would say you should do whatever is easiest for you, I'm fine keeping it as the same site or as two seperate sites.

from zk-email-verify.

foolo avatar foolo commented on June 16, 2024

Well this sqlite one was the quickest to put up, but yeah I'd recommend moving to postgresql generally.

If you keep the current app, we'd have to find a way to make the public scraper code to adapt to only have access perms to add records *with signatures), not direct db access, and have some basic ddos protection. I would say you should do whatever is easiest for you, I'm fine keeping it as the same site or as two seperate sites.

@Divide-By-0
Ok, thanks! I'm creating a Next.js app which has a Vercel Postgres database. I also created an uploader script (yes, another script :) ) that reads domains+selectors from the emails.db sqlite3 file, then fetches DKIM records from the DNS server, and uploads everything to the Postgres server on Vercel, where the data can the be used by the end-user website. Right now this uploader script is connecting via a database connection, but later we can change so that we have an API route instead. We can also change the uploader script so that it reads domains+selectors from some common file format, and we can then write data scrapers for other email provider than gmail, as long as their output has that common format.

from zk-email-verify.

foolo avatar foolo commented on June 16, 2024

@Divide-By-0 I worked a bit more on this website.
It's live on https://dkim-lookup.vercel.app/ and the code is here: https://github.com/foolo/dkim-lookup/tree/main/dkim-lookup-app

Current features are briefly:

  • A database with historic and current keys for a list of domains and selectors (which have been scraped from our private email history).
  • Regular batch job to update selectors via DNS lookup, and if there are updates, store a new db record with timestamp.
  • Simple UI frontend for searching by domain.

Question:

Regarding "Scrape the alexa top 1M websites". We discussed this a while ago and I think we chose the email-inbox-scraping approach instead (?) for the reason that there is no direct way of knowing the selector names for a particular domain. Do we still want this feature in some form or another? For example we could loop the 1M-list and guess among the 25 most common selectors?
Then there is also the problem that the user-facing domain is not always the same as the DNS domain for DKIM key lookup. E.g. example.com may use examplemail.com for DKIM verification, so we won't necessarily find anything if we search for selectors directly on the domains from the Alexa list.

from zk-email-verify.

foolo avatar foolo commented on June 16, 2024

@Divide-By-0 Another example: on the 1M-list we would find yahoo.com, but when we scrape emails, the from-address, and the DKIM domain is cc.yahoo-inc.com:

From: Yahoo <[email protected]>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cc.yahoo-inc.com; s=fz2048;

from zk-email-verify.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.