Giter VIP home page Giter VIP logo

hamlet's Introduction

Hamlet, an email corpus

This repository contains randomly generated emails according to our mrmime tool. Mr. MIME is a library to parse & encode an email. So we developped a tool to randomly generate an email and ensure a kind of isormophism such as:

$ x = generate (seed)
$ test x = decode (encode (x))

This corpus contains valid emails and Mr. MIME does not alterate them when it parses or encodes them. Our MUA should do the same! As the Enron's corpus, Hamlet wants to improve the email stack.

Contents of emails

Due to the random generation, emails are not human-readable but still they are valids. An email client should be able to process it and transfer it - decode it / encode it - without alterations.

The goal of this corpus is an alternative to the Enron database without privacy concerns (because we generated these emails instead of collecting them). It permits us to test our email implementation and check its reliability with the standards (see RFC5322).

In our development process, this corpus is acts as an "oracle" to ensure some edge cases about the standards and a kind of "isomorphim".

How to reproduce Hamlet

This corpus was generated via our Mr. MIME tool with a specific seed. You can reproduce the generation:

$ git clone https://github.com/mirage/mrmime
$ cd mrmime
$ opam pin add -y .
$ dune exec corpus/generate.exe -- crowbar --multi 1000000 --seed 0

Isomorphism

The equality we have defined between a generated email m and its and decoded counterpart m' is not strictly exact. What is actually check is:

  • structural equality: both emails must have the same structure. For example, if m is a multipart email with 3 parts, m' is also a multipart mail with 3 parts, and the equality is recursivly called on each of the parts;

  • partial header equality: the headers present are the same and in order but equality between the values is only checked for the headers for which values are completely parsed by mrmime (like Content-Type, Content-Transfer-Encoding or Date). Note a small exception for Content-type header: boundary parameter can change between m and m';

  • content equality.

By this way, semantically, Mr. MIME does not alterate any important part of your email and what you parsed is what you can read.

License

The corpus is under the CC0 license.

hamlet has received funding from the Next Generation Internet Initiative (NGI) within the framework of the DAPSI Project.

hamlet's People

Contributors

dinosaure avatar lyrm avatar

Stargazers

Virgile Robles avatar  avatar Alex avatar Ulysse avatar Jules Aguillon avatar Patrick Ferris avatar savi2w avatar Masanori Ogino avatar Tristan de Cacqueray avatar Sora Morimoto avatar Marcello Seri avatar

Watchers

Anil Madhavapeddy avatar Thomas Gazagnaire avatar Richard Mortier avatar David Scott avatar Hannes Mehnert avatar  avatar Thomas Leonard avatar James Cloos avatar  avatar

Forkers

lyrm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.