Giter VIP home page Giter VIP logo

email-classification-challenge's People

Contributors

louismartin avatar pdubreuil1 avatar zaccharieramzi avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

victor-iyi

email-classification-challenge's Issues

Take multiple recipients as one

Une idée pas du tout documentée: est ce qu'il ne sera pas possible de traiter un groupe d'email comme un seul receveur?
Mettons qu'il y ait un groupe de travail de 9 personnes autour d'un sujet, dans tous les échanges de mails les 9 emails apparaissent (8 receveurs et un envoyeur).
Je ne sais pas comment on pourrait s'en servir mais ça serait un moyen de diminuer le nombre de receveur ?

Combination of personal and global info

My idea here is to combine two types of info for each sender: personal and global.
Personal would be a classifier trained only on this particular sender. Global would be trained with no consideration for the sender.
It would allow for cases when a subordinate calls his boss "Sir", but a partner calls him by his name for example.
We could weight the personal classifier by how much training examples we had for it.

Too long body

Some emails have very long bodies... I think all the info contained in it is not all the words, but rather the length of the email => solution, give real length of email as a parameter and maybe crop a little bit the content inside (inside because usually, informative words are either at the beginning or at the end).

It is also useful to do that in order not to pollute the BoW.

Simplest rule

Suggested by @pdubreuil1 : If the email contains "Hi Pierre", you know that "Pierre" is likely to be a recipient.

Test to see if recipient is actually an email

A lot of emails are wrong due to data mishandling. For now, to check if a recipient is indeed an email address, we just check if it has "@" in it.
However, I know that there are complicated and more efficient email address checker out there. We should try to use those.

Forwarded mails / answers

Some mails contain forwarded information which is both a curse and a blessing.
It is a curse because the bag of words is completely polluted with all the information but a blessing because we have all the email addresses of the previous recipients !!!!!!!!

Example: mid: 51172

On July 16th, SDG&E filed a Motion requesting certain actions from theCommission necessary to implement its Memorandum of Understanding with DWR.There are ten CPUC Implementing Decisions provided for in the MOU. Themotion either requests that the action occur or references another CPUCproceeding where the matter is pending.Please let me know if Enron is interested in pursuing any matters related tothe SDG&E MOU.Jeanne Bennett-----Original Message-----From: Ruiz, Annie [mailto:[email protected]]Sent: Monday, July 16, 2001 5:33 PMTo: [email protected]; [email protected]; Brill, Thomas;[email protected]; [email protected]; [email protected]; [email protected];[email protected]; [email protected];[email protected]; [email protected].; [email protected];[email protected]; [email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected];[email protected]; [email protected]; Melville, Keith;[email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected]; [email protected];[email protected]; [email protected];

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.