louismartin / email-classification-challenge Goto Github PK
View Code? Open in Web Editor NEWAltegard challenge in collaboration w/ Linagora
Home Page: https://inclass.kaggle.com/c/master-data-science-mva-data-competition-2017
Altegard challenge in collaboration w/ Linagora
Home Page: https://inclass.kaggle.com/c/master-data-science-mva-data-competition-2017
Maybe we should not remove these kind of words !
Une idée pas du tout documentée: est ce qu'il ne sera pas possible de traiter un groupe d'email comme un seul receveur?
Mettons qu'il y ait un groupe de travail de 9 personnes autour d'un sujet, dans tous les échanges de mails les 9 emails apparaissent (8 receveurs et un envoyeur).
Je ne sais pas comment on pourrait s'en servir mais ça serait un moyen de diminuer le nombre de receveur ?
TEDLT
My idea here is to combine two types of info for each sender: personal and global.
Personal would be a classifier trained only on this particular sender. Global would be trained with no consideration for the sender.
It would allow for cases when a subordinate calls his boss "Sir", but a partner calls him by his name for example.
We could weight the personal classifier by how much training examples we had for it.
Some emails have very long bodies... I think all the info contained in it is not all the words, but rather the length of the email => solution, give real length of email as a parameter and maybe crop a little bit the content inside (inside because usually, informative words are either at the beginning or at the end).
It is also useful to do that in order not to pollute the BoW.
Suggested by @pdubreuil1 : If the email contains "Hi Pierre", you know that "Pierre" is likely to be a recipient.
A lot of emails are wrong due to data mishandling. For now, to check if a recipient is indeed an email address, we just check if it has "@"
in it.
However, I know that there are complicated and more efficient email address checker out there. We should try to use those.
Some mails contain forwarded information which is both a curse and a blessing.
It is a curse because the bag of words is completely polluted with all the information but a blessing because we have all the email addresses of the previous recipients !!!!!!!!
Example: mid: 51172
On July 16th, SDG&E filed a Motion requesting certain actions from theCommission necessary to implement its Memorandum of Understanding with DWR.There are ten CPUC Implementing Decisions provided for in the MOU. Themotion either requests that the action occur or references another CPUCproceeding where the matter is pending.Please let me know if Enron is interested in pursuing any matters related tothe SDG&E MOU.Jeanne Bennett-----Original Message-----From: Ruiz, Annie [mailto:[email protected]]Sent: Monday, July 16, 2001 5:33 PMTo: [email protected]; [email protected]; Brill, Thomas;[email protected]; [email protected]; [email protected]; [email protected];[email protected]; [email protected];[email protected]; [email protected].; [email protected];[email protected]; [email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected];[email protected]; [email protected]; Melville, Keith;[email protected]; [email protected]; [email protected];[email protected]; [email protected]; [email protected]; [email protected];[email protected]; [email protected];
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.