Giter VIP home page Giter VIP logo

enron_email_analysis's Introduction

Enron Email Analysis

USAGE

You'll need Docker and the ability to run Docker as your current user.

You'll need to build the container:

> docker build . -t enron_env

This Docker container is based on rocker/verse. To run rstudio server:

> docker run -v `pwd`:/home/rstudio -p 8787:8787 -e PASSWORD=mypass -t enron_env

Then connect to the machine on port 8787. \

Username: rstudio
Password: mypass

Make

Use Makefile as recipe book for building artifacts found in derived directories.

Example:

In local project directory, to build artifact Example.csv:

> make derived_data/Example.csv

Use artifacts before colon as make targets. Dependencies are listed after colon.

Data

Data from Here will need to be in source_data directory.

Introduction

This dataset is named Enron Corpus, which was collected and prepared by the CALO Project (Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.

The email dataset was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems, and it is thanks to them that the dataset is available. The dataset here does not include attachments, and some messages have been deleted "as part of a redaction effort due to requests from affected employees." Invalid email addresses were converted to something of the form [email protected] whenever possible (i.e., recipient is specified in some parse-able format like "Doe, John" or "Mary K. Smith") and to [email protected] when no recipient was specified.

This dataset, along with a thorough explanation of its origin, is available at Here

Methodology

This project will use natural language processing to analyze emails between Enron employees to gain deeper insights into the collapse of the company.

Preliminary Plots

enron_email_analysis's People

Contributors

mattymo18 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.