Giter VIP home page Giter VIP logo

action-item-detection's Introduction

Action-Item-Detection

classify sentences to actionable sentence and non-actionable sentence

Actionable item => A sentence which asks someone to do something example: "Please create an assignment and forward it by EOD"

=================================================

First download the raw data from the link: https://www.kaggle.com/wcukierski/enron-email-dataset and put that in the enrol-email-dataset Folder --> 1.2GB size and name emails.csv

================================================

Then, extract_main_content.py once you run this file:

  • Input: emails.csv It will extract 5000 email and extract the main content from the raw email and save 5000 email-main-content in the files with the name: content1.csv , content2.csv ........ content104.csv. Sample content1.csv and content2.csv is present in the Folder eron-email-dataset

==========================================

When you run: actionable_sentences.py-

  • Input : content1.csv (one csv file) or any onr of the content(1 to 104).csv file It first extract the sentences from the main content and then outputs if the sentence is actionable or not for all the 5000 email-main-content in the content1.csv file.
  • This is the file where rule-based model is present for classsfying the sentence as actionable or not ======================================

When you run: content2sentence.py Input: content1.csv Output: sentence1.csv Convert the content to sentence and label is by applying out rule-based model.

=====================================

train-data Folder Sample Train Dataset having two column - sentence and label (actionable or non-actionable)

=============================

When you run: rule-based-model-testdata.py Input: test.csv (Given the test file) Apply our rule based model to the given test.csv Accuracy of the rule bases model: 66.769% ('sensitivity: ', 0.5642201834862385) ('specificity: ', 0.7729393468118196)

==============================

When you run: ml-classification-model.py Input: sentence1.csv and sentence2.csv , test.csv Output: accuracy, sensitivity and specificity on the test.csv dataset This module trains the labeled dataset from the sentence*.csv file and randomly picks the 4000 (actionable) and 5000 (non-actionable) sentences and after training predicts the test.csv dataset Results: Accuracy on the given test dataset: RandomForestClassifier accuracy = 63.9167309175% ('sensitivity: ', 0.599388379204893) ('specificity: ', 0.6796267496111975) MultinomialNB accuracy = 63.2228218967% ('sensitivity: ', 0.6926605504587156) ('specificity: ', 0.5707620528771384)

============================================

Note: test.csv file is not present in the repository. Put it in enron-email-dataset Folder test.csv: Hand-crafted labeled dataset - format: |<True|Flase> Ex: Please drop me a mail regarding meeting|True

action-item-detection's People

Contributors

vaibhavsanjaylalka avatar

Stargazers

 avatar Godlike_Yue avatar  avatar Ranjan Dailata avatar Arun Kumar C S avatar Aidan avatar Shane Walker avatar

Watchers

James Cloos avatar  avatar

action-item-detection's Issues

Getting this error while running actionable_sentences.py

$ python actionable_sentences.py
Start processing the cleaned content
Traceback (most recent call last):
  File "actionable_sentences.py", line 77, in <module>
    emails_df = pd.read_csv('./enron-email-dataset/content1.csv', names='m')
  File "/Users/ksachdeva/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/Users/ksachdeva/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 459, in _read
    _validate_names(kwds.get("names", None))
  File "/Users/ksachdeva/opt/anaconda3/lib/python3.8/site-packages/pandas/io/parsers.py", line 444, in _validate_names
    raise ValueError("Names should be an ordered collection.")
ValueError: Names should be an ordered collection.

@vaibhavsanjaylalka

Sentence2.csv missing

Context: Sentence2.csv referenced in the file ml-classification-model.py is missing.
Please share the data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.