Giter VIP home page Giter VIP logo

email-summarization's Introduction

email-summarization

A module for E-mail Summarization which uses clustering of skip-thought sentence embeddings.
This code in this repository compliments this Medium article.

Instructions

  • The code is written in Python 2.
  • The module uses code of the Skip-Thoughts paper which can be found here. Do:
    git clone https://github.com/ryankiros/skip-thoughts
    
  • The code for the skip-thoughts paper uses Theano. Make sure you have Theano installed and GPU acceleration is functional for faster execution.
  • Clone this repository and copy the file email_summarization.py to the root of the cloned skip-thoughts repository. Do:
    git clone https://github.com/jatana-research/email-summarization
    cp email-summarization/email_summarization.py skip-thoughts/
    
  • Install dependencies. Do:
    pip install -r email-summarization/requirements.txt
    python -c 'import nltk; nltk.download("punkt")'
    
  • Download the pre-trained models. The total download size will be of around 5 GB. Do:
    mkdir skip-thoughts/models
    wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/dictionary.txt
    wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/utable.npy
    wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/btable.npy
    wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/uni_skip.npz
    wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/uni_skip.npz.pkl
    wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/bi_skip.npz
    wget -P ./skip-thoughts/models http://www.cs.toronto.edu/~rkiros/models/bi_skip.npz.pkl
    
  • Verify the MD5 hashes of the downloaded files to ensure that the files haven't been corrupted during the download. Do:
    md5sum skip-thoughts/models/*
    
    The output should be:
    9a15429d694a0e035f9ee1efcb1406f3 bi_skip.npz
    c9b86840e1dedb05837735d8bf94cee2 bi_skip.npz.pkl
    022b5b15f53a84c785e3153a2c383df6 btable.npy
    26d8a3e6458500013723b380a4b4b55e dictionary.txt
    8eb7c6948001740c3111d71a2fa446c1 uni_skip.npz
    e1a0ead377877ff3ea5388bb11cfe8d7 uni_skip.npz.pkl
    5871cc62fc01b79788c79c219b175617 utable.npy
    
  • Change Lines:23-24 in the file skip-thoughts/skipthoughts.py to provide the correct paths to the downloaded models.
    path_to_models = 'models/'
    path_to_tables = 'models/'
    

Running the module

  • Find any English emails dataset online or create a small one on your own.
  • The module expects a list of emails as input and returns a list of summaries.
  • Open the Python interpreter in the skip-thoughts/ folder and do:
    >>> from email_summarization import summarize
    >>> summaries = summarize(emails) # emails is a Python list containing English emails.
    

email-summarization's People

Contributors

kushalchauhan98 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

email-summarization's Issues

Theano error while training

On running the train module from this github repository

import train
train.trainer(list)

While running the above code. we are getting error in Theano. error file has been attached.

[error.txt](https://github.com/jatana-research/email-summarization/files/2414610/error.txt)
error2.txt

Memory Error

I have been facing the memory error in my project. I am using an EC2 instance with 8GB of RAM.

code doesn't execute

Hi I followed your medium article, great article and thanks for sharing. But when I tried to execute the code it stops executing after the line given below which is in skipthought. py file and init_params(options) function.

params = get_layer(options['encoder'])[0](options, params, prefix='encoder',
                                              nin=options['dim_word'], dim=options['dim'])

I couldn't figure out the issue. Can you spot it?

FYI: I'm using Python 3.6.4 :: Anaconda custom and I've fixed the print statement and import errors.

Encoding error: KeyError: 'UNK'

Am facing issue in encode function of skipthought.py
Below line has thrown error:
uembedding[j,ind] = model['utable']['UNK']

KeyError: 'UNK'

am using below package version:
Theano== 0.7.0
numpy==1.15.1
keras== 2.2.2
python==2.7.15

image002

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.