Giter VIP home page Giter VIP logo

deep-summarization's Introduction

Deep Summarization

Uses Recurrent Neural Network (LSTM and GRU units) for developing Seq2Seq Encoder Decoded model with and without attention mechanism for summarization of amazon food reviews into abstractive tips.

Contents

Encoder Decoder Model

Model

DataSet

  • DataSet Information - This dataset consists of reviews of fine foods from amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plaintext review.

The dataset can be downloaded from here

A sample dataset example looks like this -

product/productId: B001E4KFG0
review/userId: A3SGXH7AUHU8GW
review/profileName: delmartian
review/helpfulness: 1/1
review/score: 5.0
review/time: 1303862400
review/summary: Good Quality Dog Food
review/text: I have bought several of the Vitality canned dog food products and have
found them all to be of good quality. The product looks more like a stew than a
processed meat and it smells better. My Labrador is finicky and she appreciates this
product better than most.

The input review has key review/text and the target summary that we wish to generate has key review/summary. For the purpose of this project, all other fields are ignored and the following two fields are extracted by the extracter script provided.

Installation Requirements

  1. Create a barebone virtual environment and activate it
virtualenv deepsum --no-site-packages
source deepsum/bin/activate
  1. Install the project requirements
pip install -r requirements.txt

Run Instructions

  1. Extract the reviews and target tips using the following command
python extracter_script.py raw_data/finefoods.txt extracted_data/review_summary.csv

NOTE: Don't forget extracting the dataset and keeping it in the raw_data directory before running the above command.

  1. Then run the seed script to create the required permuted training and testing dataset and also train and evaluate the model
# Simple - No Attention
python train_scripts/train_script_gru_simple_no_attn.py

This runs the Simple GRU Cell Based (Without Attention Mechanism) Encoder Decoder model.

  1. Once the above script has completed execution run one of the following scripts in whichever order desired.
  • For Models without Attention Mechanism
# Simple - No Attention
python train_scripts/train_script_lstm_simple_no_attn.py

# Stacked Simple - No Attention
python train_scripts/train_script_gru_stacked_simple_no_attn.py
python train_scripts/train_script_lstm_stacked_simple_no_attention.py

# Bidirectional - No Attention
python train_scripts/train_script_gru_bidirectional_no_attn.py
python train_scripts/train_script_lstm_bidirectional_no_attn.py

# Stacked Bidirectional - No Attention
python train_scripts/train_script_gru_stacked_bidirectional_no_attn.py
python train_scripts/train_script_lstm_stacked_bidirectional_no_attention.py

  • For Models with Attention Mechanism
# Simple - Attention
python train_scripts/train_script_gru_simple_attn.py
python train_scripts/train_script_lstm_simple_attn.py

# Stacked Simple - Attention
python train_scripts/train_script_gru_stacked_simple_attn.py
python train_scripts/train_script_lstm_stacked_simple_attention.py

# Bidirectional - Attention
python train_scripts/train_script_gru_bidirectional_attn.py
python train_scripts/train_script_lstm_bidirectional_attn.py

# Stacked Bidirectional - Attention
python train_scripts/train_script_gru_stacked_bidirectional_attn.py
python train_scripts/train_script_lstm_stacked_bidirectional_attention.py
  1. Finally exit the virtual environment once you have completed the project. You can reactivate the env later.
deactivate

Documentation

The documentation was created automatically, and thus can be error prone. Please report any in the issue table. Some methods have missing documentation. This is not an error, but laziness on my part. I will add those documentations, when I get some free time.

To access documentation, just open index.html located at

docs/build/html/index.html

on your favorite browser. You can open them locally for now. I will try hosting them on Github pages once i get time.

References

  1. J. McAuley and J. Leskovec. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. WWW, 2013.

  2. Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.

  3. Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).

deep-summarization's People

Contributors

lprakash avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-summarization's Issues

RuntimeError and ImportError

RuntimeError: module compiled against API version 0xc but this version of numpy is 0xa
Traceback (most recent call last):
File "extracter_script.py", line 1, in
from helpers.extracter import Spider
File "/home/user/Documents/deep-summarization/helpers/extracter.py", line 2, in
import pandas as pd
File "/home/user/Documents/deep-summarization/deepsum/lib/python2.7/site-packages/pandas/init.py", line 31, in
"extensions first.".format(module))
ImportError: C extension: umpy.core.multiarray failed to import not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace' to build the C extensions first.

Kindly, help me out with this error.

generates empty summaries

I let train_script_gru_simple_no_attn.py run for a long time, and each iteration the generated summary is completely empty. Is this normal? How many iterations is it supposed to run before exiting? After 1500 iterations the generated summaries are still empty.

On Windows getting value error

I have installed tensorflow on windows (with Anaconda) and trying to run on windows.

command : python train_scripts/train_script_gru_simple_no_attn.py

ValueError: Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.GRUCell object at 0x000002063DDCBBE0> with a different variable scope than its first use. First use of cell was with scope 'train_test/embedding_rnn_seq2seq/embedding_rnn_decoder/rnn_decoder/gru_cell', this attempt is with scope 'train_test/embedding_rnn_seq2seq/rnn/gru_cell'. Please create a new instance of the cell if you would like it to use a different set of weights. If before you were using: MultiRNNCell([GRUCell(...)] * num_layers), change to: MultiRNNCell([GRUCell(...) for _ in range(num_layers)]). If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

Please help

Its not running

I have followed your "Run Instructions", I am facing issues in 2nd step of it, which is :
"Then run the seed script to create the required permuted training and testing dataset and also train and evaluate the model"
When I ran "python train_scripts/train_script_gru_simple_no_attn.py" command, it gives me following error

TypeError: cannot create 'sys.version_info' instances

please help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.