Giter VIP home page Giter VIP logo

mp-cnn-variants's Introduction

MP-CNN Variations

This is a PyTorch implementation of MP-CNN as a base model with modifications and additions such as attention and sparse features.

Here is the MP-CNN paper:

The datasets are available in https://git.uwaterloo.ca/jimmylin/Castor-data, as well as the GloVe word embeddings.

Directory layout should be like this:

├── MP-CNN-Variants
│   ├── README.md
│   ├── ...
├── Castor-data
│   ├── README.md
│   ├── ...
│   ├── msrvid/
│   ├── sick/
│   └── GloVe/

Note the original paper doesn't use dropout, so dropout=0 mimics this behaviour to allow for fair comparison in the results reported below.

To visualize the training process, just add --tensorboard to use TensorBoard.

SICK Dataset

To run MP-CNN on the SICK dataset mimicking the original paper as closely as possible, use the following command:

python main.py mpcnn.sick.model --dataset sick --epochs 19 --dropout 0 --lr 0.0005
Implementation and config Pearson's r Spearman's p MSE
Paper 0.8686 0.8047 0.2606
PyTorch using above config 0.8692 0.8145 0.2520

TrecQA Dataset

To run MP-CNN on TrecQA, you first need to run the get_trec_eval.sh script in utils.

Then, you can run:

python main.py mpcnn.trecqa.model --arch mpcnn --dataset trecqa --epochs 5 --holistic-filters 200 --lr 0.00018 --regularization 0.0006405 --dropout 0
Implementation and config map mrr
Paper 0.762 0.854
PyTorch using above config 0.774 0.836

The paper results are reported in Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks.

WikiQA Dataset

You also need trec_eval for this dataset, similar to TrecQA.

Then, you can run:

python main.py mpcnn.wikiqa.model --arch mpcnn --dataset wikiqa --epochs 5 --holistic-filters 100 --lr 0.0001 --regularization 0.0002 --dropout 0
Implementation and config map mrr
Paper 0.693 0.709
PyTorch using above config 0.699 0.714

The paper results are reported in Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks.

Other Datasets

MSRVID

To run MP-CNN on the MSRVID dataset, use the following command:

python main.py mpcnn.msrvid.model --dataset msrvid --batch-size 16 --lr 0.0005 --epsilon 1e-7 --epochs 32 --dropout 0 --regularization 0.001

You should be able to obtain Pearson's p to be 0.8980 (untuned), for reference the performance in the paper is 0.9090.

MSRP Dataset

To run MP-CNN on the MSRP dataset, use the following command:

python main.py mpcnn.msrp.model --dataset msrp --epochs 15

To see all options available, use

python main.py --help

Experimental

There are some scripts in this repo for hyperparameter optimization using watermill with some hacks since the library is in alpha. Hence, the imports in hyperparameter_tuning_{random,hyperband}.py and utils/hyperband.py will not work for you at the moment.

References

For results, please see my Master's thesis here:

@mastersthesis{tu2018experimental,
  title={An Experimental Analysis of Multi-Perspective Convolutional Neural Networks},
  author={Tu, Zhucheng},
  year={2018},
  school={University of Waterloo}
}

mp-cnn-variants's People

Contributors

hinhmd avatar tuzhucheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mp-cnn-variants's Issues

Training with SemEval STS data set

Hi,
I want to use your model on the SemEval STS data set. The training set has 22592 pairs of sentences.
I got out of memory on the GPU (2 GPUs (NVIDIA® Tesla® K80) 24 GB GDDR5):
"Runtime Error: cuda runtime error (2): out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:66".
And in the model:
n_feat = n_feat_h + n_feat_v + EXT_FEATS # n_feat = 44427

Do I need to reduce n_feat or increase the GPU's memory to be able to train on the STS?
Do I need to change anything else?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.