Giter VIP home page Giter VIP logo

bso's Introduction

Code for Sequence-to-Sequence Learning as Beam-Search Optimization (Wiseman and Rush, 2016).

This code is adapted from a much earlier version of Yoon Kim's seq2seq-attn code.

For questions/concerns/bugs feel free to contact swiseman at seas.harvard.edu.

Running Experiments

First prepare the data as in data_prep/.

All seq2seq baselines use the seq2seq-attn code.

Word-ordering Experiments

Pretrain with

th pretrain.lua -data_file wo-train.hdf5 -val_data_file wo-val.hdf5 -savefile wopt -num_layers 2 -rnn_size 256 -word_vec_size 256 -save_after 10 -adagrad -layer_etas 0.02,0.01,0.2 -epochs 10 -curriculum 1 -dropout 0.2

Unconstrained train with

th bso_train.lua -data_file wo-train.hdf5 -val_data_file wo-val.hdf5 -savefile wosave -num_layers 2 -rnn_size 256 -word_vec_size 256 -adagrad -layer_etas 0.02,0.02,0.2 -curriculum 0 -epochs 39 -train_from wopt_epoch10.00_*.t7 -dropout 0.2 -max_beam_size 6 -beam_size 2

Constrained training is accomplished by adding the argument '-con wo' to the above.

Predict with

``th predict.lua -val_data_file wo-val.hdf5 -model wosave_epoch39*.t7 -src_file wo-src-val.txt -src_dict wo.src.dict -targ_dict wo.targ.dict -beam_size 5 -con wo -output_file val-unconstrwo-preds.out```

Train the seq2seq baseline as

th train.lua -data_file wo-train.hdf5 -val_data_file wo-val.hdf5 -savefile wos2s -num_layers 2 -rnn_size 256 -word_vec_size 256 -save_after 10 -param_init 0.1 -adagrad -layer_lrs 0.02,0.01,0.2 -lr_decay 1 -epochs 30 -curriculum 1 -dropout 0.2

(and use the epoch with the lowest validation perplexity)

Dependency Parsing Experiments

Pretrain with

th pretrain.lua -data_file dep-train.hdf5 -val_data_file dep-val.hdf5 -savefile deppt -num_layers 2 -rnn_size 300 -word_vec_size 300 -save_after 5 -adagrad -layer_etas 0.02,0.02,0.2 -epochs 5 -curriculum 1 -dropout 0.3 -pre_word_vecs_enc dep_src_w2v.h5 -pre_word_vecs_dec dep_targ_w2v.h5

Constrained train with

th bso_train.lua -data_file dep-train.hdf5 -val_data_file dep-val.hdf5 -savefile condep -num_layers 2 -rnn_size 300 -word_vec_size 300 -save_after 16 -adagrad -curriculum 0 -epochs 16 -train_from deppt_epoch5_*.t7 -dropout 0.3 -max_beam_size 6 -beam_size 2 -layer_etas 0.02,0.02,0.1 -ignore_eos -src_dict dep.src.dict -targ_dict dep.targ.dict -con sr

(Unconstrained training can be accomplished by leaving out the '-con sr' argument)

Predict with

th predict.lua -val_data_file dep-val.hdf5 -model condep_epoch16_*.t7 -gpuid 1 -src_file dep-src-val.txt -src_dict dep.src.dict -targ_dict dep.targ.dict -beam_size 5 -con sr -output_file val-condepb5-preds.out

Train the seq2seq baseline as

th train.lua -data_file dep-train.hdf5 -val_data_file dep-val.hdf5 -savefile deps2s -num_layers 2 -rnn_size 300 -word_vec_size 300 -adagrad -layer_lrs 0.02,0.02,0.2 -lr_decay 1 -epochs 25 -curriculum 1 -dropout 0.3 -pre_word_vecs_enc dep_src_w2v.h5 -pre_word_vecs_dec dep_targ_w2v.h5

(and use the epoch with the lowest validation perplexity)

MT Experiments

Pretrain with

th pretrain.lua -data_file mixer-train.hdf5 -val_data_file mixer-val.hdf5 -savefile mixerpt -num_layers 1 -rnn_size 256 -word_vec_size 256 -save_after 3 -adagrad -layer_etas 0.02,0.02,0.2 -epochs 3 -curriculum 1 -dropout 0.2

Train with

th bso_train.lua -data_file mixer-train.hdf5 -val_data_file mixer-val.hdf5 -savefile mixersave -num_layers 1 -rnn_size 256 -word_vec_size 256 -save_after 21 -adagrad -curriculum 0 -epochs 21 -train_from mixerpt_epoch3_*.t7 -dropout 0.2 -max_beam_size 6 -beam_size 2 -layer_etas 0.02,0.02,0.1 -mt_delt_multiple 1

Predict with

th predict.lua -val_data_file mixer-val.hdf5 -model mixersave_epoch21_*.t7 -src_file valid.de-en.de -src_dict mixer.src.dict -targ_dict mixer.targ.dict -beam_size 5 -output_file val-mixer-preds.out

Train the seq2seq baseline as

th pretrain.lua -data_file mixer-train.hdf5 -val_data_file mixer-val.hdf5 -savefile mixers2s -num_layers 1 -rnn_size 256 -word_vec_size 256 -adagrad -layer_lrs 0.02,0.02,0.2 -epochs 15 -lr_decay 1 -curriculum 1 -dropout 0.2

(and use the epoch with the lowest validation perplexity).

MIT License.

bso's People

Contributors

swiseman avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.