Giter VIP home page Giter VIP logo

variational_mmt's Introduction

variational_mmt

TL-DR

This is the code base one should use to reproduce results reported in the ACL 2019 paper Latent variable model for multi-modal translation. We propose a conditional variational auto-encoder model for multi-modal translation, i.e. to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and also to predict image features. Importantly, our model formulation utilises visual and textual inputs during training but does not require that images be available at test time. Please refer to the paper for more details.

Before you start

Before you start, please ensure that:

  • You have installed the right version of PyTorch and all the dependencies according to requirements.txt;
  • If you want to use your own version of the Multi30k data set, that you changed the respective variable names in the run_*.sh files as required.

If you want to use the exact version of the Multi30k data set used in the paper:

  • download a tarball containing all files (PyTorch binaries and image features) for the translated Multi30k data set experiments here. The tarball includes:
    • flickr30k_train_resnet50_cnn_features.hdf5: training set image features, 29K examples.
    • flickr30k_valid_resnet50_cnn_features.hdf5: validation set image features, 1,014 examples.
    • flickr30k_test_resnet50_cnn_features.hdf5: 2016 test set image features, 1K examples.
    • flickr30k_test_2017_flickr_resnet50_cnn_features.hdf5: 2017 test set image features, 1K examples.
    • flickr30k_test_2017_mscoco_resnet50_cnn_features.hdf5: ambiguous MSCOCO test set image features, 461 examples.
    • m30k.{train,valid}.1.pt, m30k.vocab.pt: PyTorch binaries containing sentences in training/validation sets and vocabulary.
    • {train,val,test_2016_flickr,test_2017_flickr,test_2017_mscoco}.lc.norm.tok.bpe-en-de-30000.{en,de}: text files containing train/validation/test sets.
  • download a tarball containing all files (PyTorch binaries and image features) for the backtranslated comparable + translated Multi30k data set experiments here. The tarball includes:
    • flickr30k_train_translated-5x-comparable-1x_resnet50_cnn_features.shuffled.hdf5: this file contains features for 290,000 images, i.e. 29K translated Multi30k images five times each (145K) and 29K comparable Multi30k images also five times each (145K). We upsample images for the translated Multi30k to keep them about half of the images used when training the model in this setting.
    • concat-multi30k-translational-5times-comparable-1time-shuffled_correct.{train,valid}.1.pt, concat-multi30k-translational-5times-comparable-1time-shuffled_correct.vocab.pt: PyTorch binaries containing sentences in training/validation sets and vocabulary.
  • ensure that variable names are correct in the corresponding run_translated_m30k_only.sh and run_additional_data.sh files. Image features were extracted as described in the paper, i.e. using a pretrained ResNet-50 convolutional neural network.

To train a model using only the translated Multi30k, you will use the shell script run_translated_m30k.sh; to train a model using the back-translated comparable + translated Multi30k, you will use run_additional_data.sh. However, before you run these scripts:

  • change DATA_PATH and MODEL_PATH variables (in both run_translated_m30k.sh and run_additional_data.sh), pointing them to the directory where to find the training data (decompressed from the tarball abovementioned) and to the directory where you wish to store model checkpoints, respectively.

Training

To see how to call the train_mm_vi_model1.py script, please refer to the run_*.sh scripts or run train_mm_vi_model1.py --help.

Training a model on the translated Multi30k

To train a model using the Translated Multi30k data set only (~29K source/target/image triplets), run:

run_translated_m30k_only.sh

This bash script assumes you have a GPU available with at least 12GBs, e.g. TitanX, 1080Ti, etc., and sets all the hyperparameters to reproduce the results in the paper.

Training a model on the back-translated comparable and translated Multi30k

To train a model using the back-translated comparable Multi30k in addition to the translated Multi30k data set (total of ~145K source/target/image triplets), simply run:

run_additional_data.sh

This bash script also assumes you have a GPU available with at least 12GBs (e.g. TitanX, 1080Ti, etc.) and sets all the hyperparameters to reproduce the results in the paper.

Decoding a translation

By calling the bash scripts above, you will not only train, but after finishing training will also decode the Multi30k's validation, test 2016, test 2017, and the ambiguous MSCOCO 2017 test set. By default, the model used to translate is the one selected according to best BLEU4 scores on the validation set.

To see how to use the translate_mm_vi.py script directly, please refer to the run_*.sh scripts or call translate_mm_vi.py --help.

Citation

If you use this code base, please consider citing our paper.

@inproceedings{calixto-etal-2019-latent,
    title = "Latent Variable Model for Multi-modal Translation",
    author = "Calixto, Iacer and Rios, Miguel  and Aziz, Wilker",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P19-1642",
    pages = "6392--6405",
}

variational_mmt's People

Contributors

iacercalixto avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.