List index out of range

MaUde

Metric for automatic Unreferenced dialog evaluation.

Contains code of the paper titled "Learning an Unreferenced Metric for Online Dialogue Evaluation" to appear at ACL 2020, Arxiv

Installation

pip install -r requirements.txt
Install ParlAI

Getting the data

Get the convai2 train and test data and pre-trained Distilbert embeddings here. Download and unzip in the folder convai2_data.
Get the trained model checkpoints from here. Download and unzip into the folder full_acl_runs.
For individual licensing reasons we cannot release the train/test data of MultiWoz, Frames and DailyDialog. Please send me a mail if you need them!
Run inference using ./run_inference.sh

N.B. - For model names and checkpoints, please refer to run_inference.sh script.

Computing Backtranslation

We use FairSeq to compute back-translations. Our modified scripts are present in scripts folder, to run cd into that folder and run ./run_trans.sh.

Computing Corruption Files

In the data dump we already provide the corruption files used for training. To generate new corruption files on the dataset, use scripts/compute_corrupt.py.

Training Script

Uses Pytorch Lightning as the boilerplate for reproducibility.

python codes/trainer.py --mode train \
    --corrupt_type all \ 
    --batch_size 64 \
    --model_save_dir /checkpoint/koustuvs/dialog_metric/all_dailydialog_10_runs \
    --learn_down True --downsample True --down_dim 300 \
    --optim adam,lr=0.001 --dropout 0.2 --decoder_hidden 200 \ 
    --data_name convai2 \ 
    --data_loc /checkpoint/koustuvs/dialog_metric/convai2_data/ \
    --use_cluster

For baselines, add the appropriate flag:

--train_baseline [infersent/ruber/bertnli]

An example training script is provided at run_training.sh

Inference Script

# CUDA_VISIBLE_DEVICES=0 python codes/inference.py \ 
    --id $MODEL_ID --model_save_dir $MODEL_SAVE_DIR \
    --model_version $VERSION --train_mode nce \ 
    --corrupt_pre $DATA_LOC --test_suffix true_response \ 
    --test_column true_response --results_file "results.jsonl"

Outputs the results in a jsonl file. To measure human correaltion with See et al 2019, specify --human_eval flag and --human_eval_file location.
We have also added the script to run inference on our trained checkpoints - run_inference.sh.

Acknowledgements

Pytorch Lightning - https://github.com/williamFalcon/pytorch-lightning
HuggingFace Transformers - https://github.com/huggingface/transformers
FairSeq - https://github.com/pytorch/fairseq
Liming Vie's RUBER implementation - https://github.com/liming-vie/RUBER
Pytorch 1.2.0 - https://pytorch.org/
ParlAI - https://parl.ai/
See et al 2019 data - https://parl.ai/projects/controllable_dialogue/

Questions

Please send a mail to [email protected] for questions / clarifications.
Open an Issue

Citation

If our work is useful for your research, consider citing it using the following bibtex:

@article{sinha2020maude,
  Author = {Koustuv Sinha and Prasanna Parthasarathi and Jasmine Wang and Ryan Lowe and William L. Hamilton and Joelle Pineau},
  Title = {Learning an Unreferenced Metric for Online Dialogue Evaluation},
  Year = {2020},
  journal = {ACL},
  arxiv = {2005.00583},
  url = {https://arxiv.org/abs/2005.00583}
}

License

This work is CC-BY-NC 4.0 (Attr Non-Commercial Inter.) licensed, as found in the LICENSE file.

facebookresearch / online_dialog_eval Goto Github PK

online_dialog_eval's Introduction

MaUde

Installation

Getting the data

Computing Backtranslation

Computing Corruption Files

Training Script

Inference Script

Acknowledgements

Questions

Citation

License

online_dialog_eval's People

Contributors

Stargazers

Watchers

Forkers

online_dialog_eval's Issues

Recommend Projects

Recommend Topics

Recommend Org