Giter VIP home page Giter VIP logo

online_dialog_eval's Introduction

MaUde

Metric for automatic Unreferenced dialog evaluation.

Contains code of the paper titled "Learning an Unreferenced Metric for Online Dialogue Evaluation" to appear at ACL 2020, Arxiv

Installation

  • pip install -r requirements.txt
  • Install ParlAI

Getting the data

  • Get the convai2 train and test data and pre-trained Distilbert embeddings here. Download and unzip in the folder convai2_data.
  • Get the trained model checkpoints from here. Download and unzip into the folder full_acl_runs.
  • For individual licensing reasons we cannot release the train/test data of MultiWoz, Frames and DailyDialog. Please send me a mail if you need them!
  • Run inference using ./run_inference.sh

N.B. - For model names and checkpoints, please refer to run_inference.sh script.

Computing Backtranslation

We use FairSeq to compute back-translations. Our modified scripts are present in scripts folder, to run cd into that folder and run ./run_trans.sh.

Computing Corruption Files

In the data dump we already provide the corruption files used for training. To generate new corruption files on the dataset, use scripts/compute_corrupt.py.

Training Script

Uses Pytorch Lightning as the boilerplate for reproducibility.

python codes/trainer.py --mode train \
    --corrupt_type all \ 
    --batch_size 64 \
    --model_save_dir /checkpoint/koustuvs/dialog_metric/all_dailydialog_10_runs \
    --learn_down True --downsample True --down_dim 300 \
    --optim adam,lr=0.001 --dropout 0.2 --decoder_hidden 200 \ 
    --data_name convai2 \ 
    --data_loc /checkpoint/koustuvs/dialog_metric/convai2_data/ \
    --use_cluster

For baselines, add the appropriate flag:

--train_baseline [infersent/ruber/bertnli]

An example training script is provided at run_training.sh

Inference Script

# CUDA_VISIBLE_DEVICES=0 python codes/inference.py \ 
    --id $MODEL_ID --model_save_dir $MODEL_SAVE_DIR \
    --model_version $VERSION --train_mode nce \ 
    --corrupt_pre $DATA_LOC --test_suffix true_response \ 
    --test_column true_response --results_file "results.jsonl"
  • Outputs the results in a jsonl file. To measure human correaltion with See et al 2019, specify --human_eval flag and --human_eval_file location.
  • We have also added the script to run inference on our trained checkpoints - run_inference.sh.

Acknowledgements

Questions

Citation

If our work is useful for your research, consider citing it using the following bibtex:

@article{sinha2020maude,
  Author = {Koustuv Sinha and Prasanna Parthasarathi and Jasmine Wang and Ryan Lowe and William L. Hamilton and Joelle Pineau},
  Title = {Learning an Unreferenced Metric for Online Dialogue Evaluation},
  Year = {2020},
  journal = {ACL},
  arxiv = {2005.00583},
  url = {https://arxiv.org/abs/2005.00583}
}

License

This work is CC-BY-NC 4.0 (Attr Non-Commercial Inter.) licensed, as found in the LICENSE file.

online_dialog_eval's People

Contributors

koustuvsinha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

online_dialog_eval's Issues

List index out of range

I just did as Readme.md told me. But I met a mistake when I ran "sh run_inference.sh". Here is the error message.

Traceback (most recent call last): File "codes/inference.py", line 164, in <module> model_save_path = all_saved_models[0] IndexError: list index out of range
I checked the variable model_save_path in "glob.glob(model_save_path)" and found that its value was "full_acl_runs/na_all/lightning_logs/version_20488119/checkpoints/*.ckpt". I think the path is right.

How can I solve this?

Where to find ~/mlp/latentDialogAnalysis/fine_tune_convai2.txt?

Hi! I'm trying to fine-tuning a language model, but I can't seem to find ~/mlp/latentDialogAnalysis/fine_tune_convai2.txt. And on the pytorch-transformer page there's no mention of that either.

Is it possible to upload it here?? Thanks so much in advance.

Human correlation notebook has errors

I attempted to run the notebook and it mostly works, but I had issues with some code blocks. Please see your notebook with some small deltas on Google Colab (https://colab.research.google.com/drive/10Ro6K14cpgTQuEV6aZKIU7Kzy2MFHsu-?usp=sharing). I've added try except blocks around the cells that are problems.

For the most part, I don't think the errors are a big deal, but I'd like to know what score_scaled should be. It is used here

conv_dial_scores.enjoy.corr(conv_dial_scores.score_scaled, method='pearson')

Thanks!

What does corrupt_type: all_context do?

May I know what is the difference between "all_context" with respect to "all" for the corrupt type option? I understand that "only semantics" and "only syntax" refer to training with only semantic negative samples and only syntactic negative samples respectively. What is the difference between the option "all_context" and "all"?

Issue running inference

I'm getting a FileNotFoundError when trying to run sh run_inference.sh:
FileNotFoundError: [Errno 2] No such file or directory: '/private/home/koustuvs/mlp/latentDialogAnalysis/logs/log.txt'
There seem to be some defaults in args.py that generate this path, but I can't seem to find where this particular error occurs.
The culprit call is

model = model_module.load_from_metrics(

which calls a LightningModule function, but I don't see where logs get involved there.

Meta Cleanup Issue

Cleaning up repo gearing for the public release for ACL 2020.

  • Clean up notebook #1
  • Clean up .zip files
  • Remove Seq2Seq folder
  • Remove backup folder

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.