Giter VIP home page Giter VIP logo

coco-dst's Introduction

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers

Authors: Shiyang Li*, Semih Yavuz*, Kazuma Hashimoto, Jia Li, Tong Niu, Nazneen Rajani, Xifeng Yan, Yingbo Zhou and Caiming Xiong (*Equal Contribution)

Abstract

Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the heldout conversations is less understood. We propose controllable counterfactuals (COCO) to bridge this gap and evaluate dialogue state tracking (DST) models on novel scenarios, i.e., would the system successfully tackle the request if the user responded differently but still consistently with the dialogue flow? COCO leverages turn-level belief states as counterfactual conditionals to produce novel conversation scenarios in two steps: (i) counterfactual goal generation at turnlevel by dropping and adding slots followed by replacing slot values, (ii) counterfactual conversation generation that is conditioned on (i) and consistent with the dialogue flow. Evaluating state-of-the-art DST models on MultiWOZ dataset with COCO-generated counterfactuals results in a significant performance drop of up to 30.8% (from 49.4% to 18.6%) in absolute joint goal accuracy. In comparison, widely used techniques like paraphrasing only affect the accuracy by at most 2%. Human evaluations show that COCO-generated conversations perfectly reflect the underlying user goal with more than 95% accuracy and are as human-like as the original conversations, further strengthening its reliability and promise to be adopted as part of the robustness evaluation of DST models.

Paper link: https://arxiv.org/pdf/2010.12850.pdf

Model Architecture

coco

The overall pipeline of CoCo. The very left part represents the training phase of utterance generation model, where the concatenation of system utterance and turn-level belief state is processed by the encoder, which the decoder then conditions on to generate the user utterance. The input and output of this model is shown within the box at the lower-left. The right part depicts the inference phase, where the counterfactual goal generator first modifies the original belief state fed from the left part into a new one, which is then fed to the trained utterance generator along with the same conversation history to generate new user utterances by beam search followed by filtering undesired utterances. Note that conversational turns in inference phase don’t have to originate from training phase.

Installation

The package general requirements are

  • Python >= 3.7
  • Pytorch >= 1.5 (installation instructions here)
  • Transformers >= 3.0.2 (installation instructions here)

The package can be installed by running the following command. Run

sh setup.sh

Usage

This section explains steps to prepare for MultiWOZ dataset and how to train CoCo model and run it for evaluation and data augmentation.

Data

It includes preprocessed MultiWOZ 2.1 and MultiWOZ 2.2 dataset. Download, uncompress it, and place the resulting multiwoz folder under the root of the repository as ./multiwoz.

Details of CoCo:

See ./coco-dst/README.md

Details of TRADE:

See ./trade-dst/README.md

Details of SimpleTOD:

See ./simpletod/README.md

Details of TripPy:

See ./trippy-public/README.md

Citation

@article{SHIYANG2020CoCoCC, 
title={CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers}, 
author={Shiyang Li, Semih Yavuz, Kazuma Hashimoto, Jia Li, Tong Niu, Nazneen Rajani, Xifeng Yan, Yingbo Zhou and Caiming Xiong}, 
journal={ArXiv}, 
year={2020}, 
volume={abs/2010.12850} }

Questions?

For any questions, feel free to open issues, or shoot emails to

License

The code is released under BSD 3-Clause - see LICENSE for details.

This code includes other open source software components: trade-dst, simpletod, and trippy-public. Each of these software components have their own license. Please see them under ./trade-dst, ./simpletod, and ./trippy-public folders.

coco-dst's People

Contributors

semihyavuzz avatar svc-scm avatar

Stargazers

temp avatar gyunggyung avatar Jeff Carpenter avatar Riku Arakawa avatar Xuemin Zhao avatar Chen Zhang avatar  avatar Longxu Dou avatar  avatar Suzie Oh avatar Chun-Mao Lai avatar  avatar Annymie avatar  avatar Tingfeng Cao avatar Qinyuan Cheng avatar Tae-Jin Woo avatar Ong Seong Wu avatar 2+c avatar Ara Bae avatar DaDa avatar 刘致远 avatar Takyoung Kim avatar Zekun Li avatar Janghoon Han avatar  avatar 爱可可-爱生活 avatar joongbo avatar Richard avatar  avatar magicye avatar  avatar diogo  avatar  avatar Yura Choi avatar Hao-Tong Ye avatar Qi Zhu avatar Xiaoting avatar Seonghan Ryu avatar Sashank Santhanam avatar  avatar Eng.J avatar Chao-Wei Huang avatar  avatar Jun avatar Fangkai Jiao avatar Yukyung Lee avatar Kyumin avatar Kazuma Hashimoto avatar Akash Singh avatar Ramsey avatar  avatar  avatar

Watchers

Caridy Patiño avatar James Cloos avatar Demian Brecht avatar  avatar  avatar

coco-dst's Issues

How to set the round I want to run for data augmentation?

Thank you for providing the augmentation training file 8times_coco-vs_rare_out_domain_train_classifier_change_add-2-max-3_drop-1_seed_0.json. However, I can't find how to set the round to run for data augmentation in run_genn.py . Could you offer more details?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.