Giter VIP home page Giter VIP logo

grade's Introduction

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

This repository contains the source code for the following paper:

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems
Lishan Huang, Zheng Ye, Jinghui Qin, Xiaodan Liang; EMNLP 2020

Model Overview

GRADE

Prerequisites

Create virtural environment (recommended):

conda create -n GRADE python=3.6
source activate GRADE

Install the required packages:

pip install -r requirements.txt

Install Texar locally:

cd texar-pytorch
pip install .

Note: Make sure that your environment has installed cuda 10.1.

Data Preparation

GRADE is trained on the DailyDialog Dataset proposed by (Li et al.,2017).

For convenience, we provide the processed data of DailyDialog. And you should also download it and unzip into the data directory. And you should also download tools and unzip it into the root directory of this repo.

If you wanna prepare the training data from scratch, please follow the steps:

  1. Install Lucene;
  2. Run the preprocessing script:
cd ./script
bash preprocess_training_dataset.sh

Training

To train GRADE, please run the following script:

cd ./script
bash train.sh

Note that the checkpoint of our final GRADE is provided. You could download it and unzip into the root directory.

Evaluation

We evaluate GRADE and other baseline metrics on three chit-chat datasets (DailyDialog, ConvAI2 and EmpatheticDialogues). The corresponding evaluation data in the evaluation directory has the following file structure:

.
└── evaluation
    └── eval_data
    |   └── DIALOG_DATASET_NAME
    |       └── DIALOG_MODEL_NAME
    |           └── human_ctx.txt
    |           └── human_hyp.txt
    └── human_score
        └── DIALOG_DATASET_NAME
        |   └── DIALOG_MODEL_NAME
        |       └── human_score.txt
        └── human_judgement.json

Note: the entire human judgement data we proposed for metric evaluation is in human_judgement.json.

To evaluate GRADE, please run the following script:

cd ./script
bash eval.sh

Using GRADE

To use GRADE on your own dialog dataset:

  1. Put the whole dataset (raw data) into ./preprocess/dataset;
  2. Update the function load_dataset in ./preprocess/extract_keywords.py for loading the dataset;
  3. Prepare the context-response data that you want to evaluate and convert it into the following format:
.
└── evaluation
    └── eval_data
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── human_ctx.txt
                └── human_hyp.txt
  1. Run the following script to evaluate the context-response data with GRADE:
cd ./script
bash inference.sh
  1. Lastly, the scores given by GRADE can be found as below:
.
└── evaluation
    └── infer_result
        └── YOUR_DIALOG_DATASET_NAME
            └── YOUR_DIALOG_MODEL_NAME
                ├── non_reduced_results.json
                └── reduced_results.json

grade's People

Contributors

li3cmz avatar james-yip avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.