Giter VIP home page Giter VIP logo

airdialogue_model's Introduction

AirDialogue Model

Prerequisites

General

  • python (verified on 3.7)
  • wget

Python Packages

  • tensorflow (verified on 1.15.0)
  • airdialogue

1. Prepare Dataset

AirDialogue dataset and its meta data can be downloaded using our download script. In this script we will also download the nltk corpus used for preprocessing.

bash ./scripts/download.sh

We will also generate a set of synthesized context pairs for self-play training. These context pairs contain initial conditions and the optimal decisions of the synthesized dialogue. Additionally, here we also generate the Out-of-domain evaluation set (OOD1). See the AirDialogue paper for more details.

bash ./scripts/gen_syn.sh --ood1

2. Preprocessing

We preprocess the train dataset in order to begin the training of our model.

bash ./scripts/preprocess.sh -p train

3. Training

Supervised Learning

The fist step is to train our model using supervised learning.

python airdialogue_model_tf.py --task_type TRAINEVAL --num_gpus 8 \
                            --input_dir ./data/airdialogue/tokenized \
                            --out_dir ./data/out_dir \
                            --num_units 256 --num_layers 2

Training with Reinforcement Learning Self-play

The second step would be to train our model using self-play based on our supervised learning checkpoint.

python airdialogue_model_tf.py --task_type SP_DISTRIBUTED --num_gpus 8 \
                            --input_dir ./data/airdialogue/tokenized \
                            --self_play_pretrain_dir ./data/out_dir \
                            --out_dir ./data/selfplay_out_dir

Examine Training Meta Information

Training meta data will be written to the output directory, which can be examined using tensorboard. The following command will examine the training procedure of the supervised learning model.

tensorboard --logdir=./data/out_dir

To view the training meta data for the self-play mode, swap logdir to ./data/selfplay_out_dir.

4. Evaluating on the AirDialogue dev set

Preprocessing

Similar to training, we will first need to preprocess the dev dataset. here we will also preprocess the ood1 dataset for evaluation.

bash ./scripts/preprocess.sh -p dev --ood1

Predicting

We use the following script to evaluate our trained model on the dev set. Following the AirDialogue paper, here we also evaluate the model's performance on the OOD1 evaluation set that we generated. The evaluation script will first try to find the selfplay model. If failed, it will use the supervised model.

bash ./scripts/evaluate.sh -p dev -a ood1

5. Scoring

Once the predictative files are generated, we will depend on the AirDialogue tookit for scoring. We are currently working on the scoring script.

airdialogue score --pred_data ./data/out_dir/dev_inference_out.txt \
                  --true_data ./data/airdialogue/tokenized/dev.infer.tar.data \
                  --true_kb ./data/airdialogue/tokenied/dev.infer.kb \
                  --task infer \
                  --output ./data/out_dir/dev_bleu.json
airdialogue score --pred_data ./data/out_dir/dev_selfplay_out.txt \
                  --true_data ./data/airdialogue/json/dev_data.json \
                  --true_kb ./data/airdialogue/json/dev_kb.json \
                  --task selfplay \
                  --output ./data/out_dir/dev_selfplay.json
airdialogue score --pred_data ./data/out_dir/ood1_selfplay_out.txt \
                  --true_data ./data/airdialogue/json/ood1_data.json \
                  --true_kb ./data/airdialogue/json/ood1_kb.json \
                  --task selfplay \
                  --output ./data/out_dir/ood1_selfplay.json

6. Evaluating on the AirDialogue test set

We are currently working on the evalaution process of the test set.

7. Benchmark Results

We are currently working on benchmarking the results.

8. Misc

a. Task and Dataset Alignments

Stage Tasks
Training Supervised Self-play
train.data train.selfplay.data
train.kb train.selfplay.kb
source Airdialogue synthesized (meta1)
Testing-Dev Inference Self-play Eval Self-play Eval Eval
dev.infer.src.data dev.selfplay.eval.data ood1.selfplay.data dev.eval.data
dev.infer.tar.data
dev.infer.kb dev.selfplay.eval.kb ood1.selfplay.eval.kb dev.eval.kb
source AirDialogue AirDialogue synthesized (meta2) AirDialogue
Testing-Test (hidden) Inference Self-play Eval Self-play Eval
test.infer.src.data test.selfplay.eval.data ood2.selfplay.data
test.infer.tar.data
test.infer.kb test.selfplay.eval.kb ood2.selfplay.kb
source AirDialogue AirDialogue synthesized (meta3)

b. Working with Synthesized Data

As an alternative to the AirDialogue Dataset, we can verify our model using the synthesized data.

Training

To genreate a synthesized dataset for training, flip the -s option for the data generation script. By default, synthesized data will be put under ./data/synthesized/

bash ./scripts/gen_syn.sh -s --ood1

We will then need to preprocess the synthesized data for training

bash ./scripts/preprocess.sh -s -p train

Similar to experiments on the AirDialogue dataset, we can train a supervised model for the synthesized data:

python airdialogue_model_tf.py --task_type TRAINEVAL --num_gpus 8 \
                            --input_dir ./data/synthesized/tokenized \
                            --out_dir ./data/synthesized_out_dir \
                            --num_units 256 --num_layers 2

With supervised model pre-training, we can also train the synthesized model using self-play:

python airdialogue_model_tf.py --task_type SP_DISTRIBUTED --num_gpus 8 \
                            --input_dir ./data/synthesized/tokenized \
                            --self_play_pretrain_dir ./data/synthesized_out_dir \
                            --out_dir ./data/synthesized_selfplay_out_dir
Testing

Before testing on the dev data, we will need to do preprocessing. Dev Dataset

bash ./scripts/preprocess.sh -p dev --ood1 -s

We can run execute the evalution script on the synthesized dev set.

bash ./scripts/evaluate.sh -p dev -a ood1 -m ./data/synthesized_out_dir -o ./data/synthesized_out_dir -i ./data/synthesized/tokenized/
Scoring
airdialogue score --pred_data ./data/synthesized_out_dir/dev_inference_out.txt \
                  --true_data ./data/synthesized/tokenized/dev.infer.tar.data \
                  --true_kb ./data/airdialogue/tokenized/dev.infer.kb \
                  --task infer \
                  --output ./data/synthesized_out_dir/dev_bleu.json
airdialogue score --pred_data ./data/synthesized_out_dir/dev_selfplay_out.txt \
                  --true_data ./data/synthesized/json/dev_data.json \
                  --true_kb ./data/airdialogue/json/dev_kb.json \
                  --task selfplay \
                  --output ./data/synthesized_out_dir/dev_selfplay.json
airdialogue score --pred_data ./data/synthesized_out_dir/ood1_selfplay_out.txt \
                  --true_data ./data/synthesized/json/ood1_data.json \
                  --true_kb ./data/airdialogue/json/ood1_kb.json \
                  --task selfplay \
                  --output ./data/synthesized_out_dir/ood1_selfplay.json

One can repeat same steps for synthesized test set as well. Please refer to the AirDialogue paper for the results on the synthesized dataset.

airdialogue_model's People

Contributors

josephch405 avatar w31w31 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

airdialogue_model's Issues

Inconsistent BatchSize

for i in tqdm(list(range(0, ceil))):
start_ind = i * batch_size
end_ind = min(i * batch_size + batch_size, len(selfplay_data))
batch_data = selfplay_data[start_ind:end_ind]
batch_kb = selfplay_kb[start_ind:end_ind]
# we indicaet to let agent1 to talk first. Keep in mind that we will
# swap between agent1 and agent2.
speaker = flip % 2
generated_data, _, summary = dialogue.talk(hparams.max_dialogue_len,
batch_data, batch_kb, agent1,
agent2, worker_step,
batch_size, speaker)

In line 144, we should replace batch_size by end_ind-start_ind. Otherwise, there will be an inconsistent batchsize issue in the last iteration.

Unexpected Error When Preprocessing

After I run bash scripts/preprocess.sh -p train, I got the following error:

mode = airdialogue
word cutoff = 10
nltk data path = ./data/nltk
data path = ./data/airdialogue
partition = train
json path = ./data/airdialogue/json
tokenized path = ./data/airdialogue/tokenized
tokenizing train data...
2020-05-18 20:39:43.271598: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
loading data: 642868it [04:13, 2532.79it/s]
process kb: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 321459/321459 [01:57<00:00, 2728.20it/s]
process raw data: 0%| | 0/321459 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/bin/airdialogue", line 4, in
import('pkg_resources').run_script('airdialogue==0.1', 'airdialogue')
File "/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 667, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/opt/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 1471, in run_script
exec(script_code, namespace, namespace)
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/EGG-INFO/scripts/airdialogue", line 33, in
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/prepro_main.py", line 276, in main
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/prepro_main.py", line 162, in load_data_from_jsons
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/tokenize_lib.py", line 359, in process_main_data
File "/opt/conda/lib/python3.7/site-packages/airdialogue-0.1-py3.7.egg/airdialogue/prepro/tokenize_lib.py", line 286, in get_dialogue_boundary
AssertionError: start token appeared twice: Hello Hello How may I help you ? Can you help me to change my recent reservation because my trip dates are got postponed ? I will help you with that please share your name to proceed further ? Edward hall here . Please wait for a while . Sure , take your own time . There is no active reservation found under your name to amend it . That 's ok , thank you for checking . Thank you for choosing us .

Is that because of python3?

Multiple bugs for evaluating selfplay

  1. In README, Section 5 Scoring:
airdialogue score --pred_data ./data/out_dir/dev_selfplay_out.txt \
                  --true_data ./data/airdialogue/tokenized/dev.selfplay.eval.data \
                  --true_kb ./data/airdialogue/tokenized/dev.selfplay.eval.kb \
                  --task selfplay \
                  --output ./data/out_dir/dev_selfplay.json

It loads tokenized true_data and true_kb.
However according to
https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L240-L256
, it actually needs json files.
May be change it to

                  --true_data ./data/airdialogue/json/dev_data.json \
                  --true_kb ./data/airdialogue/json/dev_kb.json \

?

  1. After fixing the previous bug, another one appears:

https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L247

it process pred_json_obj['action'] using action_obj_to_str. This step, however, has been done when generating dev_selfplay_out.txt

maybe remove action_obj_to_str?

  1. After that, another one appears:
    https://github.com/josephch405/airdialogue/blob/c74072f8667d92839dc39e98b386ce8e932c8c68/airdialogue/evaluator/evaluator_main.py#L252

pred_json_obj is not compatible with json_obj_to_tokens, where pred_json_obj do not have key dialogue. Instead pred_json_obj has a key called utterance

I can get the program run via replacing that line by

pred_raw_text = pred_json_obj['utterance'].replace('<t1> ','').replace('<t2> ','').split(' ')

However, it think that may not be the optimal solution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.