jasonwu0731 / glmp Goto Github PK

PyTorch code for ICLR 2019 paper: Global-to-local Memory Pointer Networks for Task-Oriented Dialogue https://arxiv.org/pdf/1901.04713

Python 93.83% Perl 6.17%

glmp's Introduction

Global-to-local Memory Pointer Networks for Task-Oriented Dialogue

This is the PyTorch implementation of the paper: Global-to-local Memory Pointer Networks for Task-Oriented Dialogue. Chien-Sheng Wu, Richard Socher, Caiming Xiong. ICLR 2019. [PDF][Open Reivew]

This code has been written using PyTorch >= 0.4. If you use any source codes or datasets included in this toolkit in your work, please cite the following paper. The bibtex are listed below:

@inproceedings{wu2019global,
  title={Global-to-local Memory Pointer Networks for Task-Oriented Dialogue},
  author={Wu, Chien-Sheng and Socher, Richard and Xiong, Caiming},
  booktitle={Proceedings of the International Conference on Learning Representations (ICLR)},
  year={2019}
}

Abstract

End-to-end task-oriented dialogue is challenging since knowledge bases are usually large, dynamic and hard to incorporate into a learning framework. We propose the global-to-local memory pointer (GLMP) networks to address this issue. In our model, a global memory encoder and a local memory decoder are proposed to share external knowledge. The encoder encodes dialogue history, modifies global contextual representation, and generates a global memory pointer. The decoder first generates a sketch response with unfilled slots. Next, it passes the global memory pointer to filter the external knowledge for relevant information, then instantiates the slots via the local memory pointers. We empirically show that our model can improve copy accuracy and mitigate the common out-of-vocabulary problem. As a result, GLMP is able to improve over the previous state-of-theart models in both simulated bAbI Dialogue dataset and human-human Stanford Multi-domain Dialogue dataset on automatic and human evaluation.

Train a model for task-oriented dialog datasets

We created myTrain.py to train models. You can run: GLMP bAbI dialogue t1-5:

❱❱❱ python3 myTrain.py -lr=0.001 -l=1 -hdd=128 -dr=0.2 -dec=GLMP -bsz=8 -ds=babi -t=1

or GLMP SMD

❱❱❱ python3 myTrain.py -lr=0.001 -l=1 -hdd=128 -dr=0.2 -dec=GLMP -bsz=8 -ds=kvr

While training, the model with the best validation is saved. If you want to reuse a model add -path=path_name_model to the function call. The model is evaluated by using per responce accuracy, WER, F1 and BLEU.

Test a model for task-oriented dialog datasets

We created myTest.py to train models. You can run: GLMP bAbI t1-5:

❱❱❱ python myTest.py -ds=babi -path=<path_to_saved_model>

or GLMP SMD

❱❱❱ python myTest.py -ds=kvr -path=<path_to_saved_model> -rec=1

Visualization Memory Access

Memory attention visualization in the SMD navigation domain. Left column is the global memory pointer G, middle column is the memory pointer without global weighting, and the right column is the final memory pointer.

Architecture

Enjoy! :)

glmp's People

Contributors

Stargazers

Watchers

glmp's Issues

Trainning for each task

Hello, thank you for sharing your code.

I have 2 question, please help me answer it:

You train a new model for each task or you use a trained model in previous task and finetune it on another task?
To build a full coversation chat, i should use model trained on task 5, right?

Thank you for your answer.

Will it work with real data?

Hi, first thanks four publishing your job, it's amazing. This is not a bug, but a question:
Do you think that this kind of models are ready to work with real data? Say I have real transcriptions for a Airway company, where the agent and the client are talking with a goal (buy tickets, do check-in, etc). Will the model be able to simulate the agent's sentences in a correct way or is it so complicated yet?
Do you think that some end-to-end model it's ready to do it at the moment? Or is the only way to label intents and entities like in Dialogflow, etc.?

Thanks for your opinion!

hard to reproduce the results reported in this paper

Hi, I run this code with "python3 myTrain.py -lr=0.001 -l=1 -hdd=128 -dr=0.2 -dec=GLMP -bsz=8 -ds=kvr" or "python3 myTrain.py -lr=0.001 -l=3 -hdd=128 -dr=0.2 -dec=GLMP -bsz=8 -ds=kvr". But the F1 score achieves only 56-58. I don't know why that happen, and can you help me out?

Read data script

Hi,
First of all, great work.

I notice that in mem2seq you guys provided the read data script but in GLMP you didn't.

Can you provide the read_data script for kvr and babi datasets?

Thanks.

How to set the hyper-parameters

Hi,
How did you set the hyper-parameters to get the results in the paper？I try to set hidden size to 128， and hop K is set to 3，but the results are far from those in the paper.Below is the result of my running：
ACC SCORE: 0.1227
F1 SCORE: 0.5788
CAL F1: 0.7344
WET F1: 0.5486
NAV F1: 0.5058

I try to set hidden size to 256, even the result is not as good as the result with a hidden size of 128.How do you set the hyper-parameters?

ZeroDivisionError: float division by zero when evaluating on BABI

Hi, thanks for the code posting the code for the paper.

I did have an issue when training on BABI. When running the following command
python myTrain.py -lr=0.001 -l=1 -hdd=128 -dr=0.2 -dec=GLMP -bsz=8 -ds=babi -t=1

It throws the following error:

L:0.58,LE:0.09,LG:0.38,LP:0.10: 100% 753/753 [01:34<00:00,  7.82it/s]
STARTING EVALUATION
100% 61/61 [00:13<00:00,  4.57it/s]
Traceback (most recent call last):
  File "myTrain.py", line 49, in <module>
    acc = model.evaluate(dev, avg_best, early_stop)
  File "/content/GLMP/models/GLMP.py", line 261, in evaluate
    F1_score = F1_pred / float(F1_count)
ZeroDivisionError: float division by zero

Just glancing at the code it seems that F1 score is initialized to zero and is not changed for anything but the KVR dataset.

How to report the results in your paper

When I run your code, sometimes I get better results (e.g. BLEU SCORE:15.45; F1 SCORE: 0.6038309388456686; though not all F1 scores are better) than the reported scores in your paper.
Do you run several times and then average those results, or just select the best test results according to your best valid scores, as I find you do not fix a random seed in your code.
Thanks for your code release!

error in models/GLMP.py

in models/GLMP.py，F1_score used without definition.
also, I confused about data/KVR/train.txt + dev.txt + test.txt, how does they come from and why they are processed in that format?

API calls

Hi, first of all - thank you for your great work and for sharing the code!

Would you please help me with a following question: According to the task formulation for the bAbI dataset, system should make API calls to derive information on smth. for instance, on restaurants, correct? If I understood it correctly, GLMP also able to deal with it, however I failed to find a place in a code, where it actually happens.
Would you please, tell whether GLMP makes API calls and if yes, on which position in a code it happens? One more question: Do you think it is possible to replace this kind of API calls with calls to ElasticSearch?

I would be very grateful for your answers!

Many thanks in advance!