Giter VIP home page Giter VIP logo

mttod's Introduction

MTTOD

This is code for the paper "Improving End-to-End Task-Oriented Dialogue System with A Simple Auxiliary Task".

checkout source code and data from github repository

To download data.zip properly, git lfs(Large File Storage) extension must be installed.

# clone repository as usual
git clone https://github.com/bepoetree/MTTOD.git
cd MTTOD
# check file size of data.zip
ls -l data.zip
# unzip
unzip data.zip -d data/

# The file size of data.zip is about 52 MB. If not, git-lfs is not installed or failed to checked out correctly.
# please ensure to install git-lfs (in Ubuntu or Debian, execute "apt install git-lfs" with sudo) in your system.
# After then, Retrying LFS checkout with the following commands:
git lfs install
git lfs pull
git checkout -f HEAD

Environment setting

Our python version is 3.6.9.

The package can be installed by running the following command.

pip install -r requirements.txt
python -m spacy download en_core_web_sm

Data Preprocessing

For the experiments, we use MultiWOZ2.0 and MultiWOZ2.1.

  • (MultiWOZ2.0) annotated_user_da_with_span_full.json: A fully annotated version of the original MultiWOZ2.0 data released by developers of Convlab available here.
  • (MultiWOZ2.1) data.json: The original MultiWOZ 2.1 data released by researchers in University of Cambrige available here.
  • (MultiWOZ2.2) data.json: The MultiWOZ2.2 dataset converted to the same format as MultiWOZ2.1 using script here.

We use the preprocessing scripts implemented by Zhang et al., 2020. Please refer to here for the details.

python preprocess.py -version $VERSION

Training

Our implementation supports a single GPU. Please use smaller batch sizes if out-of-memory error raises.

  • MTTOD without auxiliary task (for the ablation)
python main.py -version $VERSION -run_type train -model_dir $MODEL_DIR
  • MTTOD with auxiliary task
python main.py -version $VERSION -run_type train -model_dir $MODEL_DIR -add_auxiliary_task

The checkpoints will be saved at the end of each epoch (the default training epoch is set to 10).

Inference

python main.py -run_type predict -ckpt $CHECKPOINT -output $MODEL_OUTPUT -batch_size $BATCH_SIZE

All checkpoints are saved in $MODEL_DIR with names such as 'ckpt-epoch10'.

The result file ($MODEL_OUTPUT) will be saved in the checkpoint directory.

To reduce inference time, it is recommended to set large $BATCH_SIZE. In our experiemnts, it is set to 16 for inference.

You can download our trained model here.

Evaluation

We use the evaluation scripts implemented by Zhang et al., 2020.

python evaluator.py -data $CHECKPOINT/$MODEL_OUTPUT

Standardized Evaluation

For the MultiWOZ benchmark, we recommend to use standardized evaluation script.

# MultiWOZ2.2 is used for the benchmark (MultiWOZ2.2 should be preprocessed prior to this step)
python main.py -run_type predict -ckpt $CHECKPOINT -output $MODEL_OUTPUT -batch_size $BATCH_SIZE -version 2.2
# convert format for the the standardized evaluation
python convert.py -input $CHECKPOINT/$MODEL_OUTPUT -output $CONVERTED_MODEL_OUTPUT

# clone the standardized evaluation repository
git clone https://github.com/Tomiinek/MultiWOZ_Evaluation
cd MultiWOZ_Evaluation
pip install -r requirements.txt

# do standardized evaluation
python evaluate.py -i $CONVERTED_MODEL_OUTPUT -b -s -r

Acknowledgements

This code is based on the released code (https://github.com/thu-spmi/damd-multiwoz/) for "Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context", which distributed under Apache License Version 2.0. Copyright 2019- Yichi Zhang.

For the pre-trained language model, we use huggingface's Transformer (https://huggingface.co/transformers/index.html#), which distributed under Apache License Version 2.0. Copyright 2018- The Hugging Face team. All rights reserved.

We are grateful for their excellent works.

mttod's People

Contributors

bepoetree avatar dalgarak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mttod's Issues

Questions about DB states.

Hi, nice work!
A little question here is that it seems like you doesn't use DB state when training e2e task.
resp_outputs = self.model(attention_mask=attention_mask, encoder_outputs=encoder_outputs, lm_labels=resp_labels, return_dict=False, decoder_type="resp")

About evaluator.py

Hi, there!

I was confused by line [386-387] in your evaluator.py。

if t == 0:
      continue

Why do you not use the first turn into evaluation? After removing this line, the score raises about 1 point.

question about prediction

Hello , thanks for your work.

Recently I use the mttod model on CrossWOZ dataset and have a question about the predict code.

In predict code you have provided , you firstly get belief state generated by dialog history and then you use the dbpn from dataset as input to predict dialog act and response.

Actually when we use the dialog system, we get dbpn by searching from database by belief state generated and turn domain predicted by model instead of using the dbpn and turn domain from dataset directly.And in the code you have provided, belief state is generated before dialog act and dialog act is generated after dbpn. So I can not get dbpn before generating the dialog act.

So , If we want to build a dialog system as service , how to manage the gap between your predict procedure and real world predict procedure?

Problem about the evaluation data

MTTOD/reader.py

Line 121 in 7256673

if data_type != "test" and k == 1 or k >= 17:

I found that in the original code, one test example(num_turn=18) will be discarded. Simply change the above code can fix this issue:
if data_type != "test" and (k == 1 or k >= 17)

DB states as training labels

Hi, I think it may be a little wrong here. The model should not learn to predict db_state tokens, thus I think it doesn't make sense taking db_state tokens as labels. Maybe putting db states into inputs make more sense.

Reader getspan convert_tokens_to_string IndexError: list index out of range when training

Great work on MTTOD - I'm trying to run the code but unfortunately could not find the exact environments (transformer 4.5 etc.) you have specified

Here's my env

Python 3.10.13
transformers 4.37.2

When I run the code as follows:

python main.py -version 2.2 -run_type train -model_dir withaux -add_auxiliary_task

I get the following error:

02/20/2024 02:32:56 [INFO] Device: cuda (the number of GPUs: 1)
02/20/2024 02:32:56 [INFO] Set random seed to 42
/opt/conda/envs/mttod/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5.py:240: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.
For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with truncation is True.

  • Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
  • If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with model_max_length or pass max_length when encoding/padding.
  • To avoid this warning, please instantiate this tokenizer with model_max_length set to your preferred value.
    warnings.warn(
    You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
    Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
    02/20/2024 02:32:57 [INFO] Encode data and save to data/MultiWOZ_2.2/processed/encoded_data.pkl
    train: 0%| | 2/8433 [00:00<02:27, 57.21it/s]
    Traceback (most recent call last):
    File "/home/ubuntu/mttod/main.py", line 64, in
    main()
    File "/home/ubuntu/mttod/main.py", line 55, in main
    runner = MultiWOZRunner(cfg)
    File "/home/ubuntu/mttod/runner.py", line 297, in init
    reader = MultiWOZReader(cfg.backbone, cfg.version)
    File "/home/ubuntu/mttod/reader.py", line 534, in init
    super(MultiWOZReader, self).init(backbone)
    File "/home/ubuntu/mttod/reader.py", line 420, in init
    train = self.encode_data("train")
    File "/home/ubuntu/mttod/reader.py", line 658, in encode_data
    resp_span = self.get_span(
    File "/home/ubuntu/mttod/reader.py", line 838, in get_span
    value = self.tokenizer.convert_tokens_to_string(lex_tokens)
    File "/opt/conda/envs/mttod/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5.py", line 425, in convert_tokens_to_string
    tokens[0] = tokens[0].lstrip(SPIECE_UNDERLINE)
    IndexError: list index out of range

Question about the generate act and response

MTTOD/runner.py

Line 826 in 7256673

decoder_input_ids=resp_decoder_input_ids,

In the transformers library, decoder_input_ids is not a parameter of the generate function. I tried to delete this parameter, and the evaluation result is not changed, which proves this parameter doesn't do any thing in the generate stage.
In other words, the original code use generate db result in the predict stage, which is wrong in the real world setting. How can I fix this problem? Thanks.

Question about database format

In the paper, "The generated belief state is used to query a domain-specific database and the DB state 𝐷𝐵𝑡 is determined by the number of matching entities." . And in the model figure, the DB result is also domain-specific, like <sos_db>restaurant > 3<eos_db>.
But the code only contains the number of entities,

MTTOD/reader.py

Line 635 in a267de5

db_token = "[db_{}]".format(pointer.index(1))

so,Is there any inconsistency between the code and the paper?

Additional auxiliary tasks

Congratulations on the good paper and code with their excellent results!
Can you share if during the research some other auxiliary tasks were tested before arriving at the one that is currently used and what were the outcomes? Also can you speculate on additional tasks that you think can achieve good results on the MultiWOZ dataset with T5?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.