Giter VIP home page Giter VIP logo

dialoglue's Issues

hyper parameters for MultiWOZ

Hi ! For all datasets in dialoGLUE benchmark, I can reproduce similar results except for the MultiWOZ.
For ConverBERT-DG, your joint goal is around 58, but I can only get 56, which is the same as the original Trippy reported.
I wonder if you have used different hyper-parameters for Trippy? If so, can you share them ?

Thank you!

The original hypers for Trippy are as follows:

--do_lower_case \ --learning_rate=1e-4 \ --num_train_epochs=10 \ --max_seq_length=180 \ --per_gpu_train_batch_size=48 \ --per_gpu_eval_batch_size=1 \ --output_dir=${OUT_DIR} \ --save_epochs=2 \ --logging_steps=10 \ --warmup_proportion=0.1 \ --eval_all_checkpoints \ --adam_epsilon=1e-6 \ --label_value_repetitions \ --swap_utterances \ --append_history \ --use_history_labels \ --delexicalize_sys_utts \ --class_aux_feats_inform \ --class_aux_feats_ds \

reproducibility for slot filling task


slots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])])
vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]]
json.dump(vocab, open(dataset + "vocab.txt", "w+"))

Slot BIO labels are stored in a python set, then saved into a python list using a for loop.
But set is unordered. In my experiment, vocab.txt is different in two runs. So I changed the code to

slots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])])
slots = sorted(list(slots))
vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]]
json.dump(vocab, open(dataset + "vocab.txt", "w+"))

and get the same vocab.txt for every run.

Cannot download pre-trained model

When running

python \
        --train_data_path data_utils/dialoglue/hwu/train.csv \
        --val_data_path data_utils/dialoglue/hwu/val.csv \
        --test_data_path data_utils/dialoglue/hwu/test.csv \
        --token_vocab_path bert-base-uncased-vocab.txt \
        --train_batch_size 64 --dropout 0.1 --num_epochs 0 --learning_rate 6e-5 \
        --model_name_or_path convbert-dg --task intent --do_lowercase --max_seq_length 50 --mlm_pre --mlm_during --dump_outputs \

there is an error:

OSError: Model name 'convbert-dg' was not found in model name list. We assumed '' was a path, a model identifier, or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.

How to download the checkpoints/pre-trained models stated in the README?

HWU64 odd number of samples

The HWU64 dataset contains 25k samples according to the original paper. The DialoGLUE paper stats the same number of samples.

However, the Readme states 11k samples.

If I count the number of samples which are actually in the HWU64 part of DialoGLUE then I get 12,112 samples (12k).

My questions:

  • Is there a reason for the difference in numbers in the original HWU64 and in the DialoGLUE HWU64? Or is it a bug?
  • Did you compute the performance of the intent prediction models on 25k, 12k or 11k samples?

Thank you for your answers :)

Will the imbalance of the label affect the final result?

For example, there are two categories A and B, a total of 100 samples, 99 of which are category A and one is category B. the output of the final result is likely to be category A, even if the input is more similar to category B.

What's the impact of running MLM during training?

Hi, thank you very much for sharing the code! From the code of, in function train(), during each training epoch, if args.mlm_during is true, "Run MLM during training" part will run. But it doesn't change "model". Is it because this sentence "model.bert_model = pre_model.bert_model.bert", that "model" and "pre_model" share the same weights since they use the same physical address? Thank you very much!

How to run TripPy DST using ConvBERT?

As mentioned in the README:

To train/evaluate the model using our modifications (i.e., MLM pre-training), you can use trippy/DO.example.advanced.

But trippy/DO.example.advanced use BERT instead of ConvBERT. I wonder how to reproduce the result of ConvBERT and ConvBERT-DG. Thanks!

Error when downloading data

Error when running bash in data_utils dir

mkdir: dialoglue: File exists
Do you wish to download dataset hwu?
1) Yes
2) No
#? 1
Downloading dataset hwu into ../dialoglue/hwu
Getting train data
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [01:10<00:00,  1.10s/it]
Getting test data
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [01:08<00:00,  1.07s/it]
Creating categories.json file
Dataset has been downloaded
Creating train_10.csv, etc...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:00<00:00, 4190.11it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:00<00:00, 5320.93it/s]
Do you wish to download dataset clinc?
1) Yes
2) No
#? 1
Downloading dataset clinc into ../dialoglue/clinc
Dataset has been downloaded
Creating train_10.csv, etc...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:00<00:00, 4146.97it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:00<00:00, 4657.23it/s]
Do you wish to download dataset banking?
1) Yes
2) No
#? 1
Downloading dataset banking into ../dialoglue/banking
Getting file: train.csv
Getting file: test.csv
Getting file: categories.json
Dataset has been downloaded
Creating train_10.csv, etc...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [00:00<00:00, 4635.72it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [00:00<00:00, 3855.20it/s]
Processing dialoglue/hwu/
Processing dialoglue/banking/
Processing dialoglue/clinc/
Done downloading intent datasets
Cloning into 'task-specific-datasets'...
remote: Enumerating objects: 103, done.
remote: Counting objects: 100% (103/103), done.
remote: Compressing objects: 100% (58/58), done.
remote: Total 103 (delta 58), reused 77 (delta 45), pack-reused 0
Receiving objects: 100% (103/103), 1001.92 KiB | 339.00 KiB/s, done.
Resolving deltas: 100% (58/58), done.
Traceback (most recent call last):
  File "", line 14, in <module>
    train_data = json.load(open(dataset + "train_0.json"))
FileNotFoundError: [Errno 2] No such file or directory: 'dialoglue/restaurant8k/train_0.json'
Traceback (most recent call last):
  File "", line 30, in <module>
    sub_train_data = json.load(open(dataset + sub + "/train_0.json"))
FileNotFoundError: [Errno 2] No such file or directory: 'dialoglue/dstc8_sgd/Buses_1/train_0.json'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:01:15 --:--:--     0curl: (7) Failed to connect to port 80: Operation timed out
unzip:  cannot find or open, or
Traceback (most recent call last):
  File "", line 97, in <module>
    data = read_data(data_file)
  File "", line 16, in read_data
    f = open(data_file)
FileNotFoundError: [Errno 2] No such file or directory: 'top-dataset-semantic-parsing/train.tsv'
cp: top-dataset-semantic-parsing/train.txt: No such file or directory
cp: top-dataset-semantic-parsing/train_10.txt: No such file or directory
cp: top-dataset-semantic-parsing/eval.txt: No such file or directory
cp: top-dataset-semantic-parsing/test.txt: No such file or directory
cp: top-dataset-semantic-parsing/vocab.*: No such file or directory
rm: No such file or directory
Cloning into 'trippy-public'...
remote: Enumerating objects: 77, done.
remote: Counting objects: 100% (77/77), done.
remote: Compressing objects: 100% (55/55), done.
remote: Total 77 (delta 21), reused 60 (delta 12), pack-reused 0
Unpacking objects: 100% (77/77), done.
mv: rename dialoglue/multiwoz/MULTIWOZ2.1 to dialoglue/multiwoz/MULTIWOZ2.1/MULTIWOZ2.1: Invalid argument
Traceback (most recent call last):
  File "", line 61, in <module>
    train += load_top("dialoglue/top/")
  File "", line 21, in load_top
    data = open(fn+"train.txt").readlines()
FileNotFoundError: [Errno 2] No such file or directory: 'dialoglue/top/train.txt'



Thanks for the nice work. As the proposed model is based on TripPy when evaluating dialogue state tracking, I wonder how the MultiWOZ is preprocessed. I found that TripPy may change some ground-truth labels using the original preprocessing script (e.g., the time-slot value "10:30" may be changed to "10" only.), so if convenient, could you please do a simple comparison between the ground-truth labels before and after the preprocessing? Since you are launching a leaderboard, I hope that the evaluation could be as precise as possible. Thanks!

About Observers in the paper

Hi Mehri,
Thanks for your awesome work 'Example-Driven Intent Prediction with Observers', and your open sourcing codebase.
How did you add observers to bert model in your codebase? I can't find what is related to [OBS]. Did you use the [PAD] as the [OBS]? And how did you make Observers the tokens that are not attended to?
Looking forward to your reply.

ONNX conversion issue


Following this thread, I tried to convert convbert-dg model to ONNX format with the following snippet,

model = IntentBertModel('bert-base-uncased', dropout=0.1, num_intent_labels=len(intent_label_to_idx))
model.load_state_dict(torch.load(os.path.join(model_path, ""), map_location=torch.device('cpu')))

model_onnx_path = "model.onnx"
max_seq_length = 100
input_ids = torch.LongTensor(1, max_seq_length).to(device)
token_type_ids = torch.LongTensor(1, max_seq_length).to(device)
attention_mask = torch.LongTensor(1, max_seq_length).to(device)

dummy_input = (input_ids, attention_mask, token_type_ids)
input_names = ["input_ids", "attention_mask", "token_type_ids"]
output_names = ["output"]
torch.onnx.export(model, dummy_input, model_onnx_path, \
    input_names=input_names, output_names=output_names, \

and this throws,

RuntimeError: index out of range: Tried to access index -1457520640 out of table with 30521 rows.

Can you please correct me if there is anything wrong in the snippet I am using.

Error while using tokenizers

While running the following command:
pip3 install -r requirements.txt

--train_data_path data_utils/dialoglue/hwu/train.csv
--val_data_path data_utils/dialoglue/hwu/val.csv
--test_data_path data_utils/dialoglue/hwu/test.csv
--token_vocab_path bert-base-uncased-vocab.txt
--train_batch_size 64 --dropout 0.1 --num_epochs 100 --learning_rate 6e-5
--model_name_or_path convbert-dg --task intent --do_lowercase --max_seq_length 50 --mlm_pre --mlm_during --dump_outputs \

I get the following error:

Namespace(adam_epsilon=1e-08, device=0, do_lowercase=True, dropout=0.1, dump_outputs=True, grad_accum=2, learning_rate=6e-05, logging_steps=100, max_grad_norm=-1.0, max_seq_length=100, mlm_data_path='', mlm_during=True, mlm_pre=True, model_name_or_path='convbert-dg', num_epochs=100, output_dir='', patience=5, repeat=1, seed=42, task='intent', test_data_path='data_utils/dialoglue/banking/test.csv', token_vocab_path='bert-base-uncased-vocab.txt', train_batch_size=32, train_data_path='data_utils/dialoglue/banking/train.csv', val_data_path='data_utils/dialoglue/banking/val.csv', weight_decay=0.0)
Errors: Os { code: 2, kind: NotFound, message: "No such file or directory" }
Traceback (most recent call last):
File "", line 566, in
scores.append(train(args, i))
File "", line 305, in train
File "/home/kapilpathak/py37-venv/lib/python3.7/site-packages/tokenizers/implementations/", line 30, in init
tokenizer = Tokenizer(WordPiece(vocab_file, unk_token=str(unk_token)))
Exception: Error while initializing WordPiece

The error is consistent even if I change tasks and datasets
Any suggestions?

Reproducing few-shot experiments on Multiwoz2.1


I am working on few-shot experiments on MultiWOZ2.1. However, I faced the same problem as in #7 .

BERT + pre + multi trained on few-shot dataset achieved ~0.49 JGA on the test set (with random seed 42).

I modified a small part of your codes, and the diff is listed here (GitHub comparing changes). I ran the experiment directly with DO.example.advanced.


  • GPU: RTX 3090
  • PyTorch: 1.7.0+cu110

I wonder if my training / evaluation process were wrong and got the high performance even in the few-shot setting.

Thanks for your reply in advance!

Some questions about the trippy dst

I have several questions:

  1. the original performance of trippy is 55.3% on multiwoz 2.1(in paper). Your bert-base DST achieves 56.3%. So where does the improvement comes from? I notice that the original trippy repo mentions:

    With a sequence length of 180, you should expect the following average JGA: 56% for MultiWOZ 2.1`
    Best performance can be achieved by using the maximum sequence length of 512.

    Do you change the code or just the trippy repo has better performance than trippy paper?

  2. How to reproduce the dst experiments? I guess:

  • for BERT(56.3%): run DO.example.advanced without any modification (although the max seq length is set to 180)
  • for CONVBERT-DG(58.57%): run DO.example.advanced with --model_name_or_path="convbert-dg"

Look forward to your reply :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.