microsoft / unilm Goto Github PK
View Code? Open in Web Editor NEWLarge-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Home Page: https://aka.ms/GeneralAI
License: MIT License
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Home Page: https://aka.ms/GeneralAI
License: MIT License
It's very nice to get the test
result, but in fact it is useless for researching because we should't touch the test.
So if it is convenient, please release the dev
result. I know it is critical because providing a useful guild is already very very very nice, but I make this request because of the high-quality of the repo and maintainer.
Thanks a lot.
Thanks for answering my previous issue.
I have a new one (easy): do you plan to release UNILM model in other languages than english ?
Thanks in advance for your response
Philippe
Excellent work!
Would it be possible to provide fine-tuned models for Generative QA along with training and inference instructions similar to those provided for Abstractive Summarization and Question Generation?
Hi
I've notice some name in common between the 2 papers (https://arxiv.org/pdf/1905.03197.pdf and https://arxiv.org/pdf/1909.10481v3.pdf). How these 2 projects are related ?
Thanks and congrats for such an impressive work !
Philippe
I have a very limited data set- 225 samples. The task is similar to Gigaword headline generation. The statistics for my source and target sequences look like this:
Source:
(After BERT tokenizer)
Target:
(After BERT tokenizer)
I used an 80-20 split and trained on 180 samples and tested on 45. I tried running decoding with different values for max_tgt_length
, and got the following results:
What's happening here and what is a good work around given the variations in my data?
For pre-training seq2seq LM , how to construct the training example? Especially, given the unannotated corpus, what are the source segment and target segment?
Using the docker to run the code ," pip install --user --editable . " success
but cant finetue the gigaword by "run_seq2seq.py"
Hi, I would like to know more about the comparison between the standard language model fine-tuning (teacher forcing) and the masked language model fine-tuning (same objective as pre-training) in your paper. In my opinion, for the text generation task, the most popular fine-tuning approach would be the teacher forcing in language modeling as it mimics the generation process during testing. Thanks!
ModuleNotFoundError: No module named 'fused_layer_norm_cuda'
Hello,
I noticed the BertForSeq2SeqDecoder to be slow on CPU and this is mainly due to the while looping inside the forward method, where basically it iterates N times where N is the difference between next_pos and output_length, and next_pos increments by 1 at each iteration.
Can you explain me what are:
Do you have any idea on how to optimize the code in order to get rid of the while loop?
Thanks a lot, any help with the explanation of those variables will be appreciated
I'm having trouble reproducing the results on CNN/DM dataset.
I downloaded the data and the fine-tuned model provided in the README, and I followed the commands to predict the test set.
Everything is running fine, but at the end I have the following results :
1 ROUGE-1 Average_R: 0.62689 (95%-conf.int. 0.62269 - 0.63111)
1 ROUGE-1 Average_P: 0.13695 (95%-conf.int. 0.13561 - 0.13828)
1 ROUGE-1 Average_F: 0.22101 (95%-conf.int. 0.21918 - 0.22288)1 ROUGE-2 Average_R: 0.33142 (95%-conf.int. 0.32673 - 0.33603)
1 ROUGE-2 Average_P: 0.06949 (95%-conf.int. 0.06832 - 0.07078)
1 ROUGE-2 Average_F: 0.11266 (95%-conf.int. 0.11089 - 0.11456)1 ROUGE-L Average_R: 0.52624 (95%-conf.int. 0.52179 - 0.53061)
1 ROUGE-L Average_P: 0.11465 (95%-conf.int. 0.11345 - 0.11598)
1 ROUGE-L Average_F: 0.18509 (95%-conf.int. 0.18333 - 0.18698)/root/code/unilm/src/cnndm_model/cnndm_model.bin.test.alp1.0
ROUGE-F(1/2/l): 22.10/11.27/18.51
ROUGE-R(1/2/3/l): 62.69/33.14/52.62
It's weird because I checked the prediction file (cnndm_model.bin.test.alp1.0.post
) and compared it with the one provided in the README, and most of the time there is only a few differences.
Here is a comparison of the last few lines of the file (left is the 'official' one, right is mine)
When run the run_seq2seq.py to finetune the model on summarization datasets, the program will always crash and output "Segmentation fault (core dumped)". The command is as follow:
export CUDA_VISIBLE_DEVICES=0,1,2,3
python biunilm/run_seq2seq.py
--do_train --fp16 --amp --num_workers 0
--bert_model ../bert-large-cased/ --new_segment_ids --tokenized_input
--output_dir ../summ_model/bert_save
--log_dir ../summ_model/bert_log
--model_recover_path ../storage/unilmv1-large-cased.bin
--max_seq_length 768 --max_position_embeddings 768
--trunc_seg a --always_truncate_tail
--max_len_a 568 --max_len_b 200
--mask_prob 0.7 --max_pred 140
--train_batch_size 48 --gradient_accumulation_steps 2
--learning_rate 0.00003 --warmup_proportion 0.1 --label_smoothing 0.1
--num_train_epochs 30
What causes the Segmentation fault error? Thanks for your help!
A docker image for UniLM would be great.
Hi, I have a problem with the QG task when I evaluate the performance. I use your released evaluation scripts can achieve your performance. But, I use https://github.com/xinyadu/nqg.
Why I get this result?
Thank you for your help.
It uses a single GPU with the command line for multiple gpus even when visible devices are 0,1.
Any suggestions?
Thank you for making the code open-sourced.
It seems that there is an issue with the question generation evaluation script. The evaluation script is producing the near same results as reported in the paper. The evaluation script contains some post-processing methods which is the main source of improvement.
The majority of the authors have used the nlg-eval to report the performance. With nlg-eval the scores on the test dataset with the released generation output and gold question (test.q.tok.txt) are as follows:
Bleu_1: 0.407580
Bleu_2: 0.275720
Bleu_3: 0.201373
Bleu_4: 0.151140
METEOR: 0.161781
ROUGE_L: 0.436765
I am requesting you to please clarify the difference.
It seems you don't use Whole Word Masking in pre-training.
Whole Word Masking has been showed useful in BERT., so will you try this on uniLM?(And release the pre-trained model)
Thanks !
I was installing the Nvidia/apex particular tree branch by doing this on colab.
%%writefile setup.sh
git clone -q https://github.com/NVIDIA/apex.git
cd apex
git reset --hard 1603407bf49c7fc3da74fceb6a6c7b47fece2ef8
cd ..
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex
I'm getting this whole error:
though this problem with particular branch, if I install the master branch it is getting installed.
.....
csrc/scale_check_overflow.cpp:14:3: note: in expansion of macro ‘AT_CHECK’
AT_CHECK(grads.type().is_cuda(), "grads must be a CUDA tensor");
^
.....
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Running setup.py install for apex ... error
Cleaning up...
Removing source in /tmp/pip-req-build-wmagyiis
Removed build tracker '/tmp/pip-req-tracker-9l2sbkhe'
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-wmagyiis/setup.py'"'"'; file='"'"'/tmp/pip-req-build-wmagyiis/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' --cpp_ext --cuda_ext install --record /tmp/pip-record-dmtu3t6t/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.
Exception information:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/cli/base_command.py", line 153, in _main
status = self.run(options, args)
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/commands/install.py", line 455, in run
use_user_site=options.use_user_site,
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/req/init.py", line 62, in install_given_reqs
**kwargs
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/req/req_install.py", line 888, in install
cwd=self.unpacked_source_directory,
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/utils/subprocess.py", line 275, in runner
spinner=spinner,
File "/usr/local/lib/python3.6/dist-packages/pip/_internal/utils/subprocess.py", line 242, in call_subprocess
raise InstallationError(exc_msg)
pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools,
While running inference on custom data getting
RuntimeError: "add_cpu/sub_cpu" not implemented for 'Half'
Error
As mentioned in the paper, during fine-tuning, you masked some tokens in the summaries and then you predicted those tokens. But during inference (only test-data is given), you don't have the summaries. So, how did the prediction occur during inference time? I mean, how you gave the inputs for inference and how did the decoding work?
Hi. I want to try the QG using decode_seq2seq.py
. It works when I try use the sample data. But when I use another data, it encounter Key Error: 'H.E.
File "/root/code/unilm/src/pytorch_pretrained_bert/tokenization.py", line 117, in convert_tokens_to_ids
ids.append(self.vocab[token])
KeyError: 'H.E.' # or another weird word
I am guessing that the model provided is for machines with CUDA-capable device.
Do you guys happen to have a pre-trained CPU version for cnndm_model.bin ?
@@ -165,7 +165,7 @@ def main():
print(args.model_recover_path)
for model_recover_path in glob.glob(args.model_recover_path.strip()):
logger.info("***** Recover model: %s *****", model_recover_path)
- model_recover = torch.load(model_recover_path)
+ model_recover = torch.load(model_recover_path, map_location="cpu")
DATA_DIR=../cnndm_data
MODEL_RECOVER_PATH=../cnndm_model.bin
EVAL_SPLIT=test
export PYTORCH_PRETRAINED_BERT_CACHE=/tmp/bert-cased-pretrained-cache
# run decoding
python biunilm/decode_seq2seq.py --fp16 --amp --bert_model bert-large-cased --new_segment_ids --mode s2s --need_score_t
races \
--input_file ${DATA_DIR}/${EVAL_SPLIT}.src --split ${EVAL_SPLIT} --tokenized_input \
--model_recover_path ${MODEL_RECOVER_PATH} \
--max_seq_length 768 --max_tgt_length 128 \
--batch_size 64 --beam_size 5 --length_penalty 0 \
--forbid_duplicate_ngrams --forbid_ignore_word ".|[X_SEP]"
11/04/2019 15:55:06 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/
models.huggingface.co/bert/bert-large-cased-vocab.txt from cache at /tmp/bert-cased-pretrained-cache/cee054f6aafe5e2cf8
16d2228704e326446785f940f5451a5b26033516a4ac3d.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=51 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File "biunilm/decode_seq2seq.py", line 254, in <module>
main()
File "biunilm/decode_seq2seq.py", line 147, in main
amp_handle = amp.init(enable_caching=True)
File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/apex/amp/amp.py", line 65, in init
handle = AmpHandle(enable_caching, verbose)
File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/apex/amp/handle.py", line 14, in __init__
self._default_scaler = LossScaler()
File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/apex/amp/scaler.py", line 35, in __init__
self._overflow_buf = torch.cuda.IntTensor([0])
File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:51
[1] 72305 exit 1 python biunilm/decode_seq2seq.py --fp16 --amp --bert_model bert-large-cased
without --amp
:
Traceback (most recent call last):
File "biunilm/decode_seq2seq.py", line 254, in <module>
main()
File "biunilm/decode_seq2seq.py", line 216, in main
position_ids, input_mask, task_idx=task_idx, mask_qkv=mask_qkv)
File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/john/code/unilm/src/pytorch_pretrained_bert/modeling.py", line 1409, in forward
return self.beam_search(input_ids, token_type_ids, position_ids, attention_mask, task_idx=task_idx, mask_qkv=mask_qkv)
File "/home/john/code/unilm/src/pytorch_pretrained_bert/modeling.py", line 1528, in beam_search
output_all_encoded_layers=True, prev_embedding=prev_embedding, prev_encoded_layers=prev_encoded_layers, mask_qkv=mask_qkv)
File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/john/code/unilm/src/pytorch_pretrained_bert/modeling.py", line 1062, in forward
input_ids, token_type_ids, attention_mask)
File "/home/john/code/unilm/src/pytorch_pretrained_bert/modeling.py", line 1037, in get_extended_attention_mask
extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
File "/home/john/.virtualenvs/unilm/lib/python3.6/site-packages/torch/tensor.py", line 371, in __rsub__
return _C._VariableFunctions.rsub(self, other)
RuntimeError: "add_cpu" not implemented for 'Half'
Packages:
pytorch-pretrained-bert 0.4.0
torch 1.1.0
tensorboardX 1.9
apex 0.1
I wanna to know the evaluation method of the generated task in the paper. Are you testing according to the highest validation set or following the results of the last epoch? If it is the second test method, is the random seed of each training fixed? Thank you so much~
In your paper, you generate five million answerable and four million unanswerable examples to improve the question answering. Can you provide the generated examples for us to reproduce the results? Thank you very much
RT
export PYTORCH_PRETRAINED_BERT_CACHE=/{tmp_folder}/bert-cased-pretrained-cache
From this command I can't understand where can I find bert-cased-pretrained-cache. I tried to pip install separately, but there's no bert-cased-pretrained-cache
Thanks for open-sourcing the code !
After reading your paper, I have a question about the finetuning procedure for Abstractive summarization (and more generally any Seq2Seq task).
I understand this idea : Similarly to Bert and to UniLM pretraining, finetuning on Abstractive Summarization is masking some token and predicting it in order to learn a bidirectional representation of tokens.
But at inference time, since we don't have access to the whole summary (it is yet to be generated), we can only apply a left-to-right LM.
It seems a pretty big discrepancy between training and testing.
What I don't understand is that people already tried to use BERT (trained as a bidirectional encoder) as a left-to-right LM. But results were really low.
And in your case, results are very high !
So my questions are :
Did I miss something ? Did I misunderstood and there is in fact no discrepancy ?
If I understood right, why do you finetune Seq2Seq model using bidirectional LM, and not left-to-right LM ?
Thanks to your contribution and will you consider releasing the model based on BERT-base?
ModuleNotFoundError: No module named 'pytorch_pretrained_bert'
Not able to convert raw custom data into preprocessed data required to model. Could you please help?
Checked #11 but not able to implement it.
作者你好,请问论文给出的结果是最后一个epoch的评估结果,还是每个epoch dev集最高的测试结果呢?另外如果是第一种random seed有放开么?感谢~
After installing the environment,I run the run_seq2seq.py. When I load the model,Segmentation Fault Core dump occurs. My environment is pytorch 1.1.0, cuda 10.1, torch-vision 0.3.0. so why the core dump occurs?
Hi
Thanks for sharing this.
I've tried to run the question generation part.
I can manage to make it work, but with a batch_size of 8, because of my 16GB limit on my GPU.
So I wanted to switch to FP16 to increase the batch_size and speed up the training.
I'm getting this error:
10/16/2019 09:32:57 - INFO - main - device: cuda n_gpu: 1, distributed training: False, 16-bits training: True
10/16/2019 09:32:57 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt from cache at /tmp/bert-cased-pretrained-cache/cee054f6aafe5e2cf816d2228704e326446785f940f5451a5b26033516a4ac3d.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1
Loading Train Dataset /root/unilm/data/train
Load 75722 documents
10/16/2019 09:33:02 - INFO - main - enable fp16 with amp
10/16/2019 09:33:02 - INFO - main - ***** Recover model: /root/unilm/models/unilmv1-large-cased.bin *****
10/16/2019 09:33:03 - INFO - pytorch_pretrained_bert.modeling - loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz from cache at /tmp/bert-cased-pretrained-cache/7fb0534b83c42daee7d3ddb0ebaa81387925b71665d6ea195c5447f1077454cd.eea60d9ebb03c75bb36302aa9d241d3b7a04bba39c360cf035e8bf8140816233
10/16/2019 09:33:03 - INFO - pytorch_pretrained_bert.modeling - extracting archive file /tmp/bert-cased-pretrained-cache/7fb0534b83c42daee7d3ddb0ebaa81387925b71665d6ea195c5447f1077454cd.eea60d9ebb03c75bb36302aa9d241d3b7a04bba39c360cf035e8bf8140816233 to temp dir /tmp/tmppli0_vk5
10/16/2019 09:33:14 - INFO - pytorch_pretrained_bert.modeling - Model config {
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"ffn_type": 0,
"fp32_embedding": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 1024,
"initializer_range": 0.02,
"intermediate_size": 4096,
"label_smoothing": 0.1,
"max_position_embeddings": 512,
"new_pos_ids": false,
"num_attention_heads": 16,
"num_hidden_layers": 24,
"num_qkv": 0,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"relax_projection": 0,
"seg_emb": false,
"task_idx": 3,
"type_vocab_size": 6,
"vocab_size": 28996
}^[[A10/16/2019 09:33:49 - INFO - pytorch_pretrained_bert.modeling - Weights of BertForPreTrainingLossMask not initialized from pretrained model: ['crit_mask_lm_smoothed.one_hot']
10/16/2019 09:33:50 - INFO - main - ***** CUDA.empty_cache() *****
10/16/2019 09:33:50 - INFO - main - ***** Running training *****
10/16/2019 09:33:50 - INFO - main - Batch size = 4
10/16/2019 09:33:50 - INFO - main - Num steps = 9465
Epoch: 0%| | 0/1 [00:01<?, ?it/s]
Traceback (most recent call last):
File "biunilm/run_seq2seq.py", line 483, in
main()
File "biunilm/run_seq2seq.py", line 461, in main
optimizer.step()
File "/root/.local/lib/python3.6/site-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/optimizers/fp16_optimizer.py", line 157, in step
grads_groups_flat.append(_flatten_dense_tensors([p.grad for p in group]))
File "/opt/conda/lib/python3.6/site-packages/torch/_utils.py", line 192, in _flatten_dense_tensors
flat = torch.cat([t.contiguous().view(-1) for t in tensors], dim=0)
File "/opt/conda/lib/python3.6/site-packages/torch/_utils.py", line 192, in
flat = torch.cat([t.contiguous().view(-1) for t in tensors], dim=0)
AttributeError: 'NoneType' object has no attribute 'contiguous'
Iter (loss=5.700): 0%|
when running this command line:
python3 biunilm/run_seq2seq.py --do_train --num_workers 0
--bert_model bert-large-cased --new_segment_ids --tokenized_input
--data_dir ${DATA_DIR} --src_file train.pa.tok.txt --tgt_file train.q.tok.txt
--output_dir ${OUTPUT_DIR}/bert_save
--log_dir ${OUTPUT_DIR}/bert_log
--model_recover_path ${MODEL_RECOVER_PATH}
--max_seq_length 512 --max_position_embeddings 512
--mask_prob 0.7 --max_pred 48
--train_batch_size 8 --gradient_accumulation_steps 2
--learning_rate 0.00002 --warmup_proportion 0.1 --label_smoothing 0.1
--num_train_epochs 1
--amp
--fp16
Any clue of what it could come from ?
Thanks in advance
Philippe
Hello,
First of all, thank you so much for open sourcing your code. This is great work! I have a quick question on the tasks in
https://github.com/microsoft/unilm/blob/master/src/pytorch_pretrained_bert/modeling.py#L1214-L1217
Can you define task_idx
? I see from seq2seq_loader.py
that task_idx=3
is for seq2seq LM and left2right LM. I think that task_idx=0
is for bidirectional LM. However, I am not sure about 1 and 2.
I really appreciate your help!
UniLM team, Awesome work!! I am able to generate very good quality questions & am thoroughly impressed with some particularly generated questions, they are simply amazing, generative models are the future & carry insane potential!!
I have one question though:
I followed your approach of using [SEP] tag post passages providing hints to draw meaningful questions. However, am not sure how can I scale this for the dataset I have? I am thinking to apply NER on the passages I have and further piggybacking on selective NERs, generating [SEP] hints for every passage.
Is there a better & faster approach?? Obviously NERs are not always picking the desired hints I would wish to capture & hence I may lose lot of intel with NERs approach. I am just not able to think of any better alternatives. Humanly curating [SEP] hints is not feasible for my research work.
Thank you & keep doing the amazing work!
Anshoo
Hi, I find that during decoding in seq2seq, you will pad the source sequence into the length of max_src_length
with '[PAD]'. But during the training, there is no such padding in the source sequence. Would it introduce any inconsistency problems?
I am a bit confused regarding the implementation of equation 3 of the paper where the Mask Matrix M was used. Can you please describe, how is the Mask Matrix M implemented in the code and in which portion of the code equation 3 was used?
git version 2.23.0.windows.1
alias=`whoami | cut -d'.' -f2`; docker run -it --rm --runtime=nvidia --ipc=host --privileged -v /home/${alias}:/home/${alias} pytorch/pytorch:1.1.0-cuda10.0-cudnn7.5-devel bash
##Returns:
C:\Program Files\Docker\Docker\Resources\bin\docker.exe: Error response from daemon: Mount denied:
The source path "C:/Program Files/Git/home/zzj04;C"
doesn't exist and is not known to Docker.
See 'C:\Program Files\Docker\Docker\Resources\bin\docker.exe run --help'.
I would appreciate if another user could share the sample commands that can reproduce the interesting results included with the arvix paper. No success so far with original MS repo and only limited, repetitious output using the current huggingface unilm fork and same prompt.
Cheers
Seed line from 1984 used in original paper:
Winston sat back. A sense of complete helplessness had descended upon him.
Published output:
He was now helpless and unmoving. What would that
mean? To become a monk under her mother’s rule. What would it
mean for him to be put at the center of the most dangerous of all of
demons, the demon known as Azrael? He would never know exactly
what that was, or even how it would end, unless he could sense the
strength of Azrael’s mind and soul. It wasn’t only the existence of
his mother, it was all he knew, and every moment. He was trapped
within her embrace, a trapped, torn, battered, crippling presence he
only knew it would mean was eternity. And in the instant that he
was lost his life changed forever, when Azraelis’ spirit touched his
mind, he was engulfed in this dark and mystical, so enchanting and
beautiful force.
Hello, I try to use your model on QG task. However it takes a lot of time to fine-tune.
It takes 8h for each epoch on SQuAD dataset. (no fp16 since my GPU is not suppoted)
The GPU is 1080ti (11g) and I need to set the batch size to 1.
I wonder why this model is so slow in fine-tune part ? (compare to other pre-train model like GPT and BERT)
Another question is about the decode part. Each time predict a word and feed this updated sequence back to model to predict next word. Is this right? (Sorry the codes are too complex to understand for me.)
The link to the data of CNN/DM dataset is an already preprocessed dataset.
How can we reproduce similar dataset from the official .story
files ?
What is the hardware setup used in the training here?
We have V100, 2080 RTX, 1080 GTX ti, 1060 GTX, and 960m people here. Hopefully it works on diverse setups.
Thanks for open-sourcing the repo, code is great and really easy to reproduce thanks to docker, your detailed explanations in the README and your finetuned checkpoints !
I have a question about predictions post-processing.
I could reproduce paper's results on CNN/DM datasets with the command provided in the README. My results are :
R-1 | R-2 | R-L |
---|---|---|
43.06 | 20.42 | 40.32 |
But if I run the same command removing the truncation (--trunc_len 0
instead of
--trunc_len 70
), results are much lower :
R-1 | R-2 | R-L |
---|---|---|
42.05 | 19.90 | 39.44 |
Is this normal ?
In other codebase, I've never seen predictions being truncated. I'm wondering why it is necessary with UniLM.
I'm also curious to hear your opinion about the reason why the score is lower without truncation.
If I want to train UniLM from scratch on another abstract summarization task (not in English), how do I do it?
I guess the fine tuning and inference code from Readme can be reused, but I'm not sure how to do the pretraining. Can you guys share the pre-train code on CNN summarization? Thanks guys!
In this line (code for evaluation of CNNDM)
Line 239 in d22a233
1
is replaced by #
.
I don't understand it. Can someone explain me the reason of such post-processing ?
Hi, where is the Bleu come from?
I encounter this when run eval
File "src/qg/eval_on_unilm_tokenized_ref.py", line 4, in <module>
from bleu.bleu import Bleu
ImportError: No module named bleu.bleu
Hello,
Thank you very much for sharing this codebase together with a good enough documentation, much appreciated! :-)
Is there any timeline or some information for the upcoming V2 release?
Regards,
Fabian
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.