mit-han-lab / lite-transformer Goto Github PK
View Code? Open in Web Editor NEW[ICLR 2020] Lite Transformer with Long-Short Range Attention
Home Page: https://arxiv.org/abs/2004.11886
License: Other
[ICLR 2020] Lite Transformer with Long-Short Range Attention
Home Page: https://arxiv.org/abs/2004.11886
License: Other
It seems iwslt14.de-en pretrained mode was missed.
Is the pre-training model of this dataset opened? If so, how can I get it?
hell,in the file of transformer-multibranch-v2,the class of TransformerEncoderLayer--the code are as follow:
if args.encoder_branch_type is None:#default=None????
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, self_attention=True,
)
else:
layers = []
embed_dims = []
heads = []
num_types = len(args.
I just wonder that do the args.encoder_branch_type equalstrue???
It can be easily distinguished that instead of attempting to model both global and local contexts, the attention module in LSRA only focuses on the global contexts capture (no diagonal pattern), leaving the local contexts capture to the convolution branch
I JUST WONDER that the attention branch is incharge of the global features with original attention module why " (no diagonal pattern)"???
Hi,
Can you please share the trained checkpoints for the quantized and quantized+pruned models (shown in this plot - https://github.com/mit-han-lab/lite-transformer#further-compress-transformer-by-182x)?
I am interested in testing it for the translation and the summarization tasks. I would appreciate it if you can share those checkpoints.
Thank you
Thanks so much for open-source your code!
Will it be possible to provide training/config for wikitext-103 dataset?
Thanks
Thank you thank you.
Hello,
I tried evaluating the model. I am getting following error:
Traceback (most recent call last):
File "generate.py", line 191, in
cli_main()
File "generate.py", line 187, in cli_main
main(args)
File "generate.py", line 105, in main
hypos = task.inference_step(generator, models, sample, prefix_tokens)
File "/home/shalinis/lite-transformer/fairseq/tasks/fairseq_task.py", line 246, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 146, in generate
encoder_outs = model.forward_encoder(encoder_input)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in forward_encoder
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 314, in forward
x = layer(x, encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 693, in forward
x, _ = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/multibranch.py", line 37, in forward
x = branch(q.contiguous(), incremental_state=incremental_state)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py", line 131, in forward
output = self.linear2(output)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device
Namespace(ignore_case=False, order=4, ref='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen // self.stat.predlen
ZeroDivisionError: integer division or modulo by zero
Kindly help me in resolving the same.
What is get_input_transform()
in fairseq/models/transformer_multibranch_v2.py used for? I'm trying to apply factorized embedding parameterization like the ALBERT model and was wondering if I could somehow use this function.
could you share the tensorboard log file? thank you so much!
In original fairseq code, encoder_padding_mask
is done after linear1
as following:
x = self.linear1(x)
if self.act is not None:
x = self.act(x)
if encoder_padding_mask is not None:
x = x.masked_fill(encoder_padding_mask.transpose(0, 1).unsqueeze(2), 0)
x = self.conv(x)
x = self.linear2(x)
However, in your code, encoder_padding_mask
is done before linear1
as following:
mask = key_padding_mask
if mask is not None:
q = q.masked_fill(mask.transpose(0, 1).unsqueeze(2), 0)
x = branch(q.contiguous(), incremental_state=incremental_state)
I think the bias
in linear1
will make the encoder_padding_mask
useless in your code. In my experiments, the second approach gets a bad result and the first one gets a normal result.
Hi, Can you please provide the code used to compress the model by 18.2 X using pruning and quantization.
Thanks.
Is there any guidance on preparing the cnndm dataset?
I can not find any clue about how to download & prepare cnndm dataset. Any guidance would be thankful.
hello,bro。could you tell me how to install dynamicconv_cuda package in the dynamicconv——layer。py?thx
hello,as I see in the frame,in the encoderlayer,the padding you set is kernel-size //2, while you set padding equals to kernel-1,I wonder that why???What's the reason???I means that why they don't stay the same??
Hi, when I run the following command:
bash configs/wmt14.en-fr/prepare.sh
the program is stuck at "DeprecationWarning" all the time
Cloning Moses github repository (for tokenization scripts)...
Cloning into 'mosesdecoder'...
remote: Enumerating objects: 40, done.
remote: Counting objects: 100% (40/40), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 147554 (delta 16), reused 13 (delta 5), pack-reused 147514
Receiving objects: 100% (147554/147554), 129.76 MiB | 862.00 KiB/s, done.
Resolving deltas: 100% (114002/114002), done.
Checking connectivity... done.
Cloning Subword NMT repository (for BPE pre-processing)...
Cloning into 'subword-nmt'...
remote: Enumerating objects: 50, done.
remote: Counting objects: 100% (50/50), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 559 (delta 19), reused 28 (delta 9), pack-reused 509
Receiving objects: 100% (559/559), 330.05 KiB | 223.00 KiB/s, done.
Resolving deltas: 100% (325/325), done.
Checking connectivity... done.
training-parallel-europarl-v7.tgz already exists, skipping download
training-parallel-commoncrawl.tgz already exists, skipping download
training-parallel-un.tgz already exists, skipping download
training-parallel-nc-v9.tgz already exists, skipping download
training-giga-fren.tar already exists, skipping download
test-full.tgz already exists, skipping download
gzip: giga-fren.release2.fixed.*.gz: No such file or directory
/home/yutliu/Code/lite-transformer
pre-processing train data...
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
pre-processing test data...
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
splitting train and valid...
learn_bpe.py on data/wmt14_en_fr/tokenized/tmp/train.fr-en...
subword-nmt/learn_bpe.py:267: DeprecationWarning: this script's location has moved to /home/yutliu/Code/lite-transformer/subword-nmt/subword_nmt. This symbolic link will be removed in a future version. Please point to the new location, or install the package and use the command 'subword-nmt'
DeprecationWarning
Could you tell me how to solve it?
Could you share the code of the register_task of abstractive summarization? The effect of mine is not good, I don't know what the problem is.
sorry, I can't find the code of cnn modules in the transformer_multibranch_v2.py. Can you show me where is it?
Hello, I am confused in your results on WMT’14 En-De and WMT’14 En-Fr:
I wonder how you get transformer proposed by Vaswaniet al. (2017) for WMT with different paramters such as 2.8M, 5.7M, by pruning I guess?
There are some problems in the folder lite-transformer/configs/wmt14.en-fr/prepare.sh.There is a conflict between the folder names on lines 50 and 138, which causes the error that the file cannot be found during operation
parser.add_argument('--decoder-kernel-size-list', nargs='', default=[3, 7, 15, 31, 31, 31, 31], type=int)
parser.add_argument('--encoder-kernel-size-list', nargs='', default=[3, 7, 15, 31, 31, 31, 31], type=int)
as you c,above code is about the param of kernel size around the 6 encoder or decoder layer,i just wonder that why they don‘t keep same for 6 layers???
hi, the download link of the preprocessed wmt16_en_de data is invalid.
Can you provide the new link?
Thank you very much.
https://github.com/mit-han-lab/lite-transformer/tree/master/fairseq/modules/lightconv_layer
I am new to lite-transformer. Thank you!
@Michaelvll @chenw23
bash: /opt/tiger/conda/lib/libtinfo.so.6: no version information available (required by bash)
Cloning Moses github repository (for tokenization scripts)...
fatal: destination path 'mosesdecoder' already exists and is not an empty directory.
Cloning Subword NMT repository (for BPE pre-processing)...
fatal: destination path 'subword-nmt' already exists and is not an empty directory.
training-parallel-europarl-v7.tgz already exists, skipping download
training-parallel-commoncrawl.tgz already exists, skipping download
training-parallel-un.tgz already exists, skipping download
training-parallel-nc-v9.tgz already exists, skipping download
training-giga-fren.tar already exists, skipping download
test-full.tgz already exists, skipping download
gzip: giga-fren.release2.fixed.*.gz: No such file or directory
/home/tiger/lite-transformer
pre-processing train data...
rm: cannot remove 'data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/train.tags.en-fr.tok.en': No such file or directory
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
rm: cannot remove 'data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/train.tags.en-fr.tok.fr': No such file or directory
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
pre-processing test data...
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
splitting train and valid...
learn_bpe.py on data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/train.fr-en...
apply_bpe.py to train.en...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to valid.en...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to test.en...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to train.fr...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to valid.fr...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to test.fr...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
clean-corpus.perl: processing data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/bpe.train.en & .fr to data/wmt14_en_fr/wmt14.tokenized.en-fr/train, cutoff 1-250, ratio 1.5
..........(100000)..........(200000)..........(300000)..........(400000)..........(500000)..........(600000)..........(700000)..........(800000)..........(900000)..........(1000000)..........(1100000)..........(1200000)..........(1300000)..........(1400000)..........(1500000)..........(1600000)..........(1700000)..........(1800000)..........(1900000)..........(2000000)..........(2100000)..........(2200000)..........(2300000)..........(2400000)..........(2500000)..........(2600000)..........(2700000)..........(2800000)..........(2900000)..........(3000000)..........(3100000)..........(3200000)..........(3300000)..........(3400000)..........(3500000)..........(3600000)..........(3700000)..........(3800000)..........(3900000)..........(4000000)..........(4100000)..........(4200000)..........(4300000)..........(4400000)..........(4500000)..........(4600000)..........(4700000)..........(4800000)..........(4900000)..........(5000000)..........(5100000)..........(5200000)..........(5300000)..........(5400000)..........(5500000)..........(5600000)..........(5700000)..........(5800000)..........(5900000)..........(6000000)..........(6100000)..........(6200000)..........(6300000)..........(6400000)..........(6500000)..........(6600000)..........(6700000)..........(6800000)..........(6900000)..........(7000000)..........(7100000)..........(7200000)..........(7300000)..........(7400000)..........(7500000)..........(7600000)..........(7700000)..........(7800000)..........(7900000)..........(8000000)..........(8100000)..........(8200000)..........(8300000)..........(8400000)..........(8500000)..........(8600000)..........(8700000)..........(8800000)..........(8900000)..........(9000000)..........(9100000)..........(9200000)..........(9300000)..........(9400000)..........(9500000)..........(9600000)..........(9700000)..........(9800000)..........(9900000)..........(10000000)..........(10100000)..........(10200000)..........(10300000)..........(10400000)..........(10500000)..........(10600000)..........(10700000)..........(10800000)..........(10900000)..........(11000000)..........(11100000)..........(11200000)..........(11300000)..........(11400000)..........(11500000)..........(11600000)..........(11700000)..........(11800000)..........(11900000)..........(12000000)..........(12100000)..........(12200000)..........(12300000)..........(12400000)..........(12500000)..........(12600000)..........(12700000)..........(12800000)..........(12900000)..........(13000000)..........(13100000)..........(13200000)..........(13300000)..........(13400000)..........(13500000)..........(13600000)..........(13700000)..........(13800000)..........(13900000)..........(14000000)..........(14100000)..........(14200000)..........(14300000)..........(14400000)..........(14500000)..........(14600000)..........(14700000)..........(14800000)..........(14900000)..........(15000000)..........(15100000)..........(15200000)..........(15300000)..........(15400000)..........(15500000)..........(15600000)..........(15700000)..........(15800000)..........(15900000)..........(16000000)..........(16100000)..........(16200000)..........(16300000)..........(16400000)..........(16500000)..........(16600000)..........(16700000)..........(16800000)..........(16900000)..........(17000000)..........(17100000)..........(17200000)..........(17300000)..........(17400000)..........(17500000)..........(17600000)..........(17700000)..........(17800000)..........(17900000)..........(18000000)..........(18100000)..........(18200000)..........(18300000)..........(18400000)..........(18500000)..........(18600000)..........(18700000)..........(18800000)..........(18900000)..........(19000000)..........(19100000)..........(19200000)..........(19300000)..........(19400000)..........(19500000)..........(19600000)..........(19700000)..........(19800000)..........(19900000)..........(20000000)..........(20100000)..........(20200000)..........(20300000)..........(20400000)..........(20500000)..........(20600000)..........(20700000)..........(20800000)..........(20900000)..........(21000000)..........(21100000)..........(21200000)..........(21300000)..........(21400000)..........(21500000)..........(21600000)..........(21700000)..........(21800000)..........(21900000)..........(22000000)..........(22100000)..........(22200000)..........(22300000)..........(22400000)..........(22500000)..........(22600000)..........(22700000)..........(22800000)..........(22900000)..........(23000000)..........(23100000)..........(23200000)..........(23300000)..........(23400000)..........(23500000)..........(23600000)..........(23700000)..........(23800000)..........(23900000)..........(24000000)..........(24100000)..........(24200000)..........(24300000)..........(24400000)..........(24500000)..........(24600000)..........(24700000)..........(24800000)..........(24900000)..........(25000000)..........(25100000)..........(25200000)..........(25300000)..........(25400000)..........(25500000)..........(25600000)..........(25700000)..........(25800000)..........(25900000)..........(26000000)..........(26100000)..........(26200000)..........(26300000)..........(26400000)..........(26500000)..........(26600000)..........(26700000)..........(26800000)..........(26900000)..........(27000000)..........(27100000)..........(27200000)..........(27300000)..........(27400000)..........(27500000)..........(27600000)..........(27700000)..........(27800000)..........(27900000)..........(28000000)..........(28100000)..........(28200000)..........(28300000)..........(28400000)..........(28500000)..........(28600000)..........(28700000)..........(28800000)..........(28900000)..........(29000000)..........(29100000)..........(29200000)..........(29300000)..........(29400000)..........(29500000)..........(29600000)..........(29700000)..........(29800000)..........(29900000)..........(30000000)..........(30100000)..........(30200000)..........(30300000)..........(30400000)..........(30500000)..........(30600000)..........(30700000)..........(30800000)..........(30900000)..........(31000000)..........(31100000)..........(31200000)..........(31300000)..........(31400000)..........(31500000)..........(31600000)..........(31700000)..........(31800000)..........(31900000)..........(32000000)..........(32100000)..........(32200000)..........(32300000)..........(32400000)..........(32500000)..........(32600000)..........(32700000)..........(32800000)..........(32900000)..........(33000000)..........(33100000)..........(33200000)..........(33300000)..........(33400000)..........(33500000)..........(33600000)..........(33700000)..........(33800000)..........(33900000)..........(34000000)..........(34100000)..........(34200000)..........(34300000)..........(34400000)..........(34500000)..........(34600000)..........(34700000)..........(34800000)..........(34900000)..........(35000000)..........(35100000)..........(35200000)..........(35300000)..........(35400000)..........(35500000)..........(35600000)..........(35700000)..........(35800000)..........(35900000)..........(36000000)..........(36100000)..........(36200000)..........(36300000)..........(36400000)..........(36500000)..........(36600000)..........(36700000)..........(36800000)..........(36900000)..........(37000000)..........(37100000)..........(37200000)..........(37300000)..........(37400000)..........(37500000)..........(37600000)..........(37700000)..........(37800000)..........(37900000)..........(38000000)..........(38100000)..........(38200000)..........(38300000)..........(38400000)..........(38500000)..........(38600000)..........(38700000)..........(38800000)..........(38900000)..........(39000000)..........(39100000)..........(39200000)..........(39300000)..........(39400000)..........(39500000)..........(39600000)..........(39700000)..........(39800000)..........(39900000)..........(40000000)..........(40100000)..........(40200000)..........(40300000)..........(40400000)..........(40500000)..........(40600000)..........(40700000)..........(40800000).
Input sentences: 40811694 Output sentences: 35762532
clean-corpus.perl: processing data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/bpe.valid.en & .fr to data/wmt14_en_fr/wmt14.tokenized.en-fr/valid, cutoff 1-250, ratio 1.5
...
Input sentences: 30639 Output sentences: 26854
Traceback (most recent call last):
File "/opt/tiger/conda/bin/fairseq-preprocess", line 11, in
load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')()
File "/opt/tiger/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 489, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/opt/tiger/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 2852, in load_entry_point
return ep.load()
File "/opt/tiger/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 2443, in load
return self.resolve()
File "/opt/tiger/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 2449, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/home/tiger/lite-transformer/fairseq_cli/preprocess.py", line 1
../preprocess.py
^
SyntaxError: invalid syntax
Is there any suggestions? Thanks!
The google drive link of processed data of wmt-16-de-en is not found!
Hi, thanks a lot for the great work.
I am new in nlp and I meet some problems in preprocess the CNN DM dataset (get BPE file for train and val).
Could you please kindly provide the shell scripts of cnndm dataset preprocessing (bpe) which matches with the test set you provided in google drive?
Thanks a lot. Very appreciate.
Hi, thanks for your great work!
I am curious how you measure the flops/macs in your reported numbers, will the torchprofile
package handle the measurement of the customized Cuda code?
Best,
Haoran
Can you tell me the details of model pruning?Thank You!
as we know,the conventional attention module can capture features like fig 3.b(including diagonal and other positions). THIS ability is its nature,BUT i JUST wonder that when we add a branch that can capture local features,the attention module can not capture feature like before,i.g,(including diagonal and other positions),while it just capture global feature!!!
Hi
I try to convert the lite-transformer model to ONNX, but I catch a lot of problems during this process.
I can't move forward with errors. Does anybody have a positive experience in export this model to ONNX?
Error message:
Traceback (most recent call last):
File "generate.py", line 202, in <module>
cli_main()
File "generate.py", line 198, in cli_main
main(args)
File "generate.py", line 110, in main
torch.onnx.export(model, args=(dummy_1, dummy_3, dummy_2), f='output.onnx', keep_initializers_as_inputs=True, opset_version=9, operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)
File "/opt/conda/lib/python3.6/site-packages/torch/onnx/__init__.py", line 230, in export
custom_opsets, enable_onnx_checker, use_external_data_format)
File "/opt/conda/lib/python3.6/site-packages/torch/onnx/utils.py", line 92, in export
use_external_data_format=use_external_data_format)
File "/opt/conda/lib/python3.6/site-packages/torch/onnx/utils.py", line 538, in _export
fixed_batch_size=fixed_batch_size)
File "/opt/conda/lib/python3.6/site-packages/torch/onnx/utils.py", line 374, in _model_to_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
File "/opt/conda/lib/python3.6/site-packages/torch/onnx/utils.py", line 327, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
File "/opt/conda/lib/python3.6/site-packages/torch/jit/__init__.py", line 135, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/jit/_trace.py", line 116, in forward
self._force_outplace,
File "/opt/conda/lib/python3.6/site-packages/torch/jit/_trace.py", line 105, in wrapper
out_vars, _ = _flatten(outs)
RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type NoneType
Thanks.
Hi,
I tried evaluating using the provided checkpoint. I get the following error:
root@jetson:/nlp/lite-transformer/lite-transformer# configs/wmt14.en-fr/test.sh /data/nlp/embed200/ 0 valid
Traceback (most recent call last):
File "generate.py", line 192, in
cli_main()
File "generate.py", line 188, in cli_main
main(args)
File "generate.py", line 32, in main
task = tasks.setup_task(args)
File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/init.py", line 17, in setup_task
return TASK_REGISTRY[args.task].setup_task(args, **kwargs)
File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/translation.py", line 166, in setup_task
args.source_lang, args.target_lang = data_utils.infer_language_pair(paths[0])
File "/nlp/lite-transformer/lite-transformer/fairseq/data/data_utils.py", line 24, in infer_language_pair
for filename in os.listdir(path):
FileNotFoundError: [Errno 2] No such file or directory: 'data/binary/wmt14_en_fr'
Namespace(ignore_case=False, order=4, ref='/data/nlp/embed200//exp/valid_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/data/nlp/embed200//exp/valid_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen / self.stat.predlen
ZeroDivisionError: division by zero
I am new to lite-transformer.
@Michaelvll @chenw23 Thank you very much.
Could you share some more information on how you quantize the model? Did you use any packages for quantization?
Thank you for open-sourcing your codebase :)
I was wondering if you are going to release summarization checkpoints on CNN/DM dataset, as you have reported the results on your paper ?
Hi,
in the README file there are instructions to prepare the other datasets, but they are missing for the CNN / DailyMail dataset. Since you are providing the checkpoint for this case, It would be great if you can include the data preparation instructions too.
Thanks.
I am new to lite-transformer.
@Michaelvll Thank you!
@chenw23 Thank you!
Hello,
I have trained the transformer from scratch for WMT en-fr. I followed the instruction as per the guidelines. However, I can not get good results as compared to pretrained model mentioned in the repository.
Result of Model (Trained from scratch) :
BLEU4 = 2.00, 19.9/2.8/0.8/0.3 (BP=1.000, ratio=0.965, syslen=79863, reflen=82793)
Result of Pretrained model:
BLEU4 = 35.70, 64.6/41.9/29.1/20.6 (BP=1.000, ratio=0.990, syslen=81934, reflen=82793)
Attached is the training log.
17may_train_transformers_adam_resume_epoch16.txt
Could you please have a look on logs and help me to regenrate results as per paper?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.