mit-han-lab / lite-transformer Goto Github PK

View Code? Open in Web Editor NEW

592.0 592.0 78.0 1.35 MB

[ICLR 2020] Lite Transformer with Long-Short Range Attention

Home Page: https://arxiv.org/abs/2004.11886

License: Other

Shell 1.76% Python 91.45% C++ 1.08% Cuda 3.80% Lua 0.62% Cython 1.29%

nlp pytorch transformer

lite-transformer's People

Contributors

Stargazers

Watchers

lite-transformer's Issues

Is there any link for downloading iwslt14.de-en pretrained model?

It seems iwslt14.de-en pretrained mode was missed.
Is the pre-training model of this dataset opened? If so, how can I get it?

TransformerEncoderLayer

hell，in the file of transformer-multibranch-v2，the class of TransformerEncoderLayer--the code are as follow：
if args.encoder_branch_type is None:#default=None？？？？
self.self_attn = MultiheadAttention(
self.embed_dim, args.encoder_attention_heads,
dropout=args.attention_dropout, self_attention=True,
)
else:
layers = []
embed_dims = []
heads = []
num_types = len(args.

I just wonder that do the args.encoder_branch_type equalstrue？？？

in the paragra 4 of paper

It can be easily distinguished that instead of attempting to model both global and local contexts, the attention module in LSRA only focuses on the global contexts capture (no diagonal pattern), leaving the local contexts capture to the convolution branch

I JUST WONDER that the attention branch is incharge of the global features with original attention module why " (no diagonal pattern)"???

Please share your quantization, quantization+pruning checkpoints

Hi,

Can you please share the trained checkpoints for the quantized and quantized+pruned models (shown in this plot - https://github.com/mit-han-lab/lite-transformer#further-compress-transformer-by-182x)?

I am interested in testing it for the translation and the summarization tasks. I would appreciate it if you can share those checkpoints.

Thank you

training config for wikitext103

Thanks so much for open-source your code!

Will it be possible to provide training/config for wikitext-103 dataset?

Thanks

Will you release the TensorFlow code in the future?

Thank you thank you.

Error while testing the model

Hello,

I tried evaluating the model. I am getting following error:

Traceback (most recent call last):
File "generate.py", line 191, in
cli_main()
File "generate.py", line 187, in cli_main
main(args)
File "generate.py", line 105, in main
hypos = task.inference_step(generator, models, sample, prefix_tokens)
File "/home/shalinis/lite-transformer/fairseq/tasks/fairseq_task.py", line 246, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 146, in generate
encoder_outs = model.forward_encoder(encoder_input)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in forward_encoder
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/lite-transformer/fairseq/sequence_generator.py", line 539, in
return [model.encoder(**encoder_input) for model in self.models]
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 314, in forward
x = layer(x, encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/models/transformer_multibranch_v2.py", line 693, in forward
x, _ = self.self_attn(query=x, key=x, value=x, key_padding_mask=encoder_padding_mask)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/multibranch.py", line 37, in forward
x = branch(q.contiguous(), incremental_state=incremental_state)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/lite-transformer/fairseq/modules/dynamicconv_layer/dynamicconv_layer.py", line 131, in forward
output = self.linear2(output)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/home/shalinis/.conda/envs/integration/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: no kernel image is available for execution on the device
Namespace(ignore_case=False, order=4, ref='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/home/shalinis/lite-transformer/checkpoints/wmt14.en-fr/attention/multibranch_v2/embed496//exp/test_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/home/shalinis/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen // self.stat.predlen
ZeroDivisionError: integer division or modulo by zero

Kindly help me in resolving the same.

Applying factorized embedding

What is get_input_transform() in fairseq/models/transformer_multibranch_v2.py used for? I'm trying to apply factorized embedding parameterization like the ALBERT model and was wondering if I could somehow use this function.

could you share the tensorboard log file? thank you so much!

Wrong key_padding_mask position?

In original fairseq code, encoder_padding_mask is done after linear1 as following:

x = self.linear1(x)
if self.act is not None:
    x = self.act(x)
if encoder_padding_mask is not None:
    x = x.masked_fill(encoder_padding_mask.transpose(0, 1).unsqueeze(2), 0)
x = self.conv(x)
x = self.linear2(x)

However, in your code, encoder_padding_mask is done before linear1 as following:

mask = key_padding_mask
if mask is not None:
    q = q.masked_fill(mask.transpose(0, 1).unsqueeze(2), 0)
x = branch(q.contiguous(), incremental_state=incremental_state)

I think the bias in linear1 will make the encoder_padding_mask useless in your code. In my experiments, the second approach gets a bad result and the first one gets a normal result.

Model Compression

Hi, Can you please provide the code used to compress the model by 18.2 X using pruning and quantization.
Thanks.

Is there any guidance on preparing the cnndm dataset?

Is there any guidance on preparing the cnndm dataset?
I can not find any clue about how to download & prepare cnndm dataset. Any guidance would be thankful.

Model size confuse

Hello, I read your paper and found that the smallest model size is 2.8M. However, I ran your config with the smallest embedding size (160) and found that the model size is about 5.2M. The embedding part model size is 8848 * 160 + 2 * 6632 * 160 = 3537920 ≈ 3.5M. So, how can I get the number 2.8M?

about dynamicconv_cuda

hello，bro。could you tell me how to install dynamicconv_cuda package in the dynamicconv——layer。py？thx

about padding！！！

hello，as I see in the frame，in the encoderlayer，the padding you set is kernel-size //2, while you set padding equals to kernel-1,I wonder that why???What's the reason???I means that why they don't stay the same??

DeprecationWarning

Hi, when I run the following command:

bash configs/wmt14.en-fr/prepare.sh

the program is stuck at "DeprecationWarning" all the time

Cloning Moses github repository (for tokenization scripts)...
Cloning into 'mosesdecoder'...
remote: Enumerating objects: 40, done.
remote: Counting objects: 100% (40/40), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 147554 (delta 16), reused 13 (delta 5), pack-reused 147514
Receiving objects: 100% (147554/147554), 129.76 MiB | 862.00 KiB/s, done.
Resolving deltas: 100% (114002/114002), done.
Checking connectivity... done.
Cloning Subword NMT repository (for BPE pre-processing)...
Cloning into 'subword-nmt'...
remote: Enumerating objects: 50, done.
remote: Counting objects: 100% (50/50), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 559 (delta 19), reused 28 (delta 9), pack-reused 509
Receiving objects: 100% (559/559), 330.05 KiB | 223.00 KiB/s, done.
Resolving deltas: 100% (325/325), done.
Checking connectivity... done.
training-parallel-europarl-v7.tgz already exists, skipping download
training-parallel-commoncrawl.tgz already exists, skipping download
training-parallel-un.tgz already exists, skipping download
training-parallel-nc-v9.tgz already exists, skipping download
training-giga-fren.tar already exists, skipping download
test-full.tgz already exists, skipping download
gzip: giga-fren.release2.fixed.*.gz: No such file or directory
/home/yutliu/Code/lite-transformer
pre-processing train data...
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
pre-processing test data...
Tokenizer Version 1.1
Language: en
Number of threads: 8

Tokenizer Version 1.1
Language: fr
Number of threads: 8

splitting train and valid...
learn_bpe.py on data/wmt14_en_fr/tokenized/tmp/train.fr-en...
subword-nmt/learn_bpe.py:267: DeprecationWarning: this script's location has moved to /home/yutliu/Code/lite-transformer/subword-nmt/subword_nmt. This symbolic link will be removed in a future version. Please point to the new location, or install the package and use the command 'subword-nmt'
  DeprecationWarning

Could you tell me how to solve it?

register_task of abstractive summarization

Could you share the code of the register_task of abstractive summarization? The effect of mine is not good, I don't know what the problem is.

Can‘t find the cnn branch,

sorry, I can't find the code of cnn modules in the transformer_multibranch_v2.py. Can you show me where is it?

transfomer model with different paramters

Hello, I am confused in your results on WMT’14 En-De and WMT’14 En-Fr:
I wonder how you get transformer proposed by Vaswaniet al. (2017) for WMT with different paramters such as 2.8M, 5.7M, by pruning I guess?

Data preprocessing

There are some problems in the folder lite-transformer/configs/wmt14.en-fr/prepare.sh.There is a conflict between the folder names on lines 50 and 138, which causes the error that the file cannot be found during operation

about kernel size

parser.add_argument('--decoder-kernel-size-list', nargs='', default=[3, 7, 15, 31, 31, 31, 31], type=int)
parser.add_argument('--encoder-kernel-size-list', nargs='', default=[3, 7, 15, 31, 31, 31, 31], type=int)

as you c，above code is about the param of kernel size around the 6 encoder or decoder layer，i just wonder that why they don‘t keep same for 6 layers？？？

wmt16_en_de dataset link

hi, the download link of the preprocessed wmt16_en_de data is invalid.
Can you provide the new link?
Thank you very much.

Why do you recode the cpp code and cu code? What function is necessary?

https://github.com/mit-han-lab/lite-transformer/tree/master/fairseq/modules/lightconv_layer

I am new to lite-transformer. Thank you!
@Michaelvll @chenw23

wmt14 en-fr data processing problem

bash: /opt/tiger/conda/lib/libtinfo.so.6: no version information available (required by bash)
Cloning Moses github repository (for tokenization scripts)...
fatal: destination path 'mosesdecoder' already exists and is not an empty directory.
Cloning Subword NMT repository (for BPE pre-processing)...
fatal: destination path 'subword-nmt' already exists and is not an empty directory.
training-parallel-europarl-v7.tgz already exists, skipping download
training-parallel-commoncrawl.tgz already exists, skipping download
training-parallel-un.tgz already exists, skipping download
training-parallel-nc-v9.tgz already exists, skipping download
training-giga-fren.tar already exists, skipping download
test-full.tgz already exists, skipping download
gzip: giga-fren.release2.fixed.*.gz: No such file or directory
/home/tiger/lite-transformer
pre-processing train data...
rm: cannot remove 'data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/train.tags.en-fr.tok.en': No such file or directory
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
Tokenizer Version 1.1
Language: en
Number of threads: 8
rm: cannot remove 'data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/train.tags.en-fr.tok.fr': No such file or directory
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
Tokenizer Version 1.1
Language: fr
Number of threads: 8
pre-processing test data...
Tokenizer Version 1.1
Language: en
Number of threads: 8

Tokenizer Version 1.1
Language: fr
Number of threads: 8

splitting train and valid...
learn_bpe.py on data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/train.fr-en...
apply_bpe.py to train.en...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to valid.en...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to test.en...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to train.fr...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to valid.fr...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
apply_bpe.py to test.fr...
subword-nmt/apply_bpe.py:416: ResourceWarning: unclosed file <_io.TextIOWrapper name='data/wmt14_en_fr/wmt14.tokenized.en-fr/code' mode='r' encoding='UTF-8'>
args.codes = codecs.open(args.codes.name, encoding='utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
clean-corpus.perl: processing data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/bpe.train.en & .fr to data/wmt14_en_fr/wmt14.tokenized.en-fr/train, cutoff 1-250, ratio 1.5
..........(100000)..........(200000)..........(300000)..........(400000)..........(500000)..........(600000)..........(700000)..........(800000)..........(900000)..........(1000000)..........(1100000)..........(1200000)..........(1300000)..........(1400000)..........(1500000)..........(1600000)..........(1700000)..........(1800000)..........(1900000)..........(2000000)..........(2100000)..........(2200000)..........(2300000)..........(2400000)..........(2500000)..........(2600000)..........(2700000)..........(2800000)..........(2900000)..........(3000000)..........(3100000)..........(3200000)..........(3300000)..........(3400000)..........(3500000)..........(3600000)..........(3700000)..........(3800000)..........(3900000)..........(4000000)..........(4100000)..........(4200000)..........(4300000)..........(4400000)..........(4500000)..........(4600000)..........(4700000)..........(4800000)..........(4900000)..........(5000000)..........(5100000)..........(5200000)..........(5300000)..........(5400000)..........(5500000)..........(5600000)..........(5700000)..........(5800000)..........(5900000)..........(6000000)..........(6100000)..........(6200000)..........(6300000)..........(6400000)..........(6500000)..........(6600000)..........(6700000)..........(6800000)..........(6900000)..........(7000000)..........(7100000)..........(7200000)..........(7300000)..........(7400000)..........(7500000)..........(7600000)..........(7700000)..........(7800000)..........(7900000)..........(8000000)..........(8100000)..........(8200000)..........(8300000)..........(8400000)..........(8500000)..........(8600000)..........(8700000)..........(8800000)..........(8900000)..........(9000000)..........(9100000)..........(9200000)..........(9300000)..........(9400000)..........(9500000)..........(9600000)..........(9700000)..........(9800000)..........(9900000)..........(10000000)..........(10100000)..........(10200000)..........(10300000)..........(10400000)..........(10500000)..........(10600000)..........(10700000)..........(10800000)..........(10900000)..........(11000000)..........(11100000)..........(11200000)..........(11300000)..........(11400000)..........(11500000)..........(11600000)..........(11700000)..........(11800000)..........(11900000)..........(12000000)..........(12100000)..........(12200000)..........(12300000)..........(12400000)..........(12500000)..........(12600000)..........(12700000)..........(12800000)..........(12900000)..........(13000000)..........(13100000)..........(13200000)..........(13300000)..........(13400000)..........(13500000)..........(13600000)..........(13700000)..........(13800000)..........(13900000)..........(14000000)..........(14100000)..........(14200000)..........(14300000)..........(14400000)..........(14500000)..........(14600000)..........(14700000)..........(14800000)..........(14900000)..........(15000000)..........(15100000)..........(15200000)..........(15300000)..........(15400000)..........(15500000)..........(15600000)..........(15700000)..........(15800000)..........(15900000)..........(16000000)..........(16100000)..........(16200000)..........(16300000)..........(16400000)..........(16500000)..........(16600000)..........(16700000)..........(16800000)..........(16900000)..........(17000000)..........(17100000)..........(17200000)..........(17300000)..........(17400000)..........(17500000)..........(17600000)..........(17700000)..........(17800000)..........(17900000)..........(18000000)..........(18100000)..........(18200000)..........(18300000)..........(18400000)..........(18500000)..........(18600000)..........(18700000)..........(18800000)..........(18900000)..........(19000000)..........(19100000)..........(19200000)..........(19300000)..........(19400000)..........(19500000)..........(19600000)..........(19700000)..........(19800000)..........(19900000)..........(20000000)..........(20100000)..........(20200000)..........(20300000)..........(20400000)..........(20500000)..........(20600000)..........(20700000)..........(20800000)..........(20900000)..........(21000000)..........(21100000)..........(21200000)..........(21300000)..........(21400000)..........(21500000)..........(21600000)..........(21700000)..........(21800000)..........(21900000)..........(22000000)..........(22100000)..........(22200000)..........(22300000)..........(22400000)..........(22500000)..........(22600000)..........(22700000)..........(22800000)..........(22900000)..........(23000000)..........(23100000)..........(23200000)..........(23300000)..........(23400000)..........(23500000)..........(23600000)..........(23700000)..........(23800000)..........(23900000)..........(24000000)..........(24100000)..........(24200000)..........(24300000)..........(24400000)..........(24500000)..........(24600000)..........(24700000)..........(24800000)..........(24900000)..........(25000000)..........(25100000)..........(25200000)..........(25300000)..........(25400000)..........(25500000)..........(25600000)..........(25700000)..........(25800000)..........(25900000)..........(26000000)..........(26100000)..........(26200000)..........(26300000)..........(26400000)..........(26500000)..........(26600000)..........(26700000)..........(26800000)..........(26900000)..........(27000000)..........(27100000)..........(27200000)..........(27300000)..........(27400000)..........(27500000)..........(27600000)..........(27700000)..........(27800000)..........(27900000)..........(28000000)..........(28100000)..........(28200000)..........(28300000)..........(28400000)..........(28500000)..........(28600000)..........(28700000)..........(28800000)..........(28900000)..........(29000000)..........(29100000)..........(29200000)..........(29300000)..........(29400000)..........(29500000)..........(29600000)..........(29700000)..........(29800000)..........(29900000)..........(30000000)..........(30100000)..........(30200000)..........(30300000)..........(30400000)..........(30500000)..........(30600000)..........(30700000)..........(30800000)..........(30900000)..........(31000000)..........(31100000)..........(31200000)..........(31300000)..........(31400000)..........(31500000)..........(31600000)..........(31700000)..........(31800000)..........(31900000)..........(32000000)..........(32100000)..........(32200000)..........(32300000)..........(32400000)..........(32500000)..........(32600000)..........(32700000)..........(32800000)..........(32900000)..........(33000000)..........(33100000)..........(33200000)..........(33300000)..........(33400000)..........(33500000)..........(33600000)..........(33700000)..........(33800000)..........(33900000)..........(34000000)..........(34100000)..........(34200000)..........(34300000)..........(34400000)..........(34500000)..........(34600000)..........(34700000)..........(34800000)..........(34900000)..........(35000000)..........(35100000)..........(35200000)..........(35300000)..........(35400000)..........(35500000)..........(35600000)..........(35700000)..........(35800000)..........(35900000)..........(36000000)..........(36100000)..........(36200000)..........(36300000)..........(36400000)..........(36500000)..........(36600000)..........(36700000)..........(36800000)..........(36900000)..........(37000000)..........(37100000)..........(37200000)..........(37300000)..........(37400000)..........(37500000)..........(37600000)..........(37700000)..........(37800000)..........(37900000)..........(38000000)..........(38100000)..........(38200000)..........(38300000)..........(38400000)..........(38500000)..........(38600000)..........(38700000)..........(38800000)..........(38900000)..........(39000000)..........(39100000)..........(39200000)..........(39300000)..........(39400000)..........(39500000)..........(39600000)..........(39700000)..........(39800000)..........(39900000)..........(40000000)..........(40100000)..........(40200000)..........(40300000)..........(40400000)..........(40500000)..........(40600000)..........(40700000)..........(40800000).
Input sentences: 40811694 Output sentences: 35762532
clean-corpus.perl: processing data/wmt14_en_fr/wmt14.tokenized.en-fr/tmp/bpe.valid.en & .fr to data/wmt14_en_fr/wmt14.tokenized.en-fr/valid, cutoff 1-250, ratio 1.5
...
Input sentences: 30639 Output sentences: 26854
Traceback (most recent call last):
File "/opt/tiger/conda/bin/fairseq-preprocess", line 11, in
load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')()
File "/opt/tiger/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 489, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/opt/tiger/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 2852, in load_entry_point
return ep.load()
File "/opt/tiger/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 2443, in load
return self.resolve()
File "/opt/tiger/conda/lib/python3.7/site-packages/pkg_resources/init.py", line 2449, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/home/tiger/lite-transformer/fairseq_cli/preprocess.py", line 1
../preprocess.py
^
SyntaxError: invalid syntax

Is there any suggestions? Thanks!

About data !

The google drive link of processed data of wmt-16-de-en is not found!

CNN\DM dateset preprocess (bpe 30K)

Hi, thanks a lot for the great work.
I am new in nlp and I meet some problems in preprocess the CNN DM dataset (get BPE file for train and val).
Could you please kindly provide the shell scripts of cnndm dataset preprocessing (bpe) which matches with the test set you provided in google drive?

Thanks a lot. Very appreciate.

How to measure the FLOPs/MACs?

Hi, thanks for your great work!

I am curious how you measure the flops/macs in your reported numbers, will the torchprofile package handle the measurement of the customized Cuda code?

Best,
Haoran

Could you share Quantify and Pruning script? Thank you very much!

model pruning

Can you tell me the details of model pruning？Thank You!

about the global and local features in fig 3

as we know，the conventional attention module can capture features like fig 3.b（including diagonal and other positions）. THIS ability is its nature，BUT i JUST wonder that when we add a branch that can capture local features，the attention module can not capture feature like before，i.g，（including diagonal and other positions），while it just capture global feature！！！

Export model to ONNX

I try to convert the lite-transformer model to ONNX, but I catch a lot of problems during this process.
I can't move forward with errors. Does anybody have a positive experience in export this model to ONNX?

Error message:

Traceback (most recent call last):
  File "generate.py", line 202, in <module>
    cli_main()
  File "generate.py", line 198, in cli_main
    main(args)
  File "generate.py", line 110, in main
    torch.onnx.export(model, args=(dummy_1, dummy_3, dummy_2), f='output.onnx', keep_initializers_as_inputs=True, opset_version=9, operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK)
  File "/opt/conda/lib/python3.6/site-packages/torch/onnx/__init__.py", line 230, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/opt/conda/lib/python3.6/site-packages/torch/onnx/utils.py", line 92, in export
    use_external_data_format=use_external_data_format)
  File "/opt/conda/lib/python3.6/site-packages/torch/onnx/utils.py", line 538, in _export
    fixed_batch_size=fixed_batch_size)
  File "/opt/conda/lib/python3.6/site-packages/torch/onnx/utils.py", line 374, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/opt/conda/lib/python3.6/site-packages/torch/onnx/utils.py", line 327, in _trace_and_get_graph_from_model
    torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
  File "/opt/conda/lib/python3.6/site-packages/torch/jit/__init__.py", line 135, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 726, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.6/site-packages/torch/jit/_trace.py", line 116, in forward
    self._force_outplace,
  File "/opt/conda/lib/python3.6/site-packages/torch/jit/_trace.py", line 105, in wrapper
    out_vars, _ = _flatten(outs)
RuntimeError: Only tuples, lists and Variables supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. But got unsupported type NoneType

Thanks.

in paragra 4 of

Error while evaluating model

Hi,

I tried evaluating using the provided checkpoint. I get the following error:

root@jetson:/nlp/lite-transformer/lite-transformer# configs/wmt14.en-fr/test.sh /data/nlp/embed200/ 0 valid
Traceback (most recent call last):
File "generate.py", line 192, in
cli_main()
File "generate.py", line 188, in cli_main
main(args)
File "generate.py", line 32, in main
task = tasks.setup_task(args)
File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/init.py", line 17, in setup_task
return TASK_REGISTRY[args.task].setup_task(args, **kwargs)
File "/nlp/lite-transformer/lite-transformer/fairseq/tasks/translation.py", line 166, in setup_task
args.source_lang, args.target_lang = data_utils.infer_language_pair(paths[0])
File "/nlp/lite-transformer/lite-transformer/fairseq/data/data_utils.py", line 24, in infer_language_pair
for filename in os.listdir(path):
FileNotFoundError: [Errno 2] No such file or directory: 'data/binary/wmt14_en_fr'
Namespace(ignore_case=False, order=4, ref='/data/nlp/embed200//exp/valid_gen.out.ref', sacrebleu=False, sentence_bleu=False, sys='/data/nlp/embed200//exp/valid_gen.out.sys')
Traceback (most recent call last):
File "score.py", line 88, in
main()
File "score.py", line 84, in main
score(f)
File "score.py", line 78, in score
print(scorer.result_string(args.order))
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 127, in result_string
return fmt.format(order, self.score(order=order), *bleup,
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 103, in score
return self.brevity() * math.exp(psum / order) * 100
File "/nlp/lite-transformer/lite-transformer/fairseq/bleu.py", line 117, in brevity
r = self.stat.reflen / self.stat.predlen
ZeroDivisionError: division by zero

What functions are achieved in the cu code? The cu code is too hard for me to understand. Thank you.

I am new to lite-transformer.

https://github.com/mit-han-lab/lite-transformer/blob/master/fairseq/modules/lightconv_layer/lightconv_cuda_kernel.cu

@Michaelvll @chenw23 Thank you very much.

Quantization

Could you share some more information on how you quantize the model? Did you use any packages for quantization?

Summarization checkpoint release ?

Thank you for open-sourcing your codebase :)

I was wondering if you are going to release summarization checkpoints on CNN/DM dataset, as you have reported the results on your paper ?

Missing Data Preparation section for the CNN / DailyMail dataset

Hi,
in the README file there are instructions to prepare the other datasets, but they are missing for the CNN / DailyMail dataset. Since you are providing the checkpoint for this case, It would be great if you can include the data preparation instructions too.
Thanks.

Could you please point out the core code, as there are too many fairseq code. Thank you!

I am new to lite-transformer.
@Michaelvll Thank you!
@chenw23 Thank you!

Can not get the result as the paper if train the transformer from scratch.

Hello,

I have trained the transformer from scratch for WMT en-fr. I followed the instruction as per the guidelines. However, I can not get good results as compared to pretrained model mentioned in the repository.

Result of Model (Trained from scratch) :
BLEU4 = 2.00, 19.9/2.8/0.8/0.3 (BP=1.000, ratio=0.965, syslen=79863, reflen=82793)

Result of Pretrained model:
BLEU4 = 35.70, 64.6/41.9/29.1/20.6 (BP=1.000, ratio=0.990, syslen=81934, reflen=82793)

Attached is the training log.
17may_train_transformers_adam_resume_epoch16.txt

Could you please have a look on logs and help me to regenrate results as per paper?

mit-han-lab / lite-transformer Goto Github PK

lite-transformer's People

Contributors

Stargazers

Watchers

Forkers

lite-transformer's Issues

Recommend Projects

Recommend Topics

Recommend Org