lancopku / prime Goto Github PK

A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.

License: Other

Python 99.19% Shell 0.09% Lua 0.40% C++ 0.32%

attention transformer sequence-to-sequence language-model

prime's Issues

IWSLT'14 DE-EN Numbers

Hi,

I followed all the commands mentioned in https://github.com/lancopku/Prime/blob/master/examples/parallel_intersected_multi-scale_attention(Prime)/README.md#iwslt14-de-en and ran it till 20000 steps. The bleu score for the best ckpt was 35.07 and the bleu score for the avg of the last 10 ckpts was 35.78. PPL was 4.7+. The repo mentions that the bleu score for the best ckpt is around 35.7. Is there any mistake in my implementation? or do i have tune the lenpen and beam size to get the numbers mentioned? Would be helpful if you could clarify these doubts. Thanks!

The multi-scale gate is modeled by parameterized weights instead of depending on the input data. So, why should term it 'dymanticlly' rather than 'adaptively'?

TypeError: argument of type 'NoneType' is not iterable

Traceback (most recent call last):
File "train.py", line 311, in
cli_main()
File "train.py", line 306, in cli_main
main(args)
File "train.py", line 49, in main
model = task.build_model(args)
File "/home/xgzhu/MUSE/fairseq/tasks/fairseq_task.py", line 169, in build_model
return models.build_model(args, self)
File "/home/xgzhu/MUSE/fairseq/models/init.py", line 50, in build_model
return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
File "/home/xgzhu/MUSE/fairseq/models/transformer.py", line 188, in build_model
encoder = TransformerCombineEncoder(args, src_dict, encoder_embed_tokens)
File "/home/xgzhu/MUSE/fairseq/models/combine_transformer.py", line 57, in init
for i in range(args.encoder_layers)
File "/home/xgzhu/MUSE/fairseq/models/combine_transformer.py", line 57, in
for i in range(args.encoder_layers)
File "/home/xgzhu/MUSE/fairseq/models/combine_transformer.py", line 157, in init
dropout=args.attention_dropout, cur_attn_type='es'
File "/home/xgzhu/MUSE/fairseq/modules/multihead_attention.py", line 93, in init
num_heads=dynamic_num_heads, weight_dropout=0.1, )
File "/home/xgzhu/MUSE/fairseq/modules/dynamic_convolution.py", line 73, in init
self.weight_linear = Linear(self.query_size, num_heads * kernel_size * 1, bias=bias)
File "/home/xgzhu/MUSE/fairseq/modules/linear.py", line 7, in Linear
init_method = args.init_method if 'init_method' in args else 'xavier'
TypeError: argument of type 'NoneType' is not iterable

Hi there,
Thanks so much for the great work!
I'm currently trying to reproduce IWSLT14-de-en (Prime model) results on a single P100 GPU. I follow the exact script at https://github.com/lancopku/Prime/blob/master/examples/parallel_intersected_multi-scale_attention(Prime)/README.md.
However, I'm unable to reproduce the results. It gave me 100+ perplexity after training is finished, and the BLEU score is below 30.

Do you have any suggestions? What is the expected perplexity / curve?

Spelling in the paper appendix

In one of the example sentences in the appendix, the letters ä, ö, and ß are missing. Please correct the sentence

und deswegen haben wir uns entschlossen in berlin eine halle zu bauen,in der wir sozusagen die elektrischen verhltnisse der insel im mastabeins zu drei ganz genau abbilden knnen.

to correctly:

und deswegen haben wir uns entschlossen in berlin eine halle zu bauen, in der wir sozusagen die elektrischen verhältnisse der insel im maßstab eins zu drei ganz genau abbilden können.

In Latex, these letters can be encoded using {\ss} and {"o} or {"a}.

Cheers!

lancopku / prime Goto Github PK

prime's Issues

IWSLT'14 DE-EN Numbers

The multi-scale gate is modeled by parameterized weights instead of depending on the input data. So, why should term it 'dymanticlly' rather than 'adaptively'?

TypeError: argument of type 'NoneType' is not iterable

TypeError: argument of type 'NoneType' is not iterable

TypeError: argument of type 'NoneType' is not iterable

anyone running into 'nan'

muse code?

Reproducing IWSLT14-de-en results

Spelling in the paper appendix

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent