lancopku / prime Goto Github PK
View Code? Open in Web Editor NEWA simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
License: Other
A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.
License: Other
Hi,
I followed all the commands mentioned in https://github.com/lancopku/Prime/blob/master/examples/parallel_intersected_multi-scale_attention(Prime)/README.md#iwslt14-de-en and ran it till 20000 steps. The bleu score for the best ckpt was 35.07 and the bleu score for the avg of the last 10 ckpts was 35.78. PPL was 4.7+. The repo mentions that the bleu score for the best ckpt is around 35.7. Is there any mistake in my implementation? or do i have tune the lenpen and beam size to get the numbers mentioned? Would be helpful if you could clarify these doubts. Thanks!
Traceback (most recent call last):
File "train.py", line 311, in
cli_main()
File "train.py", line 306, in cli_main
main(args)
File "train.py", line 49, in main
model = task.build_model(args)
File "/home/xgzhu/MUSE/fairseq/tasks/fairseq_task.py", line 169, in build_model
return models.build_model(args, self)
File "/home/xgzhu/MUSE/fairseq/models/init.py", line 50, in build_model
return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
File "/home/xgzhu/MUSE/fairseq/models/transformer.py", line 188, in build_model
encoder = TransformerCombineEncoder(args, src_dict, encoder_embed_tokens)
File "/home/xgzhu/MUSE/fairseq/models/combine_transformer.py", line 57, in init
for i in range(args.encoder_layers)
File "/home/xgzhu/MUSE/fairseq/models/combine_transformer.py", line 57, in
for i in range(args.encoder_layers)
File "/home/xgzhu/MUSE/fairseq/models/combine_transformer.py", line 157, in init
dropout=args.attention_dropout, cur_attn_type='es'
File "/home/xgzhu/MUSE/fairseq/modules/multihead_attention.py", line 93, in init
num_heads=dynamic_num_heads, weight_dropout=0.1, )
File "/home/xgzhu/MUSE/fairseq/modules/dynamic_convolution.py", line 73, in init
self.weight_linear = Linear(self.query_size, num_heads * kernel_size * 1, bias=bias)
File "/home/xgzhu/MUSE/fairseq/modules/linear.py", line 7, in Linear
init_method = args.init_method if 'init_method' in args else 'xavier'
TypeError: argument of type 'NoneType' is not iterable
I am changing the TransformerCombineEncoder to do a seq to seq job, but I got 'nan' after some steps, anyone has experience on this?
nice work here and I really love the results of this paper, just wonder is the muse code already in this repo?
Hi there,
Thanks so much for the great work!
I'm currently trying to reproduce IWSLT14-de-en (Prime model) results on a single P100 GPU. I follow the exact script at https://github.com/lancopku/Prime/blob/master/examples/parallel_intersected_multi-scale_attention(Prime)/README.md.
However, I'm unable to reproduce the results. It gave me 100+ perplexity after training is finished, and the BLEU score is below 30.
Do you have any suggestions? What is the expected perplexity / curve?
In one of the example sentences in the appendix, the letters ä, ö, and ß are missing. Please correct the sentence
und deswegen haben wir uns entschlossen in berlin eine halle zu bauen,in der wir sozusagen die elektrischen verhltnisse der insel im mastabeins zu drei ganz genau abbilden knnen.
to correctly:
und deswegen haben wir uns entschlossen in berlin eine halle zu bauen, in der wir sozusagen die elektrischen verhältnisse der insel im maßstab eins zu drei ganz genau abbilden können.
In Latex, these letters can be encoded using {\ss} and {"o} or {"a}.
Cheers!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.