Giter VIP home page Giter VIP logo

prime's Introduction

News

2019/12/10 We have changed the model name from MUSE(parallel MUlti-Scale attEntion) to PRIME(PaRallel Intersected Multi-scale AttEntion)

Introduction

Core Code:

Relevent links:

About the paper:

TL;DR: A simple module consistently outperforms self-attention and Transformer model on main NMT datasets with SoTA performance.

We ask three questions:

  • Is attention alone good enough?
  • Is parallel representation learning applicable to sequence data and tasks?
  • How to design a module that combines both inductive bias of convolution and self-attention?

We find that there are shortcomings in stand-alone self-attention, and present a new module that maps the input to the hidden space and performs the three operations of self-attention, convolution and nonlinearity in parallel, simply stacking this module outperforms all previous models including Transformer (Vasvani et al., 2017) on main NMT tasks under standard setting.

Key features:

  • Design a multi-branch schema evolving self attention and first successfully combine convolution and self-attention in one module for sequence tasks by the proposed shared projection,
  • SOTA on three main translation datasets, including WMT14 En-Fr, WMT14 En-De and IWSLT14 De-En,
  • Parallel learn sequence representations and thus have potential for acceleration.

Results:

  1. Better than previous models on large NMT datasets; can scale to small datasets and base model setting.
  2. The shared projection is key to combine conv and self-attn; generate better long sequences;potential for acceleration. )
Task size test (BLEU)
IWSLT14 De-En Base 36.3
WMT14 En-De Large 29.9
WMT14 En-Fr Large 43.5

Requirements and Installation

  • PyTorch version >= 1.0.0
  • Python version >= 3.6
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • torch==1.3.1 with cuda==10.0

Installing from source

To install from source and develop locally:

pip install --editable . --user

We provide pre-trained models and detailed example training and evaluation in examples/parallel_intersected_multi-scale_attention(Prime)/README.md.

Citation

Please cite as:

@article{zhao2019muse,
  title={MUSE: Parallel Multi-Scale Attention for Sequence to Sequence Learning},
  author={Zhao, Guangxiang and Sun, Xu and Xu, Jingjing and Zhang, Zhiyuan and Luo, Liangchen},
  journal={arXiv preprint arXiv:1911.09483},
  year={2019}
}

Notes

The code is based on fairseq-0.6.2

prime's People

Contributors

jingjingxupku avatar zhaoguangxiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

prime's Issues

muse code?

nice work here and I really love the results of this paper, just wonder is the muse code already in this repo?

TypeError: argument of type 'NoneType' is not iterable

Traceback (most recent call last):
File "train.py", line 311, in
cli_main()
File "train.py", line 306, in cli_main
main(args)
File "train.py", line 49, in main
model = task.build_model(args)
File "/home/xgzhu/MUSE/fairseq/tasks/fairseq_task.py", line 169, in build_model
return models.build_model(args, self)
File "/home/xgzhu/MUSE/fairseq/models/init.py", line 50, in build_model
return ARCH_MODEL_REGISTRY[args.arch].build_model(args, task)
File "/home/xgzhu/MUSE/fairseq/models/transformer.py", line 188, in build_model
encoder = TransformerCombineEncoder(args, src_dict, encoder_embed_tokens)
File "/home/xgzhu/MUSE/fairseq/models/combine_transformer.py", line 57, in init
for i in range(args.encoder_layers)
File "/home/xgzhu/MUSE/fairseq/models/combine_transformer.py", line 57, in
for i in range(args.encoder_layers)
File "/home/xgzhu/MUSE/fairseq/models/combine_transformer.py", line 157, in init
dropout=args.attention_dropout, cur_attn_type='es'
File "/home/xgzhu/MUSE/fairseq/modules/multihead_attention.py", line 93, in init
num_heads=dynamic_num_heads, weight_dropout=0.1, )
File "/home/xgzhu/MUSE/fairseq/modules/dynamic_convolution.py", line 73, in init
self.weight_linear = Linear(self.query_size, num_heads * kernel_size * 1, bias=bias)
File "/home/xgzhu/MUSE/fairseq/modules/linear.py", line 7, in Linear
init_method = args.init_method if 'init_method' in args else 'xavier'
TypeError: argument of type 'NoneType' is not iterable

Reproducing IWSLT14-de-en results

Hi there,
Thanks so much for the great work!
I'm currently trying to reproduce IWSLT14-de-en (Prime model) results on a single P100 GPU. I follow the exact script at https://github.com/lancopku/Prime/blob/master/examples/parallel_intersected_multi-scale_attention(Prime)/README.md.
However, I'm unable to reproduce the results. It gave me 100+ perplexity after training is finished, and the BLEU score is below 30.

Do you have any suggestions? What is the expected perplexity / curve?

Spelling in the paper appendix

In one of the example sentences in the appendix, the letters ä, ö, and ß are missing. Please correct the sentence

und deswegen haben wir uns entschlossen in berlin eine halle zu bauen,in der wir sozusagen die elektrischen verhltnisse der insel im mastabeins zu drei ganz genau abbilden knnen.

to correctly:

und deswegen haben wir uns entschlossen in berlin eine halle zu bauen, in der wir sozusagen die elektrischen verhältnisse der insel im maßstab eins zu drei ganz genau abbilden können.

In Latex, these letters can be encoded using {\ss} and {"o} or {"a}.

Cheers!

anyone running into 'nan'

I am changing the TransformerCombineEncoder to do a seq to seq job, but I got 'nan' after some steps, anyone has experience on this?

IWSLT'14 DE-EN Numbers

Hi,

I followed all the commands mentioned in https://github.com/lancopku/Prime/blob/master/examples/parallel_intersected_multi-scale_attention(Prime)/README.md#iwslt14-de-en and ran it till 20000 steps. The bleu score for the best ckpt was 35.07 and the bleu score for the avg of the last 10 ckpts was 35.78. PPL was 4.7+. The repo mentions that the bleu score for the best ckpt is around 35.7. Is there any mistake in my implementation? or do i have tune the lenpen and beam size to get the numbers mentioned? Would be helpful if you could clarify these doubts. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.