jayparks / transformer Goto Github PK

View Code? Open in Web Editor NEW

530.0 7.0 121.0 57 KB

A Pytorch Implementation of "Attention is All You Need" and "Weighted Transformer Network for Machine Translation"

Python 100.00%

self-attention attention-mechanism attention-is-all-you-need machine-translation

transformer's People

Contributors

Stargazers

Watchers

Forkers

klpek sumhncku christinaliang langlanglofa nlpscott wizdom13 offbit chenghuige mingyangw javelir dsf1235 embarassed gzjas soumyadeepdey imissucc zyj0021200 xiongshufeng zhouming-hfut queenie88 lichao88 panhuafan elenore1997 zishuym liuzhencheng tonydeep mrzpx lousiaye birendra20743592 yaoyiran communicateconnectcreate wzcui lei522 rogerspy mayanweigithub romangao ljggg xiaokangran li-study sohappyzkx lovelyczli wangclnlp aixiaom kechenqin russul arshadshk le-code qianrenjian moey920 yolymaker hanjx16 xyh523078979 nkechinyere-agu chuongdd284 raytang88 chenyang918 xrosliang arthur-telles3 lesliechen233 zshy1205 nico-robin-mm heenarajan zhaoxvdong sa-asus liutian111111 lichundi anddyyyyy 612twilight mud-fire xunge zhanghahah juntaoduan sino-crdc sumching frankjy xueliu8617112 andyyyf yangpuhai bobbycn79 zacharium ghostlyfeng maggiehao zwy4896 lainegates gaodexiaozheng rrjia yuanqinglee saxloy nhsjgczryf baitutanglj lyapunovstability qhuni wuhuzi smithol zuokuijun gouqi666 y78h11b09 hpetrusev cfsmile aris-z cloudenginehub

transformer's Issues

How to keep constrains of sum(k)=1 and sum(α)=1?

In the original paper(weighted transformer), the author mentioned that "all bounds are respected during each training step by projection."

I have no idea what "by project" means and don't know how to keep the constrains of sum(k)=1 and sum(α)=1.

It seems there is no particular processing in this repository except for initialization. Could you please explain?

This is a piece of junk code don't look at it anymore！

difference between paper and your code

a dropout between two FC in FFN
In the embedding layers, you should multiply those weights by sqrt(d_model).

how to generate train, src, tgt,or how to run the code

RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #3 'index'

This error occurs when I run train.py

Traceback (most recent call last):
File "train.py", line 208, in
main(opt)
File "train.py", line 72, in main
train_loss, train_sents = train(model, criterion, optimizer, train_iter, model_state)
File "train.py", line 113, in train
dec_inputs, dec_inputs_len)
File "C:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "D:\transformer-master\transformer\models.py", line 152, in forward
enc_outputs, enc_self_attns = self.encoder(enc_inputs, enc_inputs_len, return_attn)
File "C:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "D:\transformer-master\transformer\models.py", line 59, in forward
enc_outputs = self.src_emb(enc_inputs)
File "C:\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "C:\Anaconda3\lib\site-packages\torch\nn\modules\sparse.py", line 110, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "C:\Anaconda3\lib\site-packages\torch\nn\functional.py", line 1110, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.LongTensor for argument #3 'index'

Can't run the whole project

......?

error in Linear

self.w_q = Linear([d_model, d_k * n_heads])

TypeError Traceback (most recent call last)
in
2 d_k = 16
3 n_heads = 6
----> 4 w_q = Linear([d_model, d_k * n_heads])
5 w_q

TypeError: init() missing 1 required positional argument: 'out_features'

thx for your sharing, but I raise this error, can you give me some advice?

How to add some functionalities to this code?

Hi. Please i would like to add some features to this code that i have read on beam search. Like coverage penalty and length normalization. But i don't know where to start. Can you help please?

why dose this repo use the earlier labels as the input of Decoder?

''' train.py line 104
enc_inputs, enc_inputs_len = batch.src
dec_, dec_inputs_len = batch.trg
dec_inputs = dec_[:, :-1]
dec_targets = dec_[:, 1:]
dec_inputs_len = dec_inputs_len - 1
'''
In the original paper of Transformer, the input of Decoder is the earlier outputs but not labels.

only integer tensors of a single element can be converted to an index

When I run the colder, there is an error in model.py (line 53):

self.layers = nn.ModuleList( [self.layer_type(d_k, d_v, d_model, d_ff, n_heads, dropout) for _ in range(n_layers)])

only integer tensors of a single element can be converted to an index

How can I fix it

Usage of this repository

Please i would like to know how to use this repository because i am getting errors that i don't understand will running it.

Some error in position encoding

The error "ValueError: only one element tensors can be converted to Python scalars" occurred in L79: input_pos = tensor([list(range(1, len+1)) + [0]*(max_len-len) for len in input_len]) in modules.py.
I want to know how to fix it, very grateful to get your reply!

I don`t want to debug...

The test set is useless and there are lots of bugs....

Thanks for your sharing, but I have a question that how to use it ?

ModuleNotFoundError: No module named 'torchtext'

TypeError: on init() missing require positional argument: out_features

python3 train.py -model_path models -data_path models/preprocess-train.t7
Namespace(batch_size=128, d_ff=2048, d_k=64, d_model=512, d_v=64, data_path='models/preprocess-train.t7', display_freq=100, dropout=0.1, log=None, lr=0.0002, max_epochs=10, max_grad_norm=None, max_src_seq_len=50, max_tgt_seq_len=50, model_path='models', n_heads=8, n_layers=6, n_warmup_steps=4000, share_embs_weight=False, share_proj_weight=False, weighted_model=False)
Loading training and development data..
Creating new model parameters..
Traceback (most recent call last):
File "train.py", line 200, in
main(opt)
File "train.py", line 47, in main
model, model_state = create_model(opt)
File "train.py", line 27, in create_model
model = Transformer(opt) # Initialize a model state.
File "/media/vivien/A/NEW-SMT/transformer-new-master/transformer/models.py", line 110, in init
opt.max_src_seq_len, opt.src_vocab_size, opt.dropout, opt.weighted_model)
File "/media/vivien/A/NEW-SMT/transformer-new-master/transformer/models.py", line 54, in init
[self.layer_type(d_k, d_v, d_model, d_ff, n_heads, dropout) for _ in range(n_layers)])
File "/media/vivien/A/NEW-SMT/transformer-new-master/transformer/models.py", line 54, in
[self.layer_type(d_k, d_v, d_model, d_ff, n_heads, dropout) for _ in range(n_layers)])
File "/media/vivien/A/NEW-SMT/transformer-new-master/transformer/layers.py", line 11, in init
self.enc_self_attn = MultiHeadAttention(d_k, d_v, d_model, n_heads, dropout)
File "/media/vivien/A/NEW-SMT/transformer-new-master/transformer/sublayers.py", line 53, in init
self.multihead_attn = _MultiHeadAttention(d_k, d_v, d_model, n_heads, dropout)
File "/media/vivien/A/NEW-SMT/transformer-new-master/transformer/sublayers.py", line 19, in init
self.w_q = Linear([d_model, d_k * n_heads])
TypeError: init() missing 1 required positional argument: 'out_features'

Thanks for your sharing, but I have a question that how to use it ?

how to use it ?