ubiqus / opennmt-py Goto Github PK

This project forked from opennmt/opennmt-py

Open Source Neural Machine Translation in PyTorch

License: MIT License

Python 83.53% Shell 4.79% Perl 7.09% Smalltalk 0.37% Emacs Lisp 3.30% JavaScript 0.16% NewLisp 0.31% Ruby 0.32% Slash 0.05% SystemVerilog 0.04% Dockerfile 0.03%

opennmt-py's People

Contributors

Stargazers

Watchers

Forkers

ibulu haquynh1505 marcwww francoishernandez rgwt123 dheer1206

opennmt-py's Issues

'ZeroDivisionError: division by zero' while training

Hello everyone,
I am very new to opennmt-py. I am trying to run this model for summarization task.
I am training this model on google colab and using cnndm as a datatset.
The lines that are used for training dataset is 50000 and for validation and testing dataset is 5000.
Number of train_steps is 12000.
When It reached at the 10000 step of training, I got the following error.

[2018-10-10 13:20:14,520 INFO] Step 10000/12000; acc: 99.76; ppl: 1.00; xent: 0.00; lr: 0.15000; 45/ 40 tok/s; 37616 sec
[2018-10-10 13:20:16,528 INFO] Loading valid dataset from data/cnndm/CNNDM.valid.1.pt, number of examples: 0
Traceback (most recent call last):
File "train.py", line 40, in
main(opt)
File "train.py", line 27, in main
single_main(opt)
File "/content/gdrive/My Drive/OpenNMT-py/onmt/train_single.py", line 133, in main
opt.valid_steps)
File "/content/gdrive/My Drive/OpenNMT-py/onmt/trainer.py", line 196, in train
step, valid_stats=valid_stats)
File "/content/gdrive/My Drive/OpenNMT-py/onmt/trainer.py", line 356, in _report_step
valid_stats=valid_stats)
File "/content/gdrive/My Drive/OpenNMT-py/onmt/utils/report_manager.py", line 91, in report_step
lr, step, train_stats=train_stats, valid_stats=valid_stats)
File "/content/gdrive/My Drive/OpenNMT-py/onmt/utils/report_manager.py", line 147, in _report_step
self.log('Validation perplexity: %g' % valid_stats.ppl())
File "/content/gdrive/My Drive/OpenNMT-py/onmt/utils/statistics.py", line 97, in ppl
return math.exp(min(self.loss / self.n_words, 100))
ZeroDivisionError: division by zero

Can anyone please help me out what I am doing wrong and why I am getting this error? Do I need to increase my dataset for training?
Looking forward for your help.

Thanks in advance

Training crashes at end

I setup a conda environment with PyTorch 0.4 and installed this fork of OpenNMT-py, as I mentioned in the original issue I had posted here: OpenNMT#743

I then ran:

python preprocess.py -train_src C:\src\torchevere-offensive-classifier\training\character\train_src.txt -train_tgt C:\src\torchevere-offensive-classifier\training\character\train_dst.txt -valid_src C:\src\torchevere-offensive-classifier\training\character\val_src.txt -valid_tgt C:\src\torchevere-offensive-classifier\training\character\val_dst.txt -save_data data/character/tc-offense-classifier-character_v3 -src_seq_length 5000 -tgt_seq_length 5000

python train.py -data data/character/tc-offense-classifier-character_v3 -save_model tc-offense-classifier-character_v3 -gpuid 0 -layers 3 -learning_rate_decay 0.99 -train_steps 10000 -rnn_size 500

Everything ran smoothly until it crashed at the end.
Here's the output:

(pyTorchOffensive) C:\src\pyopennmt\ubiqus\OpenNMT-py>python train.py -data data/character/tc-offense-classifier-character_v3 -save_model tc-offense-classifier-character_v3 -gpuid 0 -layers 3 -learning_rate_decay 0.99 -train_steps 10000 -rnn_size 500
		Loading train dataset from data/character\tc-offense-classifier-character_v3.train.1.pt, number of examples: 3384
		 * vocabulary size. source = 165; target = 6
		Building model...
		Intializing model parameters.
		NMTModel(
		  (encoder): RNNEncoder(
		    (embeddings): Embeddings(
		      (make_embedding): Sequential(
		        (emb_luts): Elementwise(
		          (0): Embedding(165, 500, padding_idx=1)
		        )
		      )
		    )
		    (rnn): LSTM(500, 500, num_layers=3, dropout=0.3)
		  )
		  (decoder): InputFeedRNNDecoder(
		    (embeddings): Embeddings(
		      (make_embedding): Sequential(
		        (emb_luts): Elementwise(
		          (0): Embedding(6, 500, padding_idx=1)
		        )
		      )
		    )
		    (dropout): Dropout(p=0.3)
		    (rnn): StackedLSTM(
		      (dropout): Dropout(p=0.3)
		      (layers): ModuleList(
		        (0): LSTMCell(1000, 500)
		        (1): LSTMCell(500, 500)
		        (2): LSTMCell(500, 500)
		      )
		    )
		    (attn): GlobalAttention(
		      (linear_in): Linear(in_features=500, out_features=500, bias=False)
		      (linear_out): Linear(in_features=1000, out_features=500, bias=False)
		      (softmax): Softmax()
		      (tanh): Tanh()
		    )
		  )
		  (generator): Sequential(
		    (0): Linear(in_features=500, out_features=6, bias=True)
		    (1): LogSoftmax()
		  )
		)
		* number of parameters: 13862506
		encoder:  6094500
		decoder:  7768006
		Making optimizer for training.
		
		Start training...
		Loading train dataset from data/character\tc-offense-classifier-character_v3.train.1.pt, number of examples: 3384
		Step 50, 10000; acc:  27.34; ppl:  16.07; xent:   2.78; lr: 1.00000; 14099 / 3200 tok/s;      4 sec
		GPU 0: for information we completed an epoch at step 54
		Loading train dataset from data/character\tc-offense-classifier-character_v3.train.1.pt, number of examples: 3384
		Step 100, 10000; acc:  74.22; ppl:   1.45; xent:   0.37; lr: 1.00000; 26876 / 2614 tok/s;      6 sec
		GPU 0: for information we completed an epoch at step 107
		
		. . . 
		
		Loading train dataset from data/character\tc-offense-classifier-character_v3.train.1.pt, number of examples: 3384
		Step 9950, 10000; acc: 100.00; ppl:   1.00; xent:   0.00; lr: 1.00000; 16583 / 2462 tok/s;    616 sec
		GPU 0: for information we completed an epoch at step 9965
		Loading train dataset from data/character\tc-offense-classifier-character_v3.train.1.pt, number of examples: 3384
		Step 10000, 10000; acc: 100.00; ppl:   1.00; xent:   0.00; lr: 1.00000; 19345 / 2416 tok/s;    619 sec
		Loading valid dataset from data/character\tc-offense-classifier-character_v3.valid.1.pt, number of examples: 376
		Traceback (most recent call last):
		  File "train.py", line 41, in <module>
		    main(opt)
		  File "train.py", line 28, in main
		    single_main(opt)
		  File "C:\src\pyopennmt\ubiqus\OpenNMT-py\train_single.py", line 120, in main
		    opt.valid_steps)
		  File "C:\src\pyopennmt\ubiqus\OpenNMT-py\onmt\trainer.py", line 176, in train
		    valid_stats = self.validate(valid_iter)
		  File "C:\src\pyopennmt\ubiqus\OpenNMT-py\onmt\trainer.py", line 208, in validate
		    for batch in valid_iter:
		  File "C:\src\pyopennmt\ubiqus\OpenNMT-py\onmt\inputters\inputter.py", line 423, in __iter__
		    for batch in self.cur_iter:
		  File "C:\Anaconda3\envs\pyTorchOffensive\lib\site-packages\torchtext\data\iterator.py", line 151, in __iter__
		    self.train)
		  File "C:\Anaconda3\envs\pyTorchOffensive\lib\site-packages\torchtext\data\batch.py", line 27, in __init__
		    setattr(self, name, field.process(batch, device=device, train=train))
		  File "C:\Anaconda3\envs\pyTorchOffensive\lib\site-packages\torchtext\data\field.py", line 188, in process
		    tensor = self.numericalize(padded, device=device, train=train)
		  File "C:\Anaconda3\envs\pyTorchOffensive\lib\site-packages\torchtext\data\field.py", line 287, in numericalize
		    arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
		  File "C:\Anaconda3\envs\pyTorchOffensive\lib\site-packages\torchtext\data\field.py", line 287, in <listcomp>
		    arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
		  File "C:\Anaconda3\envs\pyTorchOffensive\lib\site-packages\torchtext\data\field.py", line 287, in <listcomp>
		    arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
		KeyError: '🏻'

Luckily, there was still a model file that was produced during the middle of the training (half way through the specified number of training steps), and I was able to use that with translate.py. And thankfully, it solved the issue with having blanks in the output file, as I mentioned in that original issue (OpenNMT#743).

However, it's still an issue to have the training crash at the end of the process, so I'm reporting this bug.

bug coverage_attn in multi-gpu mode

Could you please help me on how to use the new GPU options? I used the flags -gpuid 0 1 -gpu_verbose 0 -gpu_rank 0 for the trianing script which resulted in the following error

Traceback (most recent call last):
  File "/data/projects/opennmt-ubiqus/train_multi.py", line 43, in run
    single_main(opt)
  File "/data/projects/opennmt-ubiqus/train_single.py", line 120, in main
    opt.valid_steps)
  File "/data/projects/opennmt-ubiqus/onmt/trainer.py", line 143, in train
    if self.gpu_verbose > 1:
TypeError: unorderable types: list() > int()

Here's the output from nvidia-smi, just in case.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 0000A20E:00:00.0 Off |                    0 |
| N/A   70C    P0    65W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | 0000C0B5:00:00.0 Off |                    0 |
| N/A   38C    P0    72W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Fully merged into OpenNMT-py master?

Hi,

At a glance, it looks like this was fully merged into the main OpenNMT-py master (that this repo was forked from). Is that correct?

UserWarning: Implicit dimension choice for softmax has been deprecated.

I understand that this repository is still a work in progress, but I just wanted to point this out.

onmt/modules/copy_generator.py:93: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  prob = F.softmax(logits)

Still getting blanks in translate.py output

This is a carry-over from: OpenNMT#743

When I run translate.py on a model that was trained to completion via the steps here: #4

I'm getting blanks in the output even for small input files, but only some of the predictions are blank. I also tried running:

python translate.py -model C:\src\pyopennmt\ubiqus\OpenNMT-py\tc-offense-classifier-character_v5_step_1000000.pt -src C:\src\torchevere-offensive-classifier\test2.txt -v

so that I could see each prediction in the console, and many of the prediction values are still blank. Any idea what could be causing this?

How to train without gpu?

I get:

$ python train.py -data data/demo -save_model demo-model
...
Loading train dataset from data/demo.train.1.pt, number of examples: 10000
Traceback (most recent call last):
  File "train.py", line 41, in <module>
    main(opt)
  File "train.py", line 28, in main
    single_main(opt)
  File "/toknas/hugh/git/OpenNMT-Ubiqus/train_single.py", line 120, in main
    opt.valid_steps)
  File "/toknas/hugh/git/OpenNMT-Ubiqus/onmt/trainer.py", line 142, in train
    if (i % self.n_gpu == self.gpu_rank):
ZeroDivisionError: integer division or modulo by zero

TypeError: init() got an unexpected keyword argument 'dtype'

I have already installed the dependecy properly. But when I run preprocess.py , it returns error:

Traceback (most recent call last):
  File "preprocess.py", line 218, in <module>
    main()
  File "preprocess.py", line 205, in main
    fields = inputters.get_fields(opt.data_type, src_nfeats, tgt_nfeats)
  File "/opt/conda/lib/python3.6/site-packages/OpenNMT_py-0.4-py3.6.egg/onmt/inputters/inputter.py", line 46, in get_fields
    return TextDataset.get_fields(n_src_features, n_tgt_features)
  File "/opt/conda/lib/python3.6/site-packages/OpenNMT_py-0.4-py3.6.egg/onmt/inputters/text_dataset.py", line 244, in get_fields
    postprocessing=make_src, sequential=False)
TypeError: __init__() got an unexpected keyword argument 'dtype'

But everything works find when I run the OpenNMT master on the same dataset in another docker.

ubiqus / opennmt-py Goto Github PK

opennmt-py's People

Contributors

Stargazers

Watchers

Forkers

opennmt-py's Issues

'ZeroDivisionError: division by zero' while training

Training crashes at end

bug coverage_attn in multi-gpu mode

Fully merged into OpenNMT-py master?

UserWarning: Implicit dimension choice for softmax has been deprecated.

Still getting blanks in translate.py output

How to train without gpu?

TypeError: init() got an unexpected keyword argument 'dtype'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent