Comments (4)
Hi @kobeyy thanks so much! That is indeed a weird phenomenon and I am not sure where it originates from. One difference is that during validation greedy decoding instead of beam search is used. But that should only speed things up. I'll look into it!
Which was the version of the code you were running it on, i.e., which is the latest commit?
from joeynmt.
I've only recently discovered your repository so it was with the following code
commit: a7cff61
Since them I've been doing more tests and discovered something else that could have something to do with this. When training the same dataset for 50 epochs the time to validate the dev dataset changes dramatically. In the beginning it takes more than 600s to validate 926 inputs. After some training it suddenly goes down to 50s to validate the same inputs.
Has this something to do with initialization of the weights? Is this a specific property of a transformer maybe?
Validation result (greedy) at epoch 1, step 400: bleu: 0.00, loss: 32008.7148, ppl: 4.3102, duration: 671.1562s
Validation result (greedy) at epoch 2, step 800: bleu: 6.26, loss: 15272.6094, ppl: 2.0079, duration: 675.3320s
Validation result (greedy) at epoch 3, step 1200: bleu: 9.61, loss: 11819.4004, ppl: 1.7151, duration: 40.2812s
Validation result (greedy) at epoch 4, step 1600: bleu: 19.98, loss: 8547.9072, ppl: 1.4772, duration: 669.6965s
Validation result (greedy) at epoch 5, step 2000: bleu: 32.51, loss: 5353.7832, ppl: 1.2768, duration: 697.3001s
Validation result (greedy) at epoch 6, step 2400: bleu: 37.37, loss: 4489.4004, ppl: 1.2274, duration: 537.8312s
Validation result (greedy) at epoch 7, step 2800: bleu: 38.12, loss: 4351.2305, ppl: 1.2197, duration: 671.8434s
Validation result (greedy) at epoch 8, step 3200: bleu: 41.90, loss: 3476.2017, ppl: 1.1719, duration: 41.2239s
Validation result (greedy) at epoch 8, step 3600: bleu: 40.06, loss: 3471.8345, ppl: 1.1717, duration: 46.6077s
Validation result (greedy) at epoch 9, step 4000: bleu: 42.01, loss: 2777.8691, ppl: 1.1352, duration: 722.4088s
Validation result (greedy) at epoch 10, step 4400: bleu: 43.41, loss: 3119.1055, ppl: 1.1530, duration: 678.4914s
Validation result (greedy) at epoch 11, step 4800: bleu: 47.62, loss: 2516.2986, ppl: 1.1217, duration: 183.5982s
Validation result (greedy) at epoch 12, step 5200: bleu: 46.87, loss: 2443.6604, ppl: 1.1180, duration: 47.7615s
Validation result (greedy) at epoch 13, step 5600: bleu: 51.19, loss: 2202.4766, ppl: 1.1058, duration: 66.7764s
Validation result (greedy) at epoch 14, step 6000: bleu: 51.08, loss: 2038.4586, ppl: 1.0975, duration: 195.3814s
Validation result (greedy) at epoch 15, step 6400: bleu: 50.86, loss: 2025.7654, ppl: 1.0969, duration: 68.2886s
Validation result (greedy) at epoch 15, step 6800: bleu: 54.32, loss: 2014.4696, ppl: 1.0963, duration: 669.3979s
Validation result (greedy) at epoch 16, step 7200: bleu: 53.02, loss: 2027.4260, ppl: 1.0970, duration: 345.0356s
Validation result (greedy) at epoch 17, step 7600: bleu: 54.21, loss: 1696.3250, ppl: 1.0805, duration: 63.9025s
Validation result (greedy) at epoch 18, step 8000: bleu: 53.67, loss: 1767.0493, ppl: 1.0840, duration: 115.9756s
Validation result (greedy) at epoch 19, step 8400: bleu: 55.62, loss: 1683.1099, ppl: 1.0799, duration: 184.7958s
Validation result (greedy) at epoch 20, step 8800: bleu: 55.62, loss: 1680.7856, ppl: 1.0797, duration: 74.7237s
Validation result (greedy) at epoch 21, step 9200: bleu: 53.13, loss: 1638.7617, ppl: 1.0777, duration: 201.6353s
Validation result (greedy) at epoch 22, step 9600: bleu: 55.51, loss: 1904.4341, ppl: 1.0908, duration: 127.1939s
Validation result (greedy) at epoch 23, step 10000: bleu: 56.80, loss: 1537.6284, ppl: 1.0727, duration: 48.0172s
Validation result (greedy) at epoch 23, step 10400: bleu: 57.24, loss: 1485.4012, ppl: 1.0701, duration: 53.8170s
Validation result (greedy) at epoch 24, step 10800: bleu: 54.43, loss: 1584.8862, ppl: 1.0750, duration: 49.0060s
Validation result (greedy) at epoch 25, step 11200: bleu: 56.70, loss: 1465.6007, ppl: 1.0692, duration: 46.4052s
Validation result (greedy) at epoch 26, step 11600: bleu: 57.45, loss: 1452.8262, ppl: 1.0686, duration: 50.4125s
Validation result (greedy) at epoch 27, step 12000: bleu: 56.70, loss: 1488.7253, ppl: 1.0703, duration: 44.9463s
Validation result (greedy) at epoch 28, step 12400: bleu: 57.88, loss: 1439.3315, ppl: 1.0679, duration: 51.4236s
Validation result (greedy) at epoch 29, step 12800: bleu: 57.45, loss: 1384.6335, ppl: 1.0652, duration: 45.4507s
Validation result (greedy) at epoch 30, step 13200: bleu: 57.67, loss: 1414.4309, ppl: 1.0667, duration: 50.1187s
Validation result (greedy) at epoch 30, step 13600: bleu: 60.04, loss: 1348.5345, ppl: 1.0635, duration: 43.3917s
Validation result (greedy) at epoch 31, step 14000: bleu: 58.64, loss: 1366.7507, ppl: 1.0644, duration: 43.5456s
Validation result (greedy) at epoch 32, step 14400: bleu: 57.78, loss: 1329.0974, ppl: 1.0625, duration: 43.6657s
Validation result (greedy) at epoch 33, step 14800: bleu: 58.75, loss: 1336.3790, ppl: 1.0629, duration: 53.3218s
Validation result (greedy) at epoch 34, step 15200: bleu: 57.78, loss: 1321.9717, ppl: 1.0622, duration: 49.5135s
Validation result (greedy) at epoch 35, step 15600: bleu: 57.88, loss: 1360.4719, ppl: 1.0641, duration: 46.0718s
Validation result (greedy) at epoch 36, step 16000: bleu: 59.50, loss: 1285.9434, ppl: 1.0605, duration: 53.3062s
Validation result (greedy) at epoch 37, step 16400: bleu: 60.26, loss: 1312.4065, ppl: 1.0617, duration: 45.7327s
Validation result (greedy) at epoch 38, step 16800: bleu: 60.15, loss: 1306.4736, ppl: 1.0614, duration: 46.0401s
Validation result (greedy) at epoch 38, step 17200: bleu: 58.96, loss: 1293.1626, ppl: 1.0608, duration: 46.2133s
Validation result (greedy) at epoch 39, step 17600: bleu: 60.48, loss: 1269.6205, ppl: 1.0597, duration: 47.1750s
Validation result (greedy) at epoch 40, step 18000: bleu: 60.37, loss: 1248.1321, ppl: 1.0586, duration: 46.4387s
Validation result (greedy) at epoch 41, step 18400: bleu: 59.83, loss: 1252.2852, ppl: 1.0588, duration: 45.3749s
Validation result (greedy) at epoch 42, step 18800: bleu: 60.15, loss: 1252.2458, ppl: 1.0588, duration: 47.2642s
Validation result (greedy) at epoch 43, step 19200: bleu: 60.15, loss: 1243.7896, ppl: 1.0584, duration: 46.7049s
Validation result (greedy) at epoch 44, step 19600: bleu: 59.07, loss: 1226.6882, ppl: 1.0576, duration: 46.5180s
Validation result (greedy) at epoch 45, step 20000: bleu: 60.15, loss: 1231.0714, ppl: 1.0578, duration: 44.7068s
Validation result (greedy) at epoch 45, step 20400: bleu: 60.91, loss: 1210.8223, ppl: 1.0568, duration: 46.6818s
Validation result (greedy) at epoch 46, step 20800: bleu: 58.96, loss: 1215.4613, ppl: 1.0570, duration: 46.7588s
Validation result (greedy) at epoch 47, step 21200: bleu: 61.23, loss: 1208.6156, ppl: 1.0567, duration: 46.6501s
Validation result (greedy) at epoch 48, step 21600: bleu: 61.23, loss: 1191.9607, ppl: 1.0559, duration: 47.3566s
Validation result (greedy) at epoch 49, step 22000: bleu: 61.12, loss: 1211.2007, ppl: 1.0568, duration: 45.6916s
Validation result (greedy) at epoch 50, step 22400: bleu: 61.56, loss: 1207.3948, ppl: 1.0567, duration: 50.6894s```
from joeynmt.
Hi @kobeyy
thanks for the additional insights. Could you try again with the latest version? I added some code on stopping after generating eos in greedy decoding, so it should be faster now.
from joeynmt.
Closing this due to inactivity.
from joeynmt.
Related Issues (20)
- Multi-GPU training. HOT 5
- JoeyNMT v1 procedure is no more compatible with JoeyNMT v2 HOT 12
- better config validation
- "AutocastCPU only supports Bfloat16" error when following rnn_reverse tutorial HOT 5
- RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) HOT 1
- AttributeError: module 'packaging' has no attribute 'version' HOT 2
- Unit test FAIL: testSentencepieceTokenizer (test.unit.test_tokenizer.TestTokenizer) HOT 4
- trg_mask generate problem HOT 4
- Running build_vocab.py for wmt17_bpe with or without --joint? HOT 3
- RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) HOT 5
- run predict function in Colab, get ConfigurationError: Invalid `batch_type` option.
- (enhancement) Deploying trained models on HuggingFace Space HOT 2
- Basic iwslt config train failure due to directory errors HOT 1
- Early stopping criteria is only checked for the `ReduceLROnPlateau` scheduler HOT 5
- Link in Tutorial to Collab dead HOT 4
- Tutorial - Test Set Evaluation HOT 5
- Columns and DataType Not Explicitly Set on line 387 of datasets.py
- Unit Test Fails - Windows Installation HOT 4
- serving & ONNX compat ?
- Implementing Knowledge distillation HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from joeynmt.