Comments (18)
you can git pull and see how it goes.
from opennmt-py.
I didn't find the bug yet. ㄒoㄒ Only find v2 add_qkvbias=True, Add_ffnbias = True
, v3 add_qkvbias = False, Add_ffnbias = False
from opennmt-py.
I am fixing this in #2491
from opennmt-py.
I am fixing this in #2491
Thanks very much for your reply and effort, please let me know if you fix it. I'm continuing to look for bugs as well.
from opennmt-py.
you can git pull and see how it goes.
Actually, I create a new environment (pythhon =3.8, pytorch = 2.0.1, same as before), and pip install from your new source.
Met small error about cuda
/home/zw/OpenNMT-py/onmt/modules/multi_headed_attn.py:481: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:545.)
attn_output = F.scaled_dot_product_attention(
/home/zw/OpenNMT-py/onmt/modules/multi_headed_attn.py:481: UserWarning: Both fused kernels do not support non-null attn_mask. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:191.)
attn_output = F.scaled_dot_product_attention(
/home/zw/OpenNMT-py/onmt/modules/multi_headed_attn.py:481: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:547.)
attn_output = F.scaled_dot_product_attention(
/home/zw/OpenNMT-py/onmt/modules/multi_headed_attn.py:481: UserWarning: Flash attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:326.)
attn_output = F.scaled_dot_product_attention(
Traceback (most recent call last):
File "/home/zw/anaconda3/envs/opennmt3_attn/bin/onmt_translate", line 33, in <module>
sys.exit(load_entry_point('OpenNMT-py', 'console_scripts', 'onmt_translate')())
File "/home/zw/OpenNMT-py/onmt/bin/translate.py", line 57, in main
translate(opt)
File "/home/zw/OpenNMT-py/onmt/bin/translate.py", line 37, in translate
_, _ = translator._translate(
File "/home/zw/OpenNMT-py/onmt/translate/translator.py", line 399, in _translate
batch_data = self.translate_batch(batch, attn_debug)
File "/home/zw/OpenNMT-py/onmt/translate/translator.py", line 786, in translate_batch
return self._translate_batch_with_strategy(batch, decode_strategy)
File "/home/zw/OpenNMT-py/onmt/translate/translator.py", line 822, in _translate_batch_with_strategy
src, enc_final_hs, enc_out, src_len = self._run_encoder(batch)
File "/home/zw/OpenNMT-py/onmt/translate/translator.py", line 793, in _run_encoder
I set the gpt:-1
to use cpu
to calculate the attention map again, but the result is the same as before. ( not on on a diagonal line)
from opennmt-py.
if you use attn_debug: true in your config file, the path should NOT go through line 481 in multi_headed_attn.py
for me it's printing the matrix but I did not visualize it in maps.
from opennmt-py.
one example:
[2023-10-18 17:35:40,089 INFO] ▁Howeve , ▁follow ▁the ▁recent ▁murder ▁of ▁Austra ▁travel ▁agent ▁Michel le ▁Smith ▁in ▁Phuket , ▁Thaila ▁may ▁also ▁be ▁lookin ▁to ▁repair ▁its ▁bat ter ed ▁touris ▁image , ▁leadin ▁to ▁an ▁a c qui t tal .
▁Nach 0.0859985 0.0413208 0.0749512 0.0071526 0.0626831 0.0106964 0.0012608 0.0289917 0.0087433 0.0005326 0.0179901 0.0001922 0.0006528 0.0029335 0.0112305 0.0160065 0.4006348 0.0574036 0.0296478 0.0098724 0.0116959 0.0037117 0.0045128 0.0263977 0.0007720 0.0001991 0.0003815 0.0046959 0.0015087 0.0041008 0.0016375
▁der 0.0165253 0.0081329 0.0267792 0.0301819 0.0224915 0.2446289 0.0090866 0.0593567 0.0060272 0.0051193 0.0680542 0.0002248 0.0012150 0.0047913 0.0470886 0.0227661 0.0820312 0.0078964 0.0010853 0.0005684 0.0021362 0.0008941 0.0097122 0.0037022 0.0005226 0.0002301 0.0003335 0.0028038 0.0015602 0.0026112 0.0006208
▁jüngste 0.0054321 0.0037117 0.0093155 0.0209808 0.0087891 0.6967773 0.0193787 0.0148010 0.0305176 0.0808105 0.0125046 0.0001609 0.0013285 0.0178070 0.0025959 0.0055618 0.0034389 0.0012836 0.0005956 0.0002749 0.0006208 0.0005350 0.0125656 0.0004704 0.0004418 0.0001382 0.0004015 0.0052605 0.0091476 0.0008526 0.0003960
n 0.0010328 0.0008292 0.0004218 0.0001789 0.0002055 0.0015440 0.0002787 0.0030403 0.0016813 0.0002389 0.0020466 0.0000414 0.0003743 0.0003228 0.0046577 0.0039749 0.0064545 0.0007787 0.0004127 0.0002192 0.0011396 0.0002778 0.0009475 0.0007772 0.0002218 0.0000331 0.0001452 0.0008178 0.0006871 0.0010452 0.0002398
▁Er 0.0007892 0.0013170 0.0026932 0.0084305 0.0033512 0.7421875 0.0217285 0.0066109 0.0390320 0.1056519 0.0054817 0.0001131 0.0006909 0.0241852 0.0012255 0.0027561 0.0008450 0.0006638 0.0001560 0.0001645 0.0006657 0.0003746 0.0073967 0.0001256 0.0002594 0.0001361 0.0002525 0.0016146 0.0074692 0.0003622 0.0002594
mord 0.0007887 0.0030155 0.0048904 0.0126495 0.0007138 0.6713867 0.0455933 0.0021286 0.0024586 0.1678467 0.0014791 0.0001935 0.0001980 0.0079880 0.0000440 0.0070572 0.0002542 0.0007067 0.0001779 0.0006466 0.0002412 0.0018854 0.0072212 0.0002315 0.0013638 0.0008001 0.0006576 0.0002145 0.0020046 0.0010586 0.0001594
ung 0.0012188 0.0014143 0.0005441 0.0003843 0.0001823 0.0011406 0.0010624 0.0012712 0.0003500 0.0003798 0.0007195 0.0000673 0.0004013 0.0005999 0.0007949 0.0068398 0.0007615 0.0006204 0.0005713 0.0005255 0.0007777 0.0008206 0.0004485 0.0008445 0.0003242 0.0001042 0.0004051 0.0002396 0.0002861 0.0021477 0.0001315
▁des 0.0007930 0.0030632 0.0007038 0.0005856 0.0003963 0.0022011 0.0020294 0.1138306 0.0132675 0.0063820 0.0977173 0.0002357 0.0035858 0.0014009 0.0276489 0.0189819 0.0217590 0.0013418 0.0007973 0.0004430 0.0040016 0.0007944 0.0011463 0.0014257 0.0002353 0.0000500 0.0001875 0.0013151 0.0004439 0.0017757 0.0001558
▁aus 0.0001901 0.0004411 0.0001414 0.0004246 0.0001154 0.0352783 0.0022945 0.0039444 0.8784180 0.0325623 0.0134430 0.0001414 0.0022411 0.0005507 0.0004609 0.0012980 0.0009642 0.0000862 0.0002782 0.0000227 0.0000595 0.0000373 0.0004365 0.0001119 0.0001292 0.0000083 0.0000620 0.0115204 0.0026150 0.0002112 0.0000669
t 0.0006952 0.0057983 0.0015392 0.0033035 0.0022316 0.0117416 0.0206909 0.1164551 0.0869751 0.0442200 0.0244446 0.0004168 0.0054474 0.0023422 0.0092621 0.0352478 0.0300751 0.0061684 0.0017004 0.0005555 0.0001525 0.0010481 0.0016546 0.0027657 0.0014601 0.0001019 0.0007272 0.0097504 0.0032158 0.0037479 0.0002416
ral 0.0002061 0.0010529 0.0001287 0.0002878 0.0000911 0.0004544 0.0027370 0.0082321 0.0006528 0.0002832 0.0004997 0.0000581 0.0003412 0.0006051 0.0034847 0.0115051 0.0027981 0.0002861 0.0001154 0.0001605 0.0000146 0.0002358 0.0001343 0.0004034 0.0000476 0.0000387 0.0000864 0.0002295 0.0000517 0.0007749 0.0000222
ischen 0.0010996 0.0011530 0.0002440 0.0002328 0.0002596 0.0010004 0.0014763 0.0018587 0.0022316 0.0008483 0.0008683 0.0000640 0.0003564 0.0004997 0.0006504 0.0061798 0.0011349 0.0012150 0.0008049 0.0003297 0.0003128 0.0004168 0.0002297 0.0013838 0.0002059 0.0000590 0.0002407 0.0006075 0.0004604 0.0030346 0.0002059
▁Reisebüro 0.0000035 0.0000150 0.0000053 0.0000269 0.0000136 0.0022507 0.0001953 0.0002052 0.9721680 0.0127258 0.0002309 0.0000219 0.0004015 0.0001224 0.0000547 0.0000329 0.0000476 0.0000045 0.0000177 0.0000026 0.0000031 0.0000029 0.0000503 0.0000044 0.0000129 0.0000017 0.0000032 0.0094604 0.0012093 0.0000175 0.0000119
s 0.0025311 0.0042610 0.0015135 0.0009851 0.0012169 0.0048103 0.0027599 0.0170898 0.0487976 0.0184631 0.0412292 0.0012627 0.0049362 0.0014591 0.0105743 0.0177002 0.0117645 0.0026226 0.0026875 0.0008326 0.0044174 0.0014706 0.0030785 0.0031452 0.0012169 0.0001462 0.0004294 0.0039330 0.0027580 0.0051422 0.0010757
▁Michel 0.0002379 0.0017462 0.0003862 0.0006685 0.0005007 0.0125046 0.0020695 0.0419312 0.0675049 0.0230865 0.4025879 0.0064125 0.1013794 0.0021515 0.0755005 0.0099411 0.0419312 0.0026417 0.0019274 0.0003743 0.0040474 0.0006723 0.0037670 0.0005488 0.0001751 0.0000721 0.0000657 0.0014706 0.0007801 0.0009084 0.0003388
le 0.0015497 0.0028629 0.0008440 0.0003760 0.0001333 0.0002794 0.0027294 0.0021706 0.0010281 0.0003471 0.0014811 0.0002249 0.0026150 0.0003040 0.0008426 0.0184479 0.0013275 0.0014915 0.0010290 0.0014639 0.0016384 0.0009985 0.0001712 0.0013151 0.0003917 0.0000668 0.0004075 0.0000847 0.0000927 0.0054398 0.0005212
▁Smith 0.0000983 0.0003881 0.0000415 0.0000544 0.0000206 0.0003023 0.0004036 0.0018778 0.0251007 0.0020905 0.0089417 0.0002304 0.0873413 0.0003276 0.0020981 0.0051155 0.0023212 0.0001991 0.0002565 0.0001134 0.0006962 0.0002468 0.0003352 0.0000741 0.0000099 0.0000026 0.0000229 0.0005388 0.0001798 0.0003412 0.0000533
▁in 0.0044098 0.0023251 0.0013466 0.0002215 0.0003569 0.0023537 0.0005231 0.0117569 0.0024872 0.0011492 0.0108566 0.0010490 0.0009871 0.0014648 0.0158081 0.0132751 0.0025749 0.0030823 0.0017653 0.0012703 0.0747681 0.0026035 0.0094757 0.0005240 0.0004458 0.0000748 0.0001910 0.0007095 0.0011787 0.0019236 0.0011749
▁Phuket 0.0008979 0.0022869 0.0004442 0.0003762 0.0003319 0.0025139 0.0004528 0.0169678 0.0150528 0.0010405 0.0316772 0.0001663 0.0027542 0.0018349 0.4785156 0.0150375 0.0530701 0.0017824 0.0016146 0.0005326 0.0025558 0.0006175 0.0032310 0.0009012 0.0001982 0.0000542 0.0001032 0.0051117 0.0006824 0.0010843 0.0002930
▁könnte 0.0004787 0.0004864 0.0027790 0.0000560 0.0001649 0.0100021 0.0001402 0.0006685 0.0050812 0.0030670 0.0002706 0.0000681 0.0000150 0.0002825 0.0002749 0.0007539 0.0022984 0.0054207 0.0017738 0.0014210 0.7744141 0.0032940 0.0764160 0.0003395 0.0020218 0.0000910 0.0001603 0.0018940 0.0210724 0.0005684 0.0046158
▁Thailand 0.0005946 0.0014944 0.0003960 0.0001021 0.0001647 0.0017452 0.0000532 0.0110397 0.0074272 0.0004871 0.0097275 0.0000372 0.0008616 0.0003238 0.0301971 0.0068436 0.8095703 0.0007310 0.0019569 0.0003219 0.0017023 0.0001878 0.0014420 0.0011787 0.0000848 0.0000095 0.0000278 0.0020370 0.0004897 0.0003183 0.0001566
▁jedoch 0.0061684 0.0079803 0.0022430 0.0006428 0.0017548 0.0035858 0.0004854 0.0050659 0.0050621 0.0003700 0.0053635 0.0002059 0.0000364 0.0012445 0.0050201 0.0270691 0.0300140 0.0049553 0.0191650 0.0051117 0.0410461 0.0054169 0.0428162 0.0389099 0.0042381 0.0012035 0.0015125 0.0239410 0.0080872 0.0088043 0.0012836
▁auch 0.0081406 0.0138092 0.0049934 0.0012779 0.0028419 0.0065384 0.0010071 0.0034561 0.0022163 0.0005760 0.0050926 0.0005407 0.0000370 0.0027313 0.0053253 0.0477295 0.0214233 0.0094299 0.0121765 0.0066757 0.0598450 0.0104446 0.0513306 0.0444336 0.0054741 0.0024586 0.0015926 0.0182495 0.0090027 0.0133972 0.0023956
▁versuchen 0.0035515 0.0113297 0.0027599 0.0016737 0.0037003 0.0257874 0.0020962 0.0019531 0.0119400 0.0006895 0.0019264 0.0009823 0.0000482 0.0050163 0.0019121 0.0383606 0.0112228 0.0101166 0.0162811 0.0087280 0.0687256 0.0205688 0.3723145 0.0484009 0.0065422 0.0072556 0.0023460 0.0641479 0.0214844 0.0112000 0.0023174
, 0.0022259 0.0052872 0.0007224 0.0003748 0.0005550 0.0028992 0.0005422 0.0047607 0.0039825 0.0003059 0.0048599 0.0000482 0.0000993 0.0006962 0.0032558 0.0206451 0.0242767 0.0014057 0.0050926 0.0010843 0.0104141 0.0034332 0.0513000 0.0290222 0.0045128 0.0023575 0.0022316 0.0335693 0.0117798 0.0079269 0.0007353
▁sein 0.0020504 0.0070724 0.0006337 0.0005155 0.0004134 0.0021458 0.0003884 0.0037556 0.0016870 0.0001005 0.0044899 0.0000210 0.0000429 0.0002546 0.0025463 0.0337524 0.0296478 0.0028934 0.0061531 0.0012007 0.0117416 0.0057297 0.1265869 0.0770874 0.0118866 0.0031147 0.0030155 0.0581970 0.0328674 0.0134048 0.0009785
▁zer 0.0000677 0.0004997 0.0001374 0.0001329 0.0000210 0.0076866 0.0004785 0.0000558 0.0024471 0.0010509 0.0000854 0.0000038 0.0000005 0.0000395 0.0000032 0.0004692 0.0000116 0.0001276 0.0005136 0.0000857 0.0032139 0.0006738 0.0197449 0.0011644 0.1105347 0.0120697 0.0052719 0.1872559 0.6191406 0.0015354 0.0007987
schlagen 0.0005274 0.0035038 0.0001217 0.0002179 0.0000175 0.0037060 0.0015402 0.0000867 0.0000440 0.0001144 0.0001802 0.0000314 0.0000053 0.0000995 0.0000575 0.0075836 0.0000300 0.0012264 0.0009747 0.0002499 0.0006776 0.0049973 0.0038891 0.0031700 0.5332031 0.0704956 0.0296783 0.0129089 0.2128906 0.0172424 0.0002279
es 0.0016747 0.0010757 0.0002573 0.0001006 0.0000851 0.0007548 0.0002773 0.0008502 0.0012541 0.0003057 0.0004854 0.0000122 0.0000439 0.0002031 0.0006456 0.0050163 0.0016079 0.0016460 0.0007696 0.0004570 0.0032063 0.0007477 0.0019426 0.0027256 0.0016041 0.0002284 0.0008845 0.0059509 0.0044899 0.0051041 0.0008984
▁touristis 0.0000222 0.0001825 0.0000159 0.0000277 0.0000017 0.0024815 0.0000647 0.0000084 0.0258179 0.0021305 0.0000588 0.0000013 0.0000041 0.0000296 0.0000054 0.0001688 0.0000145 0.0000191 0.0002563 0.0000427 0.0003541 0.0001367 0.0086136 0.0002792 0.0031872 0.0003688 0.0006223 0.4084473 0.5327148 0.0003748 0.0001888
s 0.0014601 0.0013714 0.0001988 0.0000955 0.0000462 0.0007925 0.0001900 0.0009766 0.0007033 0.0007143 0.0006208 0.0000103 0.0000480 0.0001242 0.0005674 0.0038872 0.0011835 0.0006456 0.0008502 0.0005260 0.0017004 0.0009766 0.0023918 0.0057907 0.0014858 0.0003152 0.0006924 0.0034714 0.0092621 0.0046768 0.0010662
▁Image 0.0002882 0.0009017 0.0001315 0.0001738 0.0000147 0.0080490 0.0007572 0.0001887 0.0080872 0.0133286 0.0001597 0.0000077 0.0000141 0.0002508 0.0000563 0.0011005 0.0001000 0.0001731 0.0011559 0.0007029 0.0010099 0.0015516 0.0103378 0.0060234 0.0093689 0.0032635 0.0036449 0.1153564 0.7319336 0.0040779 0.0032387
▁zu 0.0060768 0.0024776 0.0008850 0.0007381 0.0003514 0.0015354 0.0005293 0.0022354 0.0002458 0.0003533 0.0021725 0.0000204 0.0000738 0.0004213 0.0009513 0.0096512 0.0013952 0.0011473 0.0028515 0.0035095 0.0068817 0.0167542 0.1679688 0.0269318 0.0169678 0.0168304 0.0089722 0.0312500 0.0610657 0.0207520 0.0065918
▁reparier 0.0003493 0.0004835 0.0005031 0.0004964 0.0001776 0.0033207 0.0002384 0.0000968 0.0002446 0.0010328 0.0001121 0.0000041 0.0000156 0.0001498 0.0000575 0.0003972 0.0002400 0.0004175 0.0004630 0.0003424 0.0041237 0.0047340 0.5527344 0.0026855 0.0159302 0.0042839 0.0023270 0.0413208 0.3300781 0.0016356 0.0054741
en 0.0007606 0.0007167 0.0001259 0.0000694 0.0000634 0.0002174 0.0001144 0.0008335 0.0003395 0.0000685 0.0005674 0.0000122 0.0001410 0.0001109 0.0010643 0.0035477 0.0022926 0.0005260 0.0005426 0.0004063 0.0009871 0.0004704 0.0008264 0.0009203 0.0002230 0.0000517 0.0001949 0.0008545 0.0004385 0.0023403 0.0002770
, 0.0010061 0.0015659 0.0002911 0.0001829 0.0000637 0.0003855 0.0002708 0.0011396 0.0003664 0.0001475 0.0016785 0.0000461 0.0001807 0.0001645 0.0014610 0.0099945 0.0060234 0.0011110 0.0020657 0.0014458 0.0077209 0.0026283 0.0162201 0.0049744 0.0007911 0.0004959 0.0009928 0.0021629 0.0062714 0.0172272 0.0095596
▁was 0.0018272 0.0026932 0.0006180 0.0003705 0.0000315 0.0009723 0.0003407 0.0014820 0.0006046 0.0001621 0.0017519 0.0000185 0.0001051 0.0003448 0.0008888 0.0216827 0.0186920 0.0011978 0.0021858 0.0017738 0.0078354 0.0026627 0.0220642 0.0116043 0.0020142 0.0008283 0.0013266 0.0050659 0.0173645 0.0428467 0.0523987
▁zu 0.0013704 0.0014372 0.0003417 0.0001783 0.0000992 0.0015345 0.0000826 0.0039444 0.0014696 0.0000422 0.0083466 0.0000682 0.0000954 0.0001580 0.0044098 0.0106277 0.0101395 0.0003068 0.0008779 0.0003753 0.0014515 0.0006199 0.0142441 0.0067787 0.0026131 0.0008526 0.0005345 0.0251007 0.0104065 0.0221252 0.0429993
▁einem 0.0002360 0.0004559 0.0000472 0.0000460 0.0000152 0.0023575 0.0000835 0.0020924 0.0036621 0.0001019 0.0026035 0.0000177 0.0000482 0.0001273 0.0041695 0.0024986 0.0019474 0.0002196 0.0001051 0.0001900 0.0008941 0.0002787 0.0159454 0.0012779 0.0021420 0.0003543 0.0003114 0.0776978 0.0096512 0.0068398 0.0185547
▁Frei 0.0000033 0.0000148 0.0000024 0.0000058 0.0000009 0.0004978 0.0000075 0.0000789 0.0008879 0.0000491 0.0000111 0.0000024 0.0000016 0.0000066 0.0000113 0.0000149 0.0000088 0.0000023 0.0000057 0.0000086 0.0000394 0.0000093 0.0007944 0.0000212 0.0004785 0.0001886 0.0000308 0.0157623 0.0066757 0.0001007 0.0020752
spruch 0.0001540 0.0009656 0.0000727 0.0001253 0.0000011 0.0008078 0.0001842 0.0001998 0.0002069 0.0009112 0.0001996 0.0000659 0.0001110 0.0002193 0.0011625 0.0011196 0.0001373 0.0000586 0.0001537 0.0002116 0.0000296 0.0003152 0.0007091 0.0004272 0.0024834 0.0022049 0.0005860 0.0044975 0.0231323 0.0015926 0.0053062
▁führen 0.0007825 0.0029373 0.0005622 0.0002073 0.0000259 0.0006423 0.0008011 0.0011740 0.0006576 0.0001522 0.0004501 0.0000645 0.0000592 0.0005441 0.0006056 0.0186310 0.0087509 0.0014277 0.0036316 0.0033855 0.0281372 0.0072174 0.0315247 0.0115585 0.0026760 0.0005536 0.0013494 0.0012693 0.0072746 0.0510254 0.0285339
▁könnte 0.0016336 0.0018511 0.0009007 0.0001321 0.0001057 0.0006022 0.0003371 0.0023041 0.0008903 0.0000521 0.0002730 0.0000247 0.0000696 0.0002135 0.0008593 0.0073891 0.0042305 0.0009794 0.0014000 0.0006204 0.0047340 0.0008492 0.0023003 0.0036812 0.0016775 0.0002890 0.0007701 0.0019569 0.0017786 0.0164795 0.0072136
. 0.0054817 0.0072899 0.0011044 0.0006199 0.0004292 0.0005994 0.0010157 0.0016489 0.0011578 0.0003247 0.0009160 0.0001694 0.0005522 0.0011091 0.0046844 0.0201721 0.0060883 0.0027313 0.0054550 0.0051422 0.0069275 0.0040436 0.0061989 0.0059586 0.0008211 0.0004394 0.0013885 0.0024509 0.0022659 0.0121307 0.0024033
0.0185699 0.0157471 0.0041237 0.0035992 0.0015364 0.0017004 0.0038853 0.0041771 0.0031700 0.0008717 0.0030422 0.0005856 0.0022621 0.0028591 0.0070610 0.0492859 0.0418396 0.0187225 0.0097122 0.0091171 0.0036640 0.0073280 0.0026073 0.0167389 0.0016975 0.0006580 0.0030022 0.0049171 0.0034142 0.0276947 0.0044022
from opennmt-py.
also please install torch 2.1.0 so that you don't get the error when not using attn_debug
from opennmt-py.
also please install torch 2.1.0 so that you don't get the error when not using attn_debug
also please install torch 2.1.0 so that you don't get the error when not using attn_debug
Thanks for your remind, I' ll try
from opennmt-py.
In addition, I did some experiments with controlled variables (Same dataset, Same v2 environment, Same steps, Same config except vocabs) and found may be related to the special tokens (averyunlikelytoken0, averyunlikelytoken1, averyunlikelytoken2) in v3.
Attention map of Vocab_1 (130 single characters) at 20000 steps :
Attention map of Vocab_2(130 single characters + averyunlikelytoken0, averyunlikelytoken1, averyunlikelytoken2 ) at 20000 step :
The attention weights are clearly dispersed in the case of adding these three new tokens.
Can you explain the function of these tokens(averyunlikelytoken0, averyunlikelytoken1, averyunlikelytoken2)?
from opennmt-py.
IIRC those are only to pad vocab_size to a multiple of X (8 most likely). If you don't set this setting, it won't happen.
But please let's focus on one issue at a time. did it fix your attn_debug issue ?
from opennmt-py.
if you use attn_debug: true in your config file, the path should NOT go through line 481 in multi_headed_attn.py
for me it's printing the matrix but I did not visualize it in maps.
Yes, but we can take a short sentence for example and look the data matrix like these.
from opennmt-py.
IIRC those are only to pad vocab_size to a multiple of X (8 most likely). If you don't set this setting, it won't happen. But please let's focus on one issue at a time. did it fix your attn_debug issue ?
The attn_debug is still not fixed.
Actually, I find the introduction of these special tokens from v3 in v2 envrioment will lead to the loss of diagonal attention.
For a more clearly Example,
Attention map of predicting a sentence by a model trained with Vocab_1 (130 single characters) at 10000 steps :
Attention map of predicting a sentence by a model trained with Vocab_2(130 single characters + <blank>
+ <s>
+ </s>
+ averyunlikelytoken0
+ averyunlikelytoken1
+ averyunlikelytoken2
) at 10000 step :
from opennmt-py.
turn off vocab_size_multiple: 1 (default is 8) and see how it goes.
from opennmt-py.
turn off vocab_size_multiple: 1 (default is 8) and see how it goes.
Okay, I'll try. Thanks.
from opennmt-py.
turn off vocab_size_multiple: 1 (default is 8) and see how it goes.
I did some further test, it does have something to do with that setting. The diagonal attention is missing!
Expect any solution without retraining, as the training cycle for our project can take months.
Looking forward to your reply.
Many Many Thanks!
from opennmt-py.
I don't know what you are printing in your graphs and how you are actually doing it so you will have to dig and figure out yourself.
The only thing I can tell is that this setting is ONLY padding the vocab to a multiple of 8 by adding vocab items at the end here:
https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/inputters/inputter.py#L52
Now how you retrieve attentions and so on, is beyond my knowledge so dig in your custom code.
from opennmt-py.
I don't know what you are printing in your graphs and how you are actually doing it so you will have to dig and figure out yourself. The only thing I can tell is that this setting is ONLY padding the vocab to a multiple of 8 by adding vocab items at the end here: https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/inputters/inputter.py#L52 Now how you retrieve attentions and so on, is beyond my knowledge so dig in your custom code.
Okay, Thanks again.
from opennmt-py.
Related Issues (20)
- transforms: filtertoolong failed in translating HOT 1
- bash: scripts/onmt/train.sh: No such file or directory HOT 4
- Cannot load recurrent encoder-decoder model trained with copy attention HOT 7
- Columns and DataType Not Explicitly Set on line 163 of run_mmlu_opennmt.py
- Training fails to start with rotary embedding (Latest OpenNMT-py) HOT 3
- NCCL timeout with 2B+ parameter model HOT 8
- set random seed for a multi-GPU model HOT 1
- Data generation when resuming from a checkpoint HOT 2
- Input size mismatch HOT 1
- Error message of `SequenceTooLongError` HOT 1
- Bug when training encoder-decoder models HOT 1
- Error evaluating LM-prior checkpoint: HOT 1
- Supported SentencePiece parameters HOT 1
- List index out of range in onmt.utils.distributed.all_reduce_and_rescale_tensors:51
- Speech to Text Toy Data Could Not Be Downloaded HOT 3
- Translation API Not Working HOT 1
- How to use Huawei‘s NPU Ascend310 to install OpenNMT-py? HOT 1
- NaN values when training big transformer model HOT 1
- Support for torch 2.2 HOT 5
- Device side assert triggered on AWQ Mistral converted model HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opennmt-py.