Giter VIP home page Giter VIP logo

Comments (18)

vince62s avatar vince62s commented on June 7, 2024 1

you can git pull and see how it goes.

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

I didn't find the bug yet. ㄒoㄒ Only find v2 add_qkvbias=True, Add_ffnbias = True, v3 add_qkvbias = False, Add_ffnbias = False

from opennmt-py.

vince62s avatar vince62s commented on June 7, 2024

I am fixing this in #2491

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

I am fixing this in #2491

Thanks very much for your reply and effort, please let me know if you fix it. I'm continuing to look for bugs as well.

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

you can git pull and see how it goes.

Actually, I create a new environment (pythhon =3.8, pytorch = 2.0.1, same as before), and pip install from your new source.
Met small error about cuda

/home/zw/OpenNMT-py/onmt/modules/multi_headed_attn.py:481: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:545.)
  attn_output = F.scaled_dot_product_attention(
/home/zw/OpenNMT-py/onmt/modules/multi_headed_attn.py:481: UserWarning: Both fused kernels do not support non-null attn_mask. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:191.)
  attn_output = F.scaled_dot_product_attention(
/home/zw/OpenNMT-py/onmt/modules/multi_headed_attn.py:481: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:547.)
  attn_output = F.scaled_dot_product_attention(
/home/zw/OpenNMT-py/onmt/modules/multi_headed_attn.py:481: UserWarning: Flash attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.h:326.)
  attn_output = F.scaled_dot_product_attention(
Traceback (most recent call last):
  File "/home/zw/anaconda3/envs/opennmt3_attn/bin/onmt_translate", line 33, in <module>
    sys.exit(load_entry_point('OpenNMT-py', 'console_scripts', 'onmt_translate')())
  File "/home/zw/OpenNMT-py/onmt/bin/translate.py", line 57, in main
    translate(opt)
  File "/home/zw/OpenNMT-py/onmt/bin/translate.py", line 37, in translate
    _, _ = translator._translate(
  File "/home/zw/OpenNMT-py/onmt/translate/translator.py", line 399, in _translate
    batch_data = self.translate_batch(batch, attn_debug)
  File "/home/zw/OpenNMT-py/onmt/translate/translator.py", line 786, in translate_batch
    return self._translate_batch_with_strategy(batch, decode_strategy)
  File "/home/zw/OpenNMT-py/onmt/translate/translator.py", line 822, in _translate_batch_with_strategy
    src, enc_final_hs, enc_out, src_len = self._run_encoder(batch)
  File "/home/zw/OpenNMT-py/onmt/translate/translator.py", line 793, in _run_encoder
 

I set the gpt:-1 to use cpu to calculate the attention map again, but the result is the same as before. ( not on on a diagonal line)

from opennmt-py.

vince62s avatar vince62s commented on June 7, 2024

if you use attn_debug: true in your config file, the path should NOT go through line 481 in multi_headed_attn.py

for me it's printing the matrix but I did not visualize it in maps.

from opennmt-py.

vince62s avatar vince62s commented on June 7, 2024

one example:

[2023-10-18 17:35:40,089 INFO] ▁Howeve , ▁follow ▁the ▁recent ▁murder ▁of ▁Austra ▁travel ▁agent ▁Michel le ▁Smith ▁in ▁Phuket , ▁Thaila ▁may ▁also ▁be ▁lookin ▁to ▁repair ▁its ▁bat ter ed ▁touris ▁image , ▁leadin ▁to ▁an ▁a c qui t tal .
▁Nach 0.0859985 0.0413208 0.0749512 0.0071526 0.0626831 0.0106964 0.0012608 0.0289917 0.0087433 0.0005326 0.0179901 0.0001922 0.0006528 0.0029335 0.0112305 0.0160065 0.4006348 0.0574036 0.0296478 0.0098724 0.0116959 0.0037117 0.0045128 0.0263977 0.0007720 0.0001991 0.0003815 0.0046959 0.0015087 0.0041008 0.0016375
▁der 0.0165253 0.0081329 0.0267792 0.0301819 0.0224915 0.2446289 0.0090866 0.0593567 0.0060272 0.0051193 0.0680542 0.0002248 0.0012150 0.0047913 0.0470886 0.0227661 0.0820312 0.0078964 0.0010853 0.0005684 0.0021362 0.0008941 0.0097122 0.0037022 0.0005226 0.0002301 0.0003335 0.0028038 0.0015602 0.0026112 0.0006208
▁jüngste 0.0054321 0.0037117 0.0093155 0.0209808 0.0087891 0.6967773 0.0193787 0.0148010 0.0305176 0.0808105 0.0125046 0.0001609 0.0013285 0.0178070 0.0025959 0.0055618 0.0034389 0.0012836 0.0005956 0.0002749 0.0006208 0.0005350 0.0125656 0.0004704 0.0004418 0.0001382 0.0004015 0.0052605 0.0091476 0.0008526 0.0003960
n 0.0010328 0.0008292 0.0004218 0.0001789 0.0002055 0.0015440 0.0002787 0.0030403 0.0016813 0.0002389 0.0020466 0.0000414 0.0003743 0.0003228 0.0046577 0.0039749 0.0064545 0.0007787 0.0004127 0.0002192 0.0011396 0.0002778 0.0009475 0.0007772 0.0002218 0.0000331 0.0001452 0.0008178 0.0006871 0.0010452 0.0002398
▁Er 0.0007892 0.0013170 0.0026932 0.0084305 0.0033512 0.7421875 0.0217285 0.0066109 0.0390320 0.1056519 0.0054817 0.0001131 0.0006909 0.0241852 0.0012255 0.0027561 0.0008450 0.0006638 0.0001560 0.0001645 0.0006657 0.0003746 0.0073967 0.0001256 0.0002594 0.0001361 0.0002525 0.0016146 0.0074692 0.0003622 0.0002594
mord 0.0007887 0.0030155 0.0048904 0.0126495 0.0007138 0.6713867 0.0455933 0.0021286 0.0024586 0.1678467 0.0014791 0.0001935 0.0001980 0.0079880 0.0000440 0.0070572 0.0002542 0.0007067 0.0001779 0.0006466 0.0002412 0.0018854 0.0072212 0.0002315 0.0013638 0.0008001 0.0006576 0.0002145 0.0020046 0.0010586 0.0001594
ung 0.0012188 0.0014143 0.0005441 0.0003843 0.0001823 0.0011406 0.0010624 0.0012712 0.0003500 0.0003798 0.0007195 0.0000673 0.0004013 0.0005999 0.0007949 0.0068398 0.0007615 0.0006204 0.0005713 0.0005255 0.0007777 0.0008206 0.0004485 0.0008445 0.0003242 0.0001042 0.0004051 0.0002396 0.0002861 0.0021477 0.0001315
▁des 0.0007930 0.0030632 0.0007038 0.0005856 0.0003963 0.0022011 0.0020294 0.1138306 0.0132675 0.0063820 0.0977173 0.0002357 0.0035858 0.0014009 0.0276489 0.0189819 0.0217590 0.0013418 0.0007973 0.0004430 0.0040016 0.0007944 0.0011463 0.0014257 0.0002353 0.0000500 0.0001875 0.0013151 0.0004439 0.0017757 0.0001558
▁aus 0.0001901 0.0004411 0.0001414 0.0004246 0.0001154 0.0352783 0.0022945 0.0039444 0.8784180 0.0325623 0.0134430 0.0001414 0.0022411 0.0005507 0.0004609 0.0012980 0.0009642 0.0000862 0.0002782 0.0000227 0.0000595 0.0000373 0.0004365 0.0001119 0.0001292 0.0000083 0.0000620 0.0115204 0.0026150 0.0002112 0.0000669
t 0.0006952 0.0057983 0.0015392 0.0033035 0.0022316 0.0117416 0.0206909 0.1164551 0.0869751 0.0442200 0.0244446 0.0004168 0.0054474 0.0023422 0.0092621 0.0352478 0.0300751 0.0061684 0.0017004 0.0005555 0.0001525 0.0010481 0.0016546 0.0027657 0.0014601 0.0001019 0.0007272 0.0097504 0.0032158 0.0037479 0.0002416
ral 0.0002061 0.0010529 0.0001287 0.0002878 0.0000911 0.0004544 0.0027370 0.0082321 0.0006528 0.0002832 0.0004997 0.0000581 0.0003412 0.0006051 0.0034847 0.0115051 0.0027981 0.0002861 0.0001154 0.0001605 0.0000146 0.0002358 0.0001343 0.0004034 0.0000476 0.0000387 0.0000864 0.0002295 0.0000517 0.0007749 0.0000222
ischen 0.0010996 0.0011530 0.0002440 0.0002328 0.0002596 0.0010004 0.0014763 0.0018587 0.0022316 0.0008483 0.0008683 0.0000640 0.0003564 0.0004997 0.0006504 0.0061798 0.0011349 0.0012150 0.0008049 0.0003297 0.0003128 0.0004168 0.0002297 0.0013838 0.0002059 0.0000590 0.0002407 0.0006075 0.0004604 0.0030346 0.0002059
▁Reisebüro 0.0000035 0.0000150 0.0000053 0.0000269 0.0000136 0.0022507 0.0001953 0.0002052 0.9721680 0.0127258 0.0002309 0.0000219 0.0004015 0.0001224 0.0000547 0.0000329 0.0000476 0.0000045 0.0000177 0.0000026 0.0000031 0.0000029 0.0000503 0.0000044 0.0000129 0.0000017 0.0000032 0.0094604 0.0012093 0.0000175 0.0000119
s 0.0025311 0.0042610 0.0015135 0.0009851 0.0012169 0.0048103 0.0027599 0.0170898 0.0487976 0.0184631 0.0412292 0.0012627 0.0049362 0.0014591 0.0105743 0.0177002 0.0117645 0.0026226 0.0026875 0.0008326 0.0044174 0.0014706 0.0030785 0.0031452 0.0012169 0.0001462 0.0004294 0.0039330 0.0027580 0.0051422 0.0010757
▁Michel 0.0002379 0.0017462 0.0003862 0.0006685 0.0005007 0.0125046 0.0020695 0.0419312 0.0675049 0.0230865 0.4025879 0.0064125 0.1013794 0.0021515 0.0755005 0.0099411 0.0419312 0.0026417 0.0019274 0.0003743 0.0040474 0.0006723 0.0037670 0.0005488 0.0001751 0.0000721 0.0000657 0.0014706 0.0007801 0.0009084 0.0003388
le 0.0015497 0.0028629 0.0008440 0.0003760 0.0001333 0.0002794 0.0027294 0.0021706 0.0010281 0.0003471 0.0014811 0.0002249 0.0026150 0.0003040 0.0008426 0.0184479 0.0013275 0.0014915 0.0010290 0.0014639 0.0016384 0.0009985 0.0001712 0.0013151 0.0003917 0.0000668 0.0004075 0.0000847 0.0000927 0.0054398 0.0005212
▁Smith 0.0000983 0.0003881 0.0000415 0.0000544 0.0000206 0.0003023 0.0004036 0.0018778 0.0251007 0.0020905 0.0089417 0.0002304 0.0873413 0.0003276 0.0020981 0.0051155 0.0023212 0.0001991 0.0002565 0.0001134 0.0006962 0.0002468 0.0003352 0.0000741 0.0000099 0.0000026 0.0000229 0.0005388 0.0001798 0.0003412 0.0000533
▁in 0.0044098 0.0023251 0.0013466 0.0002215 0.0003569 0.0023537 0.0005231 0.0117569 0.0024872 0.0011492 0.0108566 0.0010490 0.0009871 0.0014648 0.0158081 0.0132751 0.0025749 0.0030823 0.0017653 0.0012703 0.0747681 0.0026035 0.0094757 0.0005240 0.0004458 0.0000748 0.0001910 0.0007095 0.0011787 0.0019236 0.0011749
▁Phuket 0.0008979 0.0022869 0.0004442 0.0003762 0.0003319 0.0025139 0.0004528 0.0169678 0.0150528 0.0010405 0.0316772 0.0001663 0.0027542 0.0018349 0.4785156 0.0150375 0.0530701 0.0017824 0.0016146 0.0005326 0.0025558 0.0006175 0.0032310 0.0009012 0.0001982 0.0000542 0.0001032 0.0051117 0.0006824 0.0010843 0.0002930
▁könnte 0.0004787 0.0004864 0.0027790 0.0000560 0.0001649 0.0100021 0.0001402 0.0006685 0.0050812 0.0030670 0.0002706 0.0000681 0.0000150 0.0002825 0.0002749 0.0007539 0.0022984 0.0054207 0.0017738 0.0014210 0.7744141 0.0032940 0.0764160 0.0003395 0.0020218 0.0000910 0.0001603 0.0018940 0.0210724 0.0005684 0.0046158
▁Thailand 0.0005946 0.0014944 0.0003960 0.0001021 0.0001647 0.0017452 0.0000532 0.0110397 0.0074272 0.0004871 0.0097275 0.0000372 0.0008616 0.0003238 0.0301971 0.0068436 0.8095703 0.0007310 0.0019569 0.0003219 0.0017023 0.0001878 0.0014420 0.0011787 0.0000848 0.0000095 0.0000278 0.0020370 0.0004897 0.0003183 0.0001566
▁jedoch 0.0061684 0.0079803 0.0022430 0.0006428 0.0017548 0.0035858 0.0004854 0.0050659 0.0050621 0.0003700 0.0053635 0.0002059 0.0000364 0.0012445 0.0050201 0.0270691 0.0300140 0.0049553 0.0191650 0.0051117 0.0410461 0.0054169 0.0428162 0.0389099 0.0042381 0.0012035 0.0015125 0.0239410 0.0080872 0.0088043 0.0012836
▁auch 0.0081406 0.0138092 0.0049934 0.0012779 0.0028419 0.0065384 0.0010071 0.0034561 0.0022163 0.0005760 0.0050926 0.0005407 0.0000370 0.0027313 0.0053253 0.0477295 0.0214233 0.0094299 0.0121765 0.0066757 0.0598450 0.0104446 0.0513306 0.0444336 0.0054741 0.0024586 0.0015926 0.0182495 0.0090027 0.0133972 0.0023956
▁versuchen 0.0035515 0.0113297 0.0027599 0.0016737 0.0037003 0.0257874 0.0020962 0.0019531 0.0119400 0.0006895 0.0019264 0.0009823 0.0000482 0.0050163 0.0019121 0.0383606 0.0112228 0.0101166 0.0162811 0.0087280 0.0687256 0.0205688 0.3723145 0.0484009 0.0065422 0.0072556 0.0023460 0.0641479 0.0214844 0.0112000 0.0023174
, 0.0022259 0.0052872 0.0007224 0.0003748 0.0005550 0.0028992 0.0005422 0.0047607 0.0039825 0.0003059 0.0048599 0.0000482 0.0000993 0.0006962 0.0032558 0.0206451 0.0242767 0.0014057 0.0050926 0.0010843 0.0104141 0.0034332 0.0513000 0.0290222 0.0045128 0.0023575 0.0022316 0.0335693 0.0117798 0.0079269 0.0007353
▁sein 0.0020504 0.0070724 0.0006337 0.0005155 0.0004134 0.0021458 0.0003884 0.0037556 0.0016870 0.0001005 0.0044899 0.0000210 0.0000429 0.0002546 0.0025463 0.0337524 0.0296478 0.0028934 0.0061531 0.0012007 0.0117416 0.0057297 0.1265869 0.0770874 0.0118866 0.0031147 0.0030155 0.0581970 0.0328674 0.0134048 0.0009785
▁zer 0.0000677 0.0004997 0.0001374 0.0001329 0.0000210 0.0076866 0.0004785 0.0000558 0.0024471 0.0010509 0.0000854 0.0000038 0.0000005 0.0000395 0.0000032 0.0004692 0.0000116 0.0001276 0.0005136 0.0000857 0.0032139 0.0006738 0.0197449 0.0011644 0.1105347 0.0120697 0.0052719 0.1872559 0.6191406 0.0015354 0.0007987
schlagen 0.0005274 0.0035038 0.0001217 0.0002179 0.0000175 0.0037060 0.0015402 0.0000867 0.0000440 0.0001144 0.0001802 0.0000314 0.0000053 0.0000995 0.0000575 0.0075836 0.0000300 0.0012264 0.0009747 0.0002499 0.0006776 0.0049973 0.0038891 0.0031700 0.5332031 0.0704956 0.0296783 0.0129089 0.2128906 0.0172424 0.0002279
es 0.0016747 0.0010757 0.0002573 0.0001006 0.0000851 0.0007548 0.0002773 0.0008502 0.0012541 0.0003057 0.0004854 0.0000122 0.0000439 0.0002031 0.0006456 0.0050163 0.0016079 0.0016460 0.0007696 0.0004570 0.0032063 0.0007477 0.0019426 0.0027256 0.0016041 0.0002284 0.0008845 0.0059509 0.0044899 0.0051041 0.0008984
▁touristis 0.0000222 0.0001825 0.0000159 0.0000277 0.0000017 0.0024815 0.0000647 0.0000084 0.0258179 0.0021305 0.0000588 0.0000013 0.0000041 0.0000296 0.0000054 0.0001688 0.0000145 0.0000191 0.0002563 0.0000427 0.0003541 0.0001367 0.0086136 0.0002792 0.0031872 0.0003688 0.0006223 0.4084473 0.5327148 0.0003748 0.0001888
s 0.0014601 0.0013714 0.0001988 0.0000955 0.0000462 0.0007925 0.0001900 0.0009766 0.0007033 0.0007143 0.0006208 0.0000103 0.0000480 0.0001242 0.0005674 0.0038872 0.0011835 0.0006456 0.0008502 0.0005260 0.0017004 0.0009766 0.0023918 0.0057907 0.0014858 0.0003152 0.0006924 0.0034714 0.0092621 0.0046768 0.0010662
▁Image 0.0002882 0.0009017 0.0001315 0.0001738 0.0000147 0.0080490 0.0007572 0.0001887 0.0080872 0.0133286 0.0001597 0.0000077 0.0000141 0.0002508 0.0000563 0.0011005 0.0001000 0.0001731 0.0011559 0.0007029 0.0010099 0.0015516 0.0103378 0.0060234 0.0093689 0.0032635 0.0036449 0.1153564 0.7319336 0.0040779 0.0032387
▁zu 0.0060768 0.0024776 0.0008850 0.0007381 0.0003514 0.0015354 0.0005293 0.0022354 0.0002458 0.0003533 0.0021725 0.0000204 0.0000738 0.0004213 0.0009513 0.0096512 0.0013952 0.0011473 0.0028515 0.0035095 0.0068817 0.0167542 0.1679688 0.0269318 0.0169678 0.0168304 0.0089722 0.0312500 0.0610657 0.0207520 0.0065918
▁reparier 0.0003493 0.0004835 0.0005031 0.0004964 0.0001776 0.0033207 0.0002384 0.0000968 0.0002446 0.0010328 0.0001121 0.0000041 0.0000156 0.0001498 0.0000575 0.0003972 0.0002400 0.0004175 0.0004630 0.0003424 0.0041237 0.0047340 0.5527344 0.0026855 0.0159302 0.0042839 0.0023270 0.0413208 0.3300781 0.0016356 0.0054741
en 0.0007606 0.0007167 0.0001259 0.0000694 0.0000634 0.0002174 0.0001144 0.0008335 0.0003395 0.0000685 0.0005674 0.0000122 0.0001410 0.0001109 0.0010643 0.0035477 0.0022926 0.0005260 0.0005426 0.0004063 0.0009871 0.0004704 0.0008264 0.0009203 0.0002230 0.0000517 0.0001949 0.0008545 0.0004385 0.0023403 0.0002770
, 0.0010061 0.0015659 0.0002911 0.0001829 0.0000637 0.0003855 0.0002708 0.0011396 0.0003664 0.0001475 0.0016785 0.0000461 0.0001807 0.0001645 0.0014610 0.0099945 0.0060234 0.0011110 0.0020657 0.0014458 0.0077209 0.0026283 0.0162201 0.0049744 0.0007911 0.0004959 0.0009928 0.0021629 0.0062714 0.0172272 0.0095596
▁was 0.0018272 0.0026932 0.0006180 0.0003705 0.0000315 0.0009723 0.0003407 0.0014820 0.0006046 0.0001621 0.0017519 0.0000185 0.0001051 0.0003448 0.0008888 0.0216827 0.0186920 0.0011978 0.0021858 0.0017738 0.0078354 0.0026627 0.0220642 0.0116043 0.0020142 0.0008283 0.0013266 0.0050659 0.0173645 0.0428467 0.0523987
▁zu 0.0013704 0.0014372 0.0003417 0.0001783 0.0000992 0.0015345 0.0000826 0.0039444 0.0014696 0.0000422 0.0083466 0.0000682 0.0000954 0.0001580 0.0044098 0.0106277 0.0101395 0.0003068 0.0008779 0.0003753 0.0014515 0.0006199 0.0142441 0.0067787 0.0026131 0.0008526 0.0005345 0.0251007 0.0104065 0.0221252 0.0429993
▁einem 0.0002360 0.0004559 0.0000472 0.0000460 0.0000152 0.0023575 0.0000835 0.0020924 0.0036621 0.0001019 0.0026035 0.0000177 0.0000482 0.0001273 0.0041695 0.0024986 0.0019474 0.0002196 0.0001051 0.0001900 0.0008941 0.0002787 0.0159454 0.0012779 0.0021420 0.0003543 0.0003114 0.0776978 0.0096512 0.0068398 0.0185547
▁Frei 0.0000033 0.0000148 0.0000024 0.0000058 0.0000009 0.0004978 0.0000075 0.0000789 0.0008879 0.0000491 0.0000111 0.0000024 0.0000016 0.0000066 0.0000113 0.0000149 0.0000088 0.0000023 0.0000057 0.0000086 0.0000394 0.0000093 0.0007944 0.0000212 0.0004785 0.0001886 0.0000308 0.0157623 0.0066757 0.0001007 0.0020752
spruch 0.0001540 0.0009656 0.0000727 0.0001253 0.0000011 0.0008078 0.0001842 0.0001998 0.0002069 0.0009112 0.0001996 0.0000659 0.0001110 0.0002193 0.0011625 0.0011196 0.0001373 0.0000586 0.0001537 0.0002116 0.0000296 0.0003152 0.0007091 0.0004272 0.0024834 0.0022049 0.0005860 0.0044975 0.0231323 0.0015926 0.0053062
▁führen 0.0007825 0.0029373 0.0005622 0.0002073 0.0000259 0.0006423 0.0008011 0.0011740 0.0006576 0.0001522 0.0004501 0.0000645 0.0000592 0.0005441 0.0006056 0.0186310 0.0087509 0.0014277 0.0036316 0.0033855 0.0281372 0.0072174 0.0315247 0.0115585 0.0026760 0.0005536 0.0013494 0.0012693 0.0072746 0.0510254 0.0285339
▁könnte 0.0016336 0.0018511 0.0009007 0.0001321 0.0001057 0.0006022 0.0003371 0.0023041 0.0008903 0.0000521 0.0002730 0.0000247 0.0000696 0.0002135 0.0008593 0.0073891 0.0042305 0.0009794 0.0014000 0.0006204 0.0047340 0.0008492 0.0023003 0.0036812 0.0016775 0.0002890 0.0007701 0.0019569 0.0017786 0.0164795 0.0072136
. 0.0054817 0.0072899 0.0011044 0.0006199 0.0004292 0.0005994 0.0010157 0.0016489 0.0011578 0.0003247 0.0009160 0.0001694 0.0005522 0.0011091 0.0046844 0.0201721 0.0060883 0.0027313 0.0054550 0.0051422 0.0069275 0.0040436 0.0061989 0.0059586 0.0008211 0.0004394 0.0013885 0.0024509 0.0022659 0.0121307 0.0024033
0.0185699 0.0157471 0.0041237 0.0035992 0.0015364 0.0017004 0.0038853 0.0041771 0.0031700 0.0008717 0.0030422 0.0005856 0.0022621 0.0028591 0.0070610 0.0492859 0.0418396 0.0187225 0.0097122 0.0091171 0.0036640 0.0073280 0.0026073 0.0167389 0.0016975 0.0006580 0.0030022 0.0049171 0.0034142 0.0276947 0.0044022

from opennmt-py.

vince62s avatar vince62s commented on June 7, 2024

also please install torch 2.1.0 so that you don't get the error when not using attn_debug

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

also please install torch 2.1.0 so that you don't get the error when not using attn_debug

also please install torch 2.1.0 so that you don't get the error when not using attn_debug

Thanks for your remind, I' ll try

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

In addition, I did some experiments with controlled variables (Same dataset, Same v2 environment, Same steps, Same config except vocabs) and found may be related to the special tokens (averyunlikelytoken0, averyunlikelytoken1, averyunlikelytoken2) in v3.

Attention map of Vocab_1 (130 single characters) at 20000 steps :
image

Attention map of Vocab_2(130 single characters + averyunlikelytoken0, averyunlikelytoken1, averyunlikelytoken2 ) at 20000 step :
image

The attention weights are clearly dispersed in the case of adding these three new tokens.

Can you explain the function of these tokens(averyunlikelytoken0, averyunlikelytoken1, averyunlikelytoken2)?

from opennmt-py.

vince62s avatar vince62s commented on June 7, 2024

IIRC those are only to pad vocab_size to a multiple of X (8 most likely). If you don't set this setting, it won't happen.
But please let's focus on one issue at a time. did it fix your attn_debug issue ?

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

if you use attn_debug: true in your config file, the path should NOT go through line 481 in multi_headed_attn.py

for me it's printing the matrix but I did not visualize it in maps.

Yes, but we can take a short sentence for example and look the data matrix like these.
image

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

IIRC those are only to pad vocab_size to a multiple of X (8 most likely). If you don't set this setting, it won't happen. But please let's focus on one issue at a time. did it fix your attn_debug issue ?

The attn_debug is still not fixed.

Actually, I find the introduction of these special tokens from v3 in v2 envrioment will lead to the loss of diagonal attention.

For a more clearly Example,

Attention map of predicting a sentence by a model trained with Vocab_1 (130 single characters) at 10000 steps :
image

Attention map of predicting a sentence by a model trained with Vocab_2(130 single characters + <blank> + <s> + </s> + averyunlikelytoken0 + averyunlikelytoken1+ averyunlikelytoken2 ) at 10000 step :
image

from opennmt-py.

vince62s avatar vince62s commented on June 7, 2024

turn off vocab_size_multiple: 1 (default is 8) and see how it goes.

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

turn off vocab_size_multiple: 1 (default is 8) and see how it goes.

Okay, I'll try. Thanks.

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

turn off vocab_size_multiple: 1 (default is 8) and see how it goes.

I did some further test, it does have something to do with that setting. The diagonal attention is missing!

image

Expect any solution without retraining, as the training cycle for our project can take months.

Looking forward to your reply.

Many Many Thanks!

from opennmt-py.

vince62s avatar vince62s commented on June 7, 2024

I don't know what you are printing in your graphs and how you are actually doing it so you will have to dig and figure out yourself.
The only thing I can tell is that this setting is ONLY padding the vocab to a multiple of 8 by adding vocab items at the end here:
https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/inputters/inputter.py#L52
Now how you retrieve attentions and so on, is beyond my knowledge so dig in your custom code.

from opennmt-py.

zw-SIMM avatar zw-SIMM commented on June 7, 2024

I don't know what you are printing in your graphs and how you are actually doing it so you will have to dig and figure out yourself. The only thing I can tell is that this setting is ONLY padding the vocab to a multiple of 8 by adding vocab items at the end here: https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/inputters/inputter.py#L52 Now how you retrieve attentions and so on, is beyond my knowledge so dig in your custom code.

Okay, Thanks again.

from opennmt-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.