Describe the issue Using a CPU only version of Pytorch. Attem

Hello @matthew-burnard, Could you please add <code class="notranslat

reminder for myself: force to assign fp16 = True, if use_cuda

solved in <a class="issue-link js-issue-link" data-error-text="Failed to load title" d

"AutocastCPU only supports Bfloat16" error when following rnn_reverse tutorial about joeynmt HOT 5 CLOSED

maya-burnard commented on June 13, 2024

"AutocastCPU only supports Bfloat16" error when following rnn_reverse tutorial

from joeynmt.

Comments (5)

may- commented on June 13, 2024

Hello @matthew-burnard,

Could you please add fp16: True in the config file, i.e. configs/transformer_reverse.yaml or configs/rnn_reverse.yaml, and try to run the same train command again?

training:
    [...]
    use_cuda: False
    print_valid_sents: [0, 3, 6]
    keep_best_ckpts: 2
    fp16: True                        # <- add this line!

In my CPU environment, I've confirmed it works with the modification above:

2022-10-27 22:42:32,654 - INFO - joeynmt.training - Train stats:
        device: cpu
        n_gpu: 0
        16-bits training: True
        gradient accumulation: 1
        batch size per device: 10
        effective batch size (w. parallel & accumulation): 10
2022-10-27 22:42:32,654 - INFO - joeynmt.training - EPOCH 1
2022-10-27 22:44:04,839 - INFO - joeynmt.training - Epoch   1, Step:      100, Batch Loss:    52.500000, Batch Acc: 0.072272, Tokens per Sec:      149, Lr: 0.001000
2022-10-27 22:45:38,308 - INFO - joeynmt.training - Epoch   1, Step:      200, Batch Loss:    46.500000, Batch Acc: 0.082955, Tokens per Sec:      151, Lr: 0.001000
2022-10-27 22:47:09,797 - INFO - joeynmt.training - Epoch   1, Step:      300, Batch Loss:    53.500000, Batch Acc: 0.098496, Tokens per Sec:      151, Lr: 0.001000
2022-10-27 22:48:42,891 - INFO - joeynmt.training - Epoch   1, Step:      400, Batch Loss:    63.250000, Batch Acc: 0.103837, Tokens per Sec:      149, Lr: 0.001000
2022-10-27 22:50:16,389 - INFO - joeynmt.training - Epoch   1, Step:      500, Batch Loss:    46.750000, Batch Acc: 0.122228, Tokens per Sec:      151, Lr: 0.001000
2022-10-27 22:51:48,991 - INFO - joeynmt.training - Epoch   1, Step:      600, Batch Loss:    34.250000, Batch Acc: 0.141818, Tokens per Sec:      150, Lr: 0.001000
2022-10-27 22:53:23,885 - INFO - joeynmt.training - Epoch   1, Step:      700, Batch Loss:    46.750000, Batch Acc: 0.171323, Tokens per Sec:      149, Lr: 0.001000
2022-10-27 22:54:56,047 - INFO - joeynmt.training - Epoch   1, Step:      800, Batch Loss:    23.750000, Batch Acc: 0.241879, Tokens per Sec:      148, Lr: 0.001000
2022-10-27 22:56:30,321 - INFO - joeynmt.training - Epoch   1, Step:      900, Batch Loss:    27.625000, Batch Acc: 0.356111, Tokens per Sec:      149, Lr: 0.001000
2022-10-27 22:58:06,413 - INFO - joeynmt.training - Epoch   1, Step:     1000, Batch Loss:    32.750000, Batch Acc: 0.504765, Tokens per Sec:      144, Lr: 0.001000
2022-10-27 22:58:06,413 - INFO - joeynmt.prediction - Predicting 1000 example(s)... (Greedy decoding with min_output_length=1, max_output_length=30, return_prob='none', generate_unk=True, repetition_penalty=-1, no_repeat_ngram_size=-1)
Predicting...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [02:10<00:00,  7.63it/s]
2022-10-27 23:00:17,941 - INFO - joeynmt.metrics - nrefs:1|case:mixed|eff:no|tok:13a|smooth:exp|version:2.3.1
2022-10-27 23:00:17,941 - INFO - joeynmt.prediction - Evaluation result (greedy) bleu:  44.19, loss:  31.47, ppl:   6.88, acc:   0.65, generation: 131.0036[sec], evaluation: 0.5055[sec]
2022-10-27 23:00:17,943 - INFO - joeynmt.training - Hooray! New best validation result [bleu]!
2022-10-27 23:00:17,963 - INFO - joeynmt.training - Example #0
2022-10-27 23:00:17,964 - INFO - joeynmt.training -     Source:     33 9 15 3 14 33 32 42 23 12 14 17 4 35 0 48 46 36 46 27 2 34 35 17 36 39 7 14 9 0
2022-10-27 23:00:17,964 - INFO - joeynmt.training -     Reference:  0 9 14 7 39 36 17 35 34 2 27 46 36 46 48 0 35 4 17 14 12 23 42 32 33 14 3 15 9 33
2022-10-27 23:00:17,964 - INFO - joeynmt.training -     Hypothesis: 0 9 14 7 9 36 17 35 34 2 2 27 12 21 46 0 23 17 4 4 4 4 9
2022-10-27 23:00:17,964 - INFO - joeynmt.training - Example #3
2022-10-27 23:00:17,966 - INFO - joeynmt.training -     Source:     10 43 37 32 6 9 25 36 21 29 16 7 18 27 30 46 37 15 7 48 18
2022-10-27 23:00:17,966 - INFO - joeynmt.training -     Reference:  18 48 7 15 37 46 30 27 18 7 16 29 21 36 25 9 6 32 37 43 10
2022-10-27 23:00:17,966 - INFO - joeynmt.training -     Hypothesis: 18 48 7 15 37 46 30 27 18 13 13 29 29 36 25 9 32 32 43 43
2022-10-27 23:00:17,966 - INFO - joeynmt.training - Example #6
2022-10-27 23:00:17,967 - INFO - joeynmt.training -     Source:     0 38 14 26 20 34 10 36 11 32 29 21
2022-10-27 23:00:17,967 - INFO - joeynmt.training -     Reference:  21 29 32 11 36 10 34 20 26 14 38 0
2022-10-27 23:00:17,967 - INFO - joeynmt.training -     Hypothesis: 21 29 32 11 36 10 34 26 26 14

Sorry, I haven't tested it in CPU before. And we know the tutorial is kind of outdated. We are working on the tutorial update for joey version 2, but it will take more time...

from joeynmt.

may- commented on June 13, 2024

reminder for myself:

force to assign fp16 = True, if use_cuda = False
update reverse task config files to use fp16
update reverse task tutorial

from joeynmt.

maya-burnard commented on June 13, 2024

Confirmed that this solves the problem on my end.

from joeynmt.

may- commented on June 13, 2024

Nice! Thank you for letting us know this. We will fix it.

from joeynmt.

may- commented on June 13, 2024

solved in #202

from joeynmt.

"AutocastCPU only supports Bfloat16" error when following rnn_reverse tutorial about joeynmt HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent