Hi, I got the same issue as <a class="issue-link js-issue-link" data-error-text="Faile

Always got zeros during validation about human-pose-estimation.pytorch HOT 9 CLOSED

ysCatherine commented on September 28, 2024

Always got zeros during validation

from human-pose-estimation.pytorch.

Comments (9)

leoxiaobin commented on September 28, 2024 1

Please check if you disable cudnn of bn correctly, or you can disable cudnn globally using CUDNN.ENABLED=False in your yaml config file.

from human-pose-estimation.pytorch.

leoxiaobin commented on September 28, 2024

Our code are tested using pytorch0.4.0 or 0.4.1, I did not test it using 0.5.0. Please try pytorch 0.4.0 or 0.4.1, and following the readme to disable cudnn.

from human-pose-estimation.pytorch.

ysCatherine commented on September 28, 2024

I have changed to pytorch0.4.0, while the result still seems not right as below:

2018-10-16 20:17:50,715 Epoch: [139][0/696] Time 1.198s (1.198s) Speed 26.7 samples/s Data 0.906s (0.906s) Loss 0.00049 (0.00049) Accuracy 0.856 (0.856)
2018-10-16 20:18:19,306 Epoch: [139][100/696] Time 0.286s (0.295s) Speed 111.8 samples/s Data 0.000s (0.009s) Loss 0.00045 (0.00048) Accuracy 0.859 (0.854)
2018-10-16 20:18:47,906 Epoch: [139][200/696] Time 0.285s (0.290s) Speed 112.3 samples/s Data 0.000s (0.005s) Loss 0.00046 (0.00048) Accuracy 0.848 (0.851)
2018-10-16 20:19:16,534 Epoch: [139][300/696] Time 0.289s (0.289s) Speed 110.6 samples/s Data 0.000s (0.003s) Loss 0.00056 (0.00048) Accuracy 0.815 (0.851)
2018-10-16 20:19:45,108 Epoch: [139][400/696] Time 0.285s (0.288s) Speed 112.3 samples/s Data 0.000s (0.003s) Loss 0.00046 (0.00048) Accuracy 0.854 (0.852)
2018-10-16 20:20:13,669 Epoch: [139][500/696] Time 0.285s (0.288s) Speed 112.2 samples/s Data 0.000s (0.002s) Loss 0.00054 (0.00048) Accuracy 0.805 (0.852)
2018-10-16 20:20:42,283 Epoch: [139][600/696] Time 0.285s (0.287s) Speed 112.3 samples/s Data 0.000s (0.002s) Loss 0.00051 (0.00048) Accuracy 0.837 (0.852)
2018-10-16 20:21:10,280 Test: [0/93] Time 1.022 (1.022) Loss 0.0015 (0.0015) Accuracy 0.014 (0.014)
2018-10-16 20:21:24,151 | Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | [email protected] |
2018-10-16 20:21:24,152 |---|---|---|---|---|---|---|---|---|---|
2018-10-16 20:21:24,152 | 256x256_pose_resnet_50_d256d256d256 | 0.000 | 0.238 | 0.222 | 1.598 | 6.941 | 0.625 | 0.142 | 1.452 | 0.075 |
2018-10-16 20:21:24,155 => saving checkpoint to output/mpii/pose_resnet_50/256x256_d256x3_adam_lr1e-3
2018-10-16 20:21:24,917 saving final model state to output/mpii/pose_resnet_50/256x256_d256x3_adam_lr1e-3/final_state.pth.tar

Is it also related to the version of python and opencv? Or something else?

from human-pose-estimation.pytorch.

ysCatherine commented on September 28, 2024

After setting CUDNN.ENABLED=False, it seems everything works well now.
Test: [0/93] Time 1.235 (1.235) Loss 0.0004 (0.0004) Accuracy 0.934 (0.934)

Arch	Head	Shoulder	Elbow	Wrist	Hip	Knee	Ankle	Mean	[email protected]
256x256_pose_resnet_50_d256d256d256	96.623	94.803	88.018	81.772	87.017	82.450	77.869	87.525	33.690
=> saving checkpoint to output/mpii/pose_resnet_50/256x256_d256x3_adam_lr1e-3

But without cudnn, the training/testing process becomes very slow. Is there a Dockerfile which can reproduce the environment of your experiments? If there is, cloud you share it to me? Thank you : )

from human-pose-estimation.pytorch.

leoxiaobin commented on September 28, 2024

so I guesses that you may not disable cudnn of bn correctly. Please follow our steps to do it.

from human-pose-estimation.pytorch.

leoxiaobin commented on September 28, 2024

In the feature, I will add the Dockerfile.

from human-pose-estimation.pytorch.

ysCatherine commented on September 28, 2024

I have changed the line below with "False":
return torch.batch_norm(
input, weight, bias, running_mean, running_var,
training, momentum, eps, False
)
and set the environment variable PYTORCH, is there anything I missed?

from human-pose-estimation.pytorch.

leoxiaobin commented on September 28, 2024

Please make sure that the runtime pytorch is what you changed, not other pytorch version.

from human-pose-estimation.pytorch.

ysCatherine commented on September 28, 2024

Thank you for your patient reply. It's helpful. I finally found the problem.

from human-pose-estimation.pytorch.

Always got zeros during validation about human-pose-estimation.pytorch HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent