Hello, I have a question about training CIFAR-100 about weighted-soft-label-distillation HOT 3 CLOSED

bellymonster commented on June 29, 2024

Hello, I have a question about training CIFAR-100

from weighted-soft-label-distillation.

Comments (3)

DeepLearningHB commented on June 29, 2024

FYI:
Epoch: [28][0/196] Time 0.330 (0.330) Data 0.283 (0.283) Loss 18.4588 (18.4588) Acc@1 75.391 (75.391) Acc@5 95.703 (95.703)
Epoch: [28][100/196] Time 0.067 (0.070) Data 0.002 (0.006) Loss 18.1703 (17.9548) Acc@1 77.344 (77.023) Acc@5 93.750 (95.440)
[Train]* Acc@1 76.274 Acc@5 95.244
Test: [0/79] Time 0.042 (0.068) Loss 21.2584 (18.1365) Acc@1 63.281 (63.281) Acc@5 89.844 (89.844)
Acc@1 61.430 Acc@5 87.290
Epoch: [29][0/196] Time 0.293 (0.293) Data 0.244 (0.244) Loss 17.1295 (17.1295) Acc@1 80.078 (80.078) Acc@5 95.312 (95.312)
Epoch: [29][100/196] Time 0.067 (0.069) Data 0.002 (0.005) Loss 17.7287 (17.6846) Acc@1 79.688 (77.970) Acc@5 94.141 (95.796)
[Train]* Acc@1 45.784 Acc@5 57.934
Test: [0/79] Time 0.042 (0.068) Loss nan (nan) Acc@1 0.000 (0.000) Acc@5 3.125 (3.125)
Acc@1 1.000 Acc@5 5.000
Epoch: [30][0/196] Time 0.332 (0.332) Data 0.286 (0.286) Loss nan (nan) Acc@1 0.781 (0.781) Acc@5 3.125 (3.125)
Epoch: [30][100/196] Time 0.067 (0.070) Data 0.003 (0.006) Loss nan (nan) Acc@1 0.391 (1.002) Acc@5 3.516 (5.105)

from weighted-soft-label-distillation.

woshichase commented on June 29, 2024

Hi, thanks for your attention. We haven't met the loss explosion problem.
Apart from re-checking your training settings, I would suggest you also check if the baseline experiment (without soft loss) meet the same problem. If the baseline runs normally, the abnormity is likely to be caused by the soft loss. Then you can set the alpha (originally is 2.25 for Cifar-100) to smaller values, or check if (1-exp(-Ls/Lt)) is not ranged between (0,1) (usually it's not likely to happen).

from weighted-soft-label-distillation.

DeepLearningHB commented on June 29, 2024

I solved this problem by adding small epsilon to focal_weight : ) It works well now!

from weighted-soft-label-distillation.

Hello, I have a question about training CIFAR-100 about weighted-soft-label-distillation HOT 3 CLOSED

Comments (3)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent