weighted-soft-label-distillation's People
weighted-soft-label-distillation's Issues
Hyper-parameters settings?
Hi, thank u for your nice job! Can you give me advice about how to set hyper-param of temperature coefficient and loss weight?
Assumption 1: a gap between "KD helps calibrate" and "KD reduces variance".
Your work is exciting and inspiring.
But there is a huge gap between "KD helps calibrate" and "KD reduces variance", since it also could be due to bias reduction, the bias between the probability and accuracy like defined in the calibration error.
Actually as defined in Eq. 2 in Guo's calibration paper, the main reason to reduce ECE could be understood as the bias reduction of p, right?
Minor questions on Eq.(2)
Hi, I have read your excellent work several times. The bias-variance idea is very interesting!
However, due to my poor knowledge of "bias/variance" theorem, I found the variance term in Equation (2) hard to understand. E.g.,
How to prove that
I have referred to Heskes's paper and it seems that the derivation relies on the normalization constant Z in Equation (1). It is easy to prove that
if Z in Equation (1) is a constant value. But I still did not find the relationship between this term and Equation (2).
Could you please kindly provide the detailed derivations of the variance term in Equation (2)? Thanks in advance.
KD Loss keeps raising during training
hello, I am using WSLD on my own dataset, however, loss KD loss keeps increasing, is it normal? could you please provide a training log on cifar-100 or imagenet? so we can see a normal training performance of WSLD
Hello, I have a question about training CIFAR-100
HI, I read your paper impressively thank you.
I implemented your method by utilizing your code in here, and tested on CIFAR-100
but, in my case, gradient exploding has been occurred.
It works well in first 15 epochs but, from 16 epoch, both accuracy and loss converge to 0..
Although I adjust learning rate more smaller (0.05 -> 0.01) I cannot solve gradient exploding problem..
How can I solve this?
thank you
缺少dataset文件
由于没有dataset,我尝试复现了一下,但是存在一些细节上的不一致,请问能提供一下dataset文件吗?
The pretrained teacher and hyper-parameters on CIFAR-100
Hi, thanks for the interesting work. I am trying to reproduce the results on CIFAR-100, but failed. I have some questions about the implementation on CIFAR-100. I will appreciate it if you can provide some suggestions. Specifically, is the training loss implementation on CIFAR-100 the same as that on ImageNet, except
Hi, I cannot reproduce your reported performance on CIFAR-100.
Hi there, I'm trying to use your method on CIFAR-100.
However, I cannot reproduce your performance even if I followed your script and hyper-parameter settings.
for instance, ResNet110-ResNet32 pair showed 74.12% on your paper but in my implementation they showed only 72.91.
I was able to reproduce your performance with respect to only resnet56-resnet20 (72.01 / 72.15)
I think it's quite high performance gap between yours and mine.
In addition, your repository only contains ImageNet training script.
If you don't mind uploading CIFAR-100 training script, I may train your method ..
Thanks!
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.