Where is the parameter \gamma about dsn HOT 4 OPEN

s9xie commented on September 13, 2024

Where is the parameter \gamma

from dsn.

Comments (4)

s9xie commented on September 13, 2024

@happynear
This is a good question, and should be a common one. Of course one can tune the gamma based on the validation set, but this is really annoying. We have tried that, but soon we came up with another idea to implement our formulation and avoid overfitting.

So if you look at our experiment configuration files, you can see we adopted an early stopping policy during the training process, i.e, we first train the network with DSN for a number of epochs (which is determined by validation) and we discard all the companion losses and continue to train the network with only the output loss.

The gamma now is implicitly and dynamically determined by the loss value achieved at the time when we early stop, empirically this is essential for DSN to achieve good performance.

from dsn.

zhangliliang commented on September 13, 2024

In @happynear comment, \gamma is setted to prevent the hinge loss to be 0.
However, from my point of view, \gamma is setted to $make$ the hinge loss of hidden layers to be 0(i.e. to vanish the gradient), so as \alpha_m does the same thing.
But I don't know the purpose to vanish the gradient in the paper. Is it for speed up the training process because it can skip part of BP algorithm? @s9xie

from dsn.

s9xie commented on September 13, 2024

@happynear @zhangliliang Sorry yes it is not "preventing hinge to be zero" but vanishing it. I assume it is a typo in original question?

In our paper we have explained that:
"This way, the overall goal of producing good classification of the output layer is not altered and the companion objective just acts as a proxy or regularization."
Intuitively we should emphasize the role of the overall loss during the training, this "early stop" policy can be a good way to avoid over-fitting the lower layers into the local loss.

from dsn.

sh0416 commented on September 13, 2024

@s9xie I am working on implementing your method. So, you mean that you don't explicitly use gamma, right? Actually, I am also curious about the other hyperparameter alpha, which requires exponential search space when layer increases. When I see your paper, you use relatively small architecture, i.e. 3-layer NN. How to tune this hyperparameter?

Thanks,

from dsn.

Where is the parameter \gamma about dsn HOT 4 OPEN

Comments (4)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent