Giter VIP home page Giter VIP logo

Comments (4)

yhhhli avatar yhhhli commented on July 21, 2024 5

Hi,

When we first test our algorithm without weight normalization, we also find that problem. It seems that the gradients of the clipping parameter in several layers will explode suddenly. Then we tried to use small learning rates for the clipping parameter but we found the performance is not good.

Then, we think the problem is the distribution of weights is changing significantly and there are no heuristics that can tell when to increase the LR (to accommodate the shift of the distribution of the weights) or when to decrease the LR (to stabilize the training behavior). Therefore, we come up with a method to normalize weights. Weight normalization is inspired by Batch Normalization in activations, because we find learning the clipping parameter in activations quantization does not have the nan issue.

from apot_quantization.

GendalfSeriyy avatar GendalfSeriyy commented on July 21, 2024

Thanks for the answer!
We also noticed that when quantizing the first and last layer to the same accuracy as the other layers, the network also does not learn. To be more precise, network trains a certain number of epochs, but then the accuracy drops to 10% and no longer grows. Have you carried out such experiments?

from apot_quantization.

yhhhli avatar yhhhli commented on July 21, 2024

I think weight normalization cannot be applied to the last layer because the output of the last layer is the output of the network, without BN to standardize its distribution. For the last layer, maybe you can apply the DoReFa scheme to quantize weights and our APoT quantization to activations.

from apot_quantization.

eeewuh avatar eeewuh commented on July 21, 2024

Thanks for the great work and the clarification on the weight_norm! I want to ask that after applying weight normalization in the real-valued weight, the lr for \alpha should be the same for weight or add some adjustment on the lr and weight_decay on \alpha (like the settings in your commented code (

# customize the lr and wd for clipping thresholds
)?

from apot_quantization.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.