Comments (4)
Hi,
When we first test our algorithm without weight normalization, we also find that problem. It seems that the gradients of the clipping parameter in several layers will explode suddenly. Then we tried to use small learning rates for the clipping parameter but we found the performance is not good.
Then, we think the problem is the distribution of weights is changing significantly and there are no heuristics that can tell when to increase the LR (to accommodate the shift of the distribution of the weights) or when to decrease the LR (to stabilize the training behavior). Therefore, we come up with a method to normalize weights. Weight normalization is inspired by Batch Normalization in activations, because we find learning the clipping parameter in activations quantization does not have the nan issue.
from apot_quantization.
Thanks for the answer!
We also noticed that when quantizing the first and last layer to the same accuracy as the other layers, the network also does not learn. To be more precise, network trains a certain number of epochs, but then the accuracy drops to 10% and no longer grows. Have you carried out such experiments?
from apot_quantization.
I think weight normalization cannot be applied to the last layer because the output of the last layer is the output of the network, without BN to standardize its distribution. For the last layer, maybe you can apply the DoReFa scheme to quantize weights and our APoT quantization to activations.
from apot_quantization.
Thanks for the great work and the clarification on the weight_norm! I want to ask that after applying weight normalization in the real-valued weight, the lr for \alpha should be the same for weight or add some adjustment on the lr and weight_decay on \alpha (like the settings in your commented code (
APoT_Quantization/ImageNet/main.py
Line 181 in a818104
from apot_quantization.
Related Issues (20)
- Why the size of Res20_2bit is the same as Res20_32bit? HOT 1
- uniform_quantization HOT 1
- Size and accuracy HOT 5
- Lightning Integration
- Technical details HOT 2
- about uniform quantization HOT 2
- about CIFAR10 part main.py resume function HOT 2
- the precision a4w4 of training MobilenetV2 is nearly 0 HOT 4
- a4w4 Resnet18 is 1.7% lower than that in the paper?
- The MUL unit of APOT HOT 1
- Need Suggestion
- Some results about resnet20 on cifar10
- quantization bit of apot HOT 2
- about training time
- difference between paper and code in quan_layer HOT 2
- calculate MAC
- Hyper-Params on MobileNet_V2 HOT 1
- The migration of this QAT function? HOT 5
- NaN loss for 8bit HOT 1
- Differences between quant_layer.py HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from apot_quantization.