If I wanted to use Quantization Aware Training (QAT) in conjunction with structured ha

Quantization Aware Training + FeatherMap about feathermap HOT 8 OPEN

varun19299 commented on July 19, 2024

Quantization Aware Training + FeatherMap

from feathermap.

Comments (8)

phelps-matthew commented on July 19, 2024

Yep your intuition is correct - ideally one would aim to quantize during training (or emulate). I would surmise that post-quantization would perform poorly on a model that's been compressed via structured multi-hashing (SMH) due to the nonlinear mapping of weights to the reduced weight matrices (V1 and V2).

In theory, there shouldn't be any issue doing both SMH and quantization - however I haven't dug into the implementation internals of QAT to confirm whether its compatible with FeatherMap out the box. Let me know how it goes!

from feathermap.

phelps-matthew commented on July 19, 2024

On further thought, I think one should apply QAT after wrapping in FeatherMap. FeatherNet as a layer will just expose self.V1 and self.V2 as the weights to be updated, which should then be quantized (or emulated quantization) and then trained. E.g, something like

base_model = ResNet50()
f_model = FeatherNet(base_model, compress=0.10)
# now apply quantization awareness to f_model and thus V1 and V2
# train
# evaluate and convert

from feathermap.

varun19299 commented on July 19, 2024

Thank you for your reply!

How about evaluation: what order would it follow?

from feathermap.

varun19299 commented on July 19, 2024

I'll also try comparing this method to iterative pruning ("To prune or not to prune", Zhu et al) and some dynamic sparse training techniques (RiGL, Evci et al. 2020).

from feathermap.

phelps-matthew commented on July 19, 2024

Thank you for your reply!

How about evaluation: what order would it follow?

For accuracy and other metric evaluation you can make use of the GPU if you keep it in f_model.eval() mode. However, if you want to benchmark inference time, then you'd want to use f_model.deploy(). Presumably one would only need to actually go to reduced precision when deploying - if QAT can continue to emulate quantization during evaluation, I'd do that.

from feathermap.

phelps-matthew commented on July 19, 2024

I'll also try comparing this method to iterative pruning ("To prune or not to prune", Zhu et al) and some dynamic sparse training techniques (RiGL, Evci et al. 2020).

Awesome. One of the cool things about FeatherMap is the ability to compound other compression methods. I'm very curious to see what kind of performance you might get compared to 'unstacked' compression methods.

from feathermap.

varun19299 commented on July 19, 2024

For accuracy and other metric evaluation you can make use of the GPU if you keep it in f_model.eval() mode. However, if you want to benchmark inference time, then you'd want to use f_model.deploy(). Presumably one would only need to actually go to reduced precision when deploying - if QAT can continue to emulate quantization during evaluation, I'd do that.

I'm actually just interested in compressing the weights V_1, V_2. So I don't need to worry about eval? (model.state_dict should have V_1, V_2?).

from feathermap.

phelps-matthew commented on July 19, 2024

Yes, the state_dict will save V1 and V2 as the weights, as well as the batchnorm layers

from feathermap.

Quantization Aware Training + FeatherMap about feathermap HOT 8 OPEN

Comments (8)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent