Giter VIP home page Giter VIP logo

Comments (8)

phelps-matthew avatar phelps-matthew commented on July 19, 2024

Yep your intuition is correct - ideally one would aim to quantize during training (or emulate). I would surmise that post-quantization would perform poorly on a model that's been compressed via structured multi-hashing (SMH) due to the nonlinear mapping of weights to the reduced weight matrices (V1 and V2).

In theory, there shouldn't be any issue doing both SMH and quantization - however I haven't dug into the implementation internals of QAT to confirm whether its compatible with FeatherMap out the box. Let me know how it goes!

from feathermap.

phelps-matthew avatar phelps-matthew commented on July 19, 2024

On further thought, I think one should apply QAT after wrapping in FeatherMap. FeatherNet as a layer will just expose self.V1 and self.V2 as the weights to be updated, which should then be quantized (or emulated quantization) and then trained. E.g, something like

base_model = ResNet50()
f_model = FeatherNet(base_model, compress=0.10)
# now apply quantization awareness to f_model and thus V1 and V2
# train
# evaluate and convert

from feathermap.

varun19299 avatar varun19299 commented on July 19, 2024

Thank you for your reply!

How about evaluation: what order would it follow?

from feathermap.

varun19299 avatar varun19299 commented on July 19, 2024

I'll also try comparing this method to iterative pruning ("To prune or not to prune", Zhu et al) and some dynamic sparse training techniques (RiGL, Evci et al. 2020).

from feathermap.

phelps-matthew avatar phelps-matthew commented on July 19, 2024

Thank you for your reply!

How about evaluation: what order would it follow?

For accuracy and other metric evaluation you can make use of the GPU if you keep it in f_model.eval() mode. However, if you want to benchmark inference time, then you'd want to use f_model.deploy(). Presumably one would only need to actually go to reduced precision when deploying - if QAT can continue to emulate quantization during evaluation, I'd do that.

from feathermap.

phelps-matthew avatar phelps-matthew commented on July 19, 2024

I'll also try comparing this method to iterative pruning ("To prune or not to prune", Zhu et al) and some dynamic sparse training techniques (RiGL, Evci et al. 2020).

Awesome. One of the cool things about FeatherMap is the ability to compound other compression methods. I'm very curious to see what kind of performance you might get compared to 'unstacked' compression methods.

from feathermap.

varun19299 avatar varun19299 commented on July 19, 2024

For accuracy and other metric evaluation you can make use of the GPU if you keep it in f_model.eval() mode. However, if you want to benchmark inference time, then you'd want to use f_model.deploy(). Presumably one would only need to actually go to reduced precision when deploying - if QAT can continue to emulate quantization during evaluation, I'd do that.

I'm actually just interested in compressing the weights V_1, V_2. So I don't need to worry about eval? (model.state_dict should have V_1, V_2?).

from feathermap.

phelps-matthew avatar phelps-matthew commented on July 19, 2024

Yes, the state_dict will save V1 and V2 as the weights, as well as the batchnorm layers

from feathermap.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.