ONNX Model Compressor
Quantization Tool Proposal
Intel Neural Compressor(INC) is a tool for generating optimized ONNX models and supports techniques like Post training quantization (PTQ), Quantization Aware Training (QAT). The tool can also be used for distillation and Pruning to generate Sparse quantized ONNX models. It has broad model coverage (300+ models) representing key domains like vision, NLP and recommendation systems. Since its release INC has seen high popularity among the ONNX community. It has also been integrated into Huggingface Optimum pipeline. INC is also the tool used to produce int8 quantized models in ONNX model Zoo.
While ONNX ecosystem is seeing high adoption in industry there hasn’t been significant community contribution towards ONNX model compression tooling. Hence Intel wants to contribute an open-source project to ONNX community which can help accelerate deployment of sparse and quantized ONNX models.
Proposal
Migrate Intel Neural Compressor to https://github.com/onnx/neural-compressor
Maintain a vendor neutral branding(Neural Compressor) and welcome community contributions to enhance Neural Compressor with broader HW support.
Questions:
-
(Question proposed by Tangri, Saurabh from Intel) How would Neural Compressor scale to non-intel Hardware?
-
(Question proposed by Tangri, Saurabh from Intel) Why not remove support non-ONNX models in Neural Compressor?
Answer: (by Tangri, Saurabh from Intel) I feel interoperability has been a strength of ONNX standard since its inception, and a quantization tool that supports other frameworks should be seen as an expression of that same openness. Yes we can remove/move non-onnx perf data on the landing page, so we don’t appear to be promoting non-onnx frameworks.
Follow-up question: How is model pruning and distillation related to ONNX?
-
Regarding requirements in Rules for all repos and Requirements for new, contributed repos: Who will be actively maintaining the repo?
-
“There are some questions raised about the tool, particularly around expansion to non-Intel hardware”. How to expand the tool to non-intel hardware?
-
(Gary from Micrisoft) Some of the Intel code is redundant with some of what we have in microsoft/onnxruntime and microsoft/onnxconvertercommon. I think it would be better to collaborate on one set of tools. How will the tool being used by converters and onnxruntime?
Rules for all repos
-
Must be owned and managed by one of the ONNX SIGs (ArchInfra SIG)
-
Must be actively maintained (Who will be actively maintaining the repo?)
-
Must adopt the ONNX Code of Conduct (check)
-
Must adopt the standard ONNX license(s) (already Apache-2.0 License)
-
Must adopt the ONNX CLA bot (check)
-
Must adopt all ONNX automation (like LGTM) (check)
-
Must have CI or other automation in place for repos containing code to ensure quality (needs CI pipelines with good code coverage)
-
All OWNERS must be members of standing as defined by ability to vote in Steering Committee elections. (check)
Requirements for new, contributed repos
We are happy to accept contributions as repos under the ONNX organization of new projects that meet the following requirements:
-
Project is closely related to ONNX ((Question proposed by Tangri, Saurabh from Intel) Why not remove support non-ONNX models in Neural Compressor?)
-
Adds value to the ONNX ecosystem (check)
-
Determined to need a new repo rather than a folder in an existing repo (Is it possible to move into Onnx Optimizer?)
-
All contributors must have signed the ONNX CLA (check)
-
Licenses of dependencies must be acceptable (check)
-
Committment to maintain the repo (Who will be actively maintaining the repo?)
-
Approval of the SIG that will own the repo
-
Approval of the Steering Committee