megvii-research / sparsebit Goto Github PK
View Code? Open in Web Editor NEWA model compression and acceleration toolbox based on pytorch.
License: Apache License 2.0
A model compression and acceleration toolbox based on pytorch.
License: Apache License 2.0
My setting:
cuda 10.2
python=3.8
intsall SparseBit by :
git clone https://github.com/megvii-research/Sparsebit.git
cd sparsebit
python3 setup.py develop --user
pip3 install tensorrt-8.2.5.1-cp38-none-linux_x86_64.whl
after installed that:
run /root/Sparsebit/examples/cifar10_ptq/main.ipynb
, but exposing RuntimeError: Ninja is required to load C++ extensions
, so ,also run pip install Ninja
, then run main.ipynb
again, exposing an error:
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?9abbaedb-76bc-4b54-8e8c-3079840743b9)
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
/root/Sparsebit/examples/cifar10_ptq/main.ipynb Cell 2 in <cell line: 24>()
[21](vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a2268616964756f546974616e5f7870227d/root/Sparsebit/examples/cifar10_ptq/main.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=20) import torchvision.datasets as datasets
[22](vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a2268616964756f546974616e5f7870227d/root/Sparsebit/examples/cifar10_ptq/main.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=21) from model import resnet20
---> [24](vscode-notebook-cell://ssh-remote%2B7b22686f73744e616d65223a2268616964756f546974616e5f7870227d/root/Sparsebit/examples/cifar10_ptq/main.ipynb#W1sdnNjb2RlLXJlbW90ZQ%3D%3D?line=23) from sparsebit.quantization import QuantModel, parse_qconfig
File ~/Sparsebit/sparsebit/quantization/__init__.py:1, in <module>
----> 1 from .quant_model import *
2 from .quant_config import parse_qconfig
File ~/Sparsebit/sparsebit/quantization/quant_model.py:18, in <module>
15 import onnx
17 from sparsebit.utils import update_config
---> 18 from sparsebit.quantization.modules import *
19 from sparsebit.quantization.observers import Observer
20 from sparsebit.quantization.quantizers import Quantizer
File ~/Sparsebit/sparsebit/quantization/modules/__init__.py:16, in <module>
12 return real_register
15 # 将需要注册的module文件填写至此
---> 16 from .base import QuantOpr, MultipleInputsQuantOpr
17 from .activations import *
18 from .conv import *
...
-> 1775 module = importlib.util.module_from_spec(spec)
1776 assert isinstance(spec.loader, importlib.abc.Loader)
1777 spec.loader.exec_module(module)
ImportError: /root/Sparsebit/sparsebit/quantization/torch_extensions/build/fake_quant.so: cannot open shared object file: No such file or directory
any plans for FQ-ViT?
感觉这个是个低成本的setup,对于train llama 7b,一定要2080ti吗?因为11G mem,还是8块8G mem的显卡也可以。能公开你们整个机器的配置吗
Because I am using the server in my lab, so I don't have sudo access. When I was trying to run setup, it shows fatal error: cuda.h: No such file or directory.
Should I install a full-version cuda?
python3 main.py qconfig_lsq.yaml --epochs=0
BACKEND: virtual
W:
QSCHEME: per-channel-symmetric
QUANTIZER:
TYPE: lsq
BIT: 4
A:
QSCHEME: per-tensor-affine
QUANTIZER:
TYPE: lsq
BIT: 4
QADD:
ENABLE_QUANT: true
Traceback (most recent call last):
File "main.py", line 439, in <module>
main()
File "main.py", line 223, in main
qmodel.export_onnx(
File "/data/Project/Sparsebit/sparsebit/quantization/quant_model.py", line 257, in export_onnx
self.add_extra_info_to_onnx(name)
File "/data/Project/Sparsebit/sparsebit/quantization/quant_model.py", line 313, in add_extra_info_to_onnx
weight_dequant = nodes[tensor_inputs[onnx_op.input[1]][0]]
IndexError: list index (1) out of range
Thank you for open this repo.
Will support lower bit(2/4bits) quantization
There is a bug here, I thought of a simple way to fix it, which is applicable to QAT of ViT.
elif 'input_quantizer.scale' in dict(_module.state_dict()).keys():
_module.input_quantizer.set_fake_fused() # 有bug, quant_state会来回切.
else:
print("no_set_fake_fused:", _user.name, _module.input_quantizer_generated)
for example, there is a layer that is constructed as follows:
while len(emb_out.shape) < len(h.shape):
emb_out = emb_out[..., None]
when quantizing this layer, we met this error.
torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow
I rewrite the related code in main.py as follows:
# set head and tail of model is 8bit
model.model.patch_embed_proj.weight_quantizer.set_bit(bit=8)
model.model.head.input_quantizer.set_bit(bit=8)
model.model.head.weight_quantizer.set_bit(bit=8)
# model.model.conv1.weight_quantizer.set_bit(bit=8)
# model.model.fc.input_quantizer.set_bit(bit=8)
# model.model.fc.weight_quantizer.set_bit(bit=8)
and training for 90 epoches, but I can`t get the same result that the readme provided.
Homework ans, Q3 code
data.transpose(self.qdesc.ch_axis, 0)
here, 0 should be 1, the first axis (0) is the calibration-size, the second axis (1) is the channel axis according to the code.
and if self.qdesc.ch_axis==0 mean observers for weights, which should not be calculated here.
执行如下代码报错:
import torchvision
import torch
from sparsebit.quantization import QuantModel, parse_qconfig
qconfig_path = "./qconfig.yaml"
# BACKEND: virtual
# W:
# QSCHEME: per-channel-symmetric
# QUANTIZER:
# TYPE: lsq
# BIT: 4
# A:
# QSCHEME: per-tensor-affine
# QUANTIZER:
# TYPE: lsq
# BIT: 4
# QADD:
# ENABLE_QUANT: true
model = torchvision.models.densenet121(pretrained=True)
qconfig = parse_qconfig(qconfig_path)
model = QuantModel(model, config=qconfig)
inp = torch.randn(2, 3, 224, 224)
out = model(inp)
python3 main.py qconfig_lsq.yaml --epochs=0
Traceback (most recent call last):
File "main.py", line 428, in <module>
main()
File "main.py", line 219, in main
qmodel.export_onnx(
File "/data/Project/Sparsebit/sparsebit/quantization/quant_model.py", line 254, in export_onnx
self.add_extra_info_to_onnx(name)
File "/data/Project/Sparsebit/sparsebit/quantization/quant_model.py", line 298, in add_extra_info_to_onnx
input_dequant = nodes[tensor_inputs[onnx_op.input[0]][0]]
KeyError: 'conv1.weight_quantizer.scale'
it shows following error
Traceback (most recent call last):
File "/home/missa/dev/Sparsebit/large_language_models/alpaca-qlora/generate.py", line 29, in
model = PeftQModel.from_pretrained(
File "/home/missa/miniconda3/envs/sparsebitv6/lib/python3.9/site-packages/peft/peft_model.py", line 135, in from_pretrained
config = PEFT_TYPE_TO_CONFIG_MAPPING[PeftConfig.from_pretrained(model_id).peft_type].from_pretrained(model_id)
File "/home/missa/miniconda3/envs/sparsebitv6/lib/python3.9/site-packages/peft/utils/config.py", line 95, in from_pretrained
if os.path.isfile(os.path.join(pretrained_model_name_or_path, CONFIG_NAME)):
File "/home/missa/miniconda3/envs/sparsebitv6/lib/python3.9/posixpath.py", line 76, in join
a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType
I see CHECKPOINT_PATH = None in generate.py, is this expected?
code
B = int(windows.shape[0] / (H * W / window_size / window_size))
error
TypeError: int() argument must be a string, a bytes-like object or a number, not 'Proxy'
Thank you for this great share!
Do you have any plans to add English Readme/Docs?
is this working for swin transformer?
homework, Q4 export onnx file and conver trt engine. I use trtexec --workspace=4096 --int8 --onnx=./qresnet18.onnx
in 8.2.5 &8.0.03 version.
But encountered that op is not supported, as follows:
[07/30/2022-08:13:33] [I] TensorRT version: 8003 [07/30/2022-08:13:33] [I] [TRT] [MemUsageChange] Init CUDA: CPU +250, GPU +0, now: CPU 257, GPU 482 (MiB) [07/30/2022-08:13:33] [I] Start parsing network model [07/30/2022-08:13:33] [I] [TRT] ---------------------------------------------------------------- [07/30/2022-08:13:33] [I] [TRT] Input filename: ./qresnet18.onnx [07/30/2022-08:13:33] [I] [TRT] ONNX IR version: 0.0.7 [07/30/2022-08:13:33] [I] [TRT] Opset version: 13 [07/30/2022-08:13:33] [I] [TRT] Producer name: pytorch [07/30/2022-08:13:33] [I] [TRT] Producer version: 1.12.0 [07/30/2022-08:13:33] [I] [TRT] Domain: [07/30/2022-08:13:33] [I] [TRT] Model version: 0 [07/30/2022-08:13:33] [I] [TRT] Doc string: [07/30/2022-08:13:33] [I] [TRT] ---------------------------------------------------------------- [07/30/2022-08:13:33] [E] Error[3]: onnx::QuantizeLinear_710: invalid weights type of Int8 [07/30/2022-08:13:33] [E] [TRT] ModelImporter.cpp:720: While parsing node number 0 [Identity -> "onnx::QuantizeLinear_872"]: [07/30/2022-08:13:33] [E] [TRT] ModelImporter.cpp:721: --- Begin node --- [07/30/2022-08:13:33] [E] [TRT] ModelImporter.cpp:722: input: "onnx::QuantizeLinear_710" output: "onnx::QuantizeLinear_872" name: "Identity_0" op_type: "Identity"
Do I need to add some other settings?
import torchvision
import torch
from sparsebit.quantization import QuantModel, parse_qconfig
qconfig_path = "./qconfig_lsq.yaml"
# BACKEND: virtual
# W:
# QSCHEME: per-channel-symmetric
# QUANTIZER:
# TYPE: lsq
# BIT: 4
# A:
# QSCHEME: per-tensor-affine
# QUANTIZER:
# TYPE: lsq
# BIT: 4
# QADD:
# ENABLE_QUANT: true
model = torchvision.models.mobilenet_v2(pretrained=True)
qconfig = parse_qconfig(qconfig_path)
qmodel = QuantModel(model, config=qconfig)
qmodel.eval()
inp = torch.randn(2, 3, 224, 224)
out = qmodel(inp)
print(out.shape)
with torch.no_grad():
qmodel.export_onnx(
inp, name="mobilenet_v2_4w4f.onnx", extra_info=True
)
Traceback (most recent call last):
File "dump_onnx.py", line 17, in <module>
qmodel.export_onnx(
File "/data/Project/Sparsebit/sparsebit/quantization/quant_model.py", line 256, in export_onnx
self.add_extra_info_to_onnx(name)
File "/data/Project/Sparsebit/sparsebit/quantization/quant_model.py", line 294, in add_extra_info_to_onnx
onnx_op = onnx_model.graph.node[op_pos]
IndexError: list index (914) out of range
This issue can be easily reproduced with the example of QAT, along with quant min, max disabled in PyTorch onnx operator. The error message comes from quantizers/quant_tensor.py
saying dimensions of scale and zero_point are inconsistent with input tensor
. I checked the scale and zero_point shape of first-layer convolution and it returns me with [3136, 1, 1, 1], where 3136=64*7*7, not [64, 1, 1, 1]...
文档链接打不开
File "main.py", line 282, in
main()
File "main.py", line 148, in main
qmodel.export_onnx(
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/quant_model.py", line 260, in export_onnx
self.add_extra_info_to_onnx(name)
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/quant_model.py", line 304, in add_extra_info_to_onnx
input_dequant = nodes[tensor_inputs[onnx_op.input[0]][0]]
KeyError: 'onnx::QuantizeLinear_711'
at this line,
elif n.op == "call_function":
new_module = QMODULE_MAPn.target # node作为module传入获取相关参数
KeyError: <built-in method conv2d of type object at 0x7fd352f76780>
An error exposing, when I run cifar10_qat_pact/main.py . as follows:
Traceback (most recent call last):
File "/root/miniconda3/envs/sb/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/sb/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
..........
File "/root/Sparsebit/examples/cifar10_qat_pact/main.py", line 311, in <module>
train(
File "/root/Sparsebit/examples/cifar10_qat_pact/main.py", line 149, in train
output = model(images)
File "/root/miniconda3/envs/sb/lib/python3.8/site-packages/torch-1.11.0-py3.8-linux-x86_64.egg/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/Sparsebit/sparsebit/quantization/quant_model.py", line 198, in forward
return self.model.forward(*args)
File "<eval_with_key>.129", line 8, in forward
File "/root/miniconda3/envs/sb/lib/python3.8/site-packages/torch-1.11.0-py3.8-linux-x86_64.egg/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/Sparsebit/sparsebit/quantization/modules/conv.py", line 39, in forward
x_in = self.input_quantizer(x_in)
File "/root/miniconda3/envs/sb/lib/python3.8/site-packages/torch-1.11.0-py3.8-linux-x86_64.egg/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/root/Sparsebit/sparsebit/quantization/quantizers/base.py", line 54, in forward
x_dq = self._forward(x, scale, zero_point)
TypeError: _forward() takes 2 positional arguments but 4 were given
I found that def _forward(self, x):
of pact does only accept two parameters, but dorefa does accept four parameters def _forward(self, x, scale, zero_point):
, and also found that in fact dorefa also only two parameters are required. So, I think there are two solutions:
either: Change def _forward(self, x):
of pact to def _forward(self, x, scale, zero_point):
or: Change def _forward(self, x, scale, zero_point):
of dorefa to def _forward(self, x):
, and change x_dq = self._forward(x, scale, zero_point) to:
if self.TYPE == "PACT" or self.TYPE == "DoReFa":
x_dq = self._forward(x)
else:
x_dq = self._forward(x, scale, zero_point)
The error will be solved!
With no modification,I using your ptq code to export deit onnx models. But error occurs when using onnxruntime to inference the onnx model.
onnxruntime.capi.onnxruntime_pybind11_state.InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:QuantizeLinear_2 : No Op registered for QuantizeLinear with domain_version of 13
/opt/python3.8.6/bin/python /home/hongyang/codebase/quantization_code/Sparsebit/examples/quantization_aware_training/cifar10/basecase/main.py
Traceback (most recent call last):
File "/opt/python3.8.6/bin/ninja", line 33, in
sys.exit(load_entry_point('ninja', 'console_scripts', 'ninja')())
File "/opt/python3.8.6/lib/python3.8/site-packages/ninja-1.11.1-py3.8-linux-x86_64.egg/ninja/init.py", line 51, in ninja
raise SystemExit(_program('ninja', sys.argv[1:]))
File "/opt/python3.8.6/lib/python3.8/site-packages/ninja-1.11.1-py3.8-linux-x86_64.egg/ninja/init.py", line 47, in _program
return subprocess.call([os.path.join(BIN_DIR, name)] + args, close_fds=False)
File "/opt/python3.8.6/lib/python3.8/subprocess.py", line 340, in call
with Popen(*popenargs, **kwargs) as p:
File "/opt/python3.8.6/lib/python3.8/subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/python3.8.6/lib/python3.8/subprocess.py", line 1592, in _execute_child
self._posix_spawn(args, executable, env, restore_signals,
File "/opt/python3.8.6/lib/python3.8/subprocess.py", line 1543, in _posix_spawn
self.pid = os.posix_spawn(executable, args, env, **kwargs)
PermissionError: [Errno 13] Permission denied: '/opt/python3.8.6/lib/python3.8/site-packages/ninja-1.11.1-py3.8-linux-x86_64.egg/ninja/data/bin/ninja'
Traceback (most recent call last):
File "/home/hongyang/codebase/quantization_code/Sparsebit/examples/quantization_aware_training/cifar10/basecase/main.py", line 23, in
from sparsebit.quantization import QuantModel, parse_qconfig
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/init.py", line 1, in
from .quant_model import *
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/quant_model.py", line 18, in
from sparsebit.quantization.modules import *
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/modules/init.py", line 17, in
from .base import QuantOpr, MultipleInputsQuantOpr
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/modules/base.py", line 4, in
from sparsebit.quantization.quantizers import build_quantizer
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/quantizers/init.py", line 9, in
from .base import Quantizer
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/quantizers/base.py", line 4, in
from sparsebit.quantization.observers import build_observer
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/observers/init.py", line 10, in
from . import minmax, percentile, mse, moving_average, kl_histogram, aciq
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/observers/mse.py", line 6, in
from sparsebit.quantization.quantizers.quant_tensor import STE
File "/home/hongyang/codebase/quantization_code/Sparsebit/sparsebit/quantization/quantizers/quant_tensor.py", line 13, in
fake_quant_kernel = load(
File "/opt/python3.8.6/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/opt/python3.8.6/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1425, in _jit_compile
_write_ninja_file_and_build_library(
File "/opt/python3.8.6/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1506, in _write_ninja_file_and_build_library
verify_ninja_availability()
File "/opt/python3.8.6/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1562, in verify_ninja_availability
raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.