This is where I keep my research/productive work.
For more information about me see my website
Tinkering, partially done projects and non-professional work can be found under my private account
GPU-accelerated Neural Network layers using Approximate Multiplications for PyTorch
Home Page: https://etrommer.de/torch-approx
License: MIT License
This is where I keep my research/productive work.
For more information about me see my website
Tinkering, partially done projects and non-professional work can be found under my private account
TorchApprox and adaPT need to be compared in terms of runtime.
This should be kept on a separate branch to not interfere with productive code.
The noise mode of the layer currently adds a zero mean tensor with learnable standard deviation.
torch-approx/src/torchapprox/layers/approx_layer.py
Lines 50 to 60 in 5740d50
This is quite specific to the origin of torch-approx as a backend for AGN Approx. To make it more generally useful, this feature should be kept on a separate branch and the noise implementation replaced with a more generic interface that adds Gaussian Noise of a fixed mean and standard deviation to the layer output
torch-approx/src/torchapprox/layers/approx_layer.py
Lines 48 to 74 in 8284cf7
Some comparison with TFApprox with TFApprox was requested.
This should be kept separate from productive code, similar to #6
Test case is not fully-defined yet. Most likely scenario: Comparison of Conv2D inference speed.
Benchmarking has shown that Im2Col + ApproxGeMM is extremely slow for Depthwise-Separable Convolution Operations.
This should be addressed by adding dedicated Approximate DWConv operators.
accurate FP32 DWConv operators should be used as a template.
Hi etrommer, I met with errors when runing unit tests with "poetry run pytest test". I installed poetry in a conda environment (python=3.10.13) and cloned your code. Then I installed packages with "poetry install --with "dev,extras"" and installed additional dependencies as well as pre-commit hooks fine. However the unit tests report failed for several times and then all errors. I also run the benchmark, though there little failures, most of the rest seems good. Could you help me to solve the errors? thanks:)
My cuda version is 11.7 and the following is the output log of unit test and the benchmark.
============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-7.4.2, pluggy-1.3.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/zhaojun/torch-approx
configfile: pyproject.toml
plugins: cov-3.0.0, benchmark-4.0.0
collected 436 items
test/test_approx_layer.py .............................FFFFFFEEEEEEEEEEE [ 10%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 27%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 37%]
test/test_approx_mm.py EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 48%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 64%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 81%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 96%]
test/test_dwconv2d.py EEEEEEEEEEEEEE [100%]
==================================== ERRORS ====================================
_____ ERROR at setup of test_layer_fwd[cuda-weight_qconfig0-layer_config6] _____
@pytest.fixture(autouse=True)
def fix_seed():
"""
Run before every test.
- Fixes random seed to make test reproducible
- Sets CUDA to blocking to allow for benchmarking of normally asynchronous kernels
"""
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
np.random.seed(42)
torch.manual_seed(42)
test/conftest.py:36:
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/random.py:40: in manual_seed
torch.cuda.manual_seed_all(seed)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/random.py:113: in manual_seed_all
_lazy_call(cb, seed_all=True)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/init.py:183: in _lazy_call
callable()
def cb():
for i in range(device_count()):
default_generator = torch.cuda.default_generators[i]
default_generator.manual_seed(seed)
E RuntimeError: CUDA error: an illegal memory access was encountered
E Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/random.py:111: RuntimeError
_____ ERROR at setup of test_layer_fwd[cuda-weight_qconfig1-layer_config0] _____
[............................... similar errors .............................................]
=================================== FAILURES ===================================
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config0] ______________
device = 'cuda'
layer_config = (<class 'torch.nn.modules.linear.Linear'>, (4, 20), (20, 10), {})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}
@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
input_dims = layer_config[1]
layer, ref_layer = generate_models(layer_config, device, weight_qconfig)
x = torch.rand(input_dims, device=device)
xref = copy.deepcopy(x)
y = layer(x)
test/test_approx_layer.py:165:
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_linear.py:46: in approx_fwd
y = self.approx_op(x, w, quant_params, self.htp_model)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/operators/lut.py:82: in forward
return ApproxGeMM.apply(x, w, self.lut, quant_params, htp_model)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
x = tensor([[0.8243, 0.2120, 0.7301, 0.3219, 0.7536, 0.2120, 0.9263, 0.1413, 0.3454,
0.9970, 0.5495, 0.2512, 0.92... 0.7536, 0.7065, 0.3297, 0.9106, 0.3925, 0.1727, 0.9813, 0.3690, 0.2591,
0.9185, 0.9891]], device='cuda:0')
w = tensor([[ 0.0501, -0.2194, -0.0449, -0.2056, -0.1538, -0.0086, 0.1054, -0.0415,
0.0086, -0.0950, -0.1158, ...0225, 0.0881, -0.2074]], device='cuda:0',
grad_fn=)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
quant_params = QuantizationParameters(x_scale=tensor([0.0079], device='cuda:0'), x_zero_point=tensor([0], device='cuda:0', dtype=torch.int32), w_scale=tensor([0.0017], device='cuda:0'), w_zero_point=tensor([0], device='cuda:0', dtype=torch.int32))
htp_model = None
@staticmethod
def forward( # type: ignore
x: torch.Tensor,
w: torch.Tensor,
lut: torch.Tensor,
quant_params: "QuantizationParameters",
htp_model: Optional[Callable],
) -> torch.Tensor:
"""
Approximate forward operation
"""
x_q = torch.round((x / quant_params.x_scale) + quant_params.x_zero_point)[
:, None, :
]
w_q = torch.round(
(w / quant_params.w_scale[:, None]) + quant_params.w_zero_point[:, None]
).T
if htp_model is None:
y_q = approx(x_q.char(), w_q.char(), lut).float()
E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
src/torchapprox/operators/approxgemm.py:39: RuntimeError
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config1] ______________
device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 16, 3), {'groups': 1})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}
@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
input_dims = layer_config[1]
layer, ref_layer = generate_models(layer_config, device, weight_qconfig)
x = torch.rand(input_dims, device=device)
xref = copy.deepcopy(x)
y = layer(x)
test/test_approx_layer.py:165:
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:240: in forward
y_q = _im2col_conv2d(x_q, w_q, conv_args, lut, out_dims)
x_q = tensor([[[[105., 27., 93., 41.],
[ 96., 27., 118., 18.],
[ 44., 127., 70., 32.],
... [ 18., 102., 123., 77.],
[ 99., 57., 5., 16.],
[ 80., 83., 114., 84.]]]], device='cuda:0')
w_q = tensor([[[[ 29., -125., -26.],
[-117., -88., -4.],
[ 60., -24., 5.]],
[[ -54.,...
[[ 22., -123., 89.],
[ 25., 91., -126.],
[ 62., -107., -40.]]]], device='cuda:0')
conv_args = Conv2dArgs(in_channels=8, out_channels=16, kernel_size=(3, 3), stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
out_dims = (2, 2)
def _im2col_conv2d(
x_q: torch.FloatTensor,
w_q: torch.FloatTensor,
conv_args: Conv2dArgs,
lut: torch.ShortTensor,
out_dims: Tuple[int, int],
) -> torch.FloatTensor:
# Pre-allocate output tensor
y_q = torch.empty(
x_q.size(0),
conv_args.out_channels,
math.prod(out_dims),
device=x_q.device,
dtype=torch.int32,
)
w_s8 = w_q.char()
for group in range(conv_args.groups):
# Calculate lower and upper channel index for current group
in_ch_lower, in_ch_upper = _group_limits(
group, conv_args.groups, conv_args.in_channels
)
out_ch_lower, out_ch_upper = _group_limits(
group, conv_args.groups, conv_args.out_channels
)
# Im2Col operation
x_unfold_s8 = torch.nn.functional.unfold(
x_q[
:,
in_ch_lower:in_ch_upper,
:,
],
kernel_size=conv_args.kernel_size,
padding=conv_args.padding,
stride=conv_args.stride,
dilation=conv_args.dilation,
).char()
# Reshape weights to 2D
w_flat_s8 = w_s8[out_ch_lower:out_ch_upper].view(
conv_args.out_channels // conv_args.groups, -1
)
# ApproxGeMM
y_q[:, out_ch_lower:out_ch_upper] = approx(
w_flat_s8,
x_unfold_s8,
lut,
)
E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
src/torchapprox/operators/conv2d.py:200: RuntimeError
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config4] ______________
device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 16, 3), {'groups': 8})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}
@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
input_dims = layer_config[1]
layer, ref_layer = generate_models(layer_config, device, weight_qconfig)
x = torch.rand(input_dims, device=device)
xref = copy.deepcopy(x)
y = layer(x)
test/test_approx_layer.py:165:
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:240: in forward
y_q = _im2col_conv2d(x_q, w_q, conv_args, lut, out_dims)
x_q = tensor([[[[105., 27., 93., 41.],
[ 96., 27., 118., 18.],
[ 44., 127., 70., 32.],
... [ 18., 102., 123., 77.],
[ 99., 57., 5., 16.],
[ 80., 83., 114., 84.]]]], device='cuda:0')
w_q = tensor([[[[ 29., -127., -26.],
[-119., -89., -5.],
[ 61., -24., 5.]]],
[[[ -55...
[[[-119., -126., 53.],
[ -20., 118., 20.],
[ 50., -8., -123.]]]], device='cuda:0')
conv_args = Conv2dArgs(in_channels=8, out_channels=16, kernel_size=(3, 3), stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=8)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
out_dims = (2, 2)
def _im2col_conv2d(
x_q: torch.FloatTensor,
w_q: torch.FloatTensor,
conv_args: Conv2dArgs,
lut: torch.ShortTensor,
out_dims: Tuple[int, int],
) -> torch.FloatTensor:
# Pre-allocate output tensor
y_q = torch.empty(
x_q.size(0),
conv_args.out_channels,
math.prod(out_dims),
device=x_q.device,
dtype=torch.int32,
)
w_s8 = w_q.char()
for group in range(conv_args.groups):
# Calculate lower and upper channel index for current group
in_ch_lower, in_ch_upper = _group_limits(
group, conv_args.groups, conv_args.in_channels
)
out_ch_lower, out_ch_upper = _group_limits(
group, conv_args.groups, conv_args.out_channels
)
# Im2Col operation
x_unfold_s8 = torch.nn.functional.unfold(
x_q[
:,
in_ch_lower:in_ch_upper,
:,
],
kernel_size=conv_args.kernel_size,
padding=conv_args.padding,
stride=conv_args.stride,
dilation=conv_args.dilation,
).char()
# Reshape weights to 2D
w_flat_s8 = w_s8[out_ch_lower:out_ch_upper].view(
conv_args.out_channels // conv_args.groups, -1
)
# ApproxGeMM
y_q[:, out_ch_lower:out_ch_upper] = approx(
w_flat_s8,
x_unfold_s8,
lut,
)
E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
src/torchapprox/operators/conv2d.py:200: RuntimeError
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config5] ______________
device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 8, 3), {'groups': 8})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}
@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
input_dims = layer_config[1]
layer, ref_layer = generate_models(layer_config, device, weight_qconfig)
x = torch.rand(input_dims, device=device)
xref = copy.deepcopy(x)
y = layer(x)
test/test_approx_layer.py:165:
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:237: in forward
y_q = dwconv2d(x_q, w_q, lut, conv_args.stride, conv_args.padding)
x = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA
to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b69d0>
w = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA
to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b6d40>
lut = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA
to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b4fe0>
stride = (1, 1), padding = (0, 0)
def dwconv2d(
x: torch.FloatTensor,
w: torch.FloatTensor,
lut: torch.ShortTensor,
stride: int = 1,
padding: int = 0,
) -> torch.FloatTensor:
"""
Approximate 2D Depthwise Convolution
"""
x = x.char()
w = w.char()
assert x.device == w.device
assert x.is_cuda
assert (
x.dtype == w.dtype == torch.int8
), "Input operands need to be 8-Bit signed Integer"
assert lut.dtype == torch.int32, "LUT needs to be 32 bit signed Integer"
def make_tuple(val):
if not isinstance(val, tuple):
return (val, val)
return val
stride = make_tuple(stride)
padding = make_tuple(padding)
lut = lut.to(x.device)
small = ta_backend.use_dwconv2d_small(x, w, 1, 1, *stride, *padding)
if small:
out = ta_backend.dwconv2d_small(x, w, lut, 1, 1, *stride, *padding, True)
else:
out = ta_backend.dwconv2d(x, w, lut, 1, 1, *stride, *padding, *padding, True)
return out.float()
E RuntimeError: CUDA error: an illegal memory access was encountered
E Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
src/torchapprox/operators/backend.py:70: RuntimeError
=============================== warnings summary ===============================
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/utils/cpp_extension.py:25
/home/zhaojun/anaconda3/envs/approx/lib/python3.10/site-packages/torch/utils/cpp_extension.py:25: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import packaging # type: ignore[attr-defined]
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config0]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config1]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config2]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config3]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config4]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config5]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config6]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig1-layer_config0]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig1-layer_config1]
......
======================================================================= short test summary info =================================================================================================
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[mobilenet_v2-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[effcientnet_b0-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[vgg16-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[alexnet-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[resnet18-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[resnet50-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
TA currently uses its own Quantizer implementation. It would be cleaner to provide approximate layer implementation as subclasses of torch.ao.nn.qat.modules and use the observer/quantizer API that is provided by native PyTorch.
Sphinx deployment as a bit too complicated for the scope of this project.
Check if mkdocs is an option with slightly less overhead.
Add (micro-)benchmarks to compare throughput of different inference modes
Hi,
The documentation page seems to be not available after the latest commit.
Add the feature to replace the LUT operation with an inlined C function that performs operand transformations according to the logic of a given AM.
This will be helpful in benchmarking 12-Bit and 16-Bit AMs where a LUT would be too large.
Benchmarking of model accuracy when retrained using several modes is required.
Likely candidates for comparison:
ApproxConv2d operator is currently composed from several separate Autograd Functions. Refactoring those into a single one will likely reduce problems with excessive memory consumption due to smaller number of tensors that need to be tracked
Improve documentation, specifically:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.