etrommer / torch-approx Goto Github PK

GPU-accelerated Neural Network layers using Approximate Multiplications for PyTorch

Home Page: https://etrommer.de/torch-approx

License: MIT License

C++ 3.48% C 0.88% Python 53.86% Cuda 21.07% Shell 0.21% Jupyter Notebook 20.51%

python approximate-computing convolutional-layers deep-learning fully-connected library machine-learning neural-network paper python3

torch-approx's Introduction

This is where I keep my research/productive work.

For more information about me see my website

Tinkering, partially done projects and non-professional work can be found under my private account

torch-approx's People

Contributors

Stargazers

Watchers

Forkers

arnabaec mukullokhande99 salarshk

torch-approx's Issues

Benchmark against adaPT

TorchApprox and adaPT need to be compared in terms of runtime.

This should be kept on a separate branch to not interfere with productive code.

Layer Noise Mode Interface

The noise mode of the layer currently adds a zero mean tensor with learnable standard deviation.

torch-approx/src/torchapprox/layers/approx_layer.py

Lines 50 to 60 in 5740d50

 def stdev(self) -> torch.nn.Parameter: 

 """ 

  The relative standard deviation of the Additive Gaussian noise that is added 

  to the computation output. Scaling is done relative the current batch's standard devitaion. 

  This is only used when the mode is set to `noise`. It will have no effect in other modes. 

  """ 

 return self._stdev 

 @stdev.setter 

 def stdev(self, noise_std: float): 

 self._stdev = torch.nn.Parameter(torch.tensor(noise_std), requires_grad=True)

This is quite specific to the origin of torch-approx as a backend for AGN Approx. To make it more generally useful, this feature should be kept on a separate branch and the noise implementation replaced with a more generic interface that adds Gaussian Noise of a fixed mean and standard deviation to the layer output

torch-approx/src/torchapprox/layers/approx_layer.py

Lines 48 to 74 in 8284cf7

 @property 

 def stdev(self) -> float: 

 """ 

  Perturbation Error Relative Standard Deviation 

  Returns: 

  Currently configured perturbation standard deviation 

  """ 

 return self._stdev.item() 

 @stdev.setter 

 def stdev(self, val: float): 

 self._stdev = torch.tensor([val], device=self.weight.device) # type: ignore 

 @property 

 def mean(self) -> float: 

 """ 

  Perturbation Error mean 

  Returns: 

  Currently configured perturbation mean 

  """ 

 return self._mean.item() 

 @mean.setter 

 def mean(self, val: float): 

 self._mean = torch.tensor([val], device=self.weight.device) # type: ignore

Benchmark against TFApprox

Some comparison with TFApprox with TFApprox was requested.

This should be kept separate from productive code, similar to #6

Test case is not fully-defined yet. Most likely scenario: Comparison of Conv2D inference speed.

Implement Approximate Depthwise Convolution Kernels

Benchmarking has shown that Im2Col + ApproxGeMM is extremely slow for Depthwise-Separable Convolution Operations.

This should be addressed by adding dedicated Approximate DWConv operators.

accurate FP32 DWConv operators should be used as a template.

FAILED & ERROR when running Unit Tests

Hi etrommer, I met with errors when runing unit tests with "poetry run pytest test". I installed poetry in a conda environment (python=3.10.13) and cloned your code. Then I installed packages with "poetry install --with "dev,extras"" and installed additional dependencies as well as pre-commit hooks fine. However the unit tests report failed for several times and then all errors. I also run the benchmark, though there little failures, most of the rest seems good. Could you help me to solve the errors? thanks:)

My cuda version is 11.7 and the following is the output log of unit test and the benchmark.

unit test

============================= test session starts ==============================
platform linux -- Python 3.10.13, pytest-7.4.2, pluggy-1.3.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/zhaojun/torch-approx
configfile: pyproject.toml
plugins: cov-3.0.0, benchmark-4.0.0
collected 436 items

test/test_approx_layer.py .............................FFFFFFEEEEEEEEEEE [ 10%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 27%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 37%]
test/test_approx_mm.py EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 48%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 64%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 81%]
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE [ 96%]
test/test_dwconv2d.py EEEEEEEEEEEEEE [100%]

==================================== ERRORS ====================================
_____ ERROR at setup of test_layer_fwd[cuda-weight_qconfig0-layer_config6] _____

@pytest.fixture(autouse=True)
def fix_seed():
    """
    Run before every test.
    - Fixes random seed to make test reproducible
    - Sets CUDA to blocking to allow for benchmarking of normally asynchronous kernels
    """
    os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
    np.random.seed(42)

  torch.manual_seed(42)

test/conftest.py:36:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/random.py:40: in manual_seed
torch.cuda.manual_seed_all(seed)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/random.py:113: in manual_seed_all
_lazy_call(cb, seed_all=True)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/init.py:183: in _lazy_call
callable()

def cb():
    for i in range(device_count()):
        default_generator = torch.cuda.default_generators[i]

      default_generator.manual_seed(seed)

E RuntimeError: CUDA error: an illegal memory access was encountered
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/cuda/random.py:111: RuntimeError
_____ ERROR at setup of test_layer_fwd[cuda-weight_qconfig1-layer_config0] _____

[............................... similar errors .............................................]
=================================== FAILURES ===================================
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config0] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.linear.Linear'>, (4, 20), (20, 10), {})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)

  y = layer(x)

test/test_approx_layer.py:165:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_linear.py:46: in approx_fwd
y = self.approx_op(x, w, quant_params, self.htp_model)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/operators/lut.py:82: in forward
return ApproxGeMM.apply(x, w, self.lut, quant_params, htp_model)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]

x = tensor([[0.8243, 0.2120, 0.7301, 0.3219, 0.7536, 0.2120, 0.9263, 0.1413, 0.3454,
0.9970, 0.5495, 0.2512, 0.92... 0.7536, 0.7065, 0.3297, 0.9106, 0.3925, 0.1727, 0.9813, 0.3690, 0.2591,
0.9185, 0.9891]], device='cuda:0')
w = tensor([[ 0.0501, -0.2194, -0.0449, -0.2056, -0.1538, -0.0086, 0.1054, -0.0415,
0.0086, -0.0950, -0.1158, ...0225, 0.0881, -0.2074]], device='cuda:0',
grad_fn=)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
quant_params = QuantizationParameters(x_scale=tensor([0.0079], device='cuda:0'), x_zero_point=tensor([0], device='cuda:0', dtype=torch.int32), w_scale=tensor([0.0017], device='cuda:0'), w_zero_point=tensor([0], device='cuda:0', dtype=torch.int32))
htp_model = None

@staticmethod
def forward(  # type: ignore
    x: torch.Tensor,
    w: torch.Tensor,
    lut: torch.Tensor,
    quant_params: "QuantizationParameters",
    htp_model: Optional[Callable],
) -> torch.Tensor:
    """
    Approximate forward operation
    """

    x_q = torch.round((x / quant_params.x_scale) + quant_params.x_zero_point)[
        :, None, :
    ]
    w_q = torch.round(
        (w / quant_params.w_scale[:, None]) + quant_params.w_zero_point[:, None]
    ).T

    if htp_model is None:

      y_q = approx(x_q.char(), w_q.char(), lut).float()

E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/approxgemm.py:39: RuntimeError
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config1] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 16, 3), {'groups': 1})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)

  y = layer(x)

test/test_approx_layer.py:165:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:240: in forward
y_q = _im2col_conv2d(x_q, w_q, conv_args, lut, out_dims)

x_q = tensor([[[[105., 27., 93., 41.],
[ 96., 27., 118., 18.],
[ 44., 127., 70., 32.],
... [ 18., 102., 123., 77.],
[ 99., 57., 5., 16.],
[ 80., 83., 114., 84.]]]], device='cuda:0')
w_q = tensor([[[[ 29., -125., -26.],
[-117., -88., -4.],
[ 60., -24., 5.]],

     [[ -54.,...
     [[  22., -123.,   89.],
      [  25.,   91., -126.],
      [  62., -107.,  -40.]]]], device='cuda:0')

conv_args = Conv2dArgs(in_channels=8, out_channels=16, kernel_size=(3, 3), stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=1)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
out_dims = (2, 2)

def _im2col_conv2d(
    x_q: torch.FloatTensor,
    w_q: torch.FloatTensor,
    conv_args: Conv2dArgs,
    lut: torch.ShortTensor,
    out_dims: Tuple[int, int],
) -> torch.FloatTensor:
    # Pre-allocate output tensor
    y_q = torch.empty(
        x_q.size(0),
        conv_args.out_channels,
        math.prod(out_dims),
        device=x_q.device,
        dtype=torch.int32,
    )

    w_s8 = w_q.char()
    for group in range(conv_args.groups):
        # Calculate lower and upper channel index for current group
        in_ch_lower, in_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.in_channels
        )
        out_ch_lower, out_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.out_channels
        )

        # Im2Col operation
        x_unfold_s8 = torch.nn.functional.unfold(
            x_q[
                :,
                in_ch_lower:in_ch_upper,
                :,
            ],
            kernel_size=conv_args.kernel_size,
            padding=conv_args.padding,
            stride=conv_args.stride,
            dilation=conv_args.dilation,
        ).char()

        # Reshape weights to 2D
        w_flat_s8 = w_s8[out_ch_lower:out_ch_upper].view(
            conv_args.out_channels // conv_args.groups, -1
        )

        # ApproxGeMM

      y_q[:, out_ch_lower:out_ch_upper] = approx(

            w_flat_s8,
            x_unfold_s8,
            lut,
        )

E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/conv2d.py:200: RuntimeError

______________ test_layer_fwd[cuda-weight_qconfig0-layer_config4] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 16, 3), {'groups': 8})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)

  y = layer(x)

test/test_approx_layer.py:165:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:240: in forward
y_q = _im2col_conv2d(x_q, w_q, conv_args, lut, out_dims)

    [[[ -55...
    [[[-119., -126.,   53.],
      [ -20.,  118.,   20.],
      [  50.,   -8., -123.]]]], device='cuda:0')

conv_args = Conv2dArgs(in_channels=8, out_channels=16, kernel_size=(3, 3), stride=(1, 1), padding=(0, 0), dilation=(1, 1), groups=8)
lut = tensor([[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 1, 2, ..., -3, -2, -1],
[ 0, 2, 4, ..., -6, -4, -2]... ..., 9, 6, 3],
[ 0, -2, -4, ..., 6, 4, 2],
[ 0, -1, -2, ..., 3, 2, 1]], dtype=torch.int32)
out_dims = (2, 2)

def _im2col_conv2d(
    x_q: torch.FloatTensor,
    w_q: torch.FloatTensor,
    conv_args: Conv2dArgs,
    lut: torch.ShortTensor,
    out_dims: Tuple[int, int],
) -> torch.FloatTensor:
    # Pre-allocate output tensor
    y_q = torch.empty(
        x_q.size(0),
        conv_args.out_channels,
        math.prod(out_dims),
        device=x_q.device,
        dtype=torch.int32,
    )

    w_s8 = w_q.char()
    for group in range(conv_args.groups):
        # Calculate lower and upper channel index for current group
        in_ch_lower, in_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.in_channels
        )
        out_ch_lower, out_ch_upper = _group_limits(
            group, conv_args.groups, conv_args.out_channels
        )

        # Im2Col operation
        x_unfold_s8 = torch.nn.functional.unfold(
            x_q[
                :,
                in_ch_lower:in_ch_upper,
                :,
            ],
            kernel_size=conv_args.kernel_size,
            padding=conv_args.padding,
            stride=conv_args.stride,
            dilation=conv_args.dilation,
        ).char()

        # Reshape weights to 2D
        w_flat_s8 = w_s8[out_ch_lower:out_ch_upper].view(
            conv_args.out_channels // conv_args.groups, -1
        )

        # ApproxGeMM

      y_q[:, out_ch_lower:out_ch_upper] = approx(

            w_flat_s8,
            x_unfold_s8,
            lut,
        )

E RuntimeError: CUDA error: invalid argument
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/conv2d.py:200: RuntimeError
______________ test_layer_fwd[cuda-weight_qconfig0-layer_config5] ______________

device = 'cuda'
layer_config = (<class 'torch.nn.modules.conv.Conv2d'>, (2, 8, 4, 4), (8, 8, 3), {'groups': 8})
weight_qconfig = functools.partial(<class 'torch.ao.quantization.fake_quantize.FakeQuantize'>, observer=<class 'torch.ao.quantization.observer.MinMaxObserver'>, dtype=torch.qint8, qscheme=torch.per_tensor_symmetric, quant_min=-128, quant_max=127){}

@pytest.mark.parametrize("layer_config", layer_configs)
@pytest.mark.parametrize("weight_qconfig", weight_quant_configs)
def test_layer_fwd(device, layer_config, weight_qconfig):
    input_dims = layer_config[1]
    layer, ref_layer = generate_models(layer_config, device, weight_qconfig)

    x = torch.rand(input_dims, device=device)
    xref = copy.deepcopy(x)

  y = layer(x)

test/test_approx_layer.py:165:

../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
src/torchapprox/layers/approx_wrapper.py:60: in forward
y_q = self.wrapped(x_q, x_scale, x_zero_point)
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/nn/modules/module.py:1538: in _call_impl
result = forward_call(*args, **kwargs)
src/torchapprox/layers/approx_conv2d.py:185: in forward
return ApproxLayer.forward(self, x_q, x_scale, x_zero_point, bias)
src/torchapprox/layers/approx_layer.py:212: in forward
y = self.approx_fwd(x, w, quant_params)
src/torchapprox/layers/approx_conv2d.py:155: in approx_fwd
y = ApproxConv2dOp.apply(
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/autograd/function.py:506: in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
src/torchapprox/operators/conv2d.py:237: in forward
y_q = dwconv2d(x_q, w_q, lut, conv_args.stride, conv_args.padding)

x = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b69d0>
w = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b6d40>
lut = <[RuntimeError('CUDA error: an illegal memory access was encountered\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n') raised in repr()] Tensor object at 0x7f91966b4fe0>
stride = (1, 1), padding = (0, 0)

def dwconv2d(
    x: torch.FloatTensor,
    w: torch.FloatTensor,
    lut: torch.ShortTensor,
    stride: int = 1,
    padding: int = 0,
) -> torch.FloatTensor:
    """
    Approximate 2D Depthwise Convolution
    """
    x = x.char()
    w = w.char()

    assert x.device == w.device
    assert x.is_cuda
    assert (
        x.dtype == w.dtype == torch.int8
    ), "Input operands need to be 8-Bit signed Integer"
    assert lut.dtype == torch.int32, "LUT needs to be 32 bit signed Integer"

    def make_tuple(val):
        if not isinstance(val, tuple):
            return (val, val)
        return val

    stride = make_tuple(stride)
    padding = make_tuple(padding)

    lut = lut.to(x.device)
    small = ta_backend.use_dwconv2d_small(x, w, 1, 1, *stride, *padding)
    if small:
        out = ta_backend.dwconv2d_small(x, w, lut, 1, 1, *stride, *padding, True)
    else:
        out = ta_backend.dwconv2d(x, w, lut, 1, 1, *stride, *padding, *padding, True)

  return out.float()

E RuntimeError: CUDA error: an illegal memory access was encountered
E Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

src/torchapprox/operators/backend.py:70: RuntimeError
=============================== warnings summary ===============================
../anaconda3/envs/approx/lib/python3.10/site-packages/torch/utils/cpp_extension.py:25
/home/zhaojun/anaconda3/envs/approx/lib/python3.10/site-packages/torch/utils/cpp_extension.py:25: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
from pkg_resources import packaging # type: ignore[attr-defined]

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config0]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config1]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config2]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config3]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config4]
FAILED test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config5]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig0-layer_config6]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig1-layer_config0]
ERROR test/test_approx_layer.py::test_layer_fwd[cuda-weight_qconfig1-layer_config1]
......

benchmark
==========================
benchmarks/test_bench_torchapprox.py .F........................................F........................................F........................................F........................................F........ [ 70%]
................................F....................................... [100%]

======================================================================= short test summary info =================================================================================================
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[mobilenet_v2-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[effcientnet_b0-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[vgg16-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[alexnet-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[resnet18-lut] - AssertionError: LUT needs to be signed 32 Bit Integer
FAILED benchmarks/test_bench_torchapprox.py::test_bench_torchapprox[resnet50-lut] - AssertionError: LUT needs to be signed 32 Bit Integer

Use PyTorch quantization interface

TA currently uses its own Quantizer implementation. It would be cleaner to provide approximate layer implementation as subclasses of torch.ao.nn.qat.modules and use the observer/quantizer API that is provided by native PyTorch.

Change documentation from Sphinx to mkdocs

Sphinx deployment as a bit too complicated for the scope of this project.

Check if mkdocs is an option with slightly less overhead.

Add Benchmarks

Add (micro-)benchmarks to compare throughput of different inference modes

Documentation page not available

Hi,
The documentation page seems to be not available after the latest commit.

Support inline compilation of C approximate functions

Add the feature to replace the LUT operation with an inlined C function that performs operand transformations according to the logic of a given AM.

This will be helpful in benchmarking 12-Bit and 16-Bit AMs where a LUT would be too large.

Accuracy Benchmarking

Benchmarking of model accuracy when retrained using several modes is required.

Likely candidates for comparison:

Baseline (retraining with accurate multiplication but same hyperparamters)
Gaussian Noise of same stdev as multiplier
Behavioral Simulation (LUT)
Regression Models

Refactor ApproxConv2d operator into torch.autograd.Function

ApproxConv2d operator is currently composed from several separate Autograd Functions. Refactoring those into a single one will likely reduce problems with excessive memory consumption due to smaller number of tensors that need to be tracked

Set up Sphinx

Improve documentation, specifically:

Set up Sphinx in Github Actions
Add preliminary content to README.md

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	def stdev(self) -> torch.nn.Parameter:
	"""
	The relative standard deviation of the Additive Gaussian noise that is added
	to the computation output. Scaling is done relative the current batch's standard devitaion.
	This is only used when the mode is set to `noise`. It will have no effect in other modes.
	"""
	return self._stdev

	@stdev.setter
	def stdev(self, noise_std: float):
	self._stdev = torch.nn.Parameter(torch.tensor(noise_std), requires_grad=True)

	@property
	def stdev(self) -> float:
	"""
	Perturbation Error Relative Standard Deviation

	Returns:
	Currently configured perturbation standard deviation
	"""
	return self._stdev.item()

	@stdev.setter
	def stdev(self, val: float):
	self._stdev = torch.tensor([val], device=self.weight.device) # type: ignore

	@property
	def mean(self) -> float:
	"""
	Perturbation Error mean

	Returns:
	Currently configured perturbation mean
	"""
	return self._mean.item()

	@mean.setter
	def mean(self, val: float):
	self._mean = torch.tensor([val], device=self.weight.device) # type: ignore