Giter VIP home page Giter VIP logo

halutmatmul's Introduction

Stella Nera: A halutmatmul based Accelerator

Algorithmic CI

PyTorch Layer Test | PyTest Python Linting Mypy - Typechecking

ML CI

ResNet9 - 92%+ accuracy

Hardware CI

HW Synth + PAR OpenROAD RTL Linting HW Design Verification

Paper

Abstract

The recent Maddness method approximates Matrix Multiplication (MatMul) without the need for multiplication by using a hash-based version of product quantization (PQ). The hash function is a decision tree, allowing for efficient hardware implementation, as multiply-accumulate operations are replaced by decision tree passes and LUT lookups. Stella Nera is the first Maddness accelerator achieving 15x higher area efficiency (GMAC/s/mm^2) and 25x higher energy efficiency (TMAC/s/W) than direct MatMul accelerators in the same technology. In a commercial 14 nm technology and scaled to 3 nm, we achieve an energy efficiency of 161 TOp/s/[email protected] with a Top-1 accuracy on CIFAR-10 of over 92.5% using ResNet9.

Algorithmic - Maddness

Maddness Animation

ResNet-9 LUTs, Thresholds, Dims

Halutmatmul example

import numpy as np
from halutmatmul.halutmatmul import HalutMatmul

A = np.random.random((10000, 512))
A_train = A[:8000]
A_test = A[8000:]
B = np.random.random((512, 10))
C = np.matmul(A_test, B)

hm = HalutMatmul(C=32, K=16)
hm.learn_offline(A_train, B)
C_halut = hm.matmul_online(A_test)

mse = np.square(C_halut - C).mean()
print(mse)

Installation

# install conda environment & activate
# mamba is recommended for faster install
conda env create -f environment_gpu.yml
conda activate halutmatmul

# IIS prefixed env
conda env create -f environment_gpu.yml --prefix /scratch/janniss/conda/halutmatmul_gpu

Differentiable Maddness

Differentiable Maddness

Hardware - OpenROAD flow results from CI - NOT OPTIMIZED

All completely open hardware results are NOT OPTIMIZED! The results are only for reference and to show the flow works. In the paper results from commercial tools are shown. See this as a community service to make the hardware results more accessible.

All Designs NanGate45
All Report All
History History

Open Hardware Results Table

NanGate45 halut_matmul halut_encoder_4 halut_decoder
Area [μm^2] 128816 46782 24667.5
Freq [Mhz] 166.7 166.7 166.7
GE 161.423 kGE 58.624 kGE 30.911 kGE
Std Cell [#] 65496 23130 12256
Voltage [V] 1.1 1.1 1.1
Util [%] 50.4 48.7 52.1
TNS 0 0 0
Clock Net Clock Net Clock Net Clock Net
Routing Routing Routing Routing
GDS GDS Download GDS Download GDS Download

Full design (halutmatmul)

Run locally with:

git submodule update --init --recursive
cd hardware
ACC_TYPE=INT DATA_WIDTH=8 NUM_M=8 NUM_DECODER_UNITS=4 NUM_C=16 make halut-open-synth-and-pnr-halut_matmul

References

Citation

@article{schonleber2023stella,
  title={Stella Nera: Achieving 161 TOp/s/W with Multiplier-free DNN Acceleration based on Approximate Matrix Multiplication},
  author={Sch{\"o}nleber, Jannis and Cavigelli, Lukas and Andri, Renzo and Perotti, Matteo and Benini, Luca},
  journal={arXiv preprint arXiv:2311.10207},
  year={2023}
}

halutmatmul's People

Contributors

joennlae avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

halutmatmul's Issues

Train and test on model level

Hi, thanks for your work! I want to know the standard way to train halutmatmul on model level, as I am trying various model architectures with different datasets to examine accuracy loss from applying PQ in inference. That is, train it with same training dataset used to train the model and perform inference to collect accuracy. Currently with example.py, it only shows matrix multiplication level of training and inference. If you can point me to the right file(s) to look for that will be great, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.