Giter VIP home page Giter VIP logo

ape's Introduction

APE on CUDA

This project is an APE implementation on NVIDIA GPU using the cuBLAS backend.

APE is a method of emulating high-bitwidth computation with low-bitwidth data types. For example, APE can use $3$ or $6$ Tensor Core low-bitwidth computation to emulate an FP32 computation with up to $5.3\times$ theoretical speedup. This project provides the following:

  • GEMM implementations using Tensor Cores with FP32-precision and various representation ranges.
  • Auto-adapted algorithm selection that guarantees end-to-end correctness.
  • INT16 GEMM implementation using Tensor Cores.

For more details, please see our paper.

Usage

Build

mkdir build && cd build
cmake ..
make -j

API

APE provides a blas-like API, and users only need to include ape.h to use APE to accelerate FP32 applications directly.

void apeGemmFP32(ApeTrans transa, ApeTrans transb, int m, int n, int k, const float *alpha, const float *A, int lda, const float *B, int ldb, const float *beta, float *C, int ldc, const ApeAlgo algo = APE_ALGO_AUTO);

FP32 GEMM supports $5$ algorithms:

  • APE_ALGO_AUTO: Select the fastest algorithm without overflow.

  • APE_ALGO_AUTO_STRICT: Select the fastest algorithm without overflow and underflow.

  • APE_ALGO_FP32F: Use FP16 emulated FP32. (1-bit precision loss, narrow representation range, overflow may occur.)

  • APE_ALGO_FP32B: Use BF16 emulated FP32. (no precision loss, large representation range, overflow does not occur.)

  • APE_ALGO_FP32T: Use TF32 emulated FP32. (1-bit precision loss, large representation range, overflow does not occur.)

void apeGemmINT16(ApeTrans transa, ApeTrans transb, int m, int n, int k, const int16_t *alpha, const int16_t *A, int lda, const int16_t *B, int ldb, const int32_t *beta, int32_t *C, int ldc, ApeAlgo algo = APE_ALGO_AUTO);

INT16 GEMM supports $2$ algorithms:

  • APE_ALGO_AUTO: Select the algorithm without overflow.

  • APE_ALGO_INT16: Use INT8 emulate INT16. (The upper bound is $32639$. Native INT16's is $32767$. Overflow may occur.)

Authors

Citation

Ma, Zixuan, et al. "Efficiently emulating high-bitwidth computation with low-bitwidth hardware." Proceedings of the 36th ACM International Conference on Supercomputing. 2022.

If you find this work useful in your research, please cite it using the following BibTeX:

@inproceedings{ma2022efficiently,
author = {Ma, Zixuan and Wang, Haojie and Feng, Guanyu and Zhang, Chen and Xie, Lei and He, Jiaao and Chen, Shengqi and Zhai, Jidong},
title = {Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware},
year = {2022},
isbn = {9781450392815},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3524059.3532377},
doi = {10.1145/3524059.3532377},
booktitle = {Proceedings of the 36th ACM International Conference on Supercomputing},
articleno = {5},
numpages = {12},
keywords = {emulation, tensor core, domain specific accelerator},
location = {Virtual Event},
series = {ICS '22}
}

ape's People

Contributors

johndevostok avatar yz-chen18 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

mfkiwl zejia-lin

ape's Issues

where is the implementaion of section 5.3 of your paper in the project?

Excuse me,where could I find the implementaion of section 5.3 of your paper in the project?In section 5.3 of your paper,you said that you have built a model to predict which data types to choose and implement a lighter wrapper. But I can't find it in your project. So could it be possible to be opensource?
Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.