Giter VIP home page Giter VIP logo

fast-mobilenetv2's Introduction

Fast-MobileNetV2

Optimized CUDA Kernels for Fast MobileNetV2 Inference

Develop Steps

  • ① Implement MobileNetV2 with PyTorch, and parse the given ONNX model with Python to analyze the network structure. --- mobilenet_v2/nn/onnx/
  • ② Implement MobileNetV2 with C++ (only sequential layer structures and weights, no forward computation), and parse the given ONNX model with Python to extract the weights. --- mobilenet_v2/nn/
  • ③ Implement wrappers and tests for cuDNN/cuBLAS primitives: Conv, Gemm, and Pool. --- mobilenet_v2/cudnn/
    • Here, Gemm can be implemented using cuBLAS, or seen as 1x1 Conv2d using cuDNN, we take the former way)
  • ④ Implement cuDNN-accelerated MobileNetV2 with wrappers and C++ network implemented above. --- mobilenet_v2/cudnn/
  • ⑤ Implement and optimize CUDA kernels: Conv, Gemm, and Pool. --- mobilenet_v2/fast_mobilenet/
    • Here, Conv can be implemented using Im2Col + Gemm, or Winograd Algorithm (we only implemented the former)
  • ⑥ Implement our Fast-MobileNetV2 as a whole. --- mobilenet_v2/fast_mobilenet/
  • ⑦ Compare and Optimize: e.g. parameters tuning, model-specific / hardware-specific optimization, ...

Test Steps

nn

  • Re-implement MobileNetV2 ONNX model with PyTorch and test inference:

    (conda) >> cd mobilenet_v2/nn/onnx/
    (conda) >> python pytorchMobileNetV2.py
  • Save weights in MobileNetV2 ONNX model to plain-text files:

    (conda) >> cd mobilenet_v2/nn/weights/
    (conda) >> python save_weights.py
  • Show MobileNetV2 topology in C++ and check loaded weights:

    >> cd mobilenet_v2/nn/examples/
    >> make show
    >> ./show.out
    >> make check
    >> ./check.out

cudnn

  • Show version of CUDA and CUDNN:

    >> cd mobilenet_v2/cudnn/
    >> bash version.sh
  • Operator tests:

    >> cd mobilenet_v2/cudnn/tests/test_op/
    >> make
    >> ./testConv.o
    >> ./testGemm.o
    >> ./testPool.o
    >> ./testAdd.o
  • Network test:

    (conda) >> cd mobilenet_v2/cudnn/tests/test_net/
    (conda) >> python generate_data.py
    (conda) >> conda deactivate
    >> make
    >> ./testCudnnMobileNetV2.o
    >> source ~/.bashrc
    (conda) >> python compare_cudnn_onnx.py

our kernels

  • Operator tests:

    >> cd mobilenet_v2/fast_mobilenet/tests/test_op/
    >> make
    >> ./testConv.o
    >> ./testGemm.o
    >> ./testPool.o
    >> ./testAdd.o
    >> ./testIm2Col.o
  • Network test:

    (conda) >> cd mobilenet_v2/fast_mobilenet/tests/test_net/
    (conda) >> python generate_data.py
    (conda) >> conda deactivate
    >> make
    >> ./testFastMobileNetV2.o
    >> source ~/.bashrc
    (conda) >> python compare_fast_onnx.py

Test Environment

  • NVIDIA Tesla V100 GPU
  • CUDA version 10.2.89
  • CUDNN version 8.2.4
  • Run Python source of this repo in an Anaconda environment, and we have Python version 3.9.7
  • Do NOT Run CUDA source of this repo in an Anaconda environment

Tech Stack

Reference

[1] Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 2018.

[2] NVIDIA Corporation. "NVIDIA cuDNN Documentation." available at: https://docs.nvidia.com/deeplearning/cudnn/api/index.html

[3] NVIDIA Corporation. "NVIDIA cuBLAS Documentation." available at: https://docs.nvidia.com/cuda/cublas/index.html

[4] Lavin, Andrew, and Scott Gray. "Fast algorithms for convolutional neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 2016.

[5] Mark Harris. "CUDA Pro Tip: Write Flexible Kernels with Grid-Stride Loops." available at: https://developer.nvidia.com/blog/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/

[6] Mark Harris. "Optimizing Parallel Reduction in CUDA." available at: https://vuduc.org/teaching/cse6230-hpcta-fa12/slides/cse6230-fa12--05b-reduction-notes.pdf

fast-mobilenetv2's People

Contributors

ljh2000 avatar zhliuworks avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.