Giter VIP home page Giter VIP logo

summer-research's Introduction

Speed Up Neural Network Inference

In this work, I implement Sparse-sparse Convolution in caffe, and test it with some famous neural network to see the acceleration.
The experimental result shows that this method leads to good speedup for some layers with sparse inputs and weights compared to traditional convolution method, and can reach the speed of Sparse-matrix-dense-matrix Multiplication (SpMDM), which is the state of the art speedup now. To reduce matrix format conversion penalty, I modify this algorithm and proposed an algorithm which outputs matrix in sparse format.

Usage

Replace the caffe/src/caffe/layers/base_conv_layer.cpp file with my modified file and re-compile caffe.

Algorithm

Result

We have implemented the Sparse-sparse Convolution in caffe interface and test the speedup on some convolution layers of Caffenet and Resnet50. The trained weights can be easily download at SkimCaffe. The test input are 1000 images download at ImageNet of 20 different classes. The test is done on Ubuntu18.04 with CPU Intel Core i5-8250U. The result of Caffenet and Resnet-50 are shown below. The red cylinder part represents the time of converting input matrix to CSR format, and the blue cylinder part represents the time of Sparse-sparse Matrix Multiplication. The green cylinder represents the origin time of default convolution. The purple cylinder represents the time of SpMDM.

From the result we can find:

  • Different layer architecture consult in different speed in Sparse-sparse Convolution. The higher the ratio of weight size to input size, the faster the compute.

  • Cause the large input size and low sparsity of Caffenet convolution layers, the algorithm even slows down the convolution. The penalty of input2CSR stunts the performance.

  • Sparse-sparse Convolution achieves good speed up on the last few layers of Resnet-50. Cause the network is deep enough, the input matrices reach high sparsity. The speedup reaches 2.92×∼3.88×, which is similar to the state-of-art speed. However, the input2CSR part is still a big penalty.

summer-research's People

Contributors

allenfenglei avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.