Giter VIP home page Giter VIP logo

convnet-benchmarks's Introduction

convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.

Work in progress! I am still working through each convolution module in each library, THIS IS NOT AN EXHAUSTIVE LIST!
  • After getting an initial baseline with the single module below (and getting inital benchmark scripts), I will benchmark a full AlexNet/MattNet/Overfeat

Machine: 6-core Intel i7-3930K @ 3.20GHz + NVIDIA Titan Black + Ubuntu 14.04 x86_64

###Spatial Convolution layer (3D input 3D output, densely connected)

forward + backprop (wrt input and weights)
Original Library Class/Function Benchmarked Total Time (ms) Total forward time (ms) Total backward time (ms) Peak Memory Formula Limitations
Theano (experimental)*** conv2d_fft 1178 304 874
Caffe ConvolutionLayer 1787 537 1250
cuda-convnet2 * ConvLayer 1818 416 1402
NVidia CuDNN * cudnn.SpatialConvolution 1861 513 1348
Torch-7 nn.SpatialConvolutionBHWD 1892 581 1311
Torch-7 SpatialConvolutionMM 1936 581 1355
Theano (experimental) CorrMM 2063 630 1433
cuda-convnet** pylearn2.cuda_convnet 3287 727 2560
ccv ccv_convnet_layer 809+bw 809
Theano (legacy) conv2d 70774 3833 66941
cherry-picking**** best per layer 985 191 794
  • * indicates that the library was tested with Torch bindings of the specific kernels.
  • ** indicates that the library was tested with Pylearn2 bindings.
  • *** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
  • **** The last row shows results obtainable when choosing the best-performing library for each layer.
  • L1 - Input: 128x128 Batch-size 128, Feature maps: 3->96, Kernel Size: 11x11, Stride: 1x1
  • L2 - Input: 64x64 Batch-size 128, Feature maps: 64->128, Kernel Size: 9x9, Stride: 1x1
  • L3 - Input: 32x32 Batch-size 128, Feature maps: 128->128, Kernel Size: 9x9, Stride: 1x1
  • L4 - Input: 16x16 Batch-size 128, Feature maps: 128->128, Kernel Size: 7x7, Stride: 1x1
  • L5 - Input: 13x13 Batch-size 128, Feature maps: 384->384, Kernel Size: 3x3, Stride: 1x1
  • The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5)

#####Breakdown

forward

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total
Theano (experimental)*** conv2d_fft 138 73 30 9 39 304
Caffe ConvolutionLayer<Dtype> 100 205 158 35 39 537
cuda-convnet2 * ConvLayer 63 241 86 9 17 416
NVidia CuDNN cudnn.SpatialConvolution 94 274 101 12 32 513
Torch-7 nn.SpatialConvolutionBHWD 182 279 94 11 15 581
Torch-7 nn.SpatialConvolutionMM 105 239 168 32 37 581
Theano (experimental) CorrMM 100 251 197 38 44 630
cuda-convnet** pylearn2.cuda_convnet 92 412 159 19 45 727
ccv ccv_convnet_layer 121 437 182 23 44 809
Theano (legacy) conv2d 408 2310 739 99 277 3833
cherry-picking**** best per layer 63 72 30 9 17 191
backward (gradInput + gradWeight)

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total
Theano (experimental)*** conv2d_fft 449 218 89 28 90 874
Caffe ConvolutionLayer<Dtype> 307 599 242 42 60 1250
cuda-convnet2 * ConvLayer 586 570 190 19 37 1402
NVidia CuDNN cudnn.SpatialConvolution 226 736 297 32 57 1348
Torch-7 nn.SpatialConvolutionBHWD 513 562 187 21 28 1311
Torch-7 nn.SpatialConvolutionMM 301 673 270 47 64 1355
Theano (experimental) CorrMM 282 733 295 51 72 1433
cuda-convnet** pylearn2.cuda_convnet 618 1305 473 50 114 2560
ccv ccv_convnet_layer
Theano (legacy) conv2d 53997 9752 2202 299 691 66941
cherry-picking**** best per layer 285 337 118 17 37 794
gradInput

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total
Theano (experimental)*** conv2d_fft 250 111 54 19 48 482
Caffe ConvolutionLayer<Dtype> 86 271 120 20 26 523
cuda-convnet2 * ConvLayer 131 230 82 8 16 467
Theano (experimental) CorrMM 87 328 142 25 31 613
NVidia CuDNN cudnn.SpatialConvolution 111 421 180 17 21 750
Torch-7 nn.SpatialConvolutionBHWD 276 277 102 11 14 680
Torch-7 nn.SpatialConvolutionMM 91 302 129 23 27 572
cuda-convnet** pylearn2.cuda_convnet 155 647 230 23 47 1102
ccv ccv_convnet_layer
Theano (legacy) conv2d 53340 2690 1044 171 406 57651
cherry-picking**** best per layer 86 230 82 8 16 422
gradWeights

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total
Theano (experimental)*** conv2d_fft 199 107 35 9 42 392
Caffe ConvolutionLayer<Dtype> 221 328 122 22 34 727
cuda-convnet2 * ConvLayer 455 340 108 11 21 935
NVidia CuDNN cudnn.SpatialConvolution 115 315 117 15 36 598
Torch-7 nn.SpatialConvolutionBHWD 237 285 85 10 14 631
Torch-7 nn.SpatialConvolutionMM 210 371 141 24 37 783
Theano (experimental) CorrMM 195 405 153 26 41 820
cuda-convnet** pylearn2.cuda_convnet 463 658 243 27 67 2069
ccv ccv_convnet_layer
Theano (legacy) conv2d 657 7062 1158 128 285 9290
cherry-picking**** best per layer 199 107 36 9 21 372

convnet-benchmarks's People

Contributors

f0k avatar liuliu avatar nicholas-leonard avatar nouiz avatar soumith avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.