Giter VIP home page Giter VIP logo

simplenet's Introduction

SimpleNet: Lets Keep it simple, Using simple architectures to outperform deeper and more complex architectures.

This repository contains the architectures, Models, logs, etc pertaining to the SimpleNet Paper (Lets keep it simple: Using simple architectures to outperform deeper architectures ) : https://arxiv.org/abs/1608.06037

SimpleNet-V1 outperforms deeper and heavier architectures such as AlexNet, VGGNet,ResNet,GoogleNet,etc in a series of benchmark datasets, such as CIFAR10/100, MNIST, SVHN. It also achievs a higher accuracy (currently 60.97/83.54) in imagenet, more than AlexNet, NIN, Squeezenet, etc with only 5.4M parameters. Slimer versions of the architecture work very decently against more complex architectures such as ResNet and WRN as well.

The files are being uploaded/updated.

Results Overview :

Dataset Accuracy
Cifar10 94.75/95.32
CIFAR100 73.42/74.86
MNIST 99.75
SVHN 98.21
ImageNet 60.97/83.54

Comparison with other architectures

Table 1 showing different architectures statistics

Model AlexNet GoogleNet ResNet152 VGGNet16 NIN Ours
#Param 60M 7M 60M 138M 7.6M 5.4M
#OP 1140M 1600M 11300M 15740M 1100M 652M
Storage (MB) 217 51 230 512.24 29 20

Table 2 showing Top CIFAR10/100 results

Method #Params CIFAR10 CIFAR100
VGGNet(original 16L)[49] Enhanced 138m 91.4 / 92.45 -
ResNet-110L [4] / 1202L[4] * 1.7m/10.2m 93.57 / 92.07 74.84 / 72.18
Stochastic depth-110L [39] / 1202L[39] 1.7m/10.2m 94.77 / 95.09 75.42 / -
Wide Residual Net-(16/8)[31] / (28/10)[31] 11m/36m 95.19 / 95.83 77.11 / 79.5
Highway Network [30] - 92.40 67.76
FitNet [43] 1M 91.61 64.96
Fractional Max-pooling* (1 tests)[13]** 12M 95.50 73.61
Max-out(k=2)[12] 6M 90.62 65.46
Network in Network [38] 1M 91.19 64.32
Deeply Supervised Network[50] 1M 92.03 65.43
Max-out Network In Network [51] - 93.25 71.14
All you need is a good init (LSUV)[52] - 94.16 -
Our Architecture۞ 5.48m 94.75 -
Our Architecture ۩1 5.48m 95.32 73.42

*Note that the Fractional max pooling[13] uses deeper architectures and also uses extreme data augmentation.۞ means No zero-padding or normalization with dropout and۩means Standard data-augmentation- with dropout. To our knowledge, our architecture has the state of the art result, without aforementioned data-augmentations.

Table 3 showing MNIST results

Method Error rate
Regularization of Neural Networks using DropConnect [16]** 0.21%
Multi-column Deep Neural Networks for Image Classification [53]** 0.23%
APAC: Augmented Pattern Classification with Neural Networks [54]** 0.23%
Batch-normalized Max-out Network in Network [51] 0.24%
**Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree [55] ** 0.29%
Fractional Max-Pooling [13]** 0.32%
Max-out network (k=2) [12] 0.45%
Network In Network [38] 0.45%
Deeply Supervised Network[50] 0.39%
RCNN-96 [56] 0.31%
Our architecture * 0.25%

*Note that we didn't intend on achieving the state of the art performance here, as we already pointed out in the prior sections, we wanted to experiment if the architecture does perform nicely without any changes

**Achieved using an ensemble or extreme data-augmentation

Table 4 showing SVHN results

Method Error
Network in Network 2.35
Deeply Supervised Net 1.92
ResNet (reported by Huang et al. (2016)) 2.01
ResNet with Stochastic Depth 1.75
Wide ResNet 1.64
Our architecture 1.79

Table 5 showing ImageNet2012 results

Method T1/T5
AlexNet(60M) 57.2/80.3
Squeezenet(1.3M) 57.5/80.3
Network in Network(7.5M) 59.36/-
VGGNet16(138M) 70.5
GoogleNet(8M) 68.7
Wide ResNet(11.7M) 69.6/89.07
Our architecture* 60.97/83.54
                     \*Trained only for 300K (still training)

Table 6-Slimmed version Results on Different Datasets

Model Ours Maxout DSN ALLCNN dasNet ResNet(32) WRN NIN
#Param 310K 460K 6M 1M 1.3M 6M 475K 600K
CIFAR10 91.98 92.33 90.62 92.03 92.75 90.78 91.6 93.15
CIFAR100 64.** 68 ** ** 66.82 ** ** 65.46 ** ** 65.43 ** ** 66.29 ** ** 66.22 ** ** 67.37 ** ** 69.11 **
Other datasets Our result
MNIST(310K)* 99.72
SVHN(310K)* 97.63
ImageNet(310K) ** 37.34/63.4

*Since we presented their results in their respective sections, we avoided mentioning the results here again. **ImageNet result belongs to the latest iteration of a still ongoing training.

Cifar10 extended results:

Method Accuracy #Params
VGGNet(original 16L)****[49] 91.4 138m
**VGGNET(Enhanced)****[49] *** 92.45 138m
ResNet-110** [4]*** 93.57 1.7m
ResNet-1202** [4]** 92.07 10.2
Stochastic depth-110L** [39]** 94.77 1.7m
Stochastic depth-1202L** [39]** 95.09 10.2m
Wide Residual Net** [31]** 95.19 11m
Wide Residual Net** [31]** 95.83 36m
Highway Network** [30]** 92.40 -
FitNet** [43]** 91.61 1M
SqueezNet** [45]****-(tested by us)** 79.58 1.3M
ALLCNN** [44]** 92.75 -
Fractional Max-pooling* (1 tests)****[13] 95.50 12M
Max-out(k=2)****[12] 90.62 6M
Network in Network** [38]** 91.19 1M
Deeply Supervised Network** [50]** 92.03 1M
Batch normalized Max-out Network In Network** [51]** 93.25 -
All you need is a good init (LSUV)****[52] 94.16 -
Generalizing Pooling Functions in Convolutional Neural Networks: [55] 93.95 -
Spatially-sparse convolutional neural networks** [59]** 93.72 -
Scalable Bayesian Optimization Using Deep Neural Networks** [60]** 93.63 -
Recurrent Convolutional Neural Network for Object Recognition** [57]** 92.91 -
RCNN-160** [56]** 92.91 -
Our Architecture1 94.75 5.4
Our Architecture1 using data augmentation 95.32 5.4

CIFAR100 Extended results:

Method Accuracy
GoogleNet with ELU** [19]*** 75.72
Spatially-sparse convolutional neural networks** [59]** 75.7
Fractional Max-Pooling(12M) [13] 73.61
Scalable Bayesian Optimization Using Deep Neural Networks** [60]** 72.60
All you need is a good init** [52]** 72.34
Batch-normalized Max-out Network In Network(k=5)****[51] 71.14
Network in Network** [38]** 64.32
Deeply Supervised Network** [50]** 65.43
ResNet-110L** [4]** 74.84
ResNet-1202L** [4]** 72.18
WRN** [31]** 77.11/79.5
Highway** [30]** 67.76
FitNet** [43]** 64.96
Our Architecture 1 73.45
Our Architecture 2 74.86
  • Achieved using several data-augmentation tricks

Flops and Parameter Comparison:

MACC COMP ADD DIV EXP Activations Params SIZE(MB)
SimpleNet 652 M 0.838M 10 10 10 1 M 5 M 20.9
SqueezeNet 861 M 10 M 226K 1.51M 1K 13 M 1 M 4.7
Inception v4* 12270 M 21.9 M 5.34M 897K 1K 73 M 43 M 163
Inception v3* 5710 M 16.5 M 2.59M 1.71M 11K 33 M 24 M 91
Inception-ResNetv2* 9210 M 17.6 M 2.36M 1K 1K 74 M 32 M 210
ResNet-152 11300 M 22.33M 35.27M 22.03M 1K 100.26M 60.19M 230
ResNet-50 3870 M 10.9 M 1.62M 1.06M 1K 47 M 26 M 97.70
AlexNet 1140 M 1.77 M 478K 955K 478K 2 M 62 M 217.00
GoogleNet 1600 M 16.1 M 883K 166K 833K 10 M 7 M 22.82
Network in Network 1100 M 2.86 M 370K 1K 1K 3.8 M 8 M 29
VGG16 15740 M 19.7 M 1K 1K 1K 29 M 138 M 512.2

*Inception v3, v4 and Inception-ResNetV2 did not have any Caffe model, so we reported their size related information from mxnet2 and tensorflow3 respectively. Inception-ResNet-V2 would take 60 days of training with 2 Titan X to achieve the reported accuracy4

As it can be seen, our architecture both has much fewer number of parameters (with an exception of squeezenet) and also much fewer number of operations.

1

Data-augmentation method used by stochastic depth paper: https://github.com/Pastromhaug/caffe-stochastic-depth.

2

https://github.com/dmlc/mxnet-model-gallery/blob/master/imagenet-1k-inception-v3.md

3

https://github.com/tensorflow/models/tree/master/slim

4

https://github.com/revilokeb/inception\_resnetv2\_caffe

simplenet's People

Contributors

coderx7 avatar

Watchers

James Cloos avatar Bhavesh Jaiswal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.