Giter VIP home page Giter VIP logo

dsn's Introduction

Deeply-supervised Nets

Update For experiments on cifar-100 and SVHN, please use the same architecture as CIFAR-10 provided. This architecture/hyper-parameter settings can generalize pretty well and can achieve the numbers reported in the paper.

Also for Cifar-100 and SVHN, we used softmax losses instead of hinge losses for the output supervision and deep supervision. There are known convergence issues of hinge loss on > 10 classes in Caffe.

Please cite DSN in your publications if it helps your research:

[1] Deeply-Supervised Nets Chen-Yu Lee*, Saining Xie*, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu (* indicates equal contributions) In Proceedings of AISTATS 2015

If you have problems reproducing the experiments feel free to contact the authors.

Deeply Supervised Nets This DSN code is based on an older version of CAFFE framework. This is for reproducing the results reported in our paper. Though with minimum engineering efforts you can apply this idea to your own code, as well as new network architectures.

We preprocess the data following the methods used in maxout networks and network in network paper. Please find the details here. Basically we only did GCN (global contrast normalization) on the benchmark datasets. Note that the scale of the data is [0,1] instead of [0, 255]. This is a tricky part when you use your own data: You should tune the learning rate accordingly. Also tools/cifar-float-data might be useful if you want to generate your own leveldb database from the gcn processed data.

To reproduce the results more easily, you can also download our processed Leveldb files here

The configuration files are in the examples folder. You can run train_full.sh script and it will automatically complete the training. The final result may vary from machine to machine. For cifar-10 you should be able to get at least 9.65% error, please contact me if you have problem in reproducing this. Thank you.

dsn's People

Contributors

s9xie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dsn's Issues

Getting error while reproducing results from shared processed Leveldb files

I am using DSN on Ubuntu 12.04 with Cuda-5.5.

I am getting following error on running ./train_full.sh

$ ./train_full.sh
I0622 00:04:40.815856 28860 train_net.cpp:26] Starting Optimization
I0622 00:04:40.816721 28860 solver.cpp:26] Creating training net.
I0622 00:04:40.816889 28860 net.cpp:70] Creating Layer data
I0622 00:04:40.816900 28860 net.cpp:105] data -> data
I0622 00:04:40.816915 28860 net.cpp:105] data -> label
I0622 00:04:40.816928 28860 data_layer.cpp:148] Opening leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
F0622 00:04:40.816994 28860 data_layer.cpp:151] Check failed: status.ok() Failed to open leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
IO error: cifar10_gcn_padded-leveldb/cifar-train-leveldb/LOCK: No such file or directory
*** Check failure stack trace: ***
@ 0x7fec1db18b7d google::LogMessage::Fail()
@ 0x7fec1db1ac7f google::LogMessage::SendToLog()
@ 0x7fec1db1876c google::LogMessage::Flush()
@ 0x7fec1db1b51d google::LogMessageFatal::~LogMessageFatal()
@ 0x45d17f caffe::DataLayer<>::SetUp()
@ 0x43a434 caffe::Net<>::Init()
@ 0x43bb2a caffe::Net<>::Net()
@ 0x426c7c caffe::Solver<>::Solver()
@ 0x40e79f main
@ 0x7fec1b8607ed (unknown)
@ 0x40fdcd (unknown)
Aborted (core dumped)
I0622 00:04:40.889854 28863 finetune_net.cpp:25] Starting Optimization
I0622 00:04:40.890429 28863 solver.cpp:26] Creating training net.
I0622 00:04:40.890560 28863 net.cpp:70] Creating Layer data
I0622 00:04:40.890580 28863 net.cpp:105] data -> data
I0622 00:04:40.890600 28863 net.cpp:105] data -> label
I0622 00:04:40.890625 28863 data_layer.cpp:148] Opening leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
F0622 00:04:40.890717 28863 data_layer.cpp:151] Check failed: status.ok() Failed to open leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
IO error: cifar10_gcn_padded-leveldb/cifar-train-leveldb/LOCK: No such file or directory
*** Check failure stack trace: ***
@ 0x7fb874271b7d google::LogMessage::Fail()
@ 0x7fb874273c7f google::LogMessage::SendToLog()
@ 0x7fb87427176c google::LogMessage::Flush()
@ 0x7fb87427451d google::LogMessageFatal::~LogMessageFatal()
@ 0x45a63f caffe::DataLayer<>::SetUp()
@ 0x42b564 caffe::Net<>::Init()
@ 0x42cc5a caffe::Net<>::Net()
@ 0x435d7c caffe::Solver<>::Solver()
@ 0x40e79f main
@ 0x7fb871fb97ed (unknown)
@ 0x40fdfd (unknown)
Aborted (core dumped)
I0622 00:04:40.968624 28866 finetune_net.cpp:25] Starting Optimization
I0622 00:04:40.969331 28866 solver.cpp:26] Creating training net.
I0622 00:04:40.969472 28866 net.cpp:70] Creating Layer data
I0622 00:04:40.969487 28866 net.cpp:105] data -> data
I0622 00:04:40.969501 28866 net.cpp:105] data -> label
I0622 00:04:40.969517 28866 data_layer.cpp:148] Opening leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
F0622 00:04:40.969605 28866 data_layer.cpp:151] Check failed: status.ok() Failed to open leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
IO error: cifar10_gcn_padded-leveldb/cifar-train-leveldb/LOCK: No such file or directory
*** Check failure stack trace: ***
@ 0x7fdde68fdb7d google::LogMessage::Fail()
@ 0x7fdde68ffc7f google::LogMessage::SendToLog()
@ 0x7fdde68fd76c google::LogMessage::Flush()
@ 0x7fdde690051d google::LogMessageFatal::~LogMessageFatal()
@ 0x45a63f caffe::DataLayer<>::SetUp()
@ 0x42b564 caffe::Net<>::Init()
@ 0x42cc5a caffe::Net<>::Net()
@ 0x435d7c caffe::Solver<>::Solver()
@ 0x40e79f main
@ 0x7fdde46457ed (unknown)
@ 0x40fdfd (unknown)
Aborted (core dumped)

Number of parameters seems larger than NIN paper unlike your paper statement?

It seems like you use the same skeleton with the NIN model bu the supervised layers. And any number is same like channel numbers and filter sizes. Even you introduces more params with supervised layers. however your paper states that number of parameters are arranged to keep the values comparable with the original NIN model.

Theano code for DSN?

In the paper you mention using a Theano implementation of DSN for experiments on MNIST. Is this code available?

Where is the parameter \gamma

In formulation (3), there is a factor \gamma. This parameter is setted to prevent the hinge loss to be 0. However, I haven't find this parameter in the code.
The 0 loss is quite common in deep learning and this phenomenon is usually called "overfit". In deep learning, people usually use dropout to prevent the loss from getting to zero too early.

About the "relu_cccp6"

I notice that you comment the "relu_cccp6".If I recomment the layer "relu_cccp6", the loss will always be 10 after every iteration,why?

release configuration of MNIST etc.

Hi, I reproduce the experiment and appreciate that DSN do a great job. I try a idea to improve the SVM and get a bit improvement on CIFAR10.
I am curious about whether it would work on other dataset under DSN framework.
Does the author has a future plan to release other dataset's(MNIST, SVHN, etc.) configurations?

Is SVM implemented?

Hi - I looked at the code and the ip_svm layer is simply an innerproduct.

Am I missing something - or SVM is not implemented here - thus just using a IP with a squared-hinge instead; with SGD maybe you consider that like an approximation of SVM?

On weights of the loss layers

I noticed that the weight of the loss layers are realised by setting the blob_lr parameter of the innerproduct layer in this project. This is equivalent to the formulation (3) for training the innerproduct layer. However, the gradients backpropagate to the bottom conv layer will not be influenced by the weight (0.001 in the prototxt file).
In another word, this realization just slowly learns the innerproduct layer of the previous SVMs, but applies the gradients of the classifiers equally to the nets. All SVMs have the same weight this way.
CAFFE has provided a param called "loss_weight", which is the correct method to realize the model described in the paper as far as I see.
This is all my opinion. If I were wrong, please reply me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.