s9xie / dsn Goto Github PK

Deeply-supervised Nets

Home Page: http://vcl.ucsd.edu/~sxie/2014/09/12/dsn-project/

License: Other

Makefile 0.48% C++ 83.97% Shell 0.17% MATLAB 0.14% Python 11.64% Cuda 3.16% Protocol Buffer 0.45%

dsn's Introduction

Deeply-supervised Nets

Update For experiments on cifar-100 and SVHN, please use the same architecture as CIFAR-10 provided. This architecture/hyper-parameter settings can generalize pretty well and can achieve the numbers reported in the paper.

Also for Cifar-100 and SVHN, we used softmax losses instead of hinge losses for the output supervision and deep supervision. There are known convergence issues of hinge loss on > 10 classes in Caffe.

Please cite DSN in your publications if it helps your research:

[1] Deeply-Supervised Nets Chen-Yu Lee*, Saining Xie*, Patrick Gallagher, Zhengyou Zhang, Zhuowen Tu (* indicates equal contributions) In Proceedings of AISTATS 2015

If you have problems reproducing the experiments feel free to contact the authors.

Deeply Supervised Nets This DSN code is based on an older version of CAFFE framework. This is for reproducing the results reported in our paper. Though with minimum engineering efforts you can apply this idea to your own code, as well as new network architectures.

We preprocess the data following the methods used in maxout networks and network in network paper. Please find the details here. Basically we only did GCN (global contrast normalization) on the benchmark datasets. Note that the scale of the data is [0,1] instead of [0, 255]. This is a tricky part when you use your own data: You should tune the learning rate accordingly. Also tools/cifar-float-data might be useful if you want to generate your own leveldb database from the gcn processed data.

To reproduce the results more easily, you can also download our processed Leveldb files here

The configuration files are in the examples folder. You can run train_full.sh script and it will automatically complete the training. The final result may vary from machine to machine. For cifar-10 you should be able to get at least 9.65% error, please contact me if you have problem in reproducing this. Thank you.

dsn's People

Contributors

Stargazers

Watchers

dsn's Issues

Getting error while reproducing results from shared processed Leveldb files

I am using DSN on Ubuntu 12.04 with Cuda-5.5.

I am getting following error on running ./train_full.sh

$ ./train_full.sh
I0622 00:04:40.815856 28860 train_net.cpp:26] Starting Optimization
I0622 00:04:40.816721 28860 solver.cpp:26] Creating training net.
I0622 00:04:40.816889 28860 net.cpp:70] Creating Layer data
I0622 00:04:40.816900 28860 net.cpp:105] data -> data
I0622 00:04:40.816915 28860 net.cpp:105] data -> label
I0622 00:04:40.816928 28860 data_layer.cpp:148] Opening leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
F0622 00:04:40.816994 28860 data_layer.cpp:151] Check failed: status.ok() Failed to open leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
IO error: cifar10_gcn_padded-leveldb/cifar-train-leveldb/LOCK: No such file or directory
*** Check failure stack trace: ***
@ 0x7fec1db18b7d google::LogMessage::Fail()
@ 0x7fec1db1ac7f google::LogMessage::SendToLog()
@ 0x7fec1db1876c google::LogMessage::Flush()
@ 0x7fec1db1b51d google::LogMessageFatal::~LogMessageFatal()
@ 0x45d17f caffe::DataLayer<>::SetUp()
@ 0x43a434 caffe::Net<>::Init()
@ 0x43bb2a caffe::Net<>::Net()
@ 0x426c7c caffe::Solver<>::Solver()
@ 0x40e79f main
@ 0x7fec1b8607ed (unknown)
@ 0x40fdcd (unknown)
Aborted (core dumped)
I0622 00:04:40.889854 28863 finetune_net.cpp:25] Starting Optimization
I0622 00:04:40.890429 28863 solver.cpp:26] Creating training net.
I0622 00:04:40.890560 28863 net.cpp:70] Creating Layer data
I0622 00:04:40.890580 28863 net.cpp:105] data -> data
I0622 00:04:40.890600 28863 net.cpp:105] data -> label
I0622 00:04:40.890625 28863 data_layer.cpp:148] Opening leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
F0622 00:04:40.890717 28863 data_layer.cpp:151] Check failed: status.ok() Failed to open leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
IO error: cifar10_gcn_padded-leveldb/cifar-train-leveldb/LOCK: No such file or directory
*** Check failure stack trace: ***
@ 0x7fb874271b7d google::LogMessage::Fail()
@ 0x7fb874273c7f google::LogMessage::SendToLog()
@ 0x7fb87427176c google::LogMessage::Flush()
@ 0x7fb87427451d google::LogMessageFatal::~LogMessageFatal()
@ 0x45a63f caffe::DataLayer<>::SetUp()
@ 0x42b564 caffe::Net<>::Init()
@ 0x42cc5a caffe::Net<>::Net()
@ 0x435d7c caffe::Solver<>::Solver()
@ 0x40e79f main
@ 0x7fb871fb97ed (unknown)
@ 0x40fdfd (unknown)
Aborted (core dumped)
I0622 00:04:40.968624 28866 finetune_net.cpp:25] Starting Optimization
I0622 00:04:40.969331 28866 solver.cpp:26] Creating training net.
I0622 00:04:40.969472 28866 net.cpp:70] Creating Layer data
I0622 00:04:40.969487 28866 net.cpp:105] data -> data
I0622 00:04:40.969501 28866 net.cpp:105] data -> label
I0622 00:04:40.969517 28866 data_layer.cpp:148] Opening leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
F0622 00:04:40.969605 28866 data_layer.cpp:151] Check failed: status.ok() Failed to open leveldb cifar10_gcn_padded-leveldb/cifar-train-leveldb
IO error: cifar10_gcn_padded-leveldb/cifar-train-leveldb/LOCK: No such file or directory
*** Check failure stack trace: ***
@ 0x7fdde68fdb7d google::LogMessage::Fail()
@ 0x7fdde68ffc7f google::LogMessage::SendToLog()
@ 0x7fdde68fd76c google::LogMessage::Flush()
@ 0x7fdde690051d google::LogMessageFatal::~LogMessageFatal()
@ 0x45a63f caffe::DataLayer<>::SetUp()
@ 0x42b564 caffe::Net<>::Init()
@ 0x42cc5a caffe::Net<>::Net()
@ 0x435d7c caffe::Solver<>::Solver()
@ 0x40e79f main
@ 0x7fdde46457ed (unknown)
@ 0x40fdfd (unknown)
Aborted (core dumped)

Number of parameters seems larger than NIN paper unlike your paper statement?

It seems like you use the same skeleton with the NIN model bu the supervised layers. And any number is same like channel numbers and filter sizes. Even you introduces more params with supervised layers. however your paper states that number of parameters are arranged to keep the values comparable with the original NIN model.

Where to find the cifar-100 train/test/solver configure file

Would you please upload the cifar-100 net configure file? Thanks!!!

How to make DSN runnable with recent Caffe?

Where are the new code files added into caffe? How to make it runnable on the upate-to-date caffe? Thanks!

Theano code for DSN?

In the paper you mention using a Theano implementation of DSN for experiments on MNIST. Is this code available?

SVHN dataset, is there config file to train and test?

I want to reproduce the result for SVHN using Deeply Supervised Nets.
However, I am new to Caffee. Is there configuration file to train and test SVHN dataset?

Thank you.

Where is the parameter \gamma

In formulation (3), there is a factor \gamma. This parameter is setted to prevent the hinge loss to be 0. However, I haven't find this parameter in the code.
The 0 loss is quite common in deep learning and this phenomenon is usually called "overfit". In deep learning, people usually use dropout to prevent the loss from getting to zero too early.

About the "relu_cccp6"

I notice that you comment the "relu_cccp6".If I recomment the layer "relu_cccp6", the loss will always be 10 after every iteration,why?

release configuration of MNIST etc.

Hi, I reproduce the experiment and appreciate that DSN do a great job. I try a idea to improve the SVM and get a bit improvement on CIFAR10.
I am curious about whether it would work on other dataset under DSN framework.
Does the author has a future plan to release other dataset's(MNIST, SVHN, etc.) configurations?

Is SVM implemented?

Hi - I looked at the code and the ip_svm layer is simply an innerproduct.

Am I missing something - or SVM is not implemented here - thus just using a IP with a squared-hinge instead; with SGD maybe you consider that like an approximation of SVM?

On weights of the loss layers

I noticed that the weight of the loss layers are realised by setting the blob_lr parameter of the innerproduct layer in this project. This is equivalent to the formulation (3) for training the innerproduct layer. However, the gradients backpropagate to the bottom conv layer will not be influenced by the weight (0.001 in the prototxt file).
In another word, this realization just slowly learns the innerproduct layer of the previous SVMs, but applies the gradients of the classifiers equally to the nets. All SVMs have the same weight this way.
CAFFE has provided a param called "loss_weight", which is the correct method to realize the model described in the paper as far as I see.
This is all my opinion. If I were wrong, please reply me.

s9xie / dsn Goto Github PK

dsn's Introduction

Deeply-supervised Nets

dsn's People

Contributors

Stargazers

Watchers

Forkers

dsn's Issues

Recommend Projects

Recommend Topics

Recommend Org