liuzhuang13 / densenetcaffe Goto Github PK
View Code? Open in Web Editor NEWCaffe code for Densely Connected Convolutional Networks (DenseNets)
Caffe code for Densely Connected Convolutional Networks (DenseNets)
Hi,liuzhuang13:
Thanks for your great work. However I don't understand that in the train_densenet.prototxt, why the BatchNorm layer's lr_mult and decay_mult are set to 0? Can you explain it?
Hi, I am using your prototxt to learn from scratch, instead of fine-tuning with ImageNet. The reason is that my dataset is totally different with ImageNet.
Looking at your prototxt, I found that your convolution layer does not have the lr_mult
setting here
conv = L.Convolution(relu, kernel_size=ks, stride=stride,
num_output=nout, pad=pad, bias_term=False,
weight_filler=dict(type='msra'), bias_filler=dict(type='constant'))
I think it must be add the lr_mult as follows:
conv = L.Convolution(relu, kernel_size=ks, stride=stride,
num_output=nout, pad=pad, bias_term=False,
weight_filler=dict(type='msra'), bias_filler=dict(type='constant', value=0),
param=[dict(lr_mult=1, decay_mult=1))
Am I right?
when I train this DenseNet using my dataset , I find some weight diff/data are nan. I don't know this problem how to solve this problem.
such as:weight diff/data:nan nan 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 nan
what should I do, if I want to specify a layer name in make_densenet.py?
@liuzhuang13 @gaohuang Thanks for the great work. I want to apply the pre-trained model to my own dataset, but it achieved worse results than other architectures. I think I may generate the deploy.prototxt file wrong. Specifically, I don't know how to write batch_norm, scale, dropout layer in a deploy.prototxt to make sure it functions properly during test phase. Could you share your deploy.prototxt as a reference? I appreciate any help.
Hi. First off, thank you for so promptly porting this into Caffe, I assume this was directly because of the request on the user group?
I was wondering if you have trained this network (in caffe) yet, and whether there are weights available?
hi, can you tell me how many number of neurons are there in each layer and how many layers are there in 1 dense block of densenet121 architecture
Hello,
I am trying to understand how the number of outputs in the transition layer is being computed (in the 121, 169, 201 and 161 configurations). Looking at the python script for generating the architectures, it seems that there are some discrepancies - it uses only a single 3x3 conv layer in the dense block, while the provided prototxts use a 1x1 and a 3x3. Also the number of conv layers seems to be different - the script uses a constant number of conv layers (N), while the provided configurations use different one (e.g. 6, 12, 24, 16 for DenseNet-121).
If I follow the same approach as the script and just sum the number of outputs of all previous convolutional layers up until the transition layer I get a completely different number.
Hi,
Have you tried running DenseNet on a GTX1080? I'm not able to load it even with batch size 1 as the GPU runs out of memory. Wondering if something may need to be tweaked in the Caffe implementation of the DenseNet.
I built a densenet with default parameters (depth=40, batch-size=64 and 50), adapted the number of outputs to 3 for my dataset (160x160x3 px images) resulting in ~1mio parameters (which is not too big). When running the solver, I get an "error==cudaSuccess(2 vs. 0) out of memory" error on a Tesla K80.
Any ideas?
Hi,Thanks for your sharing !
I noticed that your team have released the torch model trained on the ImageNet, Could you please release your caffe model trained on the ImageNet?
Hi,
I read your code and I saw that the number of feature map before goes to first dense block is twice time of growth rate k. Can I choose another number like three times, four times...?
About number of "conv" layer, for example DenseNet-121 BC is 6,12,24,16. Do you have any rule/hint to design the number? What is happen if I choose these number equally?
Thanks in advance
Thank you for sharing nice work!
This is not bug. I just want to clarify my points
DenseNet-121
, DenseNet-169
...What does it means 121? How to compute it? If it is the depth of network, what is relationship with L
term?Update: This is my solver using Adam method
train_net: "train_densenet_BC.prototxt"
display: 20
lr_policy: "step"
gamma: 0.1
stepsize: 20000
power: 0.75
# lr for normalized softmax
base_lr: 0.001
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "snapshot/train_dense"
type:"Adam"
Nice paper! I just have a minor detail question for reimplementing it.
In https://github.com/liuzhuang13/DenseNetCaffe/blob/master/make_densenet.py#L8, you use:
scale = L.Scale(batch_norm, bias_term=False, ...)
This would correspond to batch normalization with learned gamma
, but without beta
.
In https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua#L28, you use:
convFactory:add(cudnn.SpatialBatchNormalization(nChannels))
This includes a learnable beta
. So I think the Caffe code needs to be adapted to match the Torch implementation.
On a side note, the convolutions (both in Caffe and Torch, if I see correctly) all have a bias term, but that will be rendered meaningless by the following batch normalization.
hi, do you have any plan of evaluate DenseNet on ImageNet classification task?
Could the author upload the pretained model on ImageNet?
I find it is hard to converge with my own data and initialization setting.
Thanks a lot.
I've been struggling to recreate your results for CIFAR-100, and am wondering if you could share how you achieved the results for the CIFAR-100 dataset (27% error rate, without augmentation).
Hi, I'm new here, I'm sorry also for my english.
I have some troubles to understand the models of DNN using batchnormalization, in specifique using keras. Can somebody explaind me the structure and content of each layer in this model that I built?
modelbatch = Sequential()
modelbatch.add(Dense(512, input_dim=1120))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))
modelbatch.add(Dense(256))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))
modelbatch.add(Dense(num_classes))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('softmax'))
# Compile model
modelbatch.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
start = time.time()
model_info = modelbatch.fit(X_2, y_2, batch_size=500, \
epochs=20, verbose=2, validation_data=(X_test, y_test))
end = time.time()
This is, i think, all the layers of my model:
print(modelbatch.layers[0].get_weights()[0].shape)
(1120, 512)
print(modelbatch.layers[0].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[0].shape)
(512,)
print(modelbatch.layers[1].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[2].shape)
(512,)
print(modelbatch.layers[1].get_weights()[3].shape)
(512,)
print(modelbatch.layers[4].get_weights()[0].shape)
(512, 256)
print(modelbatch.layers[4].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[0].shape)
(256,)
print(modelbatch.layers[5].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[2].shape)
(256,)
print(modelbatch.layers[5].get_weights()[3].shape)
(256,)
print(modelbatch.layers[8].get_weights()[0].shape)
(256, 38)
print(modelbatch.layers[8].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[0].shape)
(38,)
print(modelbatch.layers[9].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[2].shape)
(38,)
print(modelbatch.layers[9].get_weights()[3].shape)
(38,)
I will appreciate your help, thanks in advance.
Hi Liu,
First congrads on the Best Paper Award.
I read the py codes of make_densenet.py, and I m pretty confused about the argument named depth
#change the line below to experiment with different setting
#depth -- must be 3n+4
#first_output -- #channels before entering the first dense block, set it to be comparable to growth_rate
#growth_rate -- growth rate
#dropout -- set to 0 to disable dropout, non-zero number to set dropout rate
def densenet(data_file, mode='train', batch_size=64, depth=40, first_output=16, growth_rate=12, dropout=0.2):
data, label = L.Data(source=data_file, backend=P.Data.LMDB, batch_size=batch_size, ntop=2,
transform_param=dict(mean_value=128))
What's the meaning of this argument and why it must be 3*n+4, and what is n anyway?
What is the path of snapshot models? It isn't shown in the "solver.prototxt".
I followed the instruction and didn't change the settings in solver.prototxt, but the loss converged to 87.3365 soon. It's said that this is because the learning rate is too large and the feature before the softmax layer equals to inf. So I am wondering what settings should I use with this network.
Thanks a lot!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.