liuzhuang13 / densenetcaffe Goto Github PK

View Code? Open in Web Editor NEW

269.0 269.0 208.0 11 KB

Caffe code for Densely Connected Convolutional Networks (DenseNets)

Python 97.92% Shell 2.08%

densenetcaffe's People

Contributors

Stargazers

Watchers

Forkers

2php misc-git-forks reynoldscem zashani baiyancheng20 lhy20 caomw livinhome nrupatunga lyken17 tron19920125 sophieyliu xperzy echoorchid liangzimei ilovecv pchank 3dmm-icme2023 wanjinchang benjamesbabala yogsin chagge lyk125 absorbguo miaowu16 kevinmtian deepmusic tongcheng oppo62258801 mariusmez wind222 coderx7 wbfor huangr76 luan-g ngchc qoboty zhaoj9014 aojunzhou ml-lab feiwang2018 lji72 zgsxwsdxg zhetongliang dsbib yubozuzu123 walkoncross merlin2013 suixiaodan zdwong hulalazz 123chengbo xc35 afelio2 adhereyuyu lly2111101 zxp774747 fululiang lengzi jskdr chunfeima tonyfy dousong visionu rotorliu yingning chunfuchen soledad89 gewenpulan liuwenhaha dushoufu ylch hityzy jianweilin joefannie superhero1991 alexliyang guitaryourself gavinchan1105 hongzhenwang google1234 tanxin2017 hlqzc2008 xiaohujecky 1165048017 armstrongyang yohoho233 neuralnetworkingtechnologies dengshuo december-boy vincentgu11 wavelet303 af258963 amose-yao codersadis mvpduncan danielanojan grseb9s code-0x00 shubhampachori12110095

densenetcaffe's Issues

Why the BatchNorm layer's lr_mult is 0?

Hi,liuzhuang13:
Thanks for your great work. However I don't understand that in the train_densenet.prototxt, why the BatchNorm layer's lr_mult and decay_mult are set to 0? Can you explain it?

Convolution layer setting when learn from scratch

Hi, I am using your prototxt to learn from scratch, instead of fine-tuning with ImageNet. The reason is that my dataset is totally different with ImageNet.

Looking at your prototxt, I found that your convolution layer does not have the lr_mult setting here

conv = L.Convolution(relu, kernel_size=ks, stride=stride, 
                    num_output=nout, pad=pad, bias_term=False, 
                    weight_filler=dict(type='msra'), bias_filler=dict(type='constant'))

I think it must be add the lr_mult as follows:

conv = L.Convolution(relu, kernel_size=ks, stride=stride, 
                    num_output=nout, pad=pad, bias_term=False, 
                    weight_filler=dict(type='msra'), bias_filler=dict(type='constant', value=0),
                    param=[dict(lr_mult=1, decay_mult=1))

Am I right?

Train problem

when I train this DenseNet using my dataset , I find some weight diff/data are nan. I don't know this problem how to solve this problem.
such as:weight diff/data:nan nan 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 nan

Why not use BatchNorm in-place, any concern?

How to rename a layer

what should I do, if I want to specify a layer name in make_densenet.py?

Is there deploy.prototxt available?

@liuzhuang13 @gaohuang Thanks for the great work. I want to apply the pre-trained model to my own dataset, but it achieved worse results than other architectures. I think I may generate the deploy.prototxt file wrong. Specifically, I don't know how to write batch_norm, scale, dropout layer in a deploy.prototxt to make sure it functions properly during test phase. Could you share your deploy.prototxt as a reference? I appreciate any help.

Trained Weights

Hi. First off, thank you for so promptly porting this into Caffe, I assume this was directly because of the request on the user group?

I was wondering if you have trained this network (in caffe) yet, and whether there are weights available?

number of neurons

hi, can you tell me how many number of neurons are there in each layer and how many layers are there in 1 dense block of densenet121 architecture

Number of outputs in the transition layer

Hello,
I am trying to understand how the number of outputs in the transition layer is being computed (in the 121, 169, 201 and 161 configurations). Looking at the python script for generating the architectures, it seems that there are some discrepancies - it uses only a single 3x3 conv layer in the dense block, while the provided prototxts use a 1x1 and a 3x3. Also the number of conv layers seems to be different - the script uses a constant number of conv layers (N), while the provided configurations use different one (e.g. 6, 12, 24, 16 for DenseNet-121).
If I follow the same approach as the script and just sum the number of outputs of all previous convolutional layers up until the transition layer I get a completely different number.

The loss

Out of Memory on 1080

Hi,

Have you tried running DenseNet on a GTX1080? I'm not able to load it even with batch size 1 as the GPU runs out of memory. Wondering if something may need to be tweaked in the Caffe implementation of the DenseNet.

out of memory

I built a densenet with default parameters (depth=40, batch-size=64 and 50), adapted the number of outputs to 3 for my dataset (160x160x3 px images) resulting in ~1mio parameters (which is not too big). When running the solver, I get an "error==cudaSuccess(2 vs. 0) out of memory" error on a Tesla K80.
Any ideas?

CaffeModel_Trained_on_ImageNet

Hi,Thanks for your sharing !
I noticed that your team have released the torch model trained on the ImageNet, Could you please release your caffe model trained on the ImageNet?

About number of feature map in first block and "conv" layer in BC model

Hi,
I read your code and I saw that the number of feature map before goes to first dense block is twice time of growth rate k. Can I choose another number like three times, four times...?

About number of "conv" layer, for example DenseNet-121 BC is 6,12,24,16. Do you have any rule/hint to design the number? What is happen if I choose these number equally?

Thanks in advance

What is meaning of 121 in the notation DenseNet-121?

Thank you for sharing nice work!
This is not bug. I just want to clarify my points

In your Table 1, you used DenseNet-121, DenseNet-169...What does it means 121? How to compute it? If it is the depth of network, what is relationship with L term?
In your solver.prototxt, why do you use so big learning rate? They often use very small learning rate (for Adam method) like 0.001, instead of 0.1. The reason is you used other method (Nesterov method), so you can you very high learning rate=0.1. Is it right?

Update: This is my solver using Adam method

train_net: "train_densenet_BC.prototxt"
display: 20
lr_policy: "step"
gamma: 0.1
stepsize: 20000
power: 0.75
# lr for normalized softmax
base_lr: 0.001
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "snapshot/train_dense"
type:"Adam"

Batch normalization with or without learned offset

Nice paper! I just have a minor detail question for reimplementing it.

In https://github.com/liuzhuang13/DenseNetCaffe/blob/master/make_densenet.py#L8, you use:
scale = L.Scale(batch_norm, bias_term=False, ...)
This would correspond to batch normalization with learned gamma, but without beta.
In https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua#L28, you use:
convFactory:add(cudnn.SpatialBatchNormalization(nChannels))
This includes a learnable beta. So I think the Caffe code needs to be adapted to match the Torch implementation.

On a side note, the convolutions (both in Caffe and Torch, if I see correctly) all have a bias term, but that will be rendered meaningless by the following batch normalization.

evaluation on ImageNet

hi, do you have any plan of evaluate DenseNet on ImageNet classification task?

Could the author upload the pretained model on ImageNet?

Could the author upload the pretained model on ImageNet?
I find it is hard to converge with my own data and initialization setting.
Thanks a lot.

Question about hyper parameters for CIFAR-100 CNN

I've been struggling to recreate your results for CIFAR-100, and am wondering if you could share how you achieved the results for the CIFAR-100 dataset (27% error rate, without augmentation).

Theoretical questions about layers in dnn with batchnormalization using keras

Hi, I'm new here, I'm sorry also for my english.

I have some troubles to understand the models of DNN using batchnormalization, in specifique using keras. Can somebody explaind me the structure and content of each layer in this model that I built?

modelbatch = Sequential()
modelbatch.add(Dense(512, input_dim=1120))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(256))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(num_classes))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('softmax'))
# Compile model
modelbatch.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
start = time.time()
model_info = modelbatch.fit(X_2, y_2, batch_size=500, \
                         epochs=20, verbose=2, validation_data=(X_test, y_test))
end = time.time()

This is, i think, all the layers of my model:

print(modelbatch.layers[0].get_weights()[0].shape)
(1120, 512)
print(modelbatch.layers[0].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[0].shape)
(512,)
print(modelbatch.layers[1].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[2].shape)
(512,)
print(modelbatch.layers[1].get_weights()[3].shape)
(512,)
print(modelbatch.layers[4].get_weights()[0].shape)
(512, 256)
print(modelbatch.layers[4].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[0].shape)
(256,)
print(modelbatch.layers[5].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[2].shape)
(256,)
print(modelbatch.layers[5].get_weights()[3].shape)
(256,)
print(modelbatch.layers[8].get_weights()[0].shape)
(256, 38)
print(modelbatch.layers[8].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[0].shape)
(38,)
print(modelbatch.layers[9].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[2].shape)
(38,)
print(modelbatch.layers[9].get_weights()[3].shape)
(38,)

I will appreciate your help, thanks in advance.

About the 3n+4

Hi Liu,

First congrads on the Best Paper Award.
I read the py codes of make_densenet.py, and I m pretty confused about the argument named depth

#change the line below to experiment with different setting
#depth -- must be 3n+4
#first_output -- #channels before entering the first dense block, set it to be comparable to growth_rate
#growth_rate -- growth rate
#dropout -- set to 0 to disable dropout, non-zero number to set dropout rate
def densenet(data_file, mode='train', batch_size=64, depth=40, first_output=16, growth_rate=12, dropout=0.2):
    data, label = L.Data(source=data_file, backend=P.Data.LMDB, batch_size=batch_size, ntop=2, 
              transform_param=dict(mean_value=128))

What's the meaning of this argument and why it must be 3*n+4, and what is n anyway?

where save snapshot models?

What is the path of snapshot models? It isn't shown in the "solver.prototxt".

The loss equaled to 87.3365 during the training stage and didn't change

I followed the instruction and didn't change the settings in solver.prototxt, but the loss converged to 87.3365 soon. It's said that this is because the learning rate is too large and the feature before the softmax layer equals to inf. So I am wondering what settings should I use with this network.
Thanks a lot!