Giter VIP home page Giter VIP logo

densenetcaffe's People

Contributors

liuzhuang13 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

densenetcaffe's Issues

Why the BatchNorm layer's lr_mult is 0?

Hi,liuzhuang13:
Thanks for your great work. However I don't understand that in the train_densenet.prototxt, why the BatchNorm layer's lr_mult and decay_mult are set to 0? Can you explain it?

Convolution layer setting when learn from scratch

Hi, I am using your prototxt to learn from scratch, instead of fine-tuning with ImageNet. The reason is that my dataset is totally different with ImageNet.

Looking at your prototxt, I found that your convolution layer does not have the lr_mult setting here

conv = L.Convolution(relu, kernel_size=ks, stride=stride, 
                    num_output=nout, pad=pad, bias_term=False, 
                    weight_filler=dict(type='msra'), bias_filler=dict(type='constant'))

I think it must be add the lr_mult as follows:

conv = L.Convolution(relu, kernel_size=ks, stride=stride, 
                    num_output=nout, pad=pad, bias_term=False, 
                    weight_filler=dict(type='msra'), bias_filler=dict(type='constant', value=0),
                    param=[dict(lr_mult=1, decay_mult=1))

Am I right?

Train problem

when I train this DenseNet using my dataset , I find some weight diff/data are nan. I don't know this problem how to solve this problem.
such as:weight diff/data:nan nan 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 nan

Is there deploy.prototxt available?

@liuzhuang13 @gaohuang Thanks for the great work. I want to apply the pre-trained model to my own dataset, but it achieved worse results than other architectures. I think I may generate the deploy.prototxt file wrong. Specifically, I don't know how to write batch_norm, scale, dropout layer in a deploy.prototxt to make sure it functions properly during test phase. Could you share your deploy.prototxt as a reference? I appreciate any help.

Trained Weights

Hi. First off, thank you for so promptly porting this into Caffe, I assume this was directly because of the request on the user group?

I was wondering if you have trained this network (in caffe) yet, and whether there are weights available?

number of neurons

hi, can you tell me how many number of neurons are there in each layer and how many layers are there in 1 dense block of densenet121 architecture

Number of outputs in the transition layer

Hello,
I am trying to understand how the number of outputs in the transition layer is being computed (in the 121, 169, 201 and 161 configurations). Looking at the python script for generating the architectures, it seems that there are some discrepancies - it uses only a single 3x3 conv layer in the dense block, while the provided prototxts use a 1x1 and a 3x3. Also the number of conv layers seems to be different - the script uses a constant number of conv layers (N), while the provided configurations use different one (e.g. 6, 12, 24, 16 for DenseNet-121).
If I follow the same approach as the script and just sum the number of outputs of all previous convolutional layers up until the transition layer I get a completely different number.

Out of Memory on 1080

Hi,

Have you tried running DenseNet on a GTX1080? I'm not able to load it even with batch size 1 as the GPU runs out of memory. Wondering if something may need to be tweaked in the Caffe implementation of the DenseNet.

out of memory

I built a densenet with default parameters (depth=40, batch-size=64 and 50), adapted the number of outputs to 3 for my dataset (160x160x3 px images) resulting in ~1mio parameters (which is not too big). When running the solver, I get an "error==cudaSuccess(2 vs. 0) out of memory" error on a Tesla K80.
Any ideas?

CaffeModel_Trained_on_ImageNet

Hi,Thanks for your sharing !
I noticed that your team have released the torch model trained on the ImageNet, Could you please release your caffe model trained on the ImageNet?

About number of feature map in first block and "conv" layer in BC model

Hi,
I read your code and I saw that the number of feature map before goes to first dense block is twice time of growth rate k. Can I choose another number like three times, four times...?

About number of "conv" layer, for example DenseNet-121 BC is 6,12,24,16. Do you have any rule/hint to design the number? What is happen if I choose these number equally?

Thanks in advance

What is meaning of 121 in the notation DenseNet-121?

Thank you for sharing nice work!
This is not bug. I just want to clarify my points

  1. In your Table 1, you used DenseNet-121, DenseNet-169...What does it means 121? How to compute it? If it is the depth of network, what is relationship with L term?
  2. In your solver.prototxt, why do you use so big learning rate? They often use very small learning rate (for Adam method) like 0.001, instead of 0.1. The reason is you used other method (Nesterov method), so you can you very high learning rate=0.1. Is it right?

Update: This is my solver using Adam method

train_net: "train_densenet_BC.prototxt"
display: 20
lr_policy: "step"
gamma: 0.1
stepsize: 20000
power: 0.75
# lr for normalized softmax
base_lr: 0.001
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 100000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "snapshot/train_dense"
type:"Adam"

Batch normalization with or without learned offset

Nice paper! I just have a minor detail question for reimplementing it.

In https://github.com/liuzhuang13/DenseNetCaffe/blob/master/make_densenet.py#L8, you use:
scale = L.Scale(batch_norm, bias_term=False, ...)
This would correspond to batch normalization with learned gamma, but without beta.
In https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua#L28, you use:
convFactory:add(cudnn.SpatialBatchNormalization(nChannels))
This includes a learnable beta. So I think the Caffe code needs to be adapted to match the Torch implementation.

On a side note, the convolutions (both in Caffe and Torch, if I see correctly) all have a bias term, but that will be rendered meaningless by the following batch normalization.

Theoretical questions about layers in dnn with batchnormalization using keras

Hi, I'm new here, I'm sorry also for my english.

I have some troubles to understand the models of DNN using batchnormalization, in specifique using keras. Can somebody explaind me the structure and content of each layer in this model that I built?

modelbatch = Sequential()
modelbatch.add(Dense(512, input_dim=1120))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(256))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('relu'))
modelbatch.add(Dropout(0.5))

modelbatch.add(Dense(num_classes))
modelbatch.add(BatchNormalization())
modelbatch.add(Activation('softmax'))
# Compile model
modelbatch.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
start = time.time()
model_info = modelbatch.fit(X_2, y_2, batch_size=500, \
                         epochs=20, verbose=2, validation_data=(X_test, y_test))
end = time.time()

This is, i think, all the layers of my model:

print(modelbatch.layers[0].get_weights()[0].shape)
(1120, 512)
print(modelbatch.layers[0].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[0].shape)
(512,)
print(modelbatch.layers[1].get_weights()[1].shape)
(512,)
print(modelbatch.layers[1].get_weights()[2].shape)
(512,)
print(modelbatch.layers[1].get_weights()[3].shape)
(512,)
print(modelbatch.layers[4].get_weights()[0].shape)
(512, 256)
print(modelbatch.layers[4].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[0].shape)
(256,)
print(modelbatch.layers[5].get_weights()[1].shape)
(256,)
print(modelbatch.layers[5].get_weights()[2].shape)
(256,)
print(modelbatch.layers[5].get_weights()[3].shape)
(256,)
print(modelbatch.layers[8].get_weights()[0].shape)
(256, 38)
print(modelbatch.layers[8].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[0].shape)
(38,)
print(modelbatch.layers[9].get_weights()[1].shape)
(38,)
print(modelbatch.layers[9].get_weights()[2].shape)
(38,)
print(modelbatch.layers[9].get_weights()[3].shape)
(38,)

I will appreciate your help, thanks in advance.

About the 3n+4

Hi Liu,

First congrads on the Best Paper Award.
I read the py codes of make_densenet.py, and I m pretty confused about the argument named depth

#change the line below to experiment with different setting
#depth -- must be 3n+4
#first_output -- #channels before entering the first dense block, set it to be comparable to growth_rate
#growth_rate -- growth rate
#dropout -- set to 0 to disable dropout, non-zero number to set dropout rate
def densenet(data_file, mode='train', batch_size=64, depth=40, first_output=16, growth_rate=12, dropout=0.2):
    data, label = L.Data(source=data_file, backend=P.Data.LMDB, batch_size=batch_size, ntop=2, 
              transform_param=dict(mean_value=128))

What's the meaning of this argument and why it must be 3*n+4, and what is n anyway?

The loss equaled to 87.3365 during the training stage and didn't change

I followed the instruction and didn't change the settings in solver.prototxt, but the loss converged to 87.3365 soon. It's said that this is because the learning rate is too large and the feature before the softmax layer equals to inf. So I am wondering what settings should I use with this network.
Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.