Would you please share the configuration for the best ResNet50 model?

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yes. it is actually decribed in <a href="https://github.com/clovaai/assembled-cnn/blob

Thank you for pointing me to the . I am trying to reimplement this in PyTorch an

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Assembled ResNet50 Configuration about assembled-cnn HOT 9 CLOSED

clovaai commented on June 1, 2024

Assembled ResNet50 Configuration

from assembled-cnn.

Comments (9)

hussam789 commented on June 1, 2024 3

@rwightman You can try SGE-Net (the same authors of SK-Net), it seems to have fewer parameters compared to SK-NET

Implementation can be found here

from assembled-cnn.

sweaterr commented on June 1, 2024 1

First, use_resnet_d==False is right.
We have implemented BigLittleNet with reference to the official implementation of BigLittleNet
We found that BigLittleNet's official implementation already includes the concept of resnet-d.
that is, in both resnet_d_projection_shortcut and bl_projection_shortcut, a average
pooling layer has been added with a stride of 2 before the convolution(except pooling size is different).
So we described it in the paper as D + BL.
However, when using BL, we did not use tweak that replaces 7x7 convolution with three 3x3 conv(so it become use_resnet_d=False) because it made training unstable.
I thought it was a little tricky. We will further explain it in the v2 version of our paper.

Second, I think you are right. The # of LittleBranch should be three in the picture.
Thank you for finding it. We will correct this in v2 of the paper.

I am very grateful for your interest on our work.
If you have any further questions please feel free to ask. 😄

from assembled-cnn.

rwightman commented on June 1, 2024 1

@sweaterr thanks for the quick response, I haven't trained it yet, that'll take some time so I'm double checking to make sure I have things right first.

The flop increase is roughly in line with the parameter increase I see, so it's likely correct. I already noticed the r=2, thanks.

It'd be interesting to run a vanilla ResNet101 or ResNet101-D without SK and BL but with all of your training time techniques and see where that ends up, its throughput is higher than R50D+SK. I may run that experiment myself once I get everything else working...

from assembled-cnn.

sweaterr commented on June 1, 2024 1

@rwightman We looked more closely at SK's FLOPS and # of params to see why our implementation had high FLOPS and # of params than the SGE paper that @hussam789 point out .
According to the SGE paper's implementation of SK, group convolutions have been applied to an additional 3x3 kernel in SK module, however we apply the normal convs
It seems that group convolution has greatly reduced the number of FLOPS and params than ours.
We will also do additional experiments to see if there is a performance improvement if we change our implementation from normal conv to group convolution.

from assembled-cnn.

rwightman commented on June 1, 2024 1

@sweaterr yeah, the original SKNet50 (paper and caffe impl) is closer to a ResNeXt-50 as it has groups=32 for both 3x3 SK convs. Not sure why, but the SK in the SGE repo has one 3x3 grouped and the other not.

from assembled-cnn.

sweaterr commented on June 1, 2024

Yes. it is actually decribed in https://github.com/clovaai/assembled-cnn/blob/master/scripts/train_assemble_from_scratch.sh

Let me know if you have more question, thanks!

from assembled-cnn.

vikramkalabi commented on June 1, 2024

Thank you for pointing me to the script. I am trying to reimplement this in PyTorch and I am not well versed in TensorFlow, so please cut me some slack if I have not correctly followed your code.

After looking at the script and comparing it with the paper, I want to know whether this script (config) yields the best model in terms of structure stated in the paper.

This set of flags determines the structure of the network according to the script.

--resnet_version=2 \
--resnet_size=50 \
--use_sk_block=True \
--anti_alias_type=sconv \
--anti_alias_filter_size=3 \
--use_dropblock=True \

By looking the default values in the code, I see use_resnet_d is False, however in the section 3.1 of the paper, you have mentioned that ResNet-D and SK apply to all blocks in all stages. Should use_resnet_d be set to True?

Another question is about the little branch from each of the stages. Looking at the network architecture I see there are only two little branches. However, when I follow your code I see that the first three stages share the common set of arguments except for the input and output number of filters. So should there be one more little and merge layer in the architecture depicted?

This is some excellent work you have done. Thanks for sharing.

from assembled-cnn.

rwightman commented on June 1, 2024

@sweaterr hello, my question is also regarding the ResNet50 config you've used. I've got a PyTorch impl of SKNets working... with DropBlock, no BL yet. With a config based off what I've understood from your paper and this impl, the ResNet50 or ResNet50-D is coming out at around 37-38M parameters. That's closer to a ResNet101 than a ResNet50, is this correct?

from assembled-cnn.

sweaterr commented on June 1, 2024

@rwightman Hi, I have never counted the number of parameters in each model.
But I guess that our setting r=2 of SK block (I remember r=16 in original SK paper) can increase the number of parameters and FLOPS.

We also observed that our implementation of the SK block increases FLOPS a lot.
it is described in the appendix of the paper.

So if your implementation's top1 acc is the same as the paper, I think your implementation is correct.
Thank you for your interest in our work and feedback. It allows us to think about what we are missing.

from assembled-cnn.

Assembled ResNet50 Configuration about assembled-cnn HOT 9 CLOSED

Comments (9)

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent