Giter VIP home page Giter VIP logo

Comments (9)

hussam789 avatar hussam789 commented on June 1, 2024 3

@rwightman You can try SGE-Net (the same authors of SK-Net), it seems to have fewer parameters compared to SK-NET
image

Implementation can be found here

from assembled-cnn.

sweaterr avatar sweaterr commented on June 1, 2024 1

First, use_resnet_d==False is right.
We have implemented BigLittleNet with reference to the official implementation of BigLittleNet
We found that BigLittleNet's official implementation already includes the concept of resnet-d.
that is, in both resnet_d_projection_shortcut and bl_projection_shortcut, a average
pooling layer has been added with a stride of 2 before the convolution(except pooling size is different).
So we described it in the paper as D + BL.
However, when using BL, we did not use tweak that replaces 7x7 convolution with three 3x3 conv(so it become use_resnet_d=False) because it made training unstable.
I thought it was a little tricky. We will further explain it in the v2 version of our paper.

Second, I think you are right. The # of LittleBranch should be three in the picture.
Thank you for finding it. We will correct this in v2 of the paper.

I am very grateful for your interest on our work.
If you have any further questions please feel free to ask. 😄

from assembled-cnn.

rwightman avatar rwightman commented on June 1, 2024 1

@sweaterr thanks for the quick response, I haven't trained it yet, that'll take some time so I'm double checking to make sure I have things right first.

The flop increase is roughly in line with the parameter increase I see, so it's likely correct. I already noticed the r=2, thanks.

It'd be interesting to run a vanilla ResNet101 or ResNet101-D without SK and BL but with all of your training time techniques and see where that ends up, its throughput is higher than R50D+SK. I may run that experiment myself once I get everything else working...

from assembled-cnn.

sweaterr avatar sweaterr commented on June 1, 2024 1

@rwightman We looked more closely at SK's FLOPS and # of params to see why our implementation had high FLOPS and # of params than the SGE paper that @hussam789 point out .
According to the SGE paper's implementation of SK, group convolutions have been applied to an additional 3x3 kernel in SK module, however we apply the normal convs
It seems that group convolution has greatly reduced the number of FLOPS and params than ours.
We will also do additional experiments to see if there is a performance improvement if we change our implementation from normal conv to group convolution.

from assembled-cnn.

rwightman avatar rwightman commented on June 1, 2024 1

@sweaterr yeah, the original SKNet50 (paper and caffe impl) is closer to a ResNeXt-50 as it has groups=32 for both 3x3 SK convs. Not sure why, but the SK in the SGE repo has one 3x3 grouped and the other not.

from assembled-cnn.

sweaterr avatar sweaterr commented on June 1, 2024

Yes. it is actually decribed in https://github.com/clovaai/assembled-cnn/blob/master/scripts/train_assemble_from_scratch.sh

Let me know if you have more question, thanks!

from assembled-cnn.

vikramkalabi avatar vikramkalabi commented on June 1, 2024

Thank you for pointing me to the script. I am trying to reimplement this in PyTorch and I am not well versed in TensorFlow, so please cut me some slack if I have not correctly followed your code.

After looking at the script and comparing it with the paper, I want to know whether this script (config) yields the best model in terms of structure stated in the paper.

This set of flags determines the structure of the network according to the script.

--resnet_version=2 \
--resnet_size=50 \
--use_sk_block=True \
--anti_alias_type=sconv \
--anti_alias_filter_size=3 \
--use_dropblock=True \

By looking the default values in the code, I see use_resnet_d is False, however in the section 3.1 of the paper, you have mentioned that ResNet-D and SK apply to all blocks in all stages. Should use_resnet_d be set to True?

Screenshot 2020-02-01 at 11 58 52 PM

Another question is about the little branch from each of the stages. Looking at the network architecture I see there are only two little branches. However, when I follow your code I see that the first three stages share the common set of arguments except for the input and output number of filters. So should there be one more little and merge layer in the architecture depicted?

Screenshot 2020-02-02 at 12 12 00 AM

This is some excellent work you have done. Thanks for sharing.

from assembled-cnn.

rwightman avatar rwightman commented on June 1, 2024

@sweaterr hello, my question is also regarding the ResNet50 config you've used. I've got a PyTorch impl of SKNets working... with DropBlock, no BL yet. With a config based off what I've understood from your paper and this impl, the ResNet50 or ResNet50-D is coming out at around 37-38M parameters. That's closer to a ResNet101 than a ResNet50, is this correct?

from assembled-cnn.

sweaterr avatar sweaterr commented on June 1, 2024

@rwightman Hi, I have never counted the number of parameters in each model.
But I guess that our setting r=2 of SK block (I remember r=16 in original SK paper) can increase the number of parameters and FLOPS.

We also observed that our implementation of the SK block increases FLOPS a lot.
it is described in the appendix of the paper.
image

So if your implementation's top1 acc is the same as the paper, I think your implementation is correct.
Thank you for your interest in our work and feedback. It allows us to think about what we are missing.

from assembled-cnn.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.