Comments (9)
@rwightman You can try SGE-Net (the same authors of SK-Net), it seems to have fewer parameters compared to SK-NET
Implementation can be found here
from assembled-cnn.
First, use_resnet_d==False
is right.
We have implemented BigLittleNet with reference to the official implementation of BigLittleNet
We found that BigLittleNet's official implementation already includes the concept of resnet-d.
that is, in both resnet_d_projection_shortcut
and bl_projection_shortcut
, a average
pooling layer has been added with a stride of 2 before the convolution(except pooling size is different).
So we described it in the paper as D + BL.
However, when using BL, we did not use tweak that replaces 7x7 convolution with three 3x3 conv(so it become use_resnet_d=False
) because it made training unstable.
I thought it was a little tricky. We will further explain it in the v2 version of our paper.
Second, I think you are right. The # of LittleBranch should be three in the picture.
Thank you for finding it. We will correct this in v2 of the paper.
I am very grateful for your interest on our work.
If you have any further questions please feel free to ask. 😄
from assembled-cnn.
@sweaterr thanks for the quick response, I haven't trained it yet, that'll take some time so I'm double checking to make sure I have things right first.
The flop increase is roughly in line with the parameter increase I see, so it's likely correct. I already noticed the r=2, thanks.
It'd be interesting to run a vanilla ResNet101 or ResNet101-D without SK and BL but with all of your training time techniques and see where that ends up, its throughput is higher than R50D+SK. I may run that experiment myself once I get everything else working...
from assembled-cnn.
@rwightman We looked more closely at SK's FLOPS and # of params to see why our implementation had high FLOPS and # of params than the SGE paper that @hussam789 point out .
According to the SGE paper's implementation of SK, group convolutions have been applied to an additional 3x3 kernel in SK module, however we apply the normal convs
It seems that group convolution has greatly reduced the number of FLOPS and params than ours.
We will also do additional experiments to see if there is a performance improvement if we change our implementation from normal conv to group convolution.
from assembled-cnn.
@sweaterr yeah, the original SKNet50 (paper and caffe impl) is closer to a ResNeXt-50 as it has groups=32 for both 3x3 SK convs. Not sure why, but the SK in the SGE repo has one 3x3 grouped and the other not.
from assembled-cnn.
Yes. it is actually decribed in https://github.com/clovaai/assembled-cnn/blob/master/scripts/train_assemble_from_scratch.sh
Let me know if you have more question, thanks!
from assembled-cnn.
Thank you for pointing me to the script. I am trying to reimplement this in PyTorch and I am not well versed in TensorFlow, so please cut me some slack if I have not correctly followed your code.
After looking at the script and comparing it with the paper, I want to know whether this script (config) yields the best model in terms of structure stated in the paper.
This set of flags determines the structure of the network according to the script.
--resnet_version=2 \
--resnet_size=50 \
--use_sk_block=True \
--anti_alias_type=sconv \
--anti_alias_filter_size=3 \
--use_dropblock=True \
By looking the default values in the code, I see use_resnet_d
is False
, however in the section 3.1 of the paper, you have mentioned that ResNet-D and SK apply to all blocks in all stages. Should use_resnet_d
be set to True
?
Another question is about the little branch from each of the stages. Looking at the network architecture I see there are only two little branches. However, when I follow your code I see that the first three stages share the common set of arguments except for the input and output number of filters. So should there be one more little and merge layer in the architecture depicted?
This is some excellent work you have done. Thanks for sharing.
from assembled-cnn.
@sweaterr hello, my question is also regarding the ResNet50 config you've used. I've got a PyTorch impl of SKNets working... with DropBlock, no BL yet. With a config based off what I've understood from your paper and this impl, the ResNet50 or ResNet50-D is coming out at around 37-38M parameters. That's closer to a ResNet101 than a ResNet50, is this correct?
from assembled-cnn.
@rwightman Hi, I have never counted the number of parameters in each model.
But I guess that our setting r=2
of SK block (I remember r=16
in original SK paper) can increase the number of parameters and FLOPS.
We also observed that our implementation of the SK block increases FLOPS a lot.
it is described in the appendix of the paper.
So if your implementation's top1 acc is the same as the paper, I think your implementation is correct.
Thank you for your interest in our work and feedback. It allows us to think about what we are missing.
from assembled-cnn.
Related Issues (7)
- what's the difference between MODEL_DIR and PRETRAINED_PATH in finetuning_assemble_on_food101.sh HOT 1
- 'NoneType' object has no attribute 'startswith' HOT 4
- Finetune with Assemble-ResNet152, but failed HOT 2
- The baseline performance of R50 HOT 2
- Can you tell me how I should make my own data set to classify specific pictures HOT 1
- Invalid argument: logits and labels must be broadcastable: logits_size=[1,101] labels_size=[1,1001] HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from assembled-cnn.