Light

slwang9353 / mobileformer Goto Github PK

View Code? Open in Web Editor NEW

67.0 3.0 13.0 103 KB

MobileFormer in torch

Python 100.00%

mobileformer's Introduction

MobileFormer

An implementation of MobileFormer proposed by Yinpeng Chen, Xiyang Dai et al.

Including

[1] Mobile-Former proposed in: 
                        Yinpeng Chen, Xiyang Dai et al., Mobile-Former: Bridging MobileNet and Transformer. 
                        arxiv.org/abs/2108.05895
[2] Dynamtic ReLU proposed in: 
                        Yinpeng Chen, Xiyang Dai et al., Dynamtic ReLU. 
                        arxiv.org/abs/2003.10027v2
[3] Lite-BottleNeck proposed in: 
                        Yunsheng Li, Yinpeng Chen et al., MicroNet: Improving Image Recognition with Extremely Low FLOPs. 
                        arxiv.org/abs/2108.05894v1
[4] Adam-W proposed in:
                        Ilya Loshchilov & Frank Hutter, Decoupled Weight Decay Regularization.
                        arxiv.org/abs/1711.05101v3
[5] Mixup proposed in:
                        Hongyi Zhang, Moustapha Cisse et al., Mixup: Beyond Empircal Risk Minimization.
                        arxiv.org/abs/1710.09412
[6] Multi-FocalLoss (not used), focal loss is proposed in:
                        Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár, Focal Loss for Dense Object Detection.
                        arxiv.org/abs/1708.02002

Note

(1) Due to the expanded DW conv used in strided Mobile-Former blocks, 
    the out_channel should be divisible by expand_size of the next block.
(2) Adam-W and Mixup is embedded in train.py.
(3) Use run() in train.py to train('run') or search('search'). There is an example in the train.py.

'###### The '#'s #######'

'##### are aligned #####'

No pre-train parameters for now.

About Training:

Following DeiT, there is an optional learning rate and weight decay set for grid search (if you want):
    LR from [5e-4, 3e-4, 5e-5] * batchsize / 256 ( or 512)
    WD from [0.03, 0.04, 0.05]
Looooooooong Training for CNN, but for transformer, its ok (maybe).

mobileformer's People

Contributors

Stargazers

Watchers

Forkers

yafengge automaton123456 121644048 daijucug abexit andy1621 dominickzhang cv-ip einstein10147 xurui-joei tuvovan chuxiayang ycxxn

mobileformer's Issues

Code issue

Hi, thanks for your reproduction. I find a small bug in your code. The small bug is shown in the picture. BatchNorm1d will raise an error when the batchsize of input is 1. When testing the model, the batchsize is set1, It will raise an error that BatchNorm1d requires input batchsize>1. I hope you can solve this small bug, Thanks!

ReLU6

hello，why you use ReLU6 instead ReLU in the code？

had you tested it that the model named MobileFormer ?

Whether it conformed with the MobileFormer?

为啥我summary的模型里面MobileFormer模块全都没有啊，代码没改过

Official Release?

Thanks for the great share.
I wonder if this repo is the official release of the original paper?

Thanks!

NameError: name 'Mixup' is not defined

关于模型训练的问题

您好作者，很感激您将代码与我们分享，我们在使用Food-101数据集对模型进行训练的时候，发现模型的每次输出都是预测第64类，而且输出的tensor都是相同的，loss也没有变化，模型并没有优化，我们只是接入了数据集没有修改网络，我们想知道是哪里出现了问题
![(F4NW`FZEM8OM E(I)

Model will not train

When the code is run in google colab, the Validation accuracy will not improve beyond around 0.11 on cifar-10 when running the search and is often even below 0.1

fps is fluctuates

Each line of data below is the result of averaging 100 by using model inference, but the result still fluctuates. Which value should I take?

# {'fps': 63.5, 'time_mean': 15.7, 'time_std': 0.5}
# {'fps': 62.8, 'time_mean': 15.9, 'time_std': 0.1}
# {'fps': 64.9, 'time_mean': 15.4, 'time_std': 0.2}
# {'fps': 63.6, 'time_mean': 15.7, 'time_std': 0.1}
# {'fps': 64.5, 'time_mean': 15.5, 'time_std': 0.1}
# {'fps': 64.1, 'time_mean': 15.6, 'time_std': 0.2}
# {'fps': 61.2, 'time_mean': 16.3, 'time_std': 0.1}
# {'fps': 62.2, 'time_mean': 16.1, 'time_std': 0.1}
# {'fps': 63.7, 'time_mean': 15.7, 'time_std': 0.4}
# {'fps': 65.0, 'time_mean': 15.4, 'time_std': 0.1}
# {'fps': 63.9, 'time_mean': 15.7, 'time_std': 0.1}
# {'fps': 63.9, 'time_mean': 15.7, 'time_std': 0.1}
# {'fps': 61.1, 'time_mean': 16.4, 'time_std': 0.5}
# {'fps': 64.2, 'time_mean': 15.6, 'time_std': 0.7}

好像模型的参数和论文对不上

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.