Comments (8)
This issue won't be closed until we update our manuscript on arxiv
.
from yolos.
Hi @gaopengcuhk, thanks for your interest in our work and good question!
For the small- and base-sized model, the added parameters mainly come from positional embeddings (PE): we add randomly initialized (512 / 16) x (864 / 16)
PE at every Transformer layer to align with the DETR settings initially. But later we find that interpolate the pre-trained first layer PE to a larger size only, i.e., (800 / 16) x (1344 / 16)
and without adding other PEs in intermediate layers can strike a better accuracy & parameter tradeoff. I.e., 36.6 AP v.s. 36.1 AP
& 24.6 M (22.1 M + 2.5 M π) v.s. 30.7 M (22.1 M+ 8.6 M π)
. The tiny-sized model adopts this configuration.
We have added a detailed description in the Appendix and we will submit it to the arxiv
soon (next week, hopefully), the pre-trained model will also be released soon, please stay tuned :)
This issue won't be closed until we update our manuscript on arxiv
.
from yolos.
Another question, why only add the prediction head on the last layer? Have you tried to add the prediction head to the last several layers like DETR?
from yolos.
Another question, why only add the prediction head on the last layer? Have you tried to add the prediction head to the last several layers like DETR?
Thanks for your valuable issue.
We have tried this configuration in our early study, which gives no improvements.
The reason we guess is: for DETR, the deep supervision works because the supervision is "deep enough". I.e., the decoders are stacked upon least 50 / 101 layers ResNet backbone and 6 layers Transformer encoders. While YOLOS with a much shallow network cannot benefit from deep supervision.
from yolos.
Another question, it seems like you add the position embedding to x every layer. While in Deit, only the first layer add position embedding, is this important in YOLOS?
from yolos.
Another question, it seems like you add the position embedding to x every layer. While in Deit, only the first layer add position embedding, is this important in YOLOS?
We have actually answered here: #3 (comment): YOLOS with only first layer PE added is better in terms of AP and parameter efficiency :)
from yolos.
Thank you very much for your reply.
from yolos.
This issue won't be closed until we update our manuscript on
arxiv
.
We have updated our manuscript on arxiv
, and as such I'm closing this issue. Let us know if you have further questions.
from yolos.
Related Issues (20)
- [URGENT] Eval results are much lower than what's reported HOT 2
- Visualization demo? HOT 3
- Input size can not be dynamic? HOT 6
- Implmenetation queries HOT 1
- Train with custom dataset HOT 3
- How can i get the ImageNet pretrained model?
- Control the patches
- AMP Support? HOT 2
- About Learning Rate Scheduler HOT 2
- The definitions of sigmoid_focal_loss and dice_loss not found in model/detector.py
- Error of the size mismatch for pos_embed HOT 1
- Where're the pre-trained models?
- Where are the pre-trained models? HOT 1
- Adding YOLOS to HuggingFace Transformers HOT 6
- HiοΌwhat is the magnitude of model parameters ?
- Train problem with VOC
- CUDA Out of Memory Errors w Batch Size of 1 on 16GB V100 HOT 4
- ONNX Export
- Small learning rate value HOT 2
- About learning rate scheduler HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yolos.