Comments (11)
Hi,
The zero-initialized linear layer can be checked in the following code. The zero-initialized linear layer is also used in Big Transfer (Bit) and shares one more related link.
from vit-pytorch.
OK. But where is removing two linear layers and replacing it by a single layer? I only see you zero-initialized a single linear layer.
from vit-pytorch.
Another issue is that you used the Conv2d layer to extract features in the embedding layer. Is this "hybrid architecture" described in the original paper? Could you also support a version which does not rely on CNN?
from vit-pytorch.
As far as I know, the two linear layers in the paper are the MLP layer whose activation function is tanh. You can check that part from the following link (pre_logits
and head
).
Additionally, in this repository, we create a single linear layer that matches the target class without weight loads of pre_logits
and head
.
from vit-pytorch.
As far as I know, the hybrid model replaces Conv2d with a different backbone. With a simple implementation you can remove Conv2d and use the BiT backbone for this part. Please refer to timm for detailed implementation of hybrid model.
ViT used Conv2d for patch embedding, and I think not using CNN is a big challenge.
from vit-pytorch.
Please check Equation 1 in the original paper. That's the non-hybrid version.
from vit-pytorch.
The currently implemented part is the same as Equation (1).
Please refer to the following link for the part about Equation (1).
from vit-pytorch.
Line 144 x = self.patch_embeddings(x)
self.patch_embeddings is conv2d?
from vit-pytorch.
That's correct!
Conv2d's kernel_size is the patch size.
from vit-pytorch.
@jeonsworld https://github.com/google-research/vision_transformer/blob/0040316f123353eaba186e7be914e58e656cc120/vit_jax/models.py#L215
Okay. Now I get your point. Look at this comment. The author of ViT said s2d (sequence 2D) + embedding is equal to a Conv operation. I think this is the key. So, in essence, ViT is also CNN-based model. We cannot say we should give up CNN.
from vit-pytorch.
Thanks for the nice comment. I think this gave me a better insight.
from vit-pytorch.
Related Issues (20)
- Not able to load ViT-H_14 HOT 1
- Testing HOT 2
- Why is the addition of convolution useless
- how train from scratch on cifar100?
- How to set the number of epoch?
- [ Softmax() missing ]
- Why the model gives the same logits for both the classes?
- Why we need to calculate residual connections when visualize attention maps? HOT 3
- Loss doesn't drop in the example
- Missing check of when (step+1) == len(trainset) in gradient_accumulation
- apex version? HOT 1
- Docker HOT 1
- Could you please provide me a test codes?
- how can I use the output .bin file again ? HOT 2
- How to convert Pytorch model checkpoint in .bin -> .npz ?
- <urlopen error [Errno -2] Name or service not known> HOT 1
- Patch size and n_patches calculation issue when grid is specified HOT 2
- Can FasterTransformer be used on Jetson Orin series Chips?
- For torch.distributed.launch ARG --local_rank should be --local-rank
- reconstruction task
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vit-pytorch.