View Code? Open in Web Editor
NEW
Tensorflow implementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
Home Page: https://openreview.net/pdf?id=YicbFdNTTy
vision-transformer's People
vision-transformer's Issues
Thank you for sharing your code.
Do you have any idea how we can train the model on our dateset?
I look forward to hearing from you.
attention map as shown in your paper Figure 6.
attention distance as Figure 7(right).
I notice that official pretrain model only has JAX format.
Do you have the original pretrain model weights in TF format? Or how to convert JAX to TF format?
this model is giving me arror when I update the embedding dimension to 512 and MLP dimension to 512. can you please look into this