- End-to-End Object Detection with Transformers [pdf]
- AN IMAGE IS WORTH 16X16 WORDS:TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE [pdf]
- Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [pdf]
- Training data-efficient image transformers & distillation through attention [pdf]
- TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation [pdf]
- SepViT: Separable Vision Transformer [pdf]
- ViTGAN: Training GANs with Vision Transformers [pdf]
- Learning Texture Transformer Network for Image Super-Resolution [pdf]