Giter VIP home page Giter VIP logo

all_about_transformers's Introduction

All_about_transformers

Awesome Transformers (self-attention) in Computer Vision

About transformers

  • Attention Is All You Need, NeurIPS 2017
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019
  • Efficient Transformers: A Survey, arXiv 2020
    • Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler
    • [paper]

Combining CNN with self-attention

  • Attention augmented convolutional networks, ICCV 2019, image classification
  • Self-Attention Generative Adversarial Networks, ICML 2019, generative model(GANs)
  • Videobert: A joint model for video and language representation learning, ICCV 2019, video processing
    • Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid
    • [paper]
  • Visual Transformers: Token-based Image Representation and Processing for Computer Vision, arXiv 2020, image classification
    • Bichen Wu, Chenfeng Xu, Xiaoliang Dai, Alvin Wan, Peizhao Zhang, Masayoshi Tomizuka, Kurt Keutzer, Peter Vajda
    • [paper]
  • Feature Pyramid Transformer, ECCV 2020, detection and segmentation
  • Revisiting Stereo Depth Estimation From a Sequence-to-Sequence Perspective with Transformers, arXiv 2020, depth estimation
    • Zhaoshuo Li, Xingtong Liu, Francis X. Creighton, Russell H. Taylor, and Mathias Unberath
    • [paper] [official code]
  • End-to-end Lane Shape Prediction with Transformers, arXiv 2020, lane detection

DETR Family

  • End-to-end object detection with transformers, ECCV 2020, object detection
  • Deformable DETR: Deformable Transformers for End-to-End Object Detection, arXiv 2020, object detection
  • End-to-End Object Detection with Adaptive Clustering Transformer, arXiv 2020, object detection
    • Minghang Zheng, Peng Gao, Xiaogang Wang, Hongsheng Li, Hao Dong
    • [paper]
  • UP-DETR: Unsupervised Pre-training for Object Detection with Transformers, arXiv 2020, object detection
    • Zhigang Dai, Bolun Cai, Yugeng Lin, Junying Chen
    • [paper]
  • DETR for Pedestrian Detection, arXiv 2020, pedestrian detection
    • Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, Zhidong Deng
    • [paper]

Stand-alone transformers for Computer Vision

Self-attention only in local neighborhood

  • Image Transformer, ICML 2018
    • Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran
    • [paper] [official code]
  • Stand-alone self-attention in vision models, NeurIPS 2019
  • On the relationship between self-attention and convolutional layers, ICLR 2020
  • Exploring self-attention for image recognition, CVPR 2020

Scalable approximations to global self-attention

  • Generating long sequences with sparse transformers, arXiv 2019
  • Scaling autoregressive video models, ICLR 2019
    • Dirk Weissenborn, Oscar Täckström, Jakob Uszkoreit
    • [paper]
  • Axial attention in multidimensional transformers, arXiv 2019
  • Axial-deeplab: Stand-alone axial-attention for panoptic segmentation, ECCV 2020

Global self-attention with image preprocessing

  • Generative pretraining from pixels, ICML 2020, iGPT
    • Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever
    • [paper] [official code]
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, arXiv 2020, ViT
    • Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby
    • [paper] [pytorch implementation]
  • Pre-Trained Image Processing Transformer, arXiv, IPT
    • Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, Wen Gao
    • [paper]

Global self-attention on 3D point clouds

  • Point Transformer, arXiv 2020, points classification + part/semantic segmentation
    • Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip Torr, Vladlen Koltun
    • [paper]

Unified text-vision tasks

Focused on VQA

  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks, NeurIPS 2019
  • LXMERT: Learning Cross-Modality Encoder Representations from Transformers, EMNLP 2019
  • VisualBERT: A Simple and Performant Baseline for Vision and Language, arXiv 2019
  • VL-BERT: Pre-training of Generic Visual-Linguistic Representations, ICLR 2020
  • UNITER: UNiversal Image-TExt Representation Learning, ECCV 2020
    • Yen-Chun Chen, Linjie Li, Licheng Yu, Ahmed El Kholy, Faisal Ahmed, Zhe Gan, Yu Cheng, Jingjing Liu
    • [paper] [official code]

Focused on Image Retrieval

  • Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training, AAAI 2020
  • ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data, arXiv 2020
    • Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, Arun Sacheti
    • [paper]
  • Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks, ECCV 2020
    • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiaowei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao
    • [paper] [official code]

Focused on OCR

  • LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Multi-Task

  • 12-in-1: Multi-Task Vision and Language Representation Learning

all_about_transformers's People

Contributors

ankitshah009 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.