Giter VIP home page Giter VIP logo

triplet-attention's Introduction

Triplet Attention

Authors - Diganta Misra 1†, Trikay Nalamada 1,2†, Ajay Uppili Arasanipalai 1,3†, Qibin Hou 4

1 - Landskape 2. IIT Guwahati 3. University of Illinois, Urbana Champaign 4. National University of Singapore

† - Denotes Equal Contribution

Abstract - Benefiting from the capability of building inter-dependencies among channels or spatial locations, attention mechanisms have been extensively studied and broadly used in a variety of computer vision tasks recently. In this paper, we investigate light-weight but effective attention mechanisms and present triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure. For an input tensor, triplet attention builds inter-dimensional dependencies by the rotation operation followed by residual transformations and encodes inter-channel and spatial information with negligible computational overhead. Our method is simple as well as efficient and can be easily plugged into classic backbone networks as an add-on module. We demonstrate the effectiveness of our method on various challenging tasks including image classification on ImageNet-1k and object detection on MSCOCO and PASCAL VOC datasets. Furthermore, we provide extensive in-sight into the performance of triplet attention by visually inspecting the GradCAM and GradCAM++ results. The empirical evaluation of our method supports our intuition on the importance of capturing dependencies across dimensions when computing attention weights.

Figure 1. (a). Squeeze Excitation Block. (b). Convolution Block Attention Module (CBAM) (Note - GMP denotes - Global Max Pooling). (c). Global Context (GC) block. (d). Triplet Attention (ours).

Figure 2. GradCAM and GradCAM++ comparisons for ResNet-50 based on sample images from ImageNet dataset.

For generating GradCAM and GradCAM++ results, please follow the code on this repository.

Pretrained Models:

ImageNet:

Model Parameters GFLOPs Top-1 Error Top-5 Error Weights
ResNet-18 + Triplet Attention (k = 3) 11.69 M 1.823 29.67% 10.42% Google Drive
ResNet-18 + Triplet Attention (k = 7) 11.69 M 1.825 28.91% 10.01% Google Drive
ResNet-50 + Triplet Attention (k = 7) 25.56 M 4.169 22.52% 6.326% Google Drive
ResNet-50 + Triplet Attention (k = 3) 25.56 M 4.131 23.88% 6.938% Google Drive
MobileNet v2 + Triplet Attention (k = 3) 3.506 M 0.322 27.38% 9.23% Google Drive
MobileNet v2 + Triplet Attention (k = 7) 3.51 M 0.327 28.01% 9.516% Google Drive

MS-COCO:

All models are trained with 1x learning schedule.

Detectron2:

Object Detection:
Backbone Detectors AP AP50 AP75 APS APM APL Weights
ResNet-50 + Triplet Attention (k = 7) Faster R-CNN 39.2 60.8 42.3 23.3 42.5 50.3 Google Drive
ResNet-50 + Triplet Attention (k = 7) RetinaNet 38.2 58.5 40.4 23.4 42.1 48.7 Google Drive
ResNet-50 + Triplet Attention (k = 7) Mask RCNN 39.8 61.6 42.8 24.3 42.9 51.3 Google Drive
Instance Segmentation
Backbone Detectors AP AP50 AP75 APS APM APL Weights
ResNet-50 + Triplet Attention (k = 7) Mask RCNN 35.8 57.8 38.1 18 38.1 50.7 Google Drive
Person Keypoint Detection
Backbone Detectors AP AP50 AP75 APM APL Weights
ResNet-50 + Triplet Attention (k = 7) Keypoint RCNN 64.7 85.9 70.4 60.3 73.1 Google Drive

BBox AP results using Keypoint RCNN:

Backbone Detectors AP AP50 AP75 APS APM APL Weights
ResNet-50 + Triplet Attention (k = 7) Keypoint RCNN 54.8 83.1 59.9 37.4 61.9 72.1 Google Drive

MMDetection:

Object Detection:
Backbone Detectors AP AP50 AP75 APS APM APL Weights
ResNet-50 + Triplet Attention (k = 7) Faster R-CNN 39.3 60.8 42.7 23.4 42.8 50.3 Google Drive
ResNet-50 + Triplet Attention (k = 7) RetinaNet 37.6 57.3 40.0 21.7 41.1 49.7 Google Drive

Cite our work:

@misc{misra2020rotate,
      title={Rotate to Attend: Convolutional Triplet Attention Module}, 
      author={Diganta Misra and Trikay Nalamada and Ajay Uppili Arasanipalai and Qibin Hou},
      year={2020},
      eprint={2010.03045},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.