Giter VIP home page Giter VIP logo

py-bottom-up-attention's Introduction

Bottom-up Attention with Detectron2

The detectron2 system with exactly the same model and weight as the Caffe VG Faster R-CNN provided in bottom-up-attetion.

The original bottom-up-attetion is implemented based on Caffe, which is not easy to install and is inconsistent with the training code in PyTorch. Our project thus transfers the weights and models to detectron2 that could be few-line installed and has PyTorch front-end.

The features extracted from this repo is compatible with LXMERT code and pre-trained models here. Results have been locally verified.

Installation

git clone https://github.com/airsplay/py-bottom-up-attention.git
cd py-bottom-up-attention

# Install python libraries
pip install -r requirements.txt
pip install 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

# Install detectron2
python setup.py build develop

# or if you are on macOS
# MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build develop

# or, as an alternative to `setup.py`, do
# pip install [--editable] .

Demos

Object Detection

demo vg detection

Feature Extraction

With Attributes:

  1. Single image: demo extraction
  2. Single image (Given boxes): demo extraction

Without Attributes:

  1. Single image: demo extraction
  2. Single image (Given boxes): demo extraction

Feature Extraction Scripts for MS COCO

Note: this script does not include attribute. If you want to use attributes, please modify it according to the demo

  1. For MS COCO (VQA): vqa script

Note

  1. The default weight is same to the 'alternative pretrained model' in the original github here, which is trained with 36 bbxes. If you want to use the original detetion trained with 10~100 bbxes, please use the following weight:
    http://nlp.cs.unc.edu/models/faster_rcnn_from_caffe_attr_original.pkl
    
  2. The coordinate generated from the code is (x_left_corner, y_top_corner, x_right_corner, y_bottom_corner). Here is a visualization. Suppose the box = [x0, y0, x1, y1], it annotates an RoI of:
    0-------------------------------------
     |                                   |
     y0 box[1]   |-----------|           |
     |           |           |           |
     |           |  Object   |           |
     y1 box[3]   |-----------|           |
     |                                   |
    H----------x0 box[0]-----x1 box[2]----
     0                                   W
    
  3. If the link breaks, please contact me (at [email protected]) directly and I will share you the weight.

External Links

  1. The orignal CAFFE implementation https://github.com/peteanderson80/bottom-up-attention, and its docker image.
  2. bottom-up-attention.pytorch maintained by MIL-LAB.

Proof of Correctness

  1. As shown in demo

Note: You might find a little difference between the caffe features and pytorch features in this verification demo. It is because the verification uses the setup "Given box" instead of "Predicted boxes". If the features are extracted from scratch (i.e., features with predicted boxes), they are exactly the same.

Detailed explanation is here; "Given box" will use feature with the final predicted boxes (after box regression), however, the extracted features will use the features of the proposals. I illustrate this in below:

Feature extraction (using predicted boxes):

ResNet --> RPN --> RoiPooling + Res5 --> Box Regression --> BOX
                                      |-------------------> Feature --> Label
                                                                  |-> Attribute

Feature extraction (using given boxes):

ResNet --> RPN --> RoiPooling + Res5 --> Box Regression --> BOX
                                           |--> RoIPooling + Res5 --> Feature --> Label
                                                                              |-> Attribute

Acknowledgement

The Caffe2PyTorch conversion code (not released here) is based on Ruotian Luo's PyTorch-ResNet project. The project also refers to Ross Girshick's old py-faster-rcnn on its way.

References

Detectron2:

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

Bottom-up Attention:

@inproceedings{Anderson2017up-down,
  author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
  title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
  booktitle={CVPR},
  year = {2018}
}

LXMERT:

@inproceedings{tan2019lxmert,
  title={LXMERT: Learning Cross-Modality Encoder Representations from Transformers},
  author={Tan, Hao and Bansal, Mohit},
  booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
  year={2019}
}

py-bottom-up-attention's People

Contributors

ppwwyyxx avatar airsplay avatar maxfrei750 avatar bryant1410 avatar vkhalidov avatar lyttonhao avatar wangg12 avatar botcs avatar yanicklandry avatar endernewton avatar sampepose avatar raymondcm avatar higumachan avatar zxf8665905 avatar bigbookplus avatar viven12138 avatar wanyenlo avatar timgates42 avatar srishti-nema avatar skeletonone avatar rbgirshick avatar shapovalov avatar nero19960329 avatar jahaniam avatar donnydonny123 avatar facebook-github-bot avatar arutyunovg avatar invisprints avatar marload avatar shenyunhang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.