Giter VIP home page Giter VIP logo

apnet's Introduction

Person Re-identification via Attention Pyramid (APNet)

The official PyTorch code implementation for TIP20' Submission: "Person Re-identification via Attention Pyramid"

This repo only contains channel-wise attention(SE-Layer) implementation, to reproduce the result of spatial attention in our paper, please refer to RGA-S by Microsoft and simply change the attention agent. We also want to thank FastReid which is the codebase of our implementation.

Introduction

Recently, attention mechanism has been widely used in the ReID system to facilitate high-performance identification and demonstrates the powerful representation ability by discovering discriminative regions and mitigating the misalignment. However, detecting the salient regions with the attention model is confronted with the dilemma to jointly capture both coarse and fine-grained clues, since the focus varies as the image scale changes. To address the above issue, we propose an effective attention pyramid networks (APNet) to jointly learn the attentions under different scales. Our attention pyramid imitates the process of human vi- sual perception which tends to notice the foreground person over the cluttered background, and further focus on the specific color of the shirt with a close observation. Please see the Figure1 below and our paper for the method detail.

We validate our method in Market1501, DukeMTMC and MSMT17 datasets, and our method shows a superior performance on all the datasets. Please check the Result section for the detaied quantity and quality result.

image Figure 1: The architecture of Attention Pyramid Networks (APNet). Our APNet adopts the “split-attend-merge-stack” principle, which first splits the feature maps into multiple parts, obtains the attention map of each part, and the attention map for current pyramid level is constructed by merging each attention map. Then in deeper pyramid level, we split the features into more fine-grained parts and learn the fine-grained attention guiding by coarse attentions. Finally, attentions with different granularities are stacked as attention pyramid and applies to original input feature by element-wise product.

Requirements

  • Python 3.6+
  • PyTorch 1.5+
  • CUDA 10.0+

Configuration other than the above setting is untested and we recommend to follow our setting.

To build all the dependency, please follow the instruction below.

conda create -n apnet python=3.7 -y
conda activate apnet
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
conda install ignite -c pytorch
git clone https://github.com/CHENGY12/APNet.git
pip install -r requirements.txt

To download the pretrained ResNet-50 model, please run the following command in your python console:

from torchvision.models import resnet50
resnet50(pretrained=True)

The model should be located in RESNET_PATH=/home/YOURNAME/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth or /home/YOURNAME/.cache/torch/checkpoints/resnet50-19c8e357.pth

Downloading

  • Market-1501
  • DukeMTMC-reID
  • MSMT17

Preparation

After downloading the datasets above, move them to the Datasets/ folder in the project root directory, and rename dataset folders to 'market1501', 'duke' and 'msmt17' respectively. I.e., the Datasets/ folder should be organized as:

|-- market1501
    |-- bounding_box_train
    |-- bounding_box_test
    |-- ...
|-- duke
    |-- bounding_box_train
    |-- bounding_box_test
    |-- ...
|-- msmt17
    |-- bounding_box_train
    |-- bounding_box_test
    |-- ...

Usage

Training

Change the PRETRAIN_PATH parameter in configs/default.yml to your RESNET_PATH To train with different pyramid level, please edit LEVEL parareter in configs/default.yml

sh train.sh

Evaluation

sh test.sh

Result

Dataset Top-1 mAP
Market-1501 96.2 90.5
DukeMTMC-Re-ID 90.4 81.5
MSMT17 83.7 63.5

image Figure 2: Visualizations of the attention maps with different pyramid level. We adopt the Grad-CAM to visualize the learned attention maps of our attention pyramid. For each sample, from left to right, we show the input image, attention of first level pyramid, attention of second level pyramid. We can observe that attentions in different pyramid levels capture the salient clues of different scales.

apnet's People

Contributors

chengy12 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.