Giter VIP home page Giter VIP logo

openibl's Introduction

OpenIBL

Introduction

OpenIBL is an open-source PyTorch-based codebase for image-based localization, or in other words, place recognition. It supports multiple state-of-the-art methods, and also covers the official implementation for our ECCV-2020 spotlight paper SFRS. We support single/multi-node multi-gpu distributed training and testing, launched by slurm or pytorch.

Official implementation:

  • SFRS: Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (ECCV'20 Spotlight) [paper] [Blog(Chinese)]

Unofficial implementation:

Self-supervising Fine-grained Region Similarities (ECCV'20 Spotlight)

NetVLAD first proposed a VLAD layer trained with triplet loss, and then SARE introduced two softmax-based losses (sare_ind and sare_joint) to boost the training. Our SFRS is trained in generations with self-enhanced soft-label losses to achieve state-of-the-art performance.

Installation

This repo was tested with Python 3.6, PyTorch 1.1.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions >=1.0.0. (0.4.x may be also ok)

python setup.py develop

Preparation

Datasets

Currently, we support Pittsburgh, Tokyo 24/7 and Tokyo Time Machine datasets. The access of the above datasets can be found here.

cd examples && mkdir data

Download the raw datasets and then unzip them under the directory like

examples/data
โ”œโ”€โ”€ pitts
โ”‚ย ย  โ”œโ”€โ”€ raw
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pitts250k_test.mat
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pitts250k_train.mat
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pitts250k_val.mat
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pitts30k_test.mat
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pitts30k_train.mat
โ”‚ย ย  โ”‚ย ย  โ”œโ”€โ”€ pitts30k_val.mat
โ”‚ย ย  โ””โ”€โ”€ โ””โ”€โ”€ Pittsburgh/
โ””โ”€โ”€ tokyo
    โ”œโ”€โ”€ raw
    โ”‚ย ย  โ”œโ”€โ”€ tokyo247/
    โ”‚ย ย  โ”œโ”€โ”€ tokyo247.mat
    โ”‚ย ย  โ”œโ”€โ”€ tokyoTM/
    โ”‚ย ย  โ”œโ”€โ”€ tokyoTM_train.mat
    โ””โ”€โ”€ โ””โ”€โ”€ tokyoTM_val.mat

Pre-trained Weights

mkdir logs && cd logs

After preparing the pre-trained weights, the file tree should be

logs
โ”œโ”€โ”€ vd16_offtheshelf_conv5_3_max.pth # refer to (1)
โ””โ”€โ”€ vgg16_pitts_64_desc_cen.hdf5 # refer to (2)

(1) imageNet-pretrained weights for VGG16 backbone from MatConvNet

The official repos of NetVLAD and SARE are based on MatConvNet. To reproduce their results, we need to load the same pretrained weights. Directly download from Google Drive and save it under the path of logs/.

(2) initial cluster centers for VLAD layer

Note: it is important as the VLAD layer cannot work with random initialization.

The original cluster centers provided by NetVLAD are highly recommended. You could directly download from Google Drive and save it under the path of logs/.

Or you could compute the centers by running the script

./scripts/cluster.sh vgg16

Train

All the training details (hyper-parameters, trained layers, backbones, etc.) strictly follow the original MatConvNet version of NetVLAD and SARE. Note: the results of all three methods (SFRS, NetVLAD, SARE) can be reproduced by training on Pitts30k-train and directly testing on the other datasets.

The default scripts adopt 4 GPUs (require ~11G per GPU) for training, where each GPU loads one tuple (anchor, positive(s), negatives).

  • In case you want to fasten training, enlarge GPUS for more GPUs, or enlarge the --tuple-size for more tuples on one GPU;
  • In case your GPU does not have enough memory (e.g. <11G), reduce --pos-num (only for SFRS) or --neg-num for fewer positives or negatives in one tuple.

PyTorch launcher: single-node multi-gpu distributed training

NetVLAD:

./scripts/train_baseline_dist.sh triplet

SARE:

./scripts/train_baseline_dist.sh sare_ind
# or
./scripts/train_baseline_dist.sh sare_joint

SFRS (state-of-the-art):

./scripts/train_sfrs_dist.sh

Slurm launcher: single/multi-node multi-gpu distributed training

Change GPUS and GPUS_PER_NODE accordingly in the scripts for your need.

NetVLAD:

./scripts/train_baseline_slurm.sh <PARTITION NAME> triplet

SARE:

./scripts/train_baseline_slurm.sh <PARTITION NAME> sare_ind
# or
./scripts/train_baseline_slurm.sh <PARTITION NAME> sare_joint

SFRS (state-of-the-art):

./scripts/train_sfrs_slurm.sh <PARTITION NAME>

Test

During testing, the python scripts will automatically compute the PCA weights from Pitts30k-train or directly load from local files. Generally, model_best.pth.tar which is selected by validation in the training performs the best.

The default scripts adopt 8 GPUs (require ~11G per GPU) for testing.

  • In case you want to fasten testing, enlarge GPUS for more GPUs, or enlarge the --test-batch-size for larger batch size on one GPU, or add --sync-gather for faster gathering from multiple threads;
  • In case your GPU does not have enough memory (e.g. <11G), reduce --test-batch-size for smaller batch size on one GPU.

PyTorch launcher: single-node multi-gpu distributed testing

Pitts250k-test:

./scripts/test_dist.sh <PATH TO MODEL> pitts 250k

Pitts30k-test:

./scripts/test_dist.sh <PATH TO MODEL> pitts 30k

Tokyo 24/7:

./scripts/test_dist.sh <PATH TO MODEL> tokyo

Slurm launcher: single/multi-node multi-gpu distributed testing

Pitts250k-test:

./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> pitts 250k

Pitts30k-test:

./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> pitts 30k

Tokyo 24/7:

./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> tokyo

Trained models

Note: the models and results for NetVLAD and SARE here are trained by this repo, showing a slight difference from their original paper.

Model Trained on Tested on Recall@1 Recall@5 Recall@10 Download Link
SARE_ind Pitts30k-train Pitts250k-test 88.4% 95.0% 96.5% Google Drive
SARE_ind Pitts30k-train Tokyo 24/7 81.0% 88.6% 90.2% same as above
SFRS Pitts30k-train Pitts250k-test 90.7% 96.4% 97.6% Google Drive
SFRS Pitts30k-train Tokyo 24/7 85.4% 91.1% 93.3% same as above

Citation

If you find this repo useful for your research, please consider citing the paper

@inproceedings{ge2020self,
    title={Self-supervising Fine-grained Region Similarities for Large-scale Image Localization},
    author={Yixiao Ge and Haibo Wang and Feng Zhu and Rui Zhao and Hongsheng Li},
    booktitle={European Conference on Computer Vision}
    year={2020},
}

Acknowledgements

The structure of this repo is inspired by open-reid, and part of the code is inspired by pytorch-NetVlad.

openibl's People

Contributors

yxgeee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.