OpenIBL
Introduction
OpenIBL
is an open-source PyTorch-based codebase for image-based localization, or in other words, place recognition. It supports multiple state-of-the-art methods, and also covers the official implementation for our ECCV-2020 spotlight paper SFRS. We support single/multi-node multi-gpu distributed training and testing, launched by slurm
or pytorch
.
Official implementation:
- SFRS: Self-supervising Fine-grained Region Similarities for Large-scale Image Localization (ECCV'20 Spotlight) [paper] [Blog(Chinese)]
Unofficial implementation:
- NetVLAD: CNN architecture for weakly supervised place recognition (CVPR'16) [paper] [official code (MatConvNet)]
- SARE: Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization (ICCV'19) [paper] [official code (MatConvNet)]
Self-supervising Fine-grained Region Similarities (ECCV'20 Spotlight)
NetVLAD first proposed a VLAD layer trained with triplet
loss, and then SARE introduced two softmax-based losses (sare_ind
and sare_joint
) to boost the training. Our SFRS is trained in generations with self-enhanced soft-label losses to achieve state-of-the-art performance.
Installation
This repo was tested with Python 3.6, PyTorch 1.1.0, and CUDA 9.0. But it should be runnable with recent PyTorch versions >=1.0.0. (0.4.x may be also ok)
python setup.py develop
Preparation
Datasets
Currently, we support Pittsburgh, Tokyo 24/7 and Tokyo Time Machine datasets. The access of the above datasets can be found here.
cd examples && mkdir data
Download the raw datasets and then unzip them under the directory like
examples/data
โโโ pitts
โย ย โโโ raw
โย ย โย ย โโโ pitts250k_test.mat
โย ย โย ย โโโ pitts250k_train.mat
โย ย โย ย โโโ pitts250k_val.mat
โย ย โย ย โโโ pitts30k_test.mat
โย ย โย ย โโโ pitts30k_train.mat
โย ย โย ย โโโ pitts30k_val.mat
โย ย โโโ โโโ Pittsburgh/
โโโ tokyo
โโโ raw
โย ย โโโ tokyo247/
โย ย โโโ tokyo247.mat
โย ย โโโ tokyoTM/
โย ย โโโ tokyoTM_train.mat
โโโ โโโ tokyoTM_val.mat
Pre-trained Weights
mkdir logs && cd logs
After preparing the pre-trained weights, the file tree should be
logs
โโโ vd16_offtheshelf_conv5_3_max.pth # refer to (1)
โโโ vgg16_pitts_64_desc_cen.hdf5 # refer to (2)
(1) imageNet-pretrained weights for VGG16 backbone from MatConvNet
The official repos of NetVLAD and SARE are based on MatConvNet. To reproduce their results, we need to load the same pretrained weights. Directly download from Google Drive and save it under the path of logs/
.
(2) initial cluster centers for VLAD layer
Note: it is important as the VLAD layer cannot work with random initialization.
The original cluster centers provided by NetVLAD are highly recommended. You could directly download from Google Drive and save it under the path of logs/
.
Or you could compute the centers by running the script
./scripts/cluster.sh vgg16
Train
All the training details (hyper-parameters, trained layers, backbones, etc.) strictly follow the original MatConvNet version of NetVLAD and SARE. Note: the results of all three methods (SFRS, NetVLAD, SARE) can be reproduced by training on Pitts30k-train and directly testing on the other datasets.
The default scripts adopt 4 GPUs (require ~11G per GPU) for training, where each GPU loads one tuple (anchor, positive(s), negatives).
- In case you want to fasten training, enlarge
GPUS
for more GPUs, or enlarge the--tuple-size
for more tuples on one GPU; - In case your GPU does not have enough memory (e.g. <11G), reduce
--pos-num
(only for SFRS) or--neg-num
for fewer positives or negatives in one tuple.
PyTorch launcher: single-node multi-gpu distributed training
NetVLAD:
./scripts/train_baseline_dist.sh triplet
SARE:
./scripts/train_baseline_dist.sh sare_ind
# or
./scripts/train_baseline_dist.sh sare_joint
SFRS (state-of-the-art):
./scripts/train_sfrs_dist.sh
Slurm launcher: single/multi-node multi-gpu distributed training
Change GPUS
and GPUS_PER_NODE
accordingly in the scripts for your need.
NetVLAD:
./scripts/train_baseline_slurm.sh <PARTITION NAME> triplet
SARE:
./scripts/train_baseline_slurm.sh <PARTITION NAME> sare_ind
# or
./scripts/train_baseline_slurm.sh <PARTITION NAME> sare_joint
SFRS (state-of-the-art):
./scripts/train_sfrs_slurm.sh <PARTITION NAME>
Test
During testing, the python scripts will automatically compute the PCA weights from Pitts30k-train or directly load from local files. Generally, model_best.pth.tar
which is selected by validation in the training performs the best.
The default scripts adopt 8 GPUs (require ~11G per GPU) for testing.
- In case you want to fasten testing, enlarge
GPUS
for more GPUs, or enlarge the--test-batch-size
for larger batch size on one GPU, or add--sync-gather
for faster gathering from multiple threads; - In case your GPU does not have enough memory (e.g. <11G), reduce
--test-batch-size
for smaller batch size on one GPU.
PyTorch launcher: single-node multi-gpu distributed testing
Pitts250k-test:
./scripts/test_dist.sh <PATH TO MODEL> pitts 250k
Pitts30k-test:
./scripts/test_dist.sh <PATH TO MODEL> pitts 30k
Tokyo 24/7:
./scripts/test_dist.sh <PATH TO MODEL> tokyo
Slurm launcher: single/multi-node multi-gpu distributed testing
Pitts250k-test:
./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> pitts 250k
Pitts30k-test:
./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> pitts 30k
Tokyo 24/7:
./scripts/test_slurm.sh <PARTITION NAME> <PATH TO MODEL> tokyo
Trained models
Note: the models and results for NetVLAD and SARE here are trained by this repo, showing a slight difference from their original paper.
Model | Trained on | Tested on | Recall@1 | Recall@5 | Recall@10 | Download Link |
---|---|---|---|---|---|---|
SARE_ind | Pitts30k-train | Pitts250k-test | 88.4% | 95.0% | 96.5% | Google Drive |
SARE_ind | Pitts30k-train | Tokyo 24/7 | 81.0% | 88.6% | 90.2% | same as above |
SFRS | Pitts30k-train | Pitts250k-test | 90.7% | 96.4% | 97.6% | Google Drive |
SFRS | Pitts30k-train | Tokyo 24/7 | 85.4% | 91.1% | 93.3% | same as above |
Citation
If you find this repo useful for your research, please consider citing the paper
@inproceedings{ge2020self,
title={Self-supervising Fine-grained Region Similarities for Large-scale Image Localization},
author={Yixiao Ge and Haibo Wang and Feng Zhu and Rui Zhao and Hongsheng Li},
booktitle={European Conference on Computer Vision}
year={2020},
}
Acknowledgements
The structure of this repo is inspired by open-reid, and part of the code is inspired by pytorch-NetVlad.