Giter VIP home page Giter VIP logo

generalized_contrastive_loss's Introduction

Generalized Contrastive Loss

License: MIT PWC PWC

Visual place recognition is a challenging task in computer vision and a key component of camera-based localization and navigation systems. Recently, Convolutional Neural Networks (CNNs) achieved high results and good generalization capabilities. They are usually trained using pairs or triplets of images labeled as either similar or dissimilar, in a binary fashion. In practice, the similarity between two images is not binary, but rather continuous. Furthermore, training these CNNs is computationally complex and involves costly pair and triplet mining strategies. We propose a Generalized Contrastive loss (GCL) function that relies on image similarity as a continuous measure, and use it to train a siamese CNN. Furthermore, we propose three techniques for automatic annotation of image pairs with labels indicating their degree of similarity, and deploy them to re-annotate the MSLS, TB-Places, and 7Scenes datasets. We demonstrate that siamese CNNs trained using the GCL function and the improved annotations consistently outperform their binary counterparts. Our models trained on MSLS outperform the state-of-the-art methods, including NetVLAD, and generalize well on the Pittsburgh, TokyoTM and Tokyo 24/7 datasets. Furthermore, training a siamese network using the GCL function does not require any pair mining.

Paper and license

The code is licensed under the MIT License.

If you use our code please cite our paper

@article{leyvavallina2021gcl,
  title={Data-efficient Large Scale Place Recognition with Graded Similarity Supervision}, 
  author={María Leyva-Vallina and Nicola Strisciuglio and Nicolai Petkov},
  journal={CVPR},
  year={2023}
}

Network activation

Contact details

If you have any doubts please contact us at:

  1. María Leyva-Vallina: m.leyva.vallina at rug dot nl
  2. Nicola Strisciuglio: n.strisciuglio at utwente dot nl

How to use this library

Download the data

  1. MSLS: The dataset is available on request here. For the new GT annotations, please request them here.
  2. Pittsburgh: The whole dataset is available on request here and the train val splits for Pitts30k are available here.
  3. TokyoTM: The dataset is available on request here.
  4. Tokyo 24/7: The dataset is available on request here.
  5. TB-Places: The dataset is available here. For the new GT annotations, please request them here.

Download the models

All our models (and labels) can be downloaded from here.

Our results

MSLS

MSLS-val MSLS-test Pitts30k Tokyo24/7 RobotSeasons v2- all Extended CMU-all
Method PCA_w Dim R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10 R@1 R@5 R@10 0.25m/2° 0.5m/5º 5.0m/10º 0.25m/2° 0.5m/5º 5.0m/10º
NetVLAD-GCL N 32768 62.7 75.0 79.1 41.0 55.3 61.7 52.5 74.1 81.7 20.3 45.4 49.5 3.3 14.1 58.2 3.0 9.7 52.3
NetVLAD-GCL Y 4096 63.2 74.9 78.1 41.5 56.2 61.3 53.5 75.2 82.9 28.3 41.9 54.9 3.4 14.2 58.8 3.1 9.7 52.4
VGG-GeM-GCL N 512 65.9 77.8 81.4 41.7 55.7 60.6 61.6 80.0 86.0 34.0 51.1 61.3 3.7 15.8 59.7 3.6 11.2 55.8
VGG-GeM-GCL Y 512 72.0 83.1 85.8 47.0 60.8 65.5 73.3 85.9 89.9 47.6 61.0 69.2 5.4 21.9 69.2 5.7 17.1 66.3
ResNet50-GeM-GCL N 2048 66.2 78.9 81.9 43.3 59.1 65.0 72.3 87.2 91.3 44.1 61.0 66.7 2.9 14.0 58.8 3.8 11.8 61.6
ResNet50-GeM-GCL Y 1024 74.6 84.7 88.1 52.9 65.7 71.9 79.9 90.0 92.8 58.7 71.1 76.8 4.7 20.2 70.0 5.4 16.5 69.9
ResNet152-GeM-GCL N 2048 70.3 82.0 84.9 45.7 62.3 67.9 72.6 87.9 91.6 34.0 51.8 60.6 2.9 13.1 63.5 3.6 11.3 63.1
ResNet152-GeM-GCL Y 2048 79.5 88.1 90.1 57.9 70.7 75.7 80.7 91.5 93.9 69.5 81.0 85.1 6.0 21.6 72.5 5.3 16.1 66.4
ResNeXt-GeM-GCL N 2048 75.5 86.1 88.5 56.0 70.8 75.1 64.0 81.2 86.6 37.8 53.6 62.9 2.7 13.4 65.2 3.5 10.5 58.8
ResNeXt-GeM-GCL Y 1024 80.9 90.7 92.6 62.3 76.2 81.1 79.2 90.4 93.2 58.1 74.3 78.1 4.7 21.0 74.7 6.1 18.2 74.9
To reproduce them

Run the src/labeling/create_json_idx.py file to generate the necessary json index files for the dataset.

Clone the mapillary repository and run the following command on your machine, substituting "MYDIR" with the path where you downloaded the mapillary library:

export MAPILLARY_ROOT="/MYDIR/mapillary_sls/"

Create the JSON index files for the MSLS dataset as follows. (replace PATH-TO-DATASET with the directory where the MSLS dataset is)

python3 src/labeling/create_json_idx.py --dataset msls --root_dir PATH-TO-DATASET/MSLS/

Run the extract_predictions.py script to compute the map and query features, and the top-k prediction. For instance:

python3 extract_predictions.py --dataset MSLS --root_dir PATH-TO-DATASET/MSLS/ --subset val --model_file models/MSLS/MSLS_resnet152_GeM_480_GCL.pth --backbone resnet152 --pool GeM --f_length 2048

This will produce the results on the MSLS validation set for this model. If you select --subset test, the file results/MSLS/test/MSLS_resnet152_GeM_480_GCL_predictions.txt will be generated. To evaluate the predictions you will need to submit this file to the MSLS evaluation server.

To apply PCA whitening run the apply_pca.py script with the appropriate parameters. For instance, for the example above, you have to run:

python3 apply_pca.py --dataset MSLS --root_dir PATH-TO-DATASET/MSLS/ --subset val --name MSLS_resnet152_GeM_480_GCL 

To reproduce our experiments we include a series of evaluation scripts in the 'scripts' folder, for the MSLS, Pittsburgh, Tokyo24/7, TokyoTM, RobotCar Seasons v2 and Extended CMU Seasons datasets. These scripts need the index files for each dataset that are available here and our model files, available here.

Train your own models

If you want to train a model on MSLS using the GCL function you must execute train.py with the appropriate parameters. For example:

python3 train.py --root_dir PATH-TO-DATASET/MSLS/ --cities val --backbone vgg16 --use_gpu --pool GeM --last_layer 2 

IMPORTANT NOTES FOR TRAINING

Make sure that the 'train_val' dir of the MSLS data set contains the graded files with the graded Ground Truth (the label files should be at the same level of the city folders in the train_val directory of MSLS).
Download the ground truth files from our DataVerse repository.
The password to extract the zip-file with the labels is 'gcl2022'.

generalized_contrastive_loss's People

Contributors

dependabot[bot] avatar marialeyvallina avatar nicstrisc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

generalized_contrastive_loss's Issues

about attention map

Hello, I'm interested in your work. How can I obtain the attention maps as shown in Figure 15? Do you have any related code available?

Could you please share the code for marking the ground truth label of the msls or other datasets?

What a great job you are doing! Just from the perspective of the datasets, the results are so attractive! It is obvious that your code is logical and I have learned a lot while reading your code. But I have a puzzle that I hope you can solve.

I am curious about how you identify the ground truth label for msls dataset. I looked through your code on github and found nothing relevant. Did you publish this part of the code? If you publish it, please tell me where it is.

Thank you for taking the time to read my issue and I hope you can answer it.

Requset password for new GT annotaions files

Hi María and Nicola,

Thanks for your impressive work.

However, when I was trying to unzip new GT annotation files downloaded via the link provided in this GitHub repo, the password was required.

May I know the password?

Thank you.

How to get the value of Extended CMU Seasons?

Hello, @marialeyvallina @nicstrisc ,I really appreciate for your work.
I have a question, the Table 6 shows that the result of Extended CMU Seasons of ResNext-GeM-GCL:
Urban:11.1 / 28.7 / 87.4;
Suburban:4.2 / 14.6 / 77.4;
Park:3.1 / 11.1 / 57.7

And the Mean result is :
6.1 / 18.2 / 74.9

Is it weighted in some way? How is it converted?
I am always looking forward to your kind response.
Best regards.

I can't load pre-trained models on MSLS

Dear Maria,

Thanks for your work and effort. I can't test your models which has GeM pooling. It can't load and shows following error:

model.load_state_dict(torch.load(args.resume_model)["model_state_dict"])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for BaseNet:
	Unexpected key(s) in state_dict: "pool.p".

GCL reproduce

Hi, I am trying to reproduce the results in the paper using ResNet152 as backbone and GCL loss. However, I get nan error during training. Here is my implementation for GCL loss.

class GCLoss(torch.nn.Module):  

    def __init__(self, margin=0.5):
        super(GCLoss, self).__init__()
        self.margin = margin

    def forward(self, out1, out2, label): # out1, out2 are the output features,  and label is the FoV value
        dist = torch.sqrt(torch.sum((out1 - out2) ** 2, dim=1))
        dist_square = torch.sum((out1 - out2) ** 2, dim=1)
        loss = label * 0.5 * dist_square + (1 - label) * 0.5 * torch.relu(self.margin - dist) ** 2
        # embed()
        if torch.isnan(loss).sum() > 0:
            embed()
        loss_m = torch.mean(loss)
        return loss_m

Could you please give me some advice? Do I get something wrong?

Siamese Vs. Single network in training

Hi, I have read your paper and the code. The work is cool and fantastic.
However, I am confused about the siamese network here. It is said in the paper that these two networks share weights and the same structure, so what is the difference if the tensors are input only once using single network? Like, how about we concatenate these two tensors into one batch and input it to the network?
Could you please provide some ablation study on Siamese Vs. Single network in MSLS? Thanks a lot.

Inquire about the 7Scenes relabeling methods

Hi,
Your work "Data-efficient Large Scale Place Recognition with Graded Similarity Supervision" is fascinating and inspire me a lot. I want to follow but curious about how to relabel the 7 Scenes dataset based on 3D overlap. I'll be really appreciate if you can provide a pseudo or source code for understanding.
Best wishes
Yingxiu Chang

How to get the new GT annotation

How to get the new GT annotation for TB_places dataset, I have filled in the email in the link left by the author, but I haven't received a reply for a long time, so where should i download it? thank you

About the Figure 14 in the paper

Hello, @marialeyvallina -The Figure 14(CNN activations) in the paper looks great. Currently I want to try to implement it but I failed. Is it convenient to provide the relevant implementation code?Of course, detailed guidance would be better.
I am always looking forward to your kind response.
Best regards.

How to reproduce the result of RobotCar Seasons v2?

Hi, @marialeyvallina @nicstrisc,
I really appreciate for your great work. I'm trying to reproduce results of RobotCar Seasons v2. I have got the MSLS_resnet50_avg_480_GCL_predictions.npy.Could you please tell me how to get the results in the paper next? Could you please provide the subsequent code? Of course, detailed guidance would be better.
I am always looking forward to your kind response.
Best regards.

Where does `create_model` come from?

I see a function create_model being called in the [extract_predictions] file, but I'm not sure where it comes from.

I tried using TIMM's create_model, but it seems to load an incompatible state dictionary. Is there another create_model I should be looking at? What version of timm do you use?

PCA whitening

Hi,

Interesting work! I'm excited try the method out in my own research.

But I have a quation regarding the descriptor PCA whitening implementation mentioned in the paper - namely, I couldn't find the implementation in the code. Did I just miss it, or did you not include the implementation in the release?

Thanks!

release date for generalized contrastive loss code?

Hello dear authors, Thank you for your interesting work. Are you going to release the code for GCL? I am trying to experiment on different contrastive learning and I want to try you proposed GCL but I could find the code. Can you please confirm me if the loss function code is going to be release or not in the near future?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.