Giter VIP home page Giter VIP logo

g-simclr's Introduction

G-SimCLR: Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling

Official TensorFlow implementation of G-SimCLR (Guided-SimCLR), as described in the paper [G-SimCLR: Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling] (link to be updated soon) by Souradip Chakraborty*, Aritra Roy Gosthipaty* and Sayak Paul*.

*Equal contribution.

The paper is accepeted at ICDM 2020 for the Deep Learning for Knowledge Transfer (DLKT) workshop.

Abstract:

In the realms of computer vision, it is evident that deep neural networks perform better in a supervised setting with a large amount of labeled data. The representations learned with supervision are not only of high quality but also helps the model in enhancing its accuracy. However, the collection and annotation of a large dataset are costly and time-consuming. To avoid the same, there has been a lot of research going on in the field of unsupervised visual representation learning especially in a self-supervised setting. Amongst the recent advancements in self-supervised methods for visual recognition, in SimCLR Chen et al. shows that good quality representations can indeed be learned without explicit supervision. In SimCLR, the authors maximize the similarity of augmentations of the same image and minimize the similarity of augmentations of different images. A linear classifier trained with the representations learned using this approach yields 76.5% top-1 accuracy on the ImageNet ILSVRC-2012 dataset. In this work, we propose that, with the normalized temperature-scaled cross-entropy (NT-Xent) loss function (as used in SimCLR), it is beneficial to not have images of the same category in the same batch. In an unsupervised setting, the information of images pertaining to the same category is missing. We use the latent space representation of a denoising autoencoder trained on the unlabeled dataset and cluster them with k-means to obtain pseudo labels. With this apriori information we batch images, where no two images from the same category are to be found. We report comparable performance enhancements on the CIFAR10 dataset and a subset of the ImageNet dataset. We refer to our method as G-SimCLR.

Datasets Used:

Architectures used:

  1. ResNet20 used for CIFAR10.
  2. ResNet50 used for ImageNet subset.
  3. Denoising Autoencoder built from scratch.

Folder Structure:

.
├── CIFAR10
│   ├── Autoencoder.ipynb
│   ├── SimCLR_Pseudo_Labels
│   │   ├── Fine_Tune_10_Perc.ipynb
│   │   ├── Linear_Evaluation.ipynb
│   │   └── SimCLR_Pseudo_Labels_Training.ipynb
│   ├── Supervised_Training_CIFAR10.ipynb
│   └── Vanilla_SimCLR
│       ├── Fine_tune_10Perc.ipynb
│       ├── Linear_Evaluation.ipynb
│       └── SimCLR_Training.ipynb
├── Imagenet_Subset
│   ├── Autoencoder
│   │   ├── Deep_Autoencoder.ipynb
│   │   └── Shallow_Autoencoder.ipynb
│   ├── SimCLR_Pseudo_Labels
│   │   ├── Deep Autoencoder
│   │   │   ├── Fine_Tune_10Perc.ipynb
│   │   │   ├── Linear_Evaluation.ipynb
│   │   │   └── SimCLR_Pseudo_Labels_Training.ipynb
│   │   └── Shallow Autoencoder
│   │       ├── Fine_tune_10Perc.ipynb
│   │       ├── Linear_Evaluation.ipynb
│   │       └── SimCLR_Pseudo_Labels_Training.ipynb
│   ├── Supervised_Training_Imagenet_Subset.ipynb
│   └── Vanilla_SimCLR
│       ├── Fine_tune_10Perc.ipynb
│       ├── Linear_Evaluation.ipynb
│       └── SimCLR_Training.ipynb
└── README.md

Loss Curves:

Loss (NT-Xent) curves as obtained from the G-SimCLR training with the CIFAR10 and ImageNet Subset datasets respectively.

Pretrained Weights:

Results Reported:

Linear Evaluation

CIFAR 10 Imagenet Subset
Fully supervised 73.62 67.6
P1 37.69 52.8
SimCLR with minor modifications P2 39.4 48.4
P3 39.92 52.4
P1 38.15 56.4
G-SimCLR (ours) P2 41.01 56.8
P3 40.5 60

Fine-tuning (10% labeled data)

CIFAR 10 Imagenet Subset
Fully supervised 73.62 67.6
SimCLR with minor modifications 42.21 49.2
G-SimCLR (ours) 43.1 56

where,

  • P1 denotes the feature backbone network + the entire non-linear projection head - its final layer
  • P2 denotes the feature backbone network + the entire non-linear projection head - its final two layers
  • P3 denotes the feature backbone network only

g-simclr's People

Contributors

sayakpaul avatar arig23498 avatar souradip-chakraborty avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.