SimCLR: A Simple Framework for Contrastive Learning of Visual Representations in PyTorch

Arxiv link for the SimCLR paper : A Simple Framework for Contrastive Learning of Visual Representations

Introduction

This repository contains my implementation of the SimCLR paper in PyTorch. I've written a blog post about the same here. SimCLR presented a simple framework to learn representations from unlabeled image data using contrastive learning. A composition of two data augmentation operations, namely random crop and color jittering, produced outlooks from images, and then positive and negative pairs are defined as outlooks from same and different images, respectively, in a batch.

Setup Dependencies

The recommended version for running the experiments is Python3. The results using this code have been generated with: torch (1.4.0) torchvision (0.5.0) scikit-learn (0.23.2) numpy (1.19.1) matplotlib (3.3.2) seaborn (0.11.0)

Project Structure

The skeletal overview of this project is as follows:

.
├── utils/
│     ├── __init__.py
│     ├── model.py
│     ├── ntxent.py
│     ├── plotfuncs.py
│     └── transforms.py
├── results/
│    ├── model/
│    │     ├── lossesfile.npz
│    │     ├── model.pth
│    │     ├── optimizer.pth
│    ├── plots/
│    │     ├── training_losses.png
├── linear_evaluation.py
├── main.py
├── simclr.py
├── README.md

Experiment Configuration

You can pass command line arguments to the files main.py for simclr training and linear_evaluation.py for linear classifier evaluation on top of the learned representations.

file main.py

datapath: Path to the data root folder which contains train and test folders
respath: Path to the results directory where the saved model and evaluation graphs would be stored.
-bs: The batch size for self-supervised training (default = 250)
-nw: The number of workers for loading data (default=2)
-c: if present, use cuda
--multiple_gpus: if multiple gpus are available, you can use them using this option

file linear_evaluation.py

datapath: Path to the data root folder which contains train and test folders
modelpath: Path to the trained self-supervised model
respath: Path to the results directory where the saved model and evaluation graphs would be stored.
-bs: The batch size for linear evaluation (default=250)
-nw: The number of workers for loading data (default=2)
-c: if present, use cuda
--multiple_gpus: if multiple gpus are available, you can use them using this option
--remove_top_layers: remove these many top layers from the overall network, they define the projection head

Example usage:

This command would do the self-supervised training on the dataset at '../milli_imagenet' and store the results in the 'results' directory. The training batch size is 250, cuda and multiple gpus are used.

python main.py '../milli_imagenet' 'results' -bs 250 -c --multiple_gpus &

This command would run the linear evaluator, for which the dataset is at '../milli_imagenet', stored self-supervised model is at 'results/model/model.pth' and would produce the results at 'results' (if any), uses cuda and multiple gpus. Batch size used is 125.

python linear_evaluation.py '../milli_imagenet/' 'results/model/model.pth' 'results' -c --multiple_gpus -bs 125

Dataset

We used the imagenet-5-categories dataset, which has a total of 1250 train and 250 test images. We used this version of this dataset. For linear evaluation, we used 250 train images (i.e. 10% of the train set).

Results

class	precision	recall	f1-score	support
car	0.8600	0.8600	0.8600	50
airplane	0.7679	0.8600	0.8113	50
elephant	0.7500	0.7200	0.7347	50
dog	0.5345	0.6200	0.5741	50
cat	0.7632	0.5800	0.6591	50

Test Accuracy: 72.80% (with projection head having 2 layers)

Pretrained model (number of epochs: 1000) with optimizer and loss file is present here.

References

SimCLR paper by Chen et. al:

@misc{chen2020simple,
      title={A Simple Framework for Contrastive Learning of Visual Representations}, 
      author={Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey Hinton},
      year={2020},
      eprint={2002.05709},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

sgsunil / simclr Goto Github PK

simclr's Introduction

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations in PyTorch

Introduction

Setup Dependencies

Project Structure

Experiment Configuration

Dataset

Results

References

simclr's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent