Giter VIP home page Giter VIP logo

simclr's Introduction

SimCLR

A PyTorch implementation of SimCLR based on ICML 2020 paper A Simple Framework for Contrastive Learning of Visual Representations.

Network Architecture image from the paper

Requirements

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
  • thop
pip install thop

Dataset

CIFAR10 dataset is used in this repo, the dataset will be downloaded into data directory by PyTorch automatically.

Usage

Train SimCLR

python main.py --batch_size 1024 --epochs 1000 
optional arguments:
--feature_dim                 Feature dim for latent vector [default value is 128]
--temperature                 Temperature used in softmax [default value is 0.5]
--k                           Top k most similar images used to predict the label [default value is 200]
--batch_size                  Number of images in each mini-batch [default value is 512]
--epochs                      Number of sweeps over the dataset to train [default value is 500]

Linear Evaluation

python linear.py --batch_size 1024 --epochs 200 
optional arguments:
--model_path                  The pretrained model path [default value is 'results/128_0.5_200_512_500_model.pth']
--batch_size                  Number of images in each mini-batch [default value is 512]
--epochs                      Number of sweeps over the dataset to train [default value is 100]

Results

There are some difference between this implementation and official implementation, the model (ResNet50) is trained on one NVIDIA TESLA V100(32G) GPU:

  1. No Gaussian blur used;
  2. Adam optimizer with learning rate 1e-3 is used to replace LARS optimizer;
  3. No Linear learning rate scaling used;
  4. No Linear Warmup and CosineLR Schedule used.
Evaluation Protocol Params (M) FLOPs (G) Feature Dim Batch Size Epoch Num τ K Top1 Acc % Top5 Acc % Download
KNN 24.62 1.31 128 512 500 0.5 200 89.1 99.6 model | gc5k
Linear 23.52 1.30 - 512 100 - - 92.0 99.8 model | f7j2

simclr's People

Contributors

leftthomas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

simclr's Issues

results reported in your repo are finetuning and not linear evalaution

Hello. I've seen in your code after you do the pre-training (in linear.py), you finetune the whole network, while the paper performs linear evaluation and not finetuning (it only trains the linear classifier layer with the ResNet-50 features frozen). In the paper, the results obtained with linear evaluation on CIFAR10 for 500 pre-training epochs and a batch size of 512 is around 93%. Your results are close (92%) but this is via fine-tuning the whole network (not linear evaluation as in the paper) which means it it logical to obtain higher score than linear evaluation. In fact, by doing this the results should outperform the resnet-50 supervised baseline which can get 93.62%.

Therefore, may you tell me the score you got via linear evaluation and without fine-tuning? Thanks!

Low accuracy after linear evaluation

Hello,

I ran main.py followed by linear.py using the default arguments. At the end of linear.py, my top-1 test accuracy was only able to reach 25%. Why is it that I do not achieve 92% top1 accuracy after the 100 epochs?

Thanks

Calculation of Loss in train() seems wrong?

In function train(), while calculating InfoNCE Loss, sim_matrix has the shape [2B, 2B-1]. However, shouldn't it be [2B, 2(B-1)]?

In the given code, while doing the mask_select on line 29, it will also select one augmented positive sample, and consider it as a negative sample. So the calculation of Loss is inaccurate. Correct me if my understanding is wrong. Thanks!

Why do you modify 'conv1' and remove maxpooling from ResNet50?

Thanks very much for such a straightforward implementation of SimCLR.

Why do you change conv1's strides from 2 to 1 and kernel_size from 7 to 3 ,and remove maxpooling from torchvision's resnet? Does those changes contribute to the performance?

SimCLR/model.py

Lines 12 to 16 in cee178b

for name, module in resnet50().named_children():
if name == 'conv1':
module = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
if not isinstance(module, nn.Linear) and not isinstance(module, nn.MaxPool2d):
self.f.append(module)

The encoder of torchvision's resnet downsamples the feature maps for 5 times (at 'conv1', 'maxpool', 'layer2', 'layer3', 'layer4') and the size of the output will finally be 1/32 of the input image. However the encoder of your SimCLR downsamples the input for just 3 times (1/8). Are those changes done simply because you're using cifar10 with 32 pixels as a dataset? Aren't those changes appropriate for datasets with a large number of pixels, such as ImageNet?

Planning to run SimCLR on my dataset

Hi,
Thanks for your nice work.
I am planning to run SimCLR on my dataset. I wonder if I need to adjust the structure of network or add some tricks. I'll appreciate any advice.

Pretrain model

Hi, can you put your pretrain model to drive ?
I can't register account to download your pretrain model.

Batch size doesn't affect

In my experiments, the evaluation accuracy get worse results when using big batch size. For example, at default setting, batch size 256 get 79.23% yet 512 get 78.38%, and it same as resnet18 and 50. I think it contray to paper's result of simclr.
Does anyone have same issue?

Batch size 1024 in V100?

Hi,

Thanks for the great codebase.

I've tried training the proposed batch_size 1024 on a V100, but it runs out of memory. Am I doing something wrong or is the limit the default 512 (instead of the proposed 1024 in README)? For me the memory for 512 is 29.9, close to the limit of 32.

Thanks!

Pretrained model load error

Hi leftthomas,
When I try to load your trained model 128_0.5_200_512_500_model.pth on baidu drive, I got a following error,
code:
torch.load("pretrained/128_0.5_200_512_500_model.pth")
error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-2b50ac73dd6d> in <module>
----> 1 torch.load("pretrained/128_0.5_200_512_500_model.pth")

~/py37/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    527             with _open_zipfile_reader(f) as opened_zipfile:
    528                 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 529         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    530 
    531 

~/py37/lib/python3.7/site-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
    707     for key in deserialized_storage_keys:
    708         assert key in deserialized_objects
--> 709         deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
    710         if offset is not None:
    711             offset = f.tell()

RuntimeError: storage has wrong size: expected 4262493158830214735 got 512

torch.version = '1.4.0'
python version = 3.7.0
Can you help me to solve this?
Thanks
Tao

Pretrained Model Links

Thank you for this amazing work.
I am struggling to download the pretrained models from Baidu disk. I tried various things such as baidu-dl extension, pandownloader, etc. none of which is working. Is it possible for you to upload to a different service like Google Drive or Dropbox? Thanks in advance.

How to carry out semi supervised training?

firstly thx for ur code!

as for my quetion ,my idea is ↓
dataset_ train is modified to a large number of unlabeled training set images
memory_ data is changed to a small number of labeled training set images
test_ data is changed to the verification set images

Are my ideas of these modifications correct?
Or dataset_ train needs to add a small number of labeled training set images?

I look forward to your prompt reply!

Data augmentation in linear evaluation

Hello. Nice work. Regarding the linear evaluation after pre-training (linear.py), I’ve noticed that you used data augmentation just like in pre-training, as you do:

train_data = CIFAR10(root='data', train=True, transform=utils.train_transform, download=True)

However, the paper mentioned that they do not do any data augmentation during linear evaluation. Have you experimented without data augmentation in linear evaluation training?

DataParrallel to avoid memory issue

Change the model creation from line 114 in main.py to the following:

    # model setup and optimizer config
    model = Model(feature_dim)
    flops, params = profile(model, inputs=(torch.randn(1, 3, 32, 32),))
    flops, params = clever_format([flops, params])
    print('# Model Params: {} FLOPs: {}'.format(params, flops))
    model = torch.nn.DataParallel(model).cuda()

Otherwise, you will face CUDA out of memory issue even if you have enough GPU cards to support higher batch size

question about learning rate

hi @leftthomas

I have 2 questions below:

1- May i know the number of pre-training epochs that you did? Is it 1k? I understand that he number of linear evaluation epochs are 100, but what about the number pre-training epochs?

2- I've seen that you used constant learning rate of 1e-3 without any decay. I really appreciate the fact that you want to keep everything simple and avoid the warmup, LARS...etc, but shouldn't you at least decay the lr at some time? Have you tried this?

Problem in dictionary keys

Did anyone faced problems in the key mismatch of the pretrained model and encoder model using strict=True ?

loss function problem

Is there no need for l2 regularization in the loss function? This is not the same as the implementation of the (TORCH.NN.FUNCTIONAL.COSINE_SIMILARITY) function on the pytorch official website.

Benchmarking on CIFAR-100

I adapted this repo to work on CIFAR-100. I noticed that the accuracy computed using KNN varies greatly between subsequent epochs from around 40% to 1%. I believe that the learning rate of 1e-3 is very reasonable. Did you experiment with this dataset? Any ideas on what might be different?
Thanks!

Contrastive loss value

Hi,
thanks for your work. I was wondering about the convergence value of the contrastive loss, I can see it around 4~3 while training the model but I don't know how far it will go down, any idea about this? Also are you able to share graphs or logs of the loss function for your trained model. This can be helpful in understanding the model behavior.

I don't know how part of the code works

Hello, I don't quite understand the following code. Could you tell me what its specific process looks like? thank you!
I don't know how to get pred_labels
sim_matrix = torch.mm(feature, feature_bank)
# [B, K]
sim_weight, sim_indices = sim_matrix.topk(k=k, dim=-1)
# [B, K]
sim_labels = torch.gather(feature_labels.expand(data.size(0), -1), dim=-1, index=sim_indices)
sim_weight = (sim_weight / temperature).exp()

        # counts for each class
        one_hot_label = torch.zeros(data.size(0) * k, c, device=sim_labels.device)
        # [B*K, C]
        one_hot_label = one_hot_label.scatter(dim=-1, index=sim_labels.view(-1, 1), value=1.0)
        # weighted score ---> [B, C]
        pred_scores = torch.sum(one_hot_label.view(data.size(0), -1, c) * sim_weight.unsqueeze(dim=-1), dim=1)

        pred_labels = pred_scores.argsort(dim=-1, descending=True)

Why no upscale back to original size in augmentations?

The SimCLR paper says:

In this work, we sequentially apply three simple augmentations: random
cropping followed by resize back to the original size, random color distortions, and random Gaussian blur

but it seems like the augmentations used in this repository first do a random crop, but do not afterwards resize the crop back to the original size. Why the difference? Am I misunderstanding the SimCLR paper?

I guess the original paper doesn't make a lot of sense, since resizing back to the original size would cause the images in the batch to have different sizes.

Training

Are u using all data train in cifar for funting model ?

Question about linear finetune

Thanks so much for the codes. However, I am a little bit confused about the linear finetune. In the linear.py, you used the original Cifar10 dataset to finetune the classifier. So it becomes supervised learning. To my understanding, all the training processes( including pre-train and finetune) should be based on the fact that there is little labeled data.
train_data = CIFAR10(root='data', train=True, transform=utils.train_transform, download=True)
In B.5 of the original paper, it says using 10% labeled data.
I am still not sure how to fine-tune before evaluation. Should I only use a subset of CIFAR10 with labels, or use the full CIFAR10 but only 10% of data are labeled?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.