leftthomas / simclr Goto Github PK

A PyTorch implementation of SimCLR based on ICML 2020 paper "A Simple Framework for Contrastive Learning of Visual Representations"

Python 100.00%

pytorch unsupervised-learning cifar10 contrastive-loss

simclr's Introduction

SimCLR

A PyTorch implementation of SimCLR based on ICML 2020 paper A Simple Framework for Contrastive Learning of Visual Representations.

Requirements

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

thop

pip install thop

Dataset

CIFAR10 dataset is used in this repo, the dataset will be downloaded into data directory by PyTorch automatically.

Usage

Train SimCLR

python main.py --batch_size 1024 --epochs 1000 
optional arguments:
--feature_dim                 Feature dim for latent vector [default value is 128]
--temperature                 Temperature used in softmax [default value is 0.5]
--k                           Top k most similar images used to predict the label [default value is 200]
--batch_size                  Number of images in each mini-batch [default value is 512]
--epochs                      Number of sweeps over the dataset to train [default value is 500]

Linear Evaluation

python linear.py --batch_size 1024 --epochs 200 
optional arguments:
--model_path                  The pretrained model path [default value is 'results/128_0.5_200_512_500_model.pth']
--batch_size                  Number of images in each mini-batch [default value is 512]
--epochs                      Number of sweeps over the dataset to train [default value is 100]

Results

There are some difference between this implementation and official implementation, the model (ResNet50) is trained on one NVIDIA TESLA V100(32G) GPU:

No Gaussian blur used;
Adam optimizer with learning rate 1e-3 is used to replace LARS optimizer;
No Linear learning rate scaling used;
No Linear Warmup and CosineLR Schedule used.

Evaluation Protocol	Params (M)	FLOPs (G)	Feature Dim	Batch Size	Epoch Num	τ	K	Top1 Acc %	Top5 Acc %	Download
KNN	24.62	1.31	128	512	500	0.5	200	89.1	99.6	model \| gc5k
Linear	23.52	1.30	-	512	100	-	-	92.0	99.8	model \| f7j2

simclr's People

Contributors

Stargazers

Watchers

Forkers

dzcgaara twistedmove xiaopingzeng gasvn paarth3098 weidixie joshr17 josueortc yingdong-hu mahayat xrosliang yongduek liaopeiyuan k-stacke mldl ashwinipokle tianjipang asphalt93 boazbk yaminibansal suvansh chzhan xyzacademic violet998 tomron27 endeavour10020 helei0147 rezaarmand khuongnd ucasligang qiuweibin2005 christopher-beckham fagan2888 pycern flyingwing lufeng22 chnxindong daehyun-bae bravestpeng steven202 yif-yang aarchlichking omg777 cliff-bot xinguozju gatechke satrio-hw p16i gengxiaomeng decodyng qxcv dogecoin-developer s-n-naik kyhoolee stepheny755 yookoon fbnq tbj128 jinec snape-here peternara huiqx purewhites hehaodele saik2121 ih-cs6300 akahello haochenglouis xiaoxiaowang098 renjie3 zhiyugege xxuujjiiee wang-rq sheng-t lizhi2021 wr19960001 valaydave yunzhongxicao linbotang123 piotrmwojcik aneesh-aparajit nsr223 vieozhu ruibo5 harsh9524 louiszango rahul13ramesh spryin emersonzc dhfx3712 cooper-a boneseva whu-yh-jx smellyang keerthan2 rrk101 mrshouxingma karsonl siaer satomm1

simclr's Issues

results reported in your repo are finetuning and not linear evalaution

Hello. I've seen in your code after you do the pre-training (in linear.py), you finetune the whole network, while the paper performs linear evaluation and not finetuning (it only trains the linear classifier layer with the ResNet-50 features frozen). In the paper, the results obtained with linear evaluation on CIFAR10 for 500 pre-training epochs and a batch size of 512 is around 93%. Your results are close (92%) but this is via fine-tuning the whole network (not linear evaluation as in the paper) which means it it logical to obtain higher score than linear evaluation. In fact, by doing this the results should outperform the resnet-50 supervised baseline which can get 93.62%.

Therefore, may you tell me the score you got via linear evaluation and without fine-tuning? Thanks!

Low accuracy after linear evaluation

Hello,

I ran main.py followed by linear.py using the default arguments. At the end of linear.py, my top-1 test accuracy was only able to reach 25%. Why is it that I do not achieve 92% top1 accuracy after the 100 epochs?

Thanks

Calculation of Loss in train() seems wrong?

In function train(), while calculating InfoNCE Loss, sim_matrix has the shape [2B, 2B-1]. However, shouldn't it be [2B, 2(B-1)]?

In the given code, while doing the mask_select on line 29, it will also select one augmented positive sample, and consider it as a negative sample. So the calculation of Loss is inaccurate. Correct me if my understanding is wrong. Thanks!

Hello. How long did it take to pretrain your model with an NVIDIA TESLA V100(32G) GPU?

Why do you modify 'conv1' and remove maxpooling from ResNet50?

Thanks very much for such a straightforward implementation of SimCLR.

Why do you change conv1's strides from 2 to 1 and kernel_size from 7 to 3 ,and remove maxpooling from torchvision's resnet? Does those changes contribute to the performance?

SimCLR/model.py

Lines 12 to 16 in cee178b

 for name, module in resnet50().named_children(): 

 if name == 'conv1': 

 module = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False) 

 if not isinstance(module, nn.Linear) and not isinstance(module, nn.MaxPool2d): 

 self.f.append(module)

The encoder of torchvision's resnet downsamples the feature maps for 5 times (at 'conv1', 'maxpool', 'layer2', 'layer3', 'layer4') and the size of the output will finally be 1/32 of the input image. However the encoder of your SimCLR downsamples the input for just 3 times (1/8). Are those changes done simply because you're using cifar10 with 32 pixels as a dataset? Aren't those changes appropriate for datasets with a large number of pixels, such as ImageNet?

Planning to run SimCLR on my dataset

Hi,
Thanks for your nice work.
I am planning to run SimCLR on my dataset. I wonder if I need to adjust the structure of network or add some tricks. I'll appreciate any advice.

what does total_num*100 mean

Pretrain model

Hi, can you put your pretrain model to drive ?
I can't register account to download your pretrain model.

Batch size doesn't affect

In my experiments, the evaluation accuracy get worse results when using big batch size. For example, at default setting, batch size 256 get 79.23% yet 512 get 78.38%, and it same as resnet18 and 50. I think it contray to paper's result of simclr.
Does anyone have same issue?

Batch size 1024 in V100?

Hi,

Thanks for the great codebase.

I've tried training the proposed batch_size 1024 on a V100, but it runs out of memory. Am I doing something wrong or is the limit the default 512 (instead of the proposed 1024 in README)? For me the memory for 512 is 29.9, close to the limit of 32.

Thanks!

Pretrained model load error

Hi leftthomas,
When I try to load your trained model 128_0.5_200_512_500_model.pth on baidu drive, I got a following error,
code:
torch.load("pretrained/128_0.5_200_512_500_model.pth")
error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-2b50ac73dd6d> in <module>
----> 1 torch.load("pretrained/128_0.5_200_512_500_model.pth")

~/py37/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    527             with _open_zipfile_reader(f) as opened_zipfile:
    528                 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
--> 529         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    530 
    531 

~/py37/lib/python3.7/site-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
    707     for key in deserialized_storage_keys:
    708         assert key in deserialized_objects
--> 709         deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
    710         if offset is not None:
    711             offset = f.tell()

RuntimeError: storage has wrong size: expected 4262493158830214735 got 512

torch.version = '1.4.0'
python version = 3.7.0
Can you help me to solve this?
Thanks
Tao

Pretrained Model Links

Thank you for this amazing work.
I am struggling to download the pretrained models from Baidu disk. I tried various things such as baidu-dl extension, pandownloader, etc. none of which is working. Is it possible for you to upload to a different service like Google Drive or Dropbox? Thanks in advance.

How to carry out semi supervised training？

firstly thx for ur code!

as for my quetion ,my idea is ↓
dataset_ train is modified to a large number of unlabeled training set images
memory_ data is changed to a small number of labeled training set images
test_ data is changed to the verification set images

Are my ideas of these modifications correct?
Or dataset_ train needs to add a small number of labeled training set images?

I look forward to your prompt reply！

Data augmentation in linear evaluation

Hello. Nice work. Regarding the linear evaluation after pre-training (linear.py), I’ve noticed that you used data augmentation just like in pre-training, as you do:

train_data = CIFAR10(root='data', train=True, transform=utils.train_transform, download=True)

However, the paper mentioned that they do not do any data augmentation during linear evaluation. Have you experimented without data augmentation in linear evaluation training?

Have you tried the Global BN mentioned in the paper?

In the original paper, they mentioned global BN to improve the performance. What do you think about it?

cifar10上linear的结果超过了论文中的结果(92.0vs90.6)

请问下是什么原因导致的呢

DataParrallel to avoid memory issue

Change the model creation from line 114 in main.py to the following:

    # model setup and optimizer config
    model = Model(feature_dim)
    flops, params = profile(model, inputs=(torch.randn(1, 3, 32, 32),))
    flops, params = clever_format([flops, params])
    print('# Model Params: {} FLOPs: {}'.format(params, flops))
    model = torch.nn.DataParallel(model).cuda()

Otherwise, you will face CUDA out of memory issue even if you have enough GPU cards to support higher batch size

How can you load 512 batch size ?

Augmentation during linear evaluation

It seems the SimCLR paper did not use augmentation during linear evaluation. Shall we use the test_transform here?
https://github.com/leftthomas/SimCLR/blob/master/linear.py#L72

question about learning rate

hi @leftthomas

I have 2 questions below:

1- May i know the number of pre-training epochs that you did? Is it 1k? I understand that he number of linear evaluation epochs are 100, but what about the number pre-training epochs?

2- I've seen that you used constant learning rate of 1e-3 without any decay. I really appreciate the fact that you want to keep everything simple and avoid the warmup, LARS...etc, but shouldn't you at least decay the lr at some time? Have you tried this?

Problem in dictionary keys

Did anyone faced problems in the key mismatch of the pretrained model and encoder model using strict=True ?

Question about positive similarity in training process

Sorry, I misunderstood. Please ignore it.

loss function problem

Is there no need for l2 regularization in the loss function? This is not the same as the implementation of the (TORCH.NN.FUNCTIONAL.COSINE_SIMILARITY) function on the pytorch official website.

How can I perform data enhancement for other data sets?

Training time.

How long does it take to train your model?

Benchmarking on CIFAR-100

I adapted this repo to work on CIFAR-100. I noticed that the accuracy computed using KNN varies greatly between subsequent epochs from around 40% to 1%. I believe that the learning rate of 1e-3 is very reasonable. Did you experiment with this dataset? Any ideas on what might be different?
Thanks!

Contrastive loss value

Hi,
thanks for your work. I was wondering about the convergence value of the contrastive loss, I can see it around 4~3 while training the model but I don't know how far it will go down, any idea about this? Also are you able to share graphs or logs of the loss function for your trained model. This can be helpful in understanding the model behavior.

In linear evaluation stage, 2048-embeddings are normalized before fc?

Using V100(32G) still get RuntimeError: CUDA out of memory.

Also, batch_size =512, k = 200

Adam or SGD?

Hi @leftthomas ! Thank you very much for your great effort!

In SimCLR paper they used SGD/momentum. Why did you use Adam?

https://github.com/leftthomas/SimCLR/blob/master/main.py#L118

Thanks!

I don't know how part of the code works

Hello, I don't quite understand the following code. Could you tell me what its specific process looks like? thank you！
I don't know how to get pred_labels
sim_matrix = torch.mm(feature, feature_bank)
# [B, K]
sim_weight, sim_indices = sim_matrix.topk(k=k, dim=-1)
# [B, K]
sim_labels = torch.gather(feature_labels.expand(data.size(0), -1), dim=-1, index=sim_indices)
sim_weight = (sim_weight / temperature).exp()

        # counts for each class
        one_hot_label = torch.zeros(data.size(0) * k, c, device=sim_labels.device)
        # [B*K, C]
        one_hot_label = one_hot_label.scatter(dim=-1, index=sim_labels.view(-1, 1), value=1.0)
        # weighted score ---> [B, C]
        pred_scores = torch.sum(one_hot_label.view(data.size(0), -1, c) * sim_weight.unsqueeze(dim=-1), dim=1)

        pred_labels = pred_scores.argsort(dim=-1, descending=True)

Why no upscale back to original size in augmentations?

The SimCLR paper says:

In this work, we sequentially apply three simple augmentations: random
cropping followed by resize back to the original size, random color distortions, and random Gaussian blur

but it seems like the augmentations used in this repository first do a random crop, but do not afterwards resize the crop back to the original size. Why the difference? Am I misunderstanding the SimCLR paper?

I guess the original paper doesn't make a lot of sense, since resizing back to the original size would cause the images in the batch to have different sizes.

Useless download link!

Please fix the pretrained models download link.
The current link does not work

Training

Are u using all data train in cifar for funting model ?

Question about linear finetune

Thanks so much for the codes. However, I am a little bit confused about the linear finetune. In the linear.py, you used the original Cifar10 dataset to finetune the classifier. So it becomes supervised learning. To my understanding, all the training processes( including pre-train and finetune) should be based on the fact that there is little labeled data.
train_data = CIFAR10(root='data', train=True, transform=utils.train_transform, download=True)
In B.5 of the original paper, it says using 10% labeled data.
I am still not sure how to fine-tune before evaluation. Should I only use a subset of CIFAR10 with labels, or use the full CIFAR10 but only 10% of data are labeled?

下游任务训练的数据量

你好，你的代码中训练linear时是用了整个数据集吗？原论文不是只用1%的数据吗？

	for name, module in resnet50().named_children():
	if name == 'conv1':
	module = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
	if not isinstance(module, nn.Linear) and not isinstance(module, nn.MaxPool2d):
	self.f.append(module)