Initialization -> use of the pretrained InfoNCE checkpoint.pth.tar

Hi! Sorry for the late reply. yes the output you provided show

Hi! Sorry for the late reply. <div class="snippet-clipboard-content n

Hi! Sorry for the late reply. <div class="snippet-clipbo

Hi! Sorry for the late reply. <div class="s

Hi! Sorry for the late reply. <div class="snippet-clipbo

about Initialization & Alternation,about tengdahan/coclr

Comments (20)

TengdaHan commented on July 16, 2024 1

In our paper the linear probe result 52.3% for InfoNCE-rgb was trained for 800 epochs, if InfoNCE-rgb was trained for 500 epochs I get 46.8% linear probe results. I have updated the NeurIPS final version and (soon) the Arxiv version to correct this. You can also check this helpful issue: #3 (comment)
Also about RandomResizeCrop was discussed in the issue above. The "consistent" is still clip-wise, in pretraining stage, I concatenate two tensors together, apply "RandAug1" to the first half and "RandAug2" to the second half. In finetune stage, this consistent augment has no effect.

from coclr.

TengdaHan commented on July 16, 2024 1

I do not use dropout in the pre-training stage. The self-supervised pre-training is expected to overfit the huge dataset (but usually limited by the model capacity), it's not necessary to constraint network capacity by dropout some nodes.
I use dropout in downstream classification tasks to avoid fast overfitting the UCF101 and HMDB51 training set (which are much smaller in size).

from coclr.

TengdaHan commented on July 16, 2024

Hi! Sorry for the late reply.

yes the output you provided shows mode is initialized.
You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.
Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.
I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.
BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

from coclr.

junmin98 commented on July 16, 2024

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer!
I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1)
But I have a more question.

I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

from coclr.

KT27-A commented on July 16, 2024

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

from coclr.

junmin98 commented on July 16, 2024

thank you for your reply.
In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

from coclr.

KT27-A commented on July 16, 2024

Yeah, now it ran 200 epochs and got 0.69 top-1 acc when training. Did it make sense? I am a little confused that why it converged so slow with adam and lr 0.001.

from coclr.

TengdaHan commented on July 16, 2024

I use python3

from coclr.

TengdaHan commented on July 16, 2024

This is train/val curve of one of my experiments on fine-tuning InfoNCE-UCF101-RGB pre-trained models. Reduce lr by x0.1 on epoch 300. At 20 epochs I get 40+% accuracy. But it's true at 200 epochs I get ~70% accuracy.

I think the reason for slow converge is I use 0.9 dropout (to prevent fast overfitting).

from coclr.

KT27-A commented on July 16, 2024

Got it. Thank you!

from coclr.

KT27-A commented on July 16, 2024

Hi, Tengda, I finished the pre-training and fixed several minor bugs in eval/main_classifier.py. Now I got 0.452 linear evaluation performance with 100 epochs, which is lower than 0.523, reported in the paper. Do I misunderstand some methods or configs? Besides, I found that you use non-consistent RandomResizeCrop when pre-training but consistent RandomResizeCrop in fine-tuning, could you please tell me what your hypothesis is for this setting? Thanks. Looking forward to your reply. I used commands for evaluation is CUDA_VISIBLE_DEVICES=0 python main_classifier.py --net s3d \ --dataset ucf101 --ds 1 --batch_size 32 -j 0 --center_crop \ --test log-eval-linclr/ucf101-128_sp1_lincls_s3d_Adam_bs32_lr0.001_dp0.9_wd0.001_seq1_len32_ds1_train-last_pt\=..-log-pretrain-infonce_k2048_ucf101-2clip-128_s3d_bs32_lr0.001_seq2_len32_ds1-model-model_best_epoch292.pth.tar/model/model_best_epoch95.pth.tar

from coclr.

KT27-A commented on July 16, 2024

Got it. Thanks for your prompt and clear answers.

from coclr.

KT27-A commented on July 16, 2024

Did you mean that you train 300 epochs for pre-training stage and 800 epochs for fine-tuning stage?

from coclr.

KT27-A commented on July 16, 2024

Hi, Tengda, I found another thing confused me that the training set of UCF101-split1 has 9537 videos, but when I set bs=32, the total batches per epoch was 149 which is half of 9537//32=298. I haven't figured out why this happened.

from coclr.

TengdaHan commented on July 16, 2024

the roadmap of our paper Table1 experiment is (pretrain epochs in parenthesis):

InfoNCE-rgb(300) --------> CoCLR-Cyclex2(100x2) --------> our CoCLR-rgb, totally 500 epochs, 70.2% linear probe.
InfoNCE-rgb(300) ---> continue to train InfoNCE(200) ---> InfoNCE-rgb baseline for a fair comparison, totally 500 epochs, 46.8% linear probe.

The 800 epochs I mentioned above is also 'pretrain epochs'. InfoNCE-rgb(300+200) is a fair comparison and should be in Table1, but I unnecessarily put an InfoNCE-rgb(800) results, which I have corrected in NeurIPS final version.
Hope this is clear now? BTW, thanks for the feedback.

My batch_size is batchsize-per-GPU:

CoCLR/main_coclr.py

Line 494 in c95eba9

dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),

Are you using 2 GPUs? if yes, then 9537//(32*2)=149 is correct.

from coclr.

KT27-A commented on July 16, 2024

Got it. Thanks. I found that there is no dropout in main_nce.py and main_coclr.py but found it in eval/main_classifier.py (when fine-tuning the entire network). Does this mean that you only use dropout when fine-tuning the entire network?

from coclr.

KT27-A commented on July 16, 2024

Got it. Thanks.

from coclr.

KT27-A commented on July 16, 2024

Hi, Tengda, I found that A.RandomSizedCrop was used in validation and test, while people usually use isotropically resize + center crop for inference. Could you please tell me why you choose such a setting? I tested isotropically resize + center crop. There is not much performance difference. Thanks.

CoCLR/eval/main_classifier.py

Lines 738 to 744 in 110c83d

 elif mode == 'val' or mode == 'test': 

 transform = transforms.Compose([ 

 A.RandomSizedCrop(size=224, consistent=True, bottom_area=0.2), 

 A.Scale(args.img_dim), 

 A.ToTensor(), 

 ]) 

 return transform

from coclr.

TengdaHan commented on July 16, 2024

Val is just to monitor performance, doesn't really matter.
For test, I actually use "isotropically" 10-crop ((4corners + center) * 2flip):

CoCLR/eval/main_classifier.py

Line 457 in 110c83d

transform = transforms.Compose([

The line you pointed out was re-written during the final inference.

from coclr.

KT27-A commented on July 16, 2024

Got it. Thanks.

from coclr.

about Initialization & Alternation about coclr HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	elif mode == 'val' or mode == 'test':
	transform = transforms.Compose([
	A.RandomSizedCrop(size=224, consistent=True, bottom_area=0.2),
	A.Scale(args.img_dim),
	A.ToTensor(),
	])
	return transform