Giter VIP home page Giter VIP logo

Comments (20)

TengdaHan avatar TengdaHan commented on July 16, 2024 1
  1. In our paper the linear probe result 52.3% for InfoNCE-rgb was trained for 800 epochs, if InfoNCE-rgb was trained for 500 epochs I get 46.8% linear probe results. I have updated the NeurIPS final version and (soon) the Arxiv version to correct this. You can also check this helpful issue: #3 (comment)
  2. Also about RandomResizeCrop was discussed in the issue above. The "consistent" is still clip-wise, in pretraining stage, I concatenate two tensors together, apply "RandAug1" to the first half and "RandAug2" to the second half. In finetune stage, this consistent augment has no effect.

from coclr.

TengdaHan avatar TengdaHan commented on July 16, 2024 1

I do not use dropout in the pre-training stage. The self-supervised pre-training is expected to overfit the huge dataset (but usually limited by the model capacity), it's not necessary to constraint network capacity by dropout some nodes.
I use dropout in downstream classification tasks to avoid fast overfitting the UCF101 and HMDB51 training set (which are much smaller in size).

from coclr.

TengdaHan avatar TengdaHan commented on July 16, 2024

Hi! Sorry for the late reply.

  1. yes the output you provided shows mode is initialized.
    You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
    The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.
  2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.
  3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.
  4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

from coclr.

junmin98 avatar junmin98 commented on July 16, 2024

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer!
I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1)
But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer!
I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1)
But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

from coclr.

junmin98 avatar junmin98 commented on July 16, 2024

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer!
I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1)
But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply.
In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer!
I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1)
But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply.
In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

Yeah, now it ran 200 epochs and got 0.69 top-1 acc when training. Did it make sense? I am a little confused that why it converged so slow with adam and lr 0.001.

from coclr.

TengdaHan avatar TengdaHan commented on July 16, 2024

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer!
I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1)
But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

I use python3

from coclr.

TengdaHan avatar TengdaHan commented on July 16, 2024

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer!
I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1)
But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply.
In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

Yeah, now it ran 200 epochs and got 0.69 top-1 acc when training. Did it make sense? I am a little confused that why it converged so slow with adam and lr 0.001.

This is train/val curve of one of my experiments on fine-tuning InfoNCE-UCF101-RGB pre-trained models. Reduce lr by x0.1 on epoch 300. At 20 epochs I get 40+% accuracy. But it's true at 200 epochs I get ~70% accuracy.
infonce-ft-ucf-128-example
I think the reason for slow converge is I use 0.9 dropout (to prevent fast overfitting).

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer!
I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1)
But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply.
In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

Yeah, now it ran 200 epochs and got 0.69 top-1 acc when training. Did it make sense? I am a little confused that why it converged so slow with adam and lr 0.001.

This is train/val curve of one of my experiments on fine-tuning InfoNCE-UCF101-RGB pre-trained models. Reduce lr by x0.1 on epoch 300. At 20 epochs I get 40+% accuracy. But it's true at 200 epochs I get ~70% accuracy.
infonce-ft-ucf-128-example
I think the reason for slow converge is I use 0.9 dropout (to prevent fast overfitting).

Got it. Thank you!

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024

Hi! Sorry for the late reply.

1. yes the output you provided shows mode is initialized.
   You can check here: https://github.com/TengdaHan/CoCLR/blob/main/utils/utils.py#L88
   The "weights not used from pretrained file" is actually None; the "weights not loaded into new model" are all related to momentum queue, I choose to re-accumulate these variables for a better-quality queue.

2. Alternation stage. My acc is always between 0-1 (before percentage), do you mean acc is less than 0.01? that's strange. Acc of alternation stage should be similar/better than infoNCE stage.

3. I didn't get this "AttributeError: 'str' object has no attribute 'decode'" with the same code.

4. BTW, I just slightly updated the code since I am also running more experiments with the same version of code. You can have a look.

thank you for your answer!
I understand most of your answers! (I understand the question of accuracy! have the same accuracy as you. between 0 and 1)
But I have a more question.

  1. I ran it again with the code you posted a few hours ago, but i got an "AttributeError: 'str' object has no attribute 'decode'". Do you use python2?

It seems OK to delete 'decode' directly. Now the model seems hard to converge, 20 epoch for 0.09 top-1 acc, have you met such a situation? Thanks.

thank you for your reply.
In your case, do you mean that you get a 0.09 top-1 acc when you run it 20 epochs?

Yeah, now it ran 200 epochs and got 0.69 top-1 acc when training. Did it make sense? I am a little confused that why it converged so slow with adam and lr 0.001.

This is train/val curve of one of my experiments on fine-tuning InfoNCE-UCF101-RGB pre-trained models. Reduce lr by x0.1 on epoch 300. At 20 epochs I get 40+% accuracy. But it's true at 200 epochs I get ~70% accuracy.
infonce-ft-ucf-128-example
I think the reason for slow converge is I use 0.9 dropout (to prevent fast overfitting).

Hi, Tengda, I finished the pre-training and fixed several minor bugs in eval/main_classifier.py. Now I got 0.452 linear evaluation performance with 100 epochs, which is lower than 0.523, reported in the paper. Do I misunderstand some methods or configs? Besides, I found that you use non-consistent RandomResizeCrop when pre-training but consistent RandomResizeCrop in fine-tuning, could you please tell me what your hypothesis is for this setting? Thanks. Looking forward to your reply. I used commands for evaluation is CUDA_VISIBLE_DEVICES=0 python main_classifier.py --net s3d \ --dataset ucf101 --ds 1 --batch_size 32 -j 0 --center_crop \ --test log-eval-linclr/ucf101-128_sp1_lincls_s3d_Adam_bs32_lr0.001_dp0.9_wd0.001_seq1_len32_ds1_train-last_pt\=..-log-pretrain-infonce_k2048_ucf101-2clip-128_s3d_bs32_lr0.001_seq2_len32_ds1-model-model_best_epoch292.pth.tar/model/model_best_epoch95.pth.tar

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024

Got it. Thanks for your prompt and clear answers.

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024
  1. In our paper the linear probe result 52.3% for InfoNCE-rgb was trained for 800 epochs, if InfoNCE-rgb was trained for 500 epochs I get 46.8% linear probe results. I have updated the NeurIPS final version and (soon) the Arxiv version to correct this. You can also check this helpful issue: #3 (comment)
  2. Also about RandomResizeCrop was discussed in the issue above. The "consistent" is still clip-wise, in pretraining stage, I concatenate two tensors together, apply "RandAug1" to the first half and "RandAug2" to the second half. In finetune stage, this consistent augment has no effect.

Did you mean that you train 300 epochs for pre-training stage and 800 epochs for fine-tuning stage?

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024
  1. In our paper the linear probe result 52.3% for InfoNCE-rgb was trained for 800 epochs, if InfoNCE-rgb was trained for 500 epochs I get 46.8% linear probe results. I have updated the NeurIPS final version and (soon) the Arxiv version to correct this. You can also check this helpful issue: #3 (comment)
  2. Also about RandomResizeCrop was discussed in the issue above. The "consistent" is still clip-wise, in pretraining stage, I concatenate two tensors together, apply "RandAug1" to the first half and "RandAug2" to the second half. In finetune stage, this consistent augment has no effect.

Hi, Tengda, I found another thing confused me that the training set of UCF101-split1 has 9537 videos, but when I set bs=32, the total batches per epoch was 149 which is half of 9537//32=298. I haven't figured out why this happened.

from coclr.

TengdaHan avatar TengdaHan commented on July 16, 2024
  1. the roadmap of our paper Table1 experiment is (pretrain epochs in parenthesis):
InfoNCE-rgb(300) --------> CoCLR-Cyclex2(100x2) --------> our CoCLR-rgb, totally 500 epochs, 70.2% linear probe.
InfoNCE-rgb(300) ---> continue to train InfoNCE(200) ---> InfoNCE-rgb baseline for a fair comparison, totally 500 epochs, 46.8% linear probe.

The 800 epochs I mentioned above is also 'pretrain epochs'. InfoNCE-rgb(300+200) is a fair comparison and should be in Table1, but I unnecessarily put an InfoNCE-rgb(800) results, which I have corrected in NeurIPS final version.
Hope this is clear now? BTW, thanks for the feedback.

  1. My batch_size is batchsize-per-GPU:
    dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),
    Are you using 2 GPUs? if yes, then 9537//(32*2)=149 is correct.

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024
  1. the roadmap of our paper Table1 experiment is (pretrain epochs in parenthesis):
InfoNCE-rgb(300) --------> CoCLR-Cyclex2(100x2) --------> our CoCLR-rgb, totally 500 epochs, 70.2% linear probe.
InfoNCE-rgb(300) ---> continue to train InfoNCE(200) ---> InfoNCE-rgb baseline for a fair comparison, totally 500 epochs, 46.8% linear probe.

The 800 epochs I mentioned above is also 'pretrain epochs'. InfoNCE-rgb(300+200) is a fair comparison and should be in Table1, but I unnecessarily put an InfoNCE-rgb(800) results, which I have corrected in NeurIPS final version.
Hope this is clear now? BTW, thanks for the feedback.

  1. My batch_size is batchsize-per-GPU:
    dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),

    Are you using 2 GPUs? if yes, then 9537//(32*2)=149 is correct.

Got it. Thanks. I found that there is no dropout in main_nce.py and main_coclr.py but found it in eval/main_classifier.py (when fine-tuning the entire network). Does this mean that you only use dropout when fine-tuning the entire network?

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024

I do not use dropout in the pre-training stage. The self-supervised pre-training is expected to overfit the huge dataset (but usually limited by the model capacity), it's not necessary to constraint network capacity by dropout some nodes.
I use dropout in downstream classification tasks to avoid fast overfitting the UCF101 and HMDB51 training set (which are much smaller in size).

Got it. Thanks.

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024

Hi, Tengda, I found that A.RandomSizedCrop was used in validation and test, while people usually use isotropically resize + center crop for inference. Could you please tell me why you choose such a setting? I tested isotropically resize + center crop. There is not much performance difference. Thanks.

elif mode == 'val' or mode == 'test':
transform = transforms.Compose([
A.RandomSizedCrop(size=224, consistent=True, bottom_area=0.2),
A.Scale(args.img_dim),
A.ToTensor(),
])
return transform

from coclr.

TengdaHan avatar TengdaHan commented on July 16, 2024

Val is just to monitor performance, doesn't really matter.
For test, I actually use "isotropically" 10-crop ((4corners + center) * 2flip):

transform = transforms.Compose([

The line you pointed out was re-written during the final inference.

from coclr.

KT27-A avatar KT27-A commented on July 16, 2024

Val is just to monitor performance, doesn't really matter.
For test, I actually use "isotropically" 10-crop ((4corners + center) * 2flip):

transform = transforms.Compose([

The line you pointed out was re-written during the final inference.

Got it. Thanks.

from coclr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.