tengdahan / coclr Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.
License: Apache License 2.0
[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.
License: Apache License 2.0
CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 --dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 --epochs 100 --schedule 80 --name_prefix Cycle1-FlowMining_ -j 4 --pretrain /mypath/CoCLR/pretrained_by_TH/InfoNCE-ucf101-rgb-128-s3d-ep399.pth.tar /mypath/CoCLR/pretrained_by_TH/InfoNCE-ucf101-f-128-s3d-ep396.pth.tar
Why is the weights of the pretrained model not used??
So I entered what I wrote above into the terminal, and now I'm Training. (i.e Cycle 1 FlowMining)
But acc@1 and acc@5 don't go over 1, is this the right value to have? Or is something wrong?
++ Additional
If something's wrong, there's one thing I'm concerned about:
in lmdb_dataset.py, I got a error for i.decode():
AttributeError: 'str' object has no attribute 'decode'
To fix this, I do that:
self.db_keys_flow = msgpack.loads(txn.get(b'keys'), raw=True)
self.db_order_flow = msgpack.loads(txn.get(b'order'), raw=True)
.
.
self.db_order_rgb = msgpack.unpackb(txn.get(b'order'),raw=True)
.
.
raw_rgb = msgpack.loads(txn.get(self.get_video_id_rgb[vname].encode('ascii')), raw=True)
raw_flow = msgpack.loads(txn.get(self.get_video_id_flow[vname].encode('ascii')), raw=True)
I added "raw=True" and is this causing an error?
Thanks for your code, it helps me a lot. I have some questions:
Hi Tengda,
Thanks for your code! Can you tell me more details about the format of the train_split01.csv.
When i try :
CUDA_VISIBLE_DEVICES=0,1,2 python main_classifier.py --net s3d --dataset ucf101 --seq_len 32 --ds 1 --batch_size 32 --train_what last --epochs 30 --schedule 60 80 --optim sgd --lr 1e-1 --wd 1e-3 --final_bn --pretrain CoCLR-ucf101-rgb-128-s3d-ep182.pth
Out:
usage: main_classifier.py [-h] [--net NET] [--model MODEL] [--dataset DATASET]
[--which_split WHICH_SPLIT] [--seq_len SEQ_LEN]
[--num_seq NUM_SEQ] [--num_fc NUM_FC] [--ds DS]
[--batch_size BATCH_SIZE] [--optim OPTIM] [--lr LR]
[--schedule [SCHEDULE [SCHEDULE ...]]] [--wd WD]
[--dropout DROPOUT] [--epochs EPOCHS]
[--start_epoch START_EPOCH] [--gpu GPU]
[--train_what TRAIN_WHAT] [--img_dim IMG_DIM]
[--print_freq PRINT_FREQ] [--eval_freq EVAL_FREQ]
[--reset_lr] [--prefix PREFIX] [-j WORKERS] [--cos]
[--resume RESUME] [--pretrain PRETRAIN]
[--test TEST] [--retrieval] [--dirname DIRNAME]
[--center_crop] [--five_crop] [--ten_crop]
main_classifier.py: error: unrecognized arguments: --final_bn
May I ask how to use the command line command --final_bn. After the above error occurred, I deleted --final_bn. Although it can run normally, it shows:
Weights not loaded into new model:
final_bn.weight
final_bn.bias
final_bn.running_mean
final_bn.running_var
final_bn.num_batches_tracked
final_fc.0.weight
final_fc.0.bias
Thanks
Hi, thanks for the amazing work. I used the ucf101-infonce pre-trained model you released. I test the retrieval result use eval/main_classifier.py. Surprisingly, I get
NN@1 | @5 | @10 | @20 | @50
36.11 | 52.31 | 61.72 | 71 | 82.05
which is much higher (3%) than you reported 33.1% of NN@1. Can you verify the result of the pre-trained model? Does anyone meet a similar problem? Is there something different between the default config in the code and you adopted?
Hi authors,
If yes, can you show me some demo code for such a downstream task?
Hello,
I learned a lot from your paper,but when i fine-tuned it and tested it with ten crop,there was an error,showing that requested crop sizes(224,224) is bigger than input size(128,128),It is written in your paper that the input is 128 x 128,so I would like to know why the error is reported here.
Is the size of data set frames and optical flow always 128 x 128 throughout the work?
Is the ucf101 data set you provided 128 x 128? The data set generated by myself is only 10 G,Could you elaborate on the parameters of the data set,such as size,quality?
I have been thinking for a long time ,I hope you can provide me with some help,Thank you very much.
Have you tried wot evaluate the result of infonce pretrianed with kinetics? My results show no obvious improvement.
Hi, im trying to replicate your result on the alternation stage, I now use two init models you provided (both 400~ epochs). I have two questions.
1). According to your paper, "At the alternation stage, on UCF101 the model is trained for two cycles, where each cycle includes 200 epochs, i.e. RGB and Flow networks are each trained for 100 epochs". Does that main I need to run main_coclr.py four times? each time with 100 epoch and the newest two pretrained models I have from the previous training process?
2). If so, what lr do you use in each of four 100 epochs in the alternation stage? I also checked the COCLR pretrained model you provided, it seems in 182 epoch and 109 epoch the lr is 1e-4. Is that mean I need to train the second cycles with larger lr, e.g. 1e-2, and decay down to 1e-4?
Best Regards,
Yuqi
I have tested the CoCLR RGB model provided in the repo. The results I get is actually better than the ones provided in the paper. However, when I try reproducing the CoCLR RGB model by training from scratch, the improvement is something around 10 % compared to InfoNCE model. I did not manage to reproduce the jump in accuracy from 45 % to 70% for the RGB model with one cycle (FlowMining).
Btw, while performing late fusion with RGB and Flow - the results again matches with the ones provided in the paper (due to their complementary nature).
Is there something I am missing while training CoCLR-RGB?
Hi, thanks for your great work, but I am wondering why you use color jitter to augment the inputs when testing (L463 in eval/main_classifier.py)?
No CoCLR implementation contained in model.pretrain😂
In L490 and L494, variable 'vname', 'prob_last' referenced before assignment.
Hi,
I'm having trouble accessing the link for downloading the ucf101 flow lmdb data: http://thor.robots.ox.ac.uk/~vgg/data/CoCLR/ucf101_flow_lmdb.tar
It would return a 'ERR_CONNECTION_REFUSED' error.
when i try
CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --net s3d --dataset ucf101 --seq_len 32 --ds 1 --batch_size 32 --train_what ft --epochs 500 --schedule 400 450 --test CoCLR-ucf101-rgb-128-s3d-ep182.pth --ten_crop
out:
...
TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
...
File "../dataset/lmdb_dataset.py", line 182, in getitem
frame_index = self.frame_sampler(vlen)
File "../dataset/lmdb_dataset.py", line 122, in frame_sampler
seq_idx = seq_idx.flatten(0)
TypeError: order must be str, not int
Hi Tengda,
I am currently trying to replicate your CoCLR result as one of the baselines in our work with the code you provide. However, I encounter some reproduction issues during the training. I understand that the code is not ready yet. It would be much appreciated if you could help us with replication. Thank you so much!
I found out that the Top 1 MoCo accuracy is quite low (only 4-5 percent in UCF101) with 1e-3 lr and Adam Optimizer, 1e-5 weight decay, 2048 moco queue size and 128 batch size. I wonder if you could provide a detailed training command for our reference.
The augmentation is not really clip-wise consistent since the value passed in is false. I wonder if this version is not final version. Could you provide the correct version of the augmentation you use?
Currently the code for data loader is not released and I don't know how input is prepared in data loader to be passed to TwoCropTransform and OneCropTransform. Could you please share the data loader code for our better replication?
Best Regards,
Hualin
Can you give the example command for the linear probe and fine-tuning ?
Hi Tengda,
Could you please show the file structure of 'train_split.csv' for K400 dataset? When I run lmdb_dataset.py, the csv file I downloaded from the DeepMind website is not compatible with the code.
Many Thanks
Hi Tengda,
Nice work and thanks for sharing the code! I have a question regarding results in table 1 of CoCLR paper. I notice that supervised training with RGB input on S3D-G architecture on UCF101 yields 77.0% top 1 accuracy. I have run similar supervised training experiments on 2d3d network (MemDPC one) without initialization of any weights (such as ImageNet 2d weight) but I encounter serious overfitting issue and can only get 40+ top 1 accuracy on UCF101. So I think this result is unusually high. I wonder if you use initialization of other weights or you train it from scratch. If you train it from scratch, have you encountered any overfitting issue on S3D-G architecture?
By the way, the overfitting issue on small video datasets for 3d resnet is validated by this paper: https://openaccess.thecvf.com/content_cvpr_2018/html/Hara_Can_Spatiotemporal_3D_CVPR_2018_paper.html
Looking forward to your reply.
Best Regards,
Hualin
Hi tengda,
I want to reproduce the result of infonce on UCF101. However, when I ran the code (in the section of "InfoNCE pretrain on UCF101-RGB", I did not change anything), the program stuck when creating data loader.
Could you help me to figure out this problem? Thanks very much!
Below is the console output:
`...
module.encoder_k.0.Mixed_5c.branch3.1.bn.bias False
module.encoder_k.2.weight False
module.encoder_k.2.bias False
module.encoder_k.4.weight False
module.encoder_k.4.bias False
=================================
TransformController: [<utils.augmentation.TwoClipTransform object at 0x7f8e866e6850>, <utils.augmentation.OneClipTransform object at 0x7f8e866e6990>] with weights: [0.5, 0.5]
Loading data for "train" mode
Loading LMDB from /data1/Deep Learning/Dataset/UCF101/ucf101_frame.lmdb, split:1
Frame Dataset from "/data1/Deep Learning/Code/CoCLR/dataset/../process_data/data/ucf101" has #class 101
filter out too short videos ...
Creating data loaders for "train" mode
`
I have now completed the CoCLR training course. In other words, it was completed up to Cycle 2.
So I'm trying to do a downstream task.
for linear probe
CUDA_VISIBLE_DEVICES=0 main_classifier.py --pretrain {cycle2_rgb_pretrained.pth.tar}
i got a new pretrained model.
so, to obtain the test result for action recognition
CUDA_VISIBLE_DEVICES=2,3 main_classifier.py --test {new_pretrained_epoch9.pth.tar} --ten_crop
but I got an error
Traceback (most recent call last):
File "main_classifier.py", line 822, in
main(args)
File "main_classifier.py", line 204, in main
test_10crop(test_dataset, model, ce_loss, transform_test_cuda, device, epoch, args)
File "main_classifier.py", line 482, in test_10crop
for idx, (input_seq, target) in tqdm(enumerate(data_loader), total=len(data_loader)):
File "/home/cvip-lab/anaconda3/envs/junmin/lib/python3.6/site-packages/tqdm/std.py", line 1158, in iter
for obj in iterable:
File "/home/cvip-lab/anaconda3/envs/junmin/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/cvip-lab/anaconda3/envs/junmin/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1065, in _next_data
return self._process_data(data)
File "/home/cvip-lab/anaconda3/envs/junmin/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/home/cvip-lab/anaconda3/envs/junmin/lib/python3.6/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
PIL.UnidentifiedImageError: Caught UnidentifiedImageError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/cvip-lab/anaconda3/envs/junmin/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/home/cvip-lab/anaconda3/envs/junmin/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/cvip-lab/anaconda3/envs/junmin/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "../dataset/lmdb_dataset.py", line 182, in getitem
seq = [pil_from_raw_rgb(raw[i]) for i in frame_index]
File "../dataset/lmdb_dataset.py", line 182, in
seq = [pil_from_raw_rgb(raw[i]) for i in frame_index]
File "../dataset/lmdb_dataset.py", line 39, in pil_from_raw_rgb
return Image.open(BytesIO(raw)).convert('RGB')
File "/home/cvip-lab/anaconda3/envs/junmin/lib/python3.6/site-packages/PIL/Image.py", line 2944, in open
"cannot identify image file %r" % (filename if filename else fp)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f6f3d880fc0>
Checking the updated code, the "self-acc" part has been newly added.
what is self-acc?
Hi, I am trying to download your train/val split csv file here: https://github.com/TengdaHan/CoCLR/tree/main/process_data/data/k400
But it says 403 forbidden. I believe you put it in some internal-only storage?
First of all, thank you so much for sharing source code.
With the newly updated code, I'm going to run the code for ucf101 and hbdb51.
I have extracted the frame of dataset and converted it to .lmdb (I got ucf101_frame.lmdb and ucf101_frame.lmdb-order)
So I tried to evaluate with the pre-trained model you uploaded.
But, I got the error:
FileNotFoundError: [Error 2] No such file or directory: 'my path/dataset/ucf101/video_source.json'
Can you tell me what 'video_source.json' is and how to get it?
Thank you!
Hi Tengda,
Thanks for the detailed instruction for this code. I am a newbie in this field, have a very simple question regarding to table 2, and in desparate need of your help. Thanks very much in advance!
Question: From what I understand, self-supervised learning could be used to learn essencial video representation. So I guess with weights learnt by self-supervised learning methods, training the S3D network on UCF-101 will yield better results than train with random initialization. From Table 2, I suppose 90.6 is the former, and 96.8 is the latter. Would you like to explain a bit why there is such a gap?
Line 282 in dec4738
selec_resnet
in other files. Through the parameters used by it, it is different from the select_backbone
above? If it is a spelling mistake, please correct it or maybe you can upload the code of it.
Thanks.
Hi, Thanks for releasing the code. Can you detail the package requirements like PyTorch, torchvision versions etc that were used to obtain the results?
Also, can you give some more information about how to set up datasets specifically KInetics-400 for pretraining? Do we need to compute optical flow separately and then create lmdb files for both RGB and flow?
Hi, I'm new to Kinetics-400 dataset. Can you provide some tutorial or instrcutions on how to generate lmdb for Kinetics-400 dataset. I find some useful message on non-local repositiry but not sure it's the proper way, thx ~
Hi tengda,
When I called lmdb_dataset.py in main_classifier.py , I got the following error:
Frame Dataset from "/root/coclr/dataset/../process_data/data/ucf101" has #class 101
Traceback (most recent call last):
File "main_classifier.py", line 818, in
main(args)
File "main_classifier.py", line 200, in main
test_retrieval(model, ce_loss, transform_test_cuda, device, epoch, args)
File "main_classifier.py", line 574, in test_retrieval
train_dataset = d_class(mode='train',
File "/root/coclr/eval/../dataset/lmdb_dataset.py", line 174, in init
super(UCF101LMDB, self).init(**kwargs)
File "/root/coclr/eval/../dataset/lmdb_dataset.py", line 96, in init
self.get_video_id = dict(zip([i.decode() for i in self.db_order],
File "/root/coclr/eval/../dataset/lmdb_dataset.py", line 96, in
self.get_video_id = dict(zip([i.decode() for i in self.db_order],
AttributeError: 'str' object has no attribute 'decode'
Thank you for your reply.
Hi tengda,
When I test lmdb_dataset.py using your provided lmdb, I got the following error:
Loading LMDB from */UCF101/ucf101_rgb_lmdb/ucf101_frame.lmdb, split:1 Frame Dataset from "*/Code/CoCLR/dataset/../process_data/data/ucf101" has #class 101 filter out too short videos ... Traceback (most recent call last): File "*/Code/CoCLR/dataset/lmdb_dataset.py", line 999, in <module> x = dataset[0] File "*/Code/CoCLR/dataset/lmdb_dataset.py", line 146, in __getitem__ raw = msgpack.loads(txn.get(self.get_video_id[vname].encode('ascii'))) File "msgpack/_unpacker.pyx", line 195, in msgpack._cmsgpack.unpackb UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
The test script is:
if __name__ == "__main__": dataset = UCF101LMDB_2CLIP(db_path=os.path.join(lmdb_root, 'UCF101/ucf101_rgb_lmdb/ucf101_frame.lmdb')) x = dataset[0] print(x)
However, when I use my custom lmdb, there is no error occurred.
The msgpack verison I used is 1.0.1.
Could you help me to figure it out? Thanks!
Quite useful work! Do you have a plan to release your trained model? Thanks!
Hi Tengda, thanks for these detailed answers. I looked into all of them, seems no detailed training instruction is given on using main_classifier.py to train from scratch. The thing is, I train on UCF101 with rgb from scratch, after 500 epochs, the reported validation accuracy is 46.1%, while in test set it is only 3.41% (center crop only, top1). The detailed commands are as below:
-- Training
CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --train_what all --epoch 500 --batch_size 24 --lr 1e-3 --wd 1e-3 --dropout 0.9 --schedule [60, 80]
-- Testing
CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --test epochxxx.pth --ten_crop
Would you like to have a quick look and help me to figure it out which configuration I made wrong? Though my computation resources is not enough, it is hard to understand why there is such a big gap between validation accuracy and test accuracy? My sincere appreciation.
Hi, i find that there is no CoCLR in model.pretrain
Hi, Tengda,
Thanks for your excellent work and detailed instructions for open source codes. Here is my question about making lmdb datasets on Kinetics-400.
I noticed that you also make K400 lmdb datasets from your code, not only UCF101 and HMDB51. I want to ask how much hard disk space do I need to extract all K400 RGB frames in jpeg format as you did. And how big is the K400_rgb_lmdb dataset after extracting and making?
Looking forward to your reply. Thank you!
When I ran main_nce, error message showed like below. k_label indeed undefined in main_nce, which feed to unberNCE model.
File "main_nce.py", line 265, in main_worker
_, train_acc = train_one_epoch(train_loader, model, criterion, optimizer, lr_scheduler, transform_train_cuda, epoch, args)
File "main_nce.py", line 313, in train_one_epoch
output, target = model(input_seq)
File "/data/home/jiaxzhuang/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/data/home/jiaxzhuang/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/data/home/jiaxzhuang/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'k_label'
Hi Tengda, thanks for sharing your code. I found CoCLR has no data preparation instruction and could you please provide some details about data preprocessing from the raw data? I found similar instruction in DPC and MemDPC, are they feasible for CoCLR?
Thank you for letting me know about the issue I posted last time.
I was studying your code, but I got curious again, so I came to ask you again.
I want to run 'main_coclr.py'
So I'm getting the ucf101's optical flow using the code to extract the frame and flow from your MemDPC github.
Looking at the code, I had to convert flow data to .lmdb. Can I get it using 'convert_video_to_lmdb.py' just like changing frame data to lmdb?
for example, (in convert_video_to_lmdb.py in dataset)
make_dataset_lmdb(dataset_path='mypath/UCF101/flow',
filename='mypath/UCF101/ucf101_tvl1.lmdb.lmdb')
I have only one gpu.
I wanted to train, so I entered the terminal as follows:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 main_coclr.py
but i got an error:
subprocess.CalledProcessError: Command '['/home/junmin/anaconda3/envs/python36/bin/python', '-u', 'main_coclr.py', '--local_rank=0']' returned non-zero exit status 1.
Is there any way to train with a single GPU?
Hi Han,
Is it possible to share the UCF-101 RGB model (end to end finetuned) which has been pretrained on UCF-101 and had an accuracy of 81.4?
Thanks,
Hello, I'm trying to run main_coclr.py on UCF101 dataset, while the split files are not given, i.e. train_split1.csv. Would you please upload these files or the script to generate them from the official txt split files? Thanks a lot.
Hello,
can you guide the process of dual-stream fusion?
Thanks
Hi,thank you for your work!
When I was training the weight in model.pretrain.py, it showed that I have to use DistributedDataParallel(DDP). Is it means that it have to train on different server by multi-card training ? And I failed my training on one mechine with 2 GPUs(1080ti)
Thank you for your attention!
Hi Tengda,
Thanks for sharing the code! I have a question about your linear probe setting.
I use your released UCF101 pretrained rgb model to train a fc layer. To do that I run the command with your code's default setting:
python3 main_classifier.py --train_what last --pretrain /../CoCLR-ucf101-rgb-128-s3d-ep182.tar
After that I got a new pretrained model and I reload it to evaluate the s3d backbone:
python3 main_classifier.py --test /../epoch9.pth.tar --ten_crop
Finally I got Acc@1 67.9 on UCF-101 test split 1, which is lower than your report result 70.2.
So I want to ask is there something wrong with my training and testing processing? could you please share me your config and steps to train a fc layer for linear probe?
Thanks, looking forward to your reply.
How can I get the two-stream fearture? And the rgb pretrained model and flow model can be use to extract two-sream feature? How can I input the command?
In your paper (Table 1 and 2) you used RGB+Flow for evaluation on UCF101. I couldn't find this 2-stream evaluation codes anywhere in this repo. I am missing anything? Would be grateful if you can point me to the relevant locations.
Hi, im curious about the config when you fully-finetune the pre-trained model.
What bs/lr/wd do you use? Do you set the lr of backbone 1/10 to the lr of the fc layer or keep them the same?
Thanks!
Best Regards,
Yuqi
First of all, thank you so much for sharing source code.
With the newly updated code, I'm going to run the code for my own dataset.
Can you tell me how to generate corresponding 'video_source.json' from the new dataset directory? Can you share the source code?
Thank you so much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.