Giter VIP home page Giter VIP logo

saccades-vssl's Introduction

Saccade-VSSL

Overview

This repository implements 'Self-supervised Video Representation Learning via Capturing Semantic Changes Indicated by Saccades (TCSVT2023)'.

Qiuxia Lai, Ailing Zeng, Ye Wang, Lihong Cao, Yu Li, Qiang Xu.

Requirements

  • python 3.7.12
  • torch 1.13.0+cu116
  • torchvision 0.14.0+cu116
  • cudatoolkit 11.6.0
  • tensorboard
  • tensorboardX
  • accimage (official github)
  • faiss-gpu (official github)
  • Pillow
  • opencv
  • scipy
  • tqdm

Weight Download

Our models are pre-trained on UCF101 split 1 with RGB data only. Linear probe and fine-tuning are performed on UCF101 and HMDB51.

  • Pre-trained models can be downloaded from this link.
  • Linear probe models can be downloaded from this link.
  • Finetune models can be downloaded from this link.

Dataset Preparation

The datasets are arranged as follows, where <base_path> is defined in config.py.

Note that all the split folders are available at data/<dataset_name>/.

<base_path>/DataSets/
    |---UCF101/
    |   |---jpegs_256/
    |   |   |---<video1>/
    |   |   |   |---XXX.jpg
    |   |   |   |--- ...
    |   |   |---<video2>/
    |   |   |--- ...
    |   |---split/
    |   |   |---ClassInd.txt
    |   |   |---trainlist01.txt
    |   |   |---testlist01.txt
    |   |   |--- ...
    |   |---scan_idx_7/
    |   |   |---v_ApplyEyeMakeup_g01_c01.txt
    |   |   |---v_ApplyEyeMakeup_g01_c02.txt
    |   |   |---v_ApplyEyeMakeup_g01_c03.txt
    |   |   |--- ...
    |
    |---HMDB51/
    |   |---jpegs_256/
    |   |   |   |---<video1>/
    |   |   |   |   |---XXX.jpg
    |   |   |   |   |--- ...
    |   |   |   |---<video2>/
    |   |   |   |--- ...
    |   |---split/
    |   |   |---ClassInd.txt
    |   |   |---trainlist01.txt
    |   |   |---testlist01.txt
    |   |   |--- ...
            

UCF101

UCF101 with scanpath data can be downloaded from this BaiduDisk link. Our files are obtained from this repo. Official files can be downloaded from this link.

  • Download three splits of the zip file. Unzip together, and got folder jpegs_256. Put it into <base_path>\DataSets\UCF101.
  • Download scan_idx_7.zip file. Unzip and Put it into <base_path>\DataSets\UCF101.

The scanpath data is generated using G-Eymol (TPAMI 2019).

HMDB51

HMDB51 can be downloaded from this BaiduDisk link. Our files are obtained from this repo. Official files can be downloaded from this link.

  • Download the zip file. Unzip together, and got folder jpegs_256. Put it into <base_path>\DataSets\HMDB51.

Pre-training

# R3D
python train_ssl_full.py --fs_warmup_epoch 241 --cluster_freq 5 --num_clusters 1500 1500 1500 --dataset=ucf101 --batch_size 16 --save_freq 60 --lr_ratio 0.001 --epochs 300 --learning_rate 0.1 --lr_decay_epochs 90 180 240 --gpu-id 2 \
      --focus_num 3 --grid_num 7 --deduplicate True --margin_h 12

# R21D
python train_ssl_full.py --fs_warmup_epoch 301 --cluster_freq 5 --num_clusters 1500 1500 1500 --dataset=ucf101 --batch_size 8 --save_freq 60 --lr_ratio 0.001 --epochs 360 --learning_rate 0.1 --lr_decay_epochs 90 180 240 --gpu-id 3 \
      --model r21d --pro_p 4 --focus_num 3 --grid_num 7 --deduplicate True --margin_h 12      

Video retrieval

# R3D
python train_ssl_full.py --fs_warmup_epoch 241 --cluster_freq 5 --num_clusters 1500 1500 1500 --dataset=ucf101 --batch_size 16 --save_freq 60 --lr_ratio 0.001 --epochs 300 --learning_rate 0.0008 --lr_decay_epochs 90 180 240 --gpu-id 1 \
      --evaluate --focus_num 3 --grid_num 7 --deduplicate True --margin_h 12 --model_name <model_name> --resume <your_best_ckpt>.pth
# R21D
python train_ssl_full.py --fs_warmup_epoch 301 --cluster_freq 5 --num_clusters 1500 1500 1500 --dataset=ucf101 --batch_size 8 --save_freq 60 --lr_ratio 0.001 --epochs 360 --learning_rate 0.0008 --lr_decay_epochs 90 180 240 --gpu-id 0 \
      --evaluate --model r21d --pro_p 4 --focus_num 3 --grid_num 7 --deduplicate True --margin_h 12 --model_name <model_name> --resume <your_best_ckpt>.pth

Downstream Tasks

Linear probe

# ucf101
python downstream.py --finetune False --dropout 0.7 --seed 42 --model_postfix _dp_0_7_s42 --lr 0.1 --ft_lr 0.1 --epochs 200 --lr_decay_epochs 60 120 160 --lr_decay_rate 0.1 --batch_size 32 \
      --focus_init <model_name>/<your_best_ckpt>.pth --gpu-id 0
# hmdb51
python downstream.py --dataset hmdb51 --finetune False --dropout 0.7 --seed 42 --model_postfix _dp_0_7_s42_hmdb --lr 0.1 --ft_lr 0.1 --epochs 200 --lr_decay_epochs 60 120 160 --lr_decay_rate 0.1 --batch_size 32 \
      --focus_init <model_name>/<your_best_ckpt>.pth --gpu-id 1

Finetuning

python downstream.py --finetune True --dropout 0.7 --lr 0.1 --ft_lr 0.1 --epochs 200 --lr_decay_epochs 60 120 160 --lr_decay_rate 0.1 --batch_size 32 \
      --final_bn False --final_norm False --focus_init <model_name>/<your_best_ckpt>.pth --gpu-id 0

Citation

If you find this repository useful, please consider citing the following reference.

@ARTICLE{lai2023self,
    title={Self-supervised video representation learning via capturing semantic changes indicated by saccades},
    author={Qiuxia Lai and Ailing Zeng and Ye Wang and Lihong Cao and Yu Li and Qiang Xu},
    journal={IEEE Trans. on Circuits and Systems for Video Technology},
    year={2023}
}

Contact

Qiuxia Lai: ashleylqxatgmail.com | qxlaiatcuc.edu.cn

saccades-vssl's People

Contributors

ashleylqx avatar

Stargazers

 avatar

Watchers

 avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.