bbrattoli / zeroshotvideoclassification Goto Github PK

Zero-shot video classification by end-to-end training of 3D convolutional neural networks

License: Apache License 2.0

Shell 2.69% Python 97.31%

zeroshotvideoclassification's Introduction

Introduction

Official code of the CVPR 2020 paper

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

available on arxiv.

Summary

Learn a video representation that can generalize to unseen actions. Semantic information are used as supervision. In particular, the visual representation is mapped into the Word2Vec embedding, where words semantically related are closer to each other in an euclidean sense.

Checkpoints

The trained models, used to produce the numbers in the paper, can be downloaded here.

Install

Requirements

Run install.sh to get the uncommon libraries (faiss, tensorboardx, joblib) and the latest version of pytorch compatible with cuda 9.2 installed in the docker.

Retrieve external assets

Get the word2vec model

sudo chmod + assets/download_word2vec.sh
./assets/download_word2vec.sh

Get C3D pretrained model

wget http://imagelab.ing.unimore.it/files/c3d_pytorch/c3d.pickle -P /workplace/

Running

The script run.sh shows an example of parameters for starting the training of the e2e model.

Training

get_dataset

In case you want to train your model on Kinetics, you need to adapt the function get_kinetics() in auxiliary/auxuliary_dataset.py according to the format in which Kinetics is stored on your machine. The current version is just a placeholder and will NOT work right away.

train on Kinetics, test on [UCF101, HMDB51]. End2End mode

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2both --save_path PATH_TO_RESULT_FOLDER --nopretrained

train on Kinetics, test on [UCF101, HMDB51, ActivityNet]. End2End mode

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2others --save_path PATH_TO_RESULT_FOLDER --nopretrained

train on Kinetics, test on [UCF101, HMDB51]. Baseline mode

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2both --save_path PATH_TO_RESULT_FOLDER --fixed

train on Kinetics, test on [UCF101, HMDB51, ActivityNet]. Baseline mode

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2others --save_path PATH_TO_RESULT_FOLDER --fixed

train on Kinetics, test on [UCF101, HMDB51]. End2End mode pretrained on SUN

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2both --save_path PATH_TO_RESULT_FOLDER --weights [path_to_SUN_pretraining]

train on Kinetics, test on [UCF101, HMDB51, ActivityNet]. End2End mode pretrained on SUN

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2others --save_path PATH_TO_RESULT_FOLDER --weights [path_to_SUN_pretraining]

zeroshotvideoclassification's People

Contributors

Stargazers

Watchers

zeroshotvideoclassification's Issues

KeyError: "word '---e1gyo84' not in vocabulary"

Sorry to bother you, but I have encountered the following problem:
python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network c3d --dataset kinetics2both --save_path /home/m/Desktop/ZeroShotVideoClassification-master/result --nopretrainedTotal batch size: 22
UCF101: total number of videos 13320, classes 101
HMDB51: total number of videos 6766, classes 51
Traceback (most recent call last):
File "main.py", line 66, in
dataloaders = dataset.get_datasets(opt)
File "/home/m/Desktop/ZeroShotVideoClassification-master/dataset.py", line 14, in get_datasets
get_datasets = get_both_datasets(opt)
File "/home/m/Desktop/ZeroShotVideoClassification-master/dataset.py", line 109, in get_both_datasets
train_class_embedding = classes2embedding('kinetics', train_classes, wv_model)
File "/home/m/Desktop/ZeroShotVideoClassification-master/auxiliary/auxiliary_word2vec.py", line 20, in classes2embedding
embedding = [one_class2embed(class_name, wv_model)[0] for class_name in class_name_inputs]
File "/home/m/Desktop/ZeroShotVideoClassification-master/auxiliary/auxiliary_word2vec.py", line 20, in
embedding = [one_class2embed(class_name, wv_model)[0] for class_name in class_name_inputs]
File "/home/m/Desktop/ZeroShotVideoClassification-master/auxiliary/auxiliary_word2vec.py", line 119, in one_class2embed_kinetics
return wv_model[name_vec].mean(0), name_vec
File "/home/m/Anaconda/envs/pytorch/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 355, in getitem
return vstack([self.get_vector(entity) for entity in entities])
File "/home/m/Anaconda/envs/pytorch/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 355, in
return vstack([self.get_vector(entity) for entity in entities])
File "/home/m/Anaconda/envs/pytorch/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 471, in get_vector
return self.word_vec(word)
File "/home/m/Anaconda/envs/pytorch/lib/python3.6/site-packages/gensim/models/keyedvectors.py", line 468, in word_vec
raise KeyError("word '%s' not in vocabulary" % word)
KeyError: "word '---e1gyo84' not in vocabulary"

I look forward to your reply. Thank you very much

did you ever try same training using BERT or similar model instead of simple Word2Vec

Hi, I follow your work and this is a great work, very simple and effective :)
I am wondering did you try or know of similar training with Bert or a similar transformer model; I am trying something like that, but the loss seems to remain fairly steady, and the model is not learning anything. The same framework is working fine with word2vec, Do you know why this may happen? any intuitive thought?
@bbrattoli

module 'dataset' has no attribute 'get_dataloaders'

dataloaders = dataset.get_dataloaders(opt)

The data loading method here is not seen in dataset.

Number of V100s used for training

Hello

Thank you for the work :)

May I ask the number of V100s you used for training the model?

Trying to estimate the total batch size you used (understand that its 22 per V100 GPU)