Giter VIP home page Giter VIP logo

zeroshotvideoclassification's Introduction

Introduction

Official code of the CVPR 2020 paper

Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

available on arxiv.

Summary

Learn a video representation that can generalize to unseen actions. Semantic information are used as supervision. In particular, the visual representation is mapped into the Word2Vec embedding, where words semantically related are closer to each other in an euclidean sense.

Checkpoints

The trained models, used to produce the numbers in the paper, can be downloaded here.

Install

Requirements

Run install.sh to get the uncommon libraries (faiss, tensorboardx, joblib) and the latest version of pytorch compatible with cuda 9.2 installed in the docker.

Retrieve external assets

Get the word2vec model

sudo chmod + assets/download_word2vec.sh
./assets/download_word2vec.sh

Get C3D pretrained model

wget http://imagelab.ing.unimore.it/files/c3d_pytorch/c3d.pickle -P /workplace/

Running

The script run.sh shows an example of parameters for starting the training of the e2e model.

Training

get_dataset

In case you want to train your model on Kinetics, you need to adapt the function get_kinetics() in auxiliary/auxuliary_dataset.py according to the format in which Kinetics is stored on your machine. The current version is just a placeholder and will NOT work right away.

train on Kinetics, test on [UCF101, HMDB51]. End2End mode

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2both --save_path PATH_TO_RESULT_FOLDER --nopretrained

train on Kinetics, test on [UCF101, HMDB51, ActivityNet]. End2End mode

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2others --save_path PATH_TO_RESULT_FOLDER --nopretrained

train on Kinetics, test on [UCF101, HMDB51]. Baseline mode

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2both --save_path PATH_TO_RESULT_FOLDER --fixed  

train on Kinetics, test on [UCF101, HMDB51, ActivityNet]. Baseline mode

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2others --save_path PATH_TO_RESULT_FOLDER --fixed  

train on Kinetics, test on [UCF101, HMDB51]. End2End mode pretrained on SUN

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2both --save_path PATH_TO_RESULT_FOLDER --weights [path_to_SUN_pretraining]

train on Kinetics, test on [UCF101, HMDB51, ActivityNet]. End2End mode pretrained on SUN

python3 main.py --n_epochs 150 --bs 22 --lr 1e-3 --network r2plus1d_18 --dataset kinetics2others --save_path PATH_TO_RESULT_FOLDER --weights [path_to_SUN_pretraining]

zeroshotvideoclassification's People

Contributors

bbrattoli avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.