Giter VIP home page Giter VIP logo

latetemporalmodeling3dcnn's Introduction

LateTemporalModeling3DCNN

Official Pytorch implementation of Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition.

This the repository which implements late temporal modeling on top of the 3D CNN architectures and mainly focus on BERT for this aim.

Installation

#For the installation, you need to install conda. The environment may contain also unnecessary packages but we want to give complete environment that we are using. 

#Create the environment with the command
conda env create -f LateTemporalModeling3D.yml

#Then you can activate the environment with the command
conda activate LateTemporalModeling3D

Later, please download the necessary files from the link, and copy them into the main directory. https://1drv.ms/u/s!AqKP51Rjkz1Gaifd54VbdRBn6qM?e=7OxYLa

Dataset Format

In order to implement training and validation the list of training and validation samples should be created as txt files in \datasets\settings folder.

As an example, the settings for hmdb51 is added. In the file names train-val, modality and split of the dataset is specified. In the txtx files, the folder of images, the number of frames in the folder and the index of the class is denoted, respectively.

The image folders should be created in \datasets as and hmdb51_frames and ucf101_frames for hmdb and ucf datasets. If you want use this code in separate dataset, there is a need to create .py file like hmdb51.py and ucf101.py existing in \dataset. You can just copy these py. files and change the class name of the dataset. Then the init file in the dataset should be modified properly.

The format of the rgb and flow images is determined for hmdb51 as "img_%05d.jpg", "flow_x_%05d", "flow_y_%05d" and for ucf101 as "img_%05d.jpg", "flow_x_%05d.jpg", "flow_y_%05d.jpg"

Name patterns of the dataset can be modified but the test of the datasets should also be modified. These are also specified in the variable called extension in the test files.

Training of the dataset

There are two seperate training files called two_stream2.py and two_stream_bert2.py. These are almost identical two training files. Select the first for SGD training and select the second for ADAMW trainings.

Models

For the models listed below, use two_stream2.py

  • rgb_resneXt3D64f101
  • flow_resneXt3D64f101
  • rgb_slowfast64f_50
  • rgb_I3D64f
  • flow_I3D64f
  • rgb_r2plus1d_32f_34

For the models listed below, use two_stream_bert2.py

  • rgb_resneXt3D64f101_bert10_FRAB

  • flow_resneXt3D64f101_bert10_FRAB

  • rgb_resneXt3D64f101_bert10_FRMB

  • flow_resneXt3D64f101_bert10_FRMB

  • rgb_resneXt3D64f101_FRMB_adamw

  • rgb_resneXt3D64f101_adamw

  • rgb_resneXt3D64f101_FRMB_NLB_concatenation

  • rgb_resneXt3D64f101_FRMB_lstm

  • rgb_resneXt3D64f101_concatenation

  • rgb_slowfast64f_50_bert10_FRAB_late

  • rgb_slowfast64f_50_bert10_FRAB_early

  • rgb_slowfast64f_50_bert10_FRMB_early

  • rgb_slowfast64f_50_bert10_FRMB_late

  • rgb_I3D64f_bert2

  • flow_I3D64f_bert2

  • rgb_I3D64f_bert2_FRMB

  • flow_I3D64f_bert2_FRMB

  • rgb_I3D64f_bert2_FRAB

  • flow_I3D64f_bert2_FRAB

  • rgb_r2plus1d_32f_34_bert10

  • rgb_r2plus1d_64f_34_bert10

Training Commands

python two_stream_bert2.py --split=1 --arch=rgb_resneXt3D64f101_bert10_FRMB --workers=2 --batch-size=8 --iter-size=16 --print-freq=400 --dataset=hmdb51 --lr=1e-5

python two_stream2.py --split=1 --arch=rgb_resneXt3D64f101 --workers=2 --batch-size=8 --iter-size=16 --print-freq=400 --dataset=hmdb51 --lr=1e-2

For multi-gpu training, comment the two lines below os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"]="0"

To continue the training from the best model, add -c. To evaluate the single clip single crop performance of best model, add -e

Test of the dataset

For the test of the files, there are three seperate files, namely spatial_demo3D.py -- which is for multiple clips test spatial_demo_bert.py -- which is for single clip test combined_demo.py -- which is for two-stream tests

Firstly, set your current directory where the test files exists which is: scripts/eval/

Then enter the example commands below: python spatial_demo3D.py --arch=rgb_resneXt3D64f101_bert10_FRMB --split=2

python spatial_demo_bert.py --arch=flow_resneXt3D64f101_bert10_FRMB --split=2

python combined_demo.py --arch_rgb=rgb_resneXt3D64f101_bert10_FRMB --arch_flow=flow_resneXt3D64f101_bert10_FRMB --split=2

If your training is implemented with multi-GPU, manually set multiGPUTrain to True As default, the tests are implemented ten crops. For single crops test, manually set ten_crop_enabled to False

Citation

If you use this toolbox or benchmark in your research, please cite this work:

@inproceedings{kalfaoglu2020late,
  title={Late temporal modeling in 3d cnn architectures with bert for action recognition},
  author={Kalfaoglu, M Esat and Kalkan, Sinan and Alatan, A Aydin},
  booktitle={European Conference on Computer Vision},
  pages={731--747},
  year={2020},
  organization={Springer}
}

Related Projects

Toward Good Practices: PyTorch implementation of popular two-stream frameworks for video action recognition

ResNeXt101: Video Classification Using 3D ResNet

SlowFast: PySlowFast

R2+1D-IG65: IG-65M PyTorch

I3D: I3D models trained on Kinetics

latetemporalmodeling3dcnn's People

Contributors

artest08 avatar eddiemg avatar fcakyon avatar hongbo-miao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

latetemporalmodeling3dcnn's Issues

Paper and code inconsistent?

Hi, I am reading your paper and code in the past few days. I found the code and the paper are inconsistent. One big part of the paper is the removal of the Temporal Global Average Pooling in Figure 1. For example, in your rgb_I3D.py code, in the model rgb_I3D64f_bert2, the input dimension to 3DCNN is batch x 3 x 64 x 224 x 224, the output is batch x 1024 x 8 x 7 x 7. Then you apply another 3D pooling to get batch x 1024 x 8 x 1 x 1. In my understanding, the temporal pooling is already done in the 3DCNN. In Figure 1 of your paper, you remove the Temporal Global Average Pooling and the output of the 3DCNN still has f1,f2,...,fN. But in your code, there is no such N frame features. Can you help me understand your code and paper?
Thanks a lot.

While installing Conda pacakge ResolvePackageNotFound error

While running conda env create -f LateTemporalModeling3D.yml getting following error

ResolvePackageNotFound:
  - fontconfig==2.13.0=h9420a91_0
  - pcre==8.43=he6710b0_0
  - scikit-learn==0.20.1=py36h4989274_0
  - mkl_fft==1.0.4=py36h4414c95_1
  - libgcc-ng==9.1.0=hdf63c60_0
  - libstdcxx-ng==8.2.0=hdf63c60_1
  - wrapt==1.11.2=py36h7b6447c_0
  - tornado==6.0.3=py36h7b6447c_0
  - icu==58.2=h9c2bf20_1
  - psutil==5.6.7=py36h7b6447c_0
  - zeromq==4.3.1=he6710b0_3
  - wurlitzer==2.0.0=py36_0
  - ncurses==6.1=he6710b0_1
  - lazy-object-proxy==1.4.3=py36h7b6447c_0
  - gmp==6.1.2=h6c8ec71_1
  - pyqt==5.9.2=py36h05f1152_2
  - glib==2.63.1=h5a9c865_0
  - dbus==1.13.12=h746ee38_0
  - tk==8.6.8=hbc83047_0
  - xz==5.2.4=h14c3975_4
  - pyrsistent==0.15.6=py36h7b6447c_0
  - libuuid==1.0.3=h1bed415_2
  - libtiff==4.0.9=he85c1e1_1
  - libsodium==1.0.16=h1bed415_0
  - typed-ast==1.4.0=py36h7b6447c_0
  - ptyprocess==0.6.0=py36_0
  - freetype==2.9.1=h8a8886c_1
  - mkl_random==1.0.1=py36h4414c95_1
  - qt==5.9.6=h8703b6f_2
  - libffi==3.2.1=hd88cf55_4
  - zlib==1.2.11=h7b6447c_3
  - libedit==3.1.20181209=hc058e9b_0
  - libgfortran-ng==7.2.0=hdf63c60_3
  - libpng==1.6.37=hbc83047_0
  - expat==2.2.6=he6710b0_0
  - readline==7.0=h7b6447c_5
  - ujson==1.35=py36h14c3975_0
  - gstreamer==1.14.0=hb453b48_1
  - pyzmq==18.1.0=py36he6710b0_0
  - python==3.6.5=hc3d631a_2
  - mistune==0.8.4=py36h7b6447c_0
  - openssl==1.0.2t=h7b6447c_1
  - cryptography==2.3.1=py36hc365091_0
  - cffi==1.13.2=py36h2e261b9_0
  - jpeg==9b=h024ee3a_2
  - markupsafe==1.1.1=py36h7b6447c_0
  - libxml2==2.9.9=hea5a465_1
  - secretstorage==3.1.1=py36_0
  - sqlite==3.30.1=h7b6447c_0
  - libspatialindex==1.9.3=he6710b0_0
  - sip==4.19.8=py36hf484d3e_0
  - yaml==0.1.7=had09818_2
  - pywavelets==1.0.2=py36hdd07704_0
  - libxcb==1.13=h1bed415_1
  - gst-plugins-base==1.14.0=hbbd80ab_1

roi-align==0.0.2 cannot be installed

An error occurred when I execute the following command:conda env create -f LateTemporalModeling3D.yml

ERROR: Could not find a version that satisfies the requirement roi-align==0.0.2 (from -r /data/LateTemporalModeling3DCNN/condaenv.u7mj_9s2.requirements.txt (line 58)) (from versions: none)
ERROR: No matching distribution found for roi-align==0.0.2 (from -r /data/LateTemporalModeling3DCNN/condaenv.u7mj_9s2.requirements.txt (line 58))

CondaValueError: pip returned an error

Thanks for your help!

Potential typo

In two_stream_bert2.py, I believe there is a typo in an import statement (line 33):
from weights.model_path import rgb_3d_model_path_selection
should be
from utils.model_path import rgb_3d_model_path_selection

Frame extraction and train-val split

Hello. First of all, congratulations on your work.

I want to reproduce your experiments, but I am stuck on the data preprocessing phase. The original dataset comes as ".avi" videos, how do you extract the RGB and optical flow? How much frames? And how do you convert from the original file-per-class split files to the structure you have in the "datasets/settings/hmdb51/" folder?

Disclaimer: I am just now getting started with action recognition research, this might be standard procedure for any other implementation using the HMDB51 dataset, but I am not yet familiarized with it.

I appreciate any help you can provide.

Two actions in a video !!

Hi. This is a great job. I'm just a beginner to this work. I just wonder if a video contains 2 or more actions so the model can automatically separate them and give us a correct prediction or we must do preprocessing ? If preprocess, any suggestion for that ?
Thank you so much !!!

For reproducing results in ucf101

Hi! I am training your awesome model in ucf101. Could you share the training parameters' detail for R(2+1)D BERT (32f) R(2+1)D BERT (64f).

update readme

  • add arxiv url of the paper
  • add bibtex citation
  • add model url table (optional)

cuda runtime error

I tried your eval script (spatial_demo_bert.py) it worked at first but after a while a Cuda runtime error occurred. The first two or three times the error was gone after creating a new environment but now it produces the error no matter what I do.

I have a NVIDIA GeForce RTX 2070 SUPER. I am working on Windows.

Thank you in advance

For training code

I have seen this code:
input_vectors=x
norm = input_vectors.norm(p=2, dim = -1, keepdim=True)
input_vectors = input_vectors.div(norm)
Is that code's function same as Batch Normalization?

[Error] ModuleNotFoundError: No module named 'weights.model_path'

I get the following error when I try to run the code: I tried to fix that, but can't understand the problem.

Traceback (most recent call last): File "two_stream_bert2.py", line 33, in <module> from weights.model_path import rgb_3d_model_path_selection ModuleNotFoundError: No module named 'weights.model_path'

pretrained model of R2+1D-BERT

Hi !! I am working on Accident Detection from CCTV Cameras. It will be helpful for my research and will also save lot of computation time if you can provide me the pretrained model of R2+1D-BERT. Thank you!!

ModuleNotFoundError: No module named 'weights.model_path'

Hi,
Thank you for your great work!
I'm currently trying to run the training for the the baseline code using "two_strream_bert2.py"
I get the following error:
ModuleNotFoundError: No module named 'weights.model_path'

The weights folder contains 5 files which I downloaded as per the instructions. However, the "model_path.py" file is apparently missing from this folder. Therefore, the following import fails:
"from weights.model_path import rgb_3d_model_path_selection".

Please could you point me to where I could get this file? or is there an alternative to fixing this issue?

Thanks

Frame-level classification using BERT

Thank you for this great work.

I am working on a similar problem, I want to apply BERT for some frame features extracted using I3D, but I want to perform frame-level classification rather than video classification. I was wondering how I can adapt your implementation since you use the classification token which is defined on video-level and not frame-level.

I also had a question regarding the training of the model, I wanted to know why you don't include the loss from the predictions of the masked frames?

Any help would be appreciated 😄 !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.