Giter VIP home page Giter VIP logo

tempagg's Introduction

Temporal Aggregate Representations for Long-Range Video Understanding

This repository provides official PyTorch implementation for our papers:

F. Sener, D. Singhania and A. Yao, "Temporal Aggregate Representations for Long-Range Video Understanding", ECCV 2020 [paper]

F. Sener, D. Chatterjee and A. Yao, "Technical Report: Temporal Aggregate Representations", arXiv:2106.03152, 2021 [paper]

model

If you use the code/models hosted in this repository, please cite the following papers:

@inproceedings{sener2020temporal,
  title={Temporal aggregate representations for long-range video understanding},
  author={Sener, Fadime and Singhania, Dipika and Yao, Angela},
  booktitle={European Conference on Computer Vision},
  pages={154--171},
  year={2020},
  organization={Springer}
}
@article{sener2021technical,
  title={Technical Report: Temporal Aggregate Representations},
  author={Sener, Fadime and Chatterjee, Dibyadip and Yao, Angela},
  journal={arXiv preprint arXiv:2106.03152},
  year={2021}
}

Dependencies

  • Python3
  • PyTorch
  • Numpy, Pandas, PIL
  • lmdb, tqdm

Overview

This repository provides code to train, validate and test our models on the EPIC-KITCHENS-55 an EPIC-KITCHENS-100 datasets for the tasks of action anticipation and action recognition.

Features

Follow the RU-LSTM repository to download the RGB, Flow, Obj features and the train/val/test splits and keep them in the data/ek55 or data/ek100 folder depending on the dataset.

For ROI features we consider the union of the hand-object interaction bbox annotations provided by the authors of EPIC-KICTHENS-100 (link) as input and extract RGB features with TSN as explained here.

Pretrained Models

Pretrained models are available only for the EPIC-KITCHENS-100 dataset trained on it's train split. They are provided in the folders models_anticipation and model_recognition.

Validation

To validate our model, run the following:

EPIC-KITCHENS-55

Action Anticipation
  • RGB: python main_anticipation.py --mode validate --path_to_data data/ek55 --path_to_models models_anticipation/ek55 --modality rgb --video_feat_dim 1024
  • Flow: python main_anticipation.py --mode validate --path_to_data data/ek55 --path_to_models models_anticipation/ek55 --modality flow --video_feat_dim 1024
  • Obj: python main_anticipation.py --mode validate --path_to_data data/ek55 --path_to_models models_anticipation/ek55 --modality obj --video_feat_dim 352
  • ROI: python main_anticipation.py --mode validate --path_to_data data/ek55 --path_to_models models_anticipation/ek55 --modality roi --video_feat_dim 1024
  • Late Fusion: python main_anticipation.py --mode validate --path_to_data data/ek55 --path_to_models models_anticipation/ek55 --modality late_fusion
Action Recognition
  • RGB: python main_recognition.py --mode validate --path_to_data data/ek55 --path_to_models models_recognition/ek55 --modality rgb --video_feat_dim 1024
  • Flow: python main_recognition.py --mode validate --path_to_data data/ek55 --path_to_models models_recognition/ek55 --modality flow --video_feat_dim 1024
  • Obj: python main_recognition.py --mode validate --path_to_data data/ek55 --path_to_models models_recognition/ek55 --modality obj --video_feat_dim 352
  • ROI: python main_recognition.py --mode validate --path_to_data data/ek55 --path_to_models models_recognition/ek55 --modality roi --video_feat_dim 1024
  • Late Fusion: python main_recognition.py --mode validate --path_to_data data/ek55 --path_to_models models_recognition/ek55 --modality late_fusion

EPIC-KITCHENS-100

Action Anticipation
  • RGB: python main_anticipation.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality rgb --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Flow: python main_anticipation.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality flow --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Obj: python main_anticipation.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality obj --video_feat_dim 352 --num_class 3806 --verb_class 97 --noun_class 300
  • ROI: python main_anticipation.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality roi --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Late Fusion: python main_anticipation.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality late_fusion --num_class 3806 --verb_class 97 --noun_class 300
Action Recognition
  • RGB: python main_recognition.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality rgb --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Flow: python main_recognition.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality flow --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Obj: python main_recognition.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality obj --video_feat_dim 352 --num_class 3806 --verb_class 97 --noun_class 300
  • ROI: python main_recognition.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality roi --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Late Fusion: python main_recognition.py --mode validate --ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality late_fusion --num_class 3806 --verb_class 97 --noun_class 300

Here are the validation results on EPIC-KITCHENS-100 as provided in our paper.

  • Anticipation ant

  • Recognition rec

Testing and submitting the results to the server

To test your model on the EPIC-100 test split, run the following:

Action Anticipation
  • mkdir -p jsons/anticipation
  • python main_anticipation.py --mode test --json_directory jsons/anticipation --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality late_fusion --num_class 3806 --verb_class 97 --noun_class 300
Action Recognition
  • mkdir -p jsons/recognition
  • python main_recognition.py --mode test --json_directory jsons/recognition--ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality late_fusion --num_class 3806 --verb_class 97 --noun_class 300

Custom Training

To train the model, run the following:

EPIC-KITCHENS-55

Action Anticipation
  • RGB: python main_anticipation.py --mode train --path_to_data data/ek55 --path_to_models models_anticipation/ek55 --modality rgb --video_feat_dim 1024
  • Flow: python main_anticipation.py --mode train --path_to_data data/ek55 --path_to_models models_anticipation/ek55 --modality flow --video_feat_dim 1024
  • Obj: python main_anticipation.py --mode train --path_to_data data/ek55 --path_to_models models_anticipation/ek55 --modality obj --video_feat_dim 352
  • ROI: python main_anticipation.py --mode train --path_to_data data/ek55 --path_to_models models_anticipation/ek55 --modality roi --video_feat_dim 1024
Action Recognition
  • RGB: python main_recognition.py --mode train --path_to_data data/ek55 --path_to_models models_recognition/ek55 --modality rgb --video_feat_dim 1024
  • Flow: python main_recognition.py --mode train --path_to_data data/ek55 --path_to_models models_recognition/ek55 --modality flow --video_feat_dim 1024
  • Obj: python main_recognition.py --mode train --path_to_data data/ek55 --path_to_models models_recognition/ek55 --modality obj --video_feat_dim 352
  • ROI: python main_recognition.py --mode train --path_to_data data/ek55 --path_to_models models_recognition/ek55 --modality roi --video_feat_dim 1024

EPIC-KITCHENS-100

Action Anticipation
  • RGB: python main_anticipation.py --mode train --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality rgb --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Flow: python main_anticipation.py --mode train --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality flow --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Obj: python main_anticipation.py --mode train --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality obj --video_feat_dim 352 --num_class 3806 --verb_class 97 --noun_class 300
  • ROI: python main_anticipation.py --mode train --ek100 --path_to_data data/ek100 --path_to_models models_anticipation/ek100/ --modality roi --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
Action Recognition
  • RGB: python main_recognition.py --mode train --ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality rgb --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Flow: python main_recognition.py --mode train --ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality flow --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300
  • Obj: python main_recognition.py --mode train --ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality obj --video_feat_dim 352 --num_class 3806 --verb_class 97 --noun_class 300
  • ROI: python main_recognition.py --mode train --ek100 --path_to_data data/ek100 --path_to_models models_recognition/ek100/ --modality roi --video_feat_dim 1024 --num_class 3806 --verb_class 97 --noun_class 300

Please refer to the papers for more technical details.

Acknowledgements

This code is based on RU-LSTM, hence grateful to the collaborators/maintainers of that repository.

tempagg's People

Contributors

dibschat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

tempagg's Issues

Model Zoo Request

Thanks for your great work! I want to build a multimodal fusion model based on your work. Could you please send me a copy of your trained model of each modal for the Action recognition task of EPIC-KITCHENS-100 if possible? I would be very appreciated if you can help me. Thank you very much!

epic-kitchens_55 error

Hi,

I'm having this errorError running Epic-Kitchens-55. Have you encountered this before? Thanks

Save file name anti_mod_rgb_span_6_s1_5_s2_3_s3_2_recent_2_r1_1.6_r2_1.2_r3_0.8_r4_0.4_bs_10_drop_0.3_lr_0.0001_dimLa_512_dimLi_512_epoc_15_vb_nn
Printing Arguments
Namespace(add_noun_loss=True, add_verb_loss=True, alpha=1, batch_size=10, best_model='best', debug_on=False, display_every=10, dropout_rate=0.3, ek100=False, epochs=15, img_tmpl='frame_{:010d}.jpg', json_directory='tempAgg_ant_rec//models_anticipation/', latent_dim=512, linear_dim=512, lr=0.0001, modality='rgb', mode='train', noun_class=352, noun_loss_weight=1.0, num_class=2513, num_workers=0, past_attention=True, path_to_data='/content/drive/MyDrive/Individual_Project/Models/RULSTM/rulstm-master/RULSTM/data/ek55', path_to_models='models_anticipation/ek55', recent_dim=2, recent_sec1=1.6, recent_sec2=1.2, recent_sec3=0.8, recent_sec4=0.4, resume=False, scale=True, scale_factor=-0.5, schedule_epoch=10, schedule_on=1, span_dim1=5, span_dim2=3, span_dim3=2, spanning_sec=6, task='action_anticipation', topK=1, trainval=False, verb_class=125, verb_loss_weight=1.0, verb_noun_scores=True, video_feat_dim=1024, weight_flow=0.1, weight_obj=0.25, weight_rgb=0.4, weight_roi=0.25)
Populating Dataset: 100% 23493/23493 [00:33<00:00, 694.22it/s]
Populating Dataset: 100% 4979/4979 [00:07<00:00, 689.38it/s]
Add verb losses
Add noun losses
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [4,0,0] Assertion t >= 0 && t < n_classes failed.
/pytorch/aten/src/ATen/native/cuda/Loss.cu:455: nll_loss_backward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [7,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "main_anticipation.py", line 674, in
main()
File "main_anticipation.py", line 531, in main
start_epoch, start_best_perf, schedule_on)
File "main_anticipation.py", line 400, in train_validation
loss.backward()
File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 307, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py", line 156, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA error: device-side assert triggered

TSM-based features for EPIC-100

Hello, thanks for your hard work! It really provides great convenience for reimplementation.

Meanwhile, I would like to know if the TSM-based features for EPIC-100 will be available for download as those of Epic-Kitchens do? Thanks a lot.

Gnerate jsons for the validation of ek100

Hi, thank you so much for the awesome work! After getting the results, I tried to generate jsons file for the validation set of ek100. However, I have not seen any file under the folder of anticipation. Could you help me with it! Thank you so much!
command line:
python main_anticipation.py --mode validate_json --json_directory ./jsons/anticipation --ek100 --path_to_data ./RULSTM/data/ek100 --path_to_models ./models_anticipation/ek100/ --modality rgb --num_class 3806 --verb_class 97 --noun_class 300

result:
Save file name anti_mod_rgb_span_6_s1_5_s2_3_s3_2_recent_2_r1_1.6_r2_1.2_r3_0.8_r4_0.4_bs_10_drop_0.3_lr_0.0001_dimLa_512_dimLi_512_epoc_15_vb_nn
Printing Arguments
Namespace(add_noun_loss=True, add_verb_loss=True, alpha=1, batch_size=10, best_model='best', debug_on=False, display_every=10, dropout_rate=0.3, ek100=True, epochs=15, img_tmpl='frame_{:010d}.jpg', json_directory='./jsons/anticipation', latent_dim=512, linear_dim=512, lr=0.0001, modality='rgb', mode='validate_json', noun_class=300, noun_loss_weight=1.0, num_class=3806, num_workers=0, past_attention=True, path_to_data='./RULSTM/data/ek100', path_to_models='./models_anticipation/ek100/', recent_dim=2, recent_sec1=1.6, recent_sec2=1.2, recent_sec3=0.8, recent_sec4=0.4, resume=False, scale=True, scale_factor=-0.5, schedule_epoch=10, schedule_on=1, span_dim1=5, span_dim2=3, span_dim3=2, spanning_sec=6, task='action_anticipation', topK=1, trainval=False, verb_class=97, verb_loss_weight=1.0, verb_noun_scores=True, video_feat_dim=352, weight_flow=0.1, weight_obj=0.25, weight_rgb=0.4, weight_roi=0.25)

However, there are no files in the folder.
Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.