Giter VIP home page Giter VIP logo

muses's Introduction

MUSES

PWC

This repo holds the code and the models for MUSES, introduced in the paper:
Multi-shot Temporal Event Localization: a Benchmark
Xiaolong Liu, Yao Hu, Song Bai, Fei Ding, Xiang Bai, Philip H.S. Torr
CVPR 2021.

MUSES is a large-scale video dataset, designed to spur researches on a new task called multi-shot temporal event localization. We present a baseline aproach (denoted as MUSES-Net) that achieves SOTA performance on MUSES. It also reports an mAP of 56.9% on THUMOS14 at IoU=0.5.

The code largely borrows from SSN and P-GCN. Thanks for their great work!

Find more resouces (e.g. annotation file, source videos) on our project page.

Updates

[2022.3.19] Add support for the MUSES dataset. The proposals, models, source videos of the MUSES dataset are released. Stay tuned for MUSES v2, which includes videos from more countries.
[2021.6.19] Code and the annotation file of MUSES are released. Please find the annotation file on our project page.

Contents



Usage Guide

Prerequisites

[back to top]

The code is based on PyTorch. The following environment is required.

Other minor Python modules can be installed by running

pip install -r requirements.txt

The code relies on CUDA extensions. Build them with the following command:

python setup.py develop

After installing all dependecies, run python demo.py for a quick test.

Data Preparation

[back to top]

We support experimenting with THUMOS14 and MUSES. The video features, the proposals and the reference models are provided on OneDrive.

Features and Proposals

  • THUMOS14: The features and the proposals are the same as thosed used by PGCN. Extract the archive thumos_i3d_features.tar and put the features in data/thumos14 folder. The proposal files are already contained in the repository. We expect the following structure in this folder.

    - data
      - thumos14
        - I3D_RGB
        - I3D_Flow
    
  • MUSES: Extract the archives of features and proposal files.

    # The archive does not have a directory structure
    # We need to create one
    mkdir -p data/muses/muses_i3d_features
    tar -xf muses_i3d_features.tar -C data/muses/muses_i3d_features
    tar -xf muses_proposals.tar -C data/muses

    We expect the following structure in this folder.

    - data
      - muses
        - muses_i3d_features
        - muses_test_proposal_list.txt
        - ...
    

You can also specify the path to the features/proposals in the config files data/cfgs/*.yml.

Reference Models

Put the reference_models folder in the root directory of this code:

 - reference_models
   - muses.pth.tar
   - thumos14_flow.pth.tar
   - thumos14_rgb.pth.tar

Testing Trained Models

[back to top]

You can test the reference models by running a single script

bash scripts/test_reference_models.sh DATASET

Here DATASET should be thumos14 or muses.

Using these models, you should get the following performance

MUSES

0.3 0.4 0.5 0.6 0.7 Average
mAP 26.5 23.1 19.7 14.8 9.5 18.7

Note: We re-train the network on MUSES and the performance is higher than that reported in the paper.

THUMOS14

Modality 0.3 0.4 0.5 0.6 0.7 Average
RGB 60.14 54.93 46.38 34.96 21.69 43.62
Flow 64.64 60.29 53.93 42.84 29.70 50.28
R+F 68.93 63.99 56.85 46.25 30.97 53.40

The testing process consists of two steps, detailed below.

  1. Extract detection scores for all the proposals by running
python test_net.py DATASET CHECKPOINT_PATH RESULT_PICKLE --cfg CFG_PATH

Here, RESULT_PICKLE is the path where we save the detection scores. CFG_PATH is the path of config file, e.g. data/cfgs/thumos14_flow.yml.

  1. Evaluate the detection performance by running
python eval.py DATASET RESULT_PICKLE --cfg CFG_PATH

On THUMOS14, we need to fuse the detection scores with RGB and Flow modality. This can be done by running

python eval.py DATASET RESULT_PICKLE_RGB RESULT_PICKLE_FLOW --cfg CFG_PATH --score_weights 1 1.2 --cfg CFG_PATH_RGB

Training

[back to top]

Train your own models with the following command

python train_net.py  DATASET  --cfg CFG_PATH --snapshot_pref SNAPSHOT_PREF --epochs MAX_EPOCHS

SNAPSHOT_PREF: the path to save trained models and logs, e.g outputs/snapshpts/thumos14_rgb/.

We provide a script that finishes all steps, including training, testing, and two-stream fusion. Run the script with the following command

bash scripts/do_all.sh DATASET

Note: The results may vary in different runs and differs from those of the reference models. It is encouraged to use the average mAP as the primary metric. It is more stable than [email protected].

Citation

Please cite the following paper if you feel MUSES useful to your research

@InProceedings{Liu_2021_CVPR,
    author    = {Liu, Xiaolong and Hu, Yao and Bai, Song and Ding, Fei and Bai, Xiang and Torr, Philip H. S.},
    title     = {Multi-Shot Temporal Event Localization: A Benchmark},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {12596-12606}
}

Related Projects

  • TadTR: Efficient temporal action detectioon (localization) with Transformer.

Contact

[back to top]

For questions and suggestions, file an issue or contact Xiaolong Liu at "liuxl at hust dot edu dot cn".

muses's People

Contributors

xlliu7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

muses's Issues

Inference on single video

作者您好!最近准备研究这个方向,请问有没有在单个视频上进行推理的代码?

MUSES proposals and cfgs

Fantastic work! I'd like to know is there a rough timeline for the release of MUSES proposals and cfgs?

Code for ActivityNet1.3

Are you planning to release code for ActivityNet1.3 dataset ? If yes, by when can we expect it?

When will this repository support MUSES?

Thanks for your awesome work!
I'd like to know when will this repository support MUSES. Or train the model in an end-to-end manner (including I3D backbone!)
Thanks again~

Error on running eval.py

Exact command used:
python eval.py thumos14 outputs/scores/thumos14_flow -j8 --cfg data/cfgs/thumos14_flow.yml --nms_threshold 0.4|tee -a outputs/eval.log

image

train code

Can the code be released on time in May

Are you going to release the source video dataset?

First thanks for your work.

As you mentioned MUSES is a large-scale video dataset, but a few months passed, it seems you don't have plan to release the source video dataset.

You said you provide the annotation info, but it you don't provide the video dataset, how can the annotation works? How can we visualize the segmentation result with the original video?

question for proposal generation

Hi, I'm trying to apply your Muses model to the Finegym dataset.

When I want to train your model using another dataset, I realized that I need a list of 100 candidate proposals.

I am referring to the supplementary of the paper for the details of proposal extraction. I am trying to perform proposal generation. However, I noticed that the multi-stage cnn used to generate proposals on the Muses dataset generates two final outputs for linear classification, whereas the final outputs for linear classification presented in the MUSES paper are three.

Therefore, if I want to train a binary classifier as suggested in the MUSES paper, how can I prepare the label values?
Also, what do the three final outputs mean?

I would be grateful if you could answer my questions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.