Giter VIP home page Giter VIP logo

cola's Introduction

CoLA: Weakly-Supervised Temporal Action Localization

PyTorch Implementation of paper accepted by CVPR'21:

CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning

Can Zhang, Meng Cao, Dongming Yang, Jie Chen and Yuexian Zou*.

[pdf][ArXiv]

Updates

  • [14 Feb 2022]

    • We have released the features and codebase of our CoLA on ActivityNet v1.2 dataset here.
  • [21 July 2021]

    • We have released the codebase and models of our CoLA.
    • Note that we have fine-tuned some hyper-parameter settings so the experimental result is better (+2.1% [email protected], +0.8% mAP@AVG) than the orignal paper! Details are as follows:
    CoLA mAP@tIoU(%)
    0.1 0.2 0.3 0.4 0.5 0.6 0.7 AVG
    original paper 66.2 59.5 51.5 41.9 32.2 22.0 13.1 40.9
    this codebase 66.1 60.0 52.1 43.1 34.3 23.5 13.1 41.7
    gain(Δ) -0.1 +0.5 +0.6 +1.2 +2.1 +1.5 0.0 +0.8
    • [Results Reproducible] You can get the above results without changing any line of our code.

Content

Dependencies

Please make sure Python>=3.6 is installed (Anaconda3 is recommended).

Required packges are listed in requirements.txt. You can install them by running:

pip install -r requirements.txt

Code and Data Preparation

  1. Get the code. Clone this repo with git:

    • For THUMOS'14 experiments:

      git clone https://github.com/zhang-can/CoLA
      
    • For ActivityNet experiments:

      git clone -b anet12 https://github.com/zhang-can/CoLA
      
  2. Prepare the features.

    • Here, we provide the two-stream I3D features for THUMOS'14. You can download them from Google Drive or Weiyun.
    • (ActivityNet v1.2 features are available here.)
    • Unzip the downloaded features into the data folder. Make sure the data structure is as below.
    ├── data
    └── THUMOS14
        ├── gt.json
        ├── split_train.txt
        ├── split_test.txt
        └── features
            ├── ...
    

Training

You can use the following command to train CoLA:

python main_cola.py train

After training, you will get the results listed in this table.

Testing

You can evaluate a trained model by running:

python main_cola.py test MODEL_PATH

Here, MODEL_PATH denotes for the path of the trained model.

This script will report the localization performance in terms of mean average precision (mAP) at different tIoU thresholds.

You can download our trained model from Google Drive or Weiyun.

Other Info

References

This repository is inspired by the following baseline implementations for the WS-TAL task.

Citation

Please [★star] this repo and [cite] the following paper if you feel our CoLA useful to your research:

@InProceedings{zhang2021cola,
    author    = {Zhang, Can and Cao, Meng and Yang, Dongming and Chen, Jie and Zou, Yuexian},
    title     = {CoLA: Weakly-Supervised Temporal Action Localization With Snippet Contrastive Learning},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {16010-16019}
}

Contact

For any questions, please feel free to open an issue or contact:

Can Zhang: [email protected]

cola's People

Contributors

zhang-can avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cola's Issues

RuntimeError: Expected a 'cpu' device type for generator but found 'cuda'

hello,How to solve this problem?

Traceback (most recent call last):
  File "main_cola.py", line 207, in <module>
    main()
  File "main_cola.py", line 86, in main
    loader_iter = iter(train_loader)
  File "/home/linux01/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 359, in __iter__
    return self._get_iterator()
  File "/home/linux01/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/home/linux01/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 944, in __init__
    self._reset(loader, first_iter=True)
  File "/home/linux01/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 975, in _reset
    self._try_put_index()
  File "/home/linux01/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1209, in _try_put_index
    index = self._next_index()
  File "/home/linux01/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 512, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/linux01/.local/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 227, in __iter__
    for idx in self.sampler:
  File "/home/linux01/.local/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 124, in __iter__
    yield from torch.randperm(n, generator=torch.Generator(device='cuda')
RuntimeError: Expected a 'cpu' device type for generator but found 'cuda'

Number of Training Epoch and Time

Thanks for your great work!

I notice that the number of training epochs are 6k and 8k for the two datasets respectively. That's quite a big number. I wonder what is the reason that it takes so many epochs to train? Do you have an ablation study on the effect of training epochs?

Besides, could you please provide your training times?

Thanks!

Discrepancy between feature_fps and original_fps

Hi thanks for releasing your work.
I found myself having trouble relating the original video and extracted features.
In your code, cfg.FEATS_FPS = 25, and it seems like that original video has fps of 30.

From the paper, I can see that 1 snippet is consist of 16 frames,
I understand that that's where t_factor formula in utils.py comes out.
-> t_factor = (16 * v_len) / (scale * num_segments * sampling_frames)

BUT when I run code for example for the test_video_000004, it has 1,011 frames but the number of segments of the extracted feature is 52... (RGB feature size of 52 x 1024)

Can you please explain what is going on between feature extractor and your model?

Reproduce the results of Activitynet1.2

@zhang-can Thanks for sharing the training code of Activitynet1.2. However, when I try to reproduce the training results of Activitynet1.2 (I use the feature from https://github.com/sujoyp/wtalc-pytorch/tree/master since the feature in CoLA repo needs verified), I just get mIoU(avg) around 3 with the configuration of branch anet1.2. It seems that the SniCo loss could not decrease at all. I also find that the maximum number of hard background and action are sometimes less that the number we set. (The non-zero elements in aness_region_inner and aness_region_outer are less than k_hard.) Is that normal?

By the way, I can reproduce the results of THUMOS14 easily. So the question is only occured on Activitynet1.2. Would you please check the released code or provide a checkpoint and training log to help us make the things right?

question about a part of code

Hello,

Thanks for your great work.

I have a question:

Why do you use zeros as labels in the loss function and not the original labels?

I am talking about this part from NCE function:

labels= torch.zeros(logits.shape[0], dtype=torch.long).cuda()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.