mcg-nju / rtd-action Goto Github PK

[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

License: Apache License 2.0

Python 99.06% Shell 0.94%

transformer temporal-action-localization temporal-action-proposals

rtd-action's Introduction

RTD-Net (ICCV 2021)

This repo holds the codes of paper: "Relaxed Transformer Decoders for Direct Action Proposal Generation", accepted in ICCV 2021.

News

[2022.4.4] We release codes, checkpoint and features on ActivityNet-1.3.
[2021.8.17] We release codes, checkpoint and features on THUMOS14.

Overview

This paper presents a simple and end-to-end learnable framework (RTD-Net) for direct action proposal generation, by re-purposing a Transformer-alike architecture. Thanks to the parallel decoding of multiple proposals with explicit context modeling, our RTD-Net outperforms the previous state-of-the-art methods in temporal action proposal generation task on THUMOS14 and also yields a superior performance for action detection on this dataset. In addition, free of NMS post-processing, our detection pipeline is more efficient than previous methods.

Dependencies

Python 3.7 or higher
PyTorch 1.6 or higher
Torchvision
Numpy 1.19.2

Data Preparation

To reproduce the results in THUMOS14 without further changes:

Download the data from GoogleDrive.
Place I3D_features and TEM_scores into the folder data.

Checkpoint

Dataset	AR@50	AR@100	AR@200	AR@500	checkpoint
THUMOS14	41.52	49.33	56.41	62.91	link

Training

Use train.sh to train RTD-Net.


# First stage

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11323 --use_env main.py --window_size 100 --batch_size 32 --stage 1 --num_queries 32 --point_prob_normalize

# Second stage for relaxation mechanism

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11324 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-5 --stage 2 --epochs 10 --lr_drop 5 --num_queries 32 --point_prob_normalize --load outputs/checkpoint_best_sum_ar.pth

# Third stage for completeness head

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11325 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-4 --stage 3 --epochs 20 --num_queries 32 --point_prob_normalize --load outputs/checkpoint_best_sum_ar.pth

Testing

Inference with test.sh.

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=11325 --use_env main.py --window_size 100 --batch_size 32 --lr 1e-4 --stage 3 --epochs 20 --num_queries 32 --point_prob_normalize --eval --resume outputs/checkpoint_best_sum_ar.pth

References

We especially thank the contributors of the BSN, G-TAD and DETR for providing helpful code.

Citations

If you think our work is helpful, please feel free to cite our paper.

@InProceedings{Tan_2021_RTD,
    author    = {Tan, Jing and Tang, Jiaqi and Wang, Limin and Wu, Gangshan},
    title     = {Relaxed Transformer Decoders for Direct Action Proposal Generation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {13526-13535}
}

Contact

For any question, please file an issue or contact

Jing Tan: [email protected]
Jiaqi Tang: [email protected]

rtd-action's People

Contributors

Stargazers

Watchers

Forkers

jackytown sparkstj cv-ip persona97 zzxihuanheixiu klauscc 1190202328 beatricehuihuihui

rtd-action's Issues

about anet code

您好，我去年复现了rtd 在thumos14数据集上的代码，但是我用同样的环境运行anet的代码，却发生了环境方面的错误，nccl278 error。刚才我发现anet代码中util/misc.py文件中，这个函数torch.distributed.init_process_group(
backend=args.dist_backend,
init_method=args.dist_url,
world_size=args.world_size,
rank=args.rank,
)，最后一个参数后面有一个逗号（rank=args.rank,），但是thumos14的代码中是没有，我想问下这个多出来的逗号是一个错误吗？

how to change the stride of THUMOS features

Hi ,

Thanks for making the code public. I am interested to change the temporal resolution of the current model. However, too many hyperparameter to tune. Which one actually changes the resolution ? It seems if i change window size from 100 to 70 but it fails in evaluation

labe issue

I am confused about the label problem. In your code, the label of the ground-truth is 0, which represent this is action, can I think that the lable output by the network is 0 for action, not 0 for background？ If the predicted label is 0, does it match the ground-truth?

about thumos14 annotation

你好，我看到您提供的thumos14_anno__action.json以及thumos14_anno_action_class_idx.json文件中video_test_0001459的duration_second、fps与您提供的thumos14_test_groundtruth.csv中的不一致，通过与gtad对比thumos14_test_groundtruth.csv中的应该是正确的。请问这是您没有注意的一个错误吗？我目前只看了这一个视频，因为我发现thumos14中有些视频的帧频并不是30fps，而在thumos14_anno__action.json、thumos14_anno_action_class_idx.json中您把所有的视频的帧频都设置成了30fps

How can I get a multi-scale proposal

Ask a question again.
Through experiments, the model makes proposals in different positions by sliding the window. The length of the proposal is basically the same. This phenomenon is due to the setting of positional embedding or other reasons. I added the start and end score convolution to the overall model to train together. @tony2016uestc

One question about the the boundary attentive module

Hi~
Does the input feature and start/end scores is pretrained in advance?

about RTD in Anet

Hello, I see that your paper combines the proposals generated by RTD on activitynet1.3 with untrimmednet to get the detection result. Can you provide the video classification results or model of untrimmednet training on activitynet1.3?

WARNING:torch.distributed.elastic.agent.server.api:Received 2 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4797 closing signal SIGINT

[2024-02-22 19:20:17,149][datasets.builder][WARNING] - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-70dc00f935d3701b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 138.50it/s]
[2024-02-22 19:20:22,163][torch.nn.parallel.distributed][INFO] - Reducer buckets have been rebuilt in this iteration.

#my get model stuck here, after I interrupt it, it shows below. I want to ask whether this wrong is caused by code or environment

WARNING:torch.distributed.elastic.agent.server.api:Received 2 death signal, shutting down workers
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4797 closing signal SIGINT
Traceback (most recent call last):
File "main.py", line 84, in
main()
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 119, in run
ret = run_job(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "main.py", line 77, in main
train(model, train_dataloader, validation_dataloader, test_dataloader, accelerator,
File "/root/Biot5/biot5/utils/train_utils.py", line 289, in train
accelerator.backward(loss / args.optim.grad_acc)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/accelerate/accelerator.py", line 1966, in backward
loss.backward(**kwargs)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
KeyboardInterrupt
Traceback (most recent call last):
File "/root/anaconda3/envs/biot5/bin/torchrun", line 8, in
sys.exit(main())
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 241, in launch_agent
result = agent.run()
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
result = f(*args, **kwargs)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 723, in run
result = self._invoke_run(role)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 864, in _invoke_run
time.sleep(monitor_interval)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 62, in _terminate_process_handler
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 4763 got signal: 2

a problem

作者您好，anet代码在我的gpu上训练时间太久，整个流程大概需要5天，这对我来说试错成本太高了。
想问下您，activitynet1.2是activitynet1.3的子类，如果我用RTD在activitynet1.2上训练，从activitynet1.3的视频特征中提取出属于activitynet1.2的视频的特征，这样是否是可以的呢？因为目前tsn的特征，只有activitynet1.3的，没有activitynet1.2的。然后我也将activitynet1.2与activitynet1.3的注释文件进行了对比，结果显示activitynet1.2的视频在activitynet1.3都存在，且视频持续时间都是一样的，每个视频annotations的数量都是一致的，仅有三个视频activitynet1.3的annotations的数量增加了一个。

from datasets import build_dataset ;from models import build_model

from datasets import build_dataset
from models import build_model
I don't see these two files in these two folders. Where are the source codes of build_dataset，build_model these two functions？

temporal action detection

I would like to ask if you can tell how to do temporal action detection tasks with the generated proposal? What has been modified on the model? It would be nice to tell me the code!

Results

I follow the steps of github to train and test, but the results I get is different from the paper result，I wonder if there is something I haven't noticed?

Looking forward your reply!

ActivityNet

Can you provide the code of ActivityNet dataset?

A question about reproducibility~

Thank you very much for your open source.

There is a training question, why after setting the random seed each training is still different. The comparison is influenced by random factors.

about activitynet feature

Can you provide the feature of activitynet 1.3 that are rescaled to 100 via linear interpolation?The link that you provide can't visit and the other link is hard to download.

ANet

I want to know, which one the feature file do you use?

And I use the feature of anet_i3d_feature_25fps/flow(rgb)-resize-step16 and test with your checkpoint_best_auc.pth, but the resuts I get is this.

Hello, may I ask, in the paper, the training is carried out in two stages, but in the code at that time, the training is carried out in three stages, why? How do they correspond to each other

On the time of the code release

Hi @tony2016uestc ,
Thanks for your wonderful work. I plan to follow your work, and try out Action Proposal Generation. But, it is hard for me to implement your approach on my own.
When would you release your code? I am going to use your code to accelerate my reproduction.
Thanks again :)

How do we get TEM_scores？

Hi, Thanks for your wonderful work. If I track this work in my Date, how do we get TEM_scores, such as start score, end score?

mAP evaluation

Hi, thanks for your interesting work! I'm wondering when would you share the evaluation code for calculating the Mean Average Precision (mAP)? Thanks so much!

about TEM

Hi, can I get the code of BSN in thumos14?

a question for training

GPU能胜任训练吗

torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0

跑代码报了这个错，真的不知道出了什么问题
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_0p4sbyi9/none_egcn9ob1/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_0p4sbyi9/none_egcn9ob1/attempt_0/1/error.json
[W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
QStandardPaths: XDG_RUNTIME_DIR points to non-existing path '/run/user/1065', please create it with 0700 permissions.
QStandardPaths: XDG_RUNTIME_DIR points to non-existing path '/run/user/1065', please create it with 0700 permissions.
qt.qpa.screen: QXcbConnection: Could not connect to display localhost:11.0
Could not connect to any X display.
qt.qpa.screen: QXcbConnection: Could not connect to display localhost:11.0
Could not connect to any X display.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 161242) of binary: /home/10601006/apps/anaconda3/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:

when will you publish the code?

Good job!! i like it.
when will you publish the code?
thanks

I have a problem

我发现RTD滑动窗口的大小是受到限制的。因为在进行验证时要保证一个batchsize中至少包含一个groundtruth，这就需要对滑动窗口的size进行限制。可是验证是当作测试来做的，不能保证至少有一个groundtruth与其对应，是否需要修改代码，当一个batchsize的滑动窗口没有groundtruth与之对应时，其loss计算方法改变一下。之前我在使用GTAD跑别的数据集时，也遇到过训练正常但是验证loss出现nan的问题，其原因也是没有groundtruth与之对应，造成有分母为0的情况，我把这样的特殊情况，特别设置分母为1。

RTD code for ActivityNet-1.3

Can the code of RTD-Action for ActivityNet-1.3 be rewritten from RTD-Action for THUMOS14？

TEM_scores

What's the meaning of the action start end in the TEM_scores files?

Thank you for your reply.