mcg-nju / rtd-action Goto Github PK

[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

License: Apache License 2.0

Python 99.06% Shell 0.94%

transformer temporal-action-localization temporal-action-proposals

rtd-action's Issues

How do we get TEM_scores？

Hi, Thanks for your wonderful work. If I track this work in my Date, how do we get TEM_scores, such as start score, end score?

RTD code for ActivityNet-1.3

Can the code of RTD-Action for ActivityNet-1.3 be rewritten from RTD-Action for THUMOS14？

How can I get a multi-scale proposal

Ask a question again.
Through experiments, the model makes proposals in different positions by sliding the window. The length of the proposal is basically the same. This phenomenon is due to the setting of positional embedding or other reasons. I added the start and end score convolution to the overall model to train together. @tony2016uestc

torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0

跑代码报了这个错，真的不知道出了什么问题
INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_0p4sbyi9/none_egcn9ob1/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_0p4sbyi9/none_egcn9ob1/attempt_0/1/error.json
[W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
[W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device.
QStandardPaths: XDG_RUNTIME_DIR points to non-existing path '/run/user/1065', please create it with 0700 permissions.
QStandardPaths: XDG_RUNTIME_DIR points to non-existing path '/run/user/1065', please create it with 0700 permissions.
qt.qpa.screen: QXcbConnection: Could not connect to display localhost:11.0
Could not connect to any X display.
qt.qpa.screen: QXcbConnection: Could not connect to display localhost:11.0
Could not connect to any X display.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 161242) of binary: /home/10601006/apps/anaconda3/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:

a question for training

GPU能胜任训练吗

A question about reproducibility~

Thank you very much for your open source.

There is a training question, why after setting the random seed each training is still different. The comparison is influenced by random factors.

from datasets import build_dataset ;from models import build_model

from datasets import build_dataset
from models import build_model
I don't see these two files in these two folders. Where are the source codes of build_dataset，build_model these two functions？

labe issue

I am confused about the label problem. In your code, the label of the ground-truth is 0, which represent this is action, can I think that the lable output by the network is 0 for action, not 0 for background？ If the predicted label is 0, does it match the ground-truth?

I have a problem

我发现RTD滑动窗口的大小是受到限制的。因为在进行验证时要保证一个batchsize中至少包含一个groundtruth，这就需要对滑动窗口的size进行限制。可是验证是当作测试来做的，不能保证至少有一个groundtruth与其对应，是否需要修改代码，当一个batchsize的滑动窗口没有groundtruth与之对应时，其loss计算方法改变一下。之前我在使用GTAD跑别的数据集时，也遇到过训练正常但是验证loss出现nan的问题，其原因也是没有groundtruth与之对应，造成有分母为0的情况，我把这样的特殊情况，特别设置分母为1。

about anet code

您好，我去年复现了rtd 在thumos14数据集上的代码，但是我用同样的环境运行anet的代码，却发生了环境方面的错误，nccl278 error。刚才我发现anet代码中util/misc.py文件中，这个函数torch.distributed.init_process_group(
backend=args.dist_backend,
init_method=args.dist_url,
world_size=args.world_size,
rank=args.rank,
)，最后一个参数后面有一个逗号（rank=args.rank,），但是thumos14的代码中是没有，我想问下这个多出来的逗号是一个错误吗？

ActivityNet

Can you provide the code of ActivityNet dataset?

how to change the stride of THUMOS features

Hi ,

Thanks for making the code public. I am interested to change the temporal resolution of the current model. However, too many hyperparameter to tune. Which one actually changes the resolution ? It seems if i change window size from 100 to 70 but it fails in evaluation

TEM_scores

What's the meaning of the action start end in the TEM_scores files?

Thank you for your reply.

when will you publish the code?

Good job!! i like it.
when will you publish the code?
thanks

On the time of the code release

Hi @tony2016uestc ,
Thanks for your wonderful work. I plan to follow your work, and try out Action Proposal Generation. But, it is hard for me to implement your approach on my own.
When would you release your code? I am going to use your code to accelerate my reproduction.
Thanks again :)

a problem

作者您好，anet代码在我的gpu上训练时间太久，整个流程大概需要5天，这对我来说试错成本太高了。
想问下您，activitynet1.2是activitynet1.3的子类，如果我用RTD在activitynet1.2上训练，从activitynet1.3的视频特征中提取出属于activitynet1.2的视频的特征，这样是否是可以的呢？因为目前tsn的特征，只有activitynet1.3的，没有activitynet1.2的。然后我也将activitynet1.2与activitynet1.3的注释文件进行了对比，结果显示activitynet1.2的视频在activitynet1.3都存在，且视频持续时间都是一样的，每个视频annotations的数量都是一致的，仅有三个视频activitynet1.3的annotations的数量增加了一个。

about TEM

Hi, can I get the code of BSN in thumos14?

One question about the the boundary attentive module

Hi~
Does the input feature and start/end scores is pretrained in advance?

about RTD in Anet

Hello, I see that your paper combines the proposals generated by RTD on activitynet1.3 with untrimmednet to get the detection result. Can you provide the video classification results or model of untrimmednet training on activitynet1.3?

WARNING:torch.distributed.elastic.agent.server.api:Received 2 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4797 closing signal SIGINT

[2024-02-22 19:20:17,149][datasets.builder][WARNING] - Found cached dataset json (/root/.cache/huggingface/datasets/json/default-70dc00f935d3701b/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 138.50it/s]
[2024-02-22 19:20:22,163][torch.nn.parallel.distributed][INFO] - Reducer buckets have been rebuilt in this iteration.

#my get model stuck here, after I interrupt it, it shows below. I want to ask whether this wrong is caused by code or environment

WARNING:torch.distributed.elastic.agent.server.api:Received 2 death signal, shutting down workers
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4797 closing signal SIGINT
Traceback (most recent call last):
File "main.py", line 84, in
main()
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 119, in run
ret = run_job(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "main.py", line 77, in main
train(model, train_dataloader, validation_dataloader, test_dataloader, accelerator,
File "/root/Biot5/biot5/utils/train_utils.py", line 289, in train
accelerator.backward(loss / args.optim.grad_acc)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/accelerate/accelerator.py", line 1966, in backward
loss.backward(**kwargs)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
KeyboardInterrupt
Traceback (most recent call last):
File "/root/anaconda3/envs/biot5/bin/torchrun", line 8, in
sys.exit(main())
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 241, in launch_agent
result = agent.run()
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/metrics/api.py", line 129, in wrapper
result = f(*args, **kwargs)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 723, in run
result = self._invoke_run(role)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/agent/server/api.py", line 864, in _invoke_run
time.sleep(monitor_interval)
File "/root/anaconda3/envs/biot5/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 62, in _terminate_process_handler
raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval)
torch.distributed.elastic.multiprocessing.api.SignalException: Process 4763 got signal: 2

about thumos14 annotation

你好，我看到您提供的thumos14_anno__action.json以及thumos14_anno_action_class_idx.json文件中video_test_0001459的duration_second、fps与您提供的thumos14_test_groundtruth.csv中的不一致，通过与gtad对比thumos14_test_groundtruth.csv中的应该是正确的。请问这是您没有注意的一个错误吗？我目前只看了这一个视频，因为我发现thumos14中有些视频的帧频并不是30fps，而在thumos14_anno__action.json、thumos14_anno_action_class_idx.json中您把所有的视频的帧频都设置成了30fps

Hello, may I ask, in the paper, the training is carried out in two stages, but in the code at that time, the training is carried out in three stages, why? How do they correspond to each other

mAP evaluation

Hi, thanks for your interesting work! I'm wondering when would you share the evaluation code for calculating the Mean Average Precision (mAP)? Thanks so much!

mcg-nju / rtd-action Goto Github PK

rtd-action's Issues

Recommend Projects

Recommend Topics

Recommend Org