happyharrycn / actionformer_release Goto Github PK

View Code? Open in Web Editor NEW

389.0 10.0 71.0 643 KB

Code release for ActionFormer (ECCV 2022)

License: MIT License

Python 96.99% C++ 2.51% Shell 0.50%

action-localization computer-vision deeplearning eccv2022 video-analysis vision-transformer

actionformer_release's People

Contributors

Stargazers

Watchers

actionformer_release's Issues

External Classifier scores are larger than 1

Thanks for your great work!
When I check the scores of the external classifier on THUMOS dataset, I found the max scores and sum of all the scores for one video are larger than 1.
For example:

>>> import pickle as pkl
>>> scores= pkl.load(open('thumos14_cls_scores.pkl','rb'))
>>> k="video_test_0000004"; scores[k];sum(scores[k])
array([0.02432016, 0.03033004, 0.00686024, 0.00082171, 0.00296591,
       0.40734518, 0.3418732 , 0.11635409, 0.21866749, 0.00644728,
       0.0069871 , 0.03885338, 0.073525  , 0.01213155, 0.01179111,
       0.0214216 , 0.10836662, 0.09010944, 0.01251512, 0.09871857],
      dtype=float32)
1.6304047908633947
>>> k="video_test_0000786"
>>> k="video_test_0000786";scores[k];sum(scores[k])
array([1.4641240e-01, 3.0841224e-04, 5.1868934e-05, 1.9699870e-05,
       1.0326537e-05, 1.2187198e+00, 6.0507798e-01, 1.5942456e-05,
       1.3059583e-03, 1.2300874e-04, 4.0310311e-05, 2.0225532e-03,
       2.5760273e-03, 4.8151556e-05, 1.1477998e-04, 6.2134140e-04,
       3.6271741e-03, 8.6841742e-03, 3.3115208e-05, 8.4242632e-04],
      dtype=float32)
1.9906554949593556

These scores seems already after softmax. It would be great if you can explain how you scale them. Thanks a lot!

about submission

thx for your work，could you tell me nore details about “score” in submission format？

ActivityNet 1-class vs. 200-class problem

Hello - thanks for the great contribution. Was wondering, for ActivityNet1.3, is the typical benchmark for the one-class problem? Saw that the config for activitynet config there was "num_classes: 1", and trying to understand if that is standard versus the 200-class problem. (can't seem to find confirmation in other literature) or I am misunderstanding the codebase.

Thanks!!

How to get the action labels?

Currently when I infer or evaluate a video with the pretrained ActivityNet model, the action label is always 0[time segments are available as expected]. The reason I guess is the pretrained model was trained with the setting num_classes=1

Also, the valid_one_epoch uses external score file to get the action labels. How was the score file generated and how to get the action label if one does not have such file? Is there any parameter in the config file that needs to be changed?

EPIC Kitchens action？

Have you trained models for action on EPIC Kitchens？

About compilation of NMS

When conducting compilation of NMS, an error occurs: "RuntimeError: Error compiling objects for extension",
I use:

Python 3.6.10
CUDA 11.3
Pytorch 1.10

Can you help me?
Thanks.

Hello, I can't open the google drive link you shared about ActivityNet 13. Pre-trained Model I don't know if there is a Baidu cloud link or other ways to open the link?

About window size

We found that you tested different window sizes for local self-attention, and I want to know when the window is Full, what the n_mha_win_size should be set to

How can I use kenetics-i3d to get the video features

I try to use i3d extract features as .npy format.However, the shape of the output is 4D.The shape of your preprocessing dataset is 2D.What should I do next to achieve the right demension?

About Visualization

Hello, this is a very good job! If I want to use DETAD for visualization (FN, sensitivity analysis, etc.) as in your paper, can you provide the code to generate the json file of the results of your model？

PointGenerato

Hello, I want to ask, what is the PointGenerator in the code？

About visualization

Great work! Sorry to bother you.
I want to ask how to get your visualization results on the regression of the localization.
I am very grateful if you can get it and give me an answer!

About the denominator of classification loss

Thanks for your great work! I would like to ask a question about the denominator of classification loss.

As shown in Eq.7 in the paper, the classification loss (using sigmoid focal loss) is divided by the length of the valid input sequence. However, I found in your implementation, that the classification loss is divided by the number of positive samples, which is the same as regression loss. Can you explain the reason? Thanks.

actionformer_release/libs/modeling/meta_archs.py

Line 571 in c893d8a

cls_loss /= self.loss_normalizer

Figure 3. Visualization of our results.

Hello, how is the "Figure 3. Visualization of our results." part of the paper implemented? Is there any visualization code.Thank you.

multi-class nms

After training on my own dataset, I found the result after mulit-class nms is confusing! For example, segment1 is [1s:4s] and label is 1, segment2 is [2s:5s] and label is 2, so the label of segment [2s:4s] is confusing!
Two objects' IOU is very high, but they belong to different labels, so nms do not filter one of them. In object detection it's a normal situation, but in video processing I don't know how to deal with it! Maybe multi-nms is not appropriate?
Can you give me a little advice? Thanks a lot!

where is the undeterministic implementation？

If run the code directly, I will get the following error:

RuntimeError: index_add_cuda_ does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

make gt(.json) consistent with feature extracted by action model(eg, I3D)

when doing sample with images from video, how to generate gt json consistent with feature extracted by action model(eg, I3D)?
look forward to your reply? Do you have same code for us inference? Thanks very much!

Generating evaluation for ActivityNet dataset without score fusion

Hi @happyharrycn,

Thank you for the great work.

I am trying to run the inference code for ActivityNet dataset. Currently the pretrained model supports score fusion due to which labels are coming from the cuhk_val_simp_share.json file. I want to try generating numbers without score fusion, and I think the pretrained model currently don't support that configuration.

I tried making the changes in ./configs/anet_tsp.yaml by below changes:

@@ -51,9 +51,9 @@ test_cfg: {
   max_seg_num: 100,
   min_score: 0.001,
   # score fusion
-  multiclass_nms: False,
+  multiclass_nms: True,
   nms_sigma : 0.75,
-  ext_score_file: ./data/anet_1.3/annotations/cuhk_val_simp_share.json,
+  #ext_score_file: ./data/anet_1.3/annotations/cuhk_val_simp_share.json,
   duration_thresh: 0.1,
 }

With the above changes I got 0 mAP, probably because the num_classes is configured as 1 currently.
Also, I could reproduce the number for THUMOS dataset where the score fusion is disabled and num_classes is set to 20 which are the number of action classes for the dataset.

About THUMOS14 in this repo

I have seen many different types of THUMOS14 in temporal action localization such as .json and .csv. Now I am making a new data set for temporal action localization, could you tell me if there any other method using the same type of data as yours? Thinks!

in label_points_single_video, why do u select min_len index to build regression target?

About performence

Great work！
I would like to know what specific changes you have made to make thumos14 perform better than before

Does it only support the gt_segment that just has one action for one moment?

Thank you for your good project!

in lines 510-515 at ./libs/modeling/meta_archs.py, I can't understand the code but it seems to keep only one action for one moment in gt_reg, but in experiments, I found that EPIC-kitchens-100 dataset has many videos that have more than one action at the same time, as shown in its original paper:

So, if there are some errors in the code, how to modify it to allow multiple actions per timestep?

Why disabling score fusion get even better results?

My understanding is that enabling score fusion will use the external classification score, which should lead to better results.
However, when I disabling the score fusion, i.e.,:

# when using external scores, our model is generating "proposals"
# multiclass_nms: False,
# ext_score_file: ./data/thumos/annotations/thumos14_cls_scores.pkl,
# comment out L47-48 and uncomment L50 to disable score fusion
multiclass_nms: True,

The eval result is:

|tIoU = 0.30: mAP = 81.87 (%)
|tIoU = 0.40: mAP = 77.58 (%)
|tIoU = 0.50: mAP = 71.65 (%)
|tIoU = 0.60: mAP = 57.75 (%)
|tIoU = 0.70: mAP = 43.07 (%)
Avearge mAP: 66.38 (%)

While when enabling the score fusion, i.e.,:

# when using external scores, our model is generating "proposals"
multiclass_nms: False,
ext_score_file: ./data/thumos/annotations/thumos14_cls_scores.pkl,
# comment out L47-48 and uncomment L50 to disable score fusion
# multiclass_nms: True,

I got a worse result:

|tIoU = 0.30: mAP = 74.82 (%)                  
|tIoU = 0.40: mAP = 71.34 (%)                
|tIoU = 0.50: mAP = 65.70 (%)            
|tIoU = 0.60: mAP = 55.28 (%)                   
|tIoU = 0.70: mAP = 41.97 (%)                
Avearge mAP: 61.82 (%)

BTW, I did NOT retrain the model after revising the yaml because I think this should only do with the testing.

Training and Evaluating On Our Own Dataset

Hi! thanks for your great work, look forward to your way to training and evaluating on our own dataset ?

in anet.py dataset, why u need to minus 0.5*num_frames? this is weird

segments = torch.from_numpy(
(video_item['segments'] * video_item['fps'] - 0.5 * num_frames) / feat_stride
)

convert time stamp into temporal feature

Here, I don't understand if your purpose is to convert time stamp (in second) into temporal feature grids, why it is ok to have small negative values and float value? the result of segment here is the feature's index (start,end) of temporal dim? what is the meaning of - 0.5 * self.num_frames?

Thank you for your reply

Feature dimension

For ActivityNet dataset, you use TSP model to get its feature. I want to konw how can you get the feature dimention [T,C]. As I know, the output feature dimention of TSP is [ feature size ].

some question about mask

Hi! thanks for your great work. I wonder what the out_mask and qx_mask are used for?

Can this method be used for multi-label temporal localization?

Thanks for your great work!
The datasets that used here are all single-label datasets where there is just one action per time-step?
Can this model be used on multi-label datasets (i. e., MultiTHUMOS, Charades) by replacing loss functions or just replacing nms with multi-class nms.

video features

sorry to bother you，could you tell me some details about negative samples for the video features？

About training on my own dataset

Thanks for your works first.And I'm trying to train the model on my own dataset, and found that there's a file named thumos14_cls_scores.pkl under thmos dataset.How can I generate the pkl file on my own dataset?

The meaning of label '4'

Thanks for your nice work! In the released code I notice that one action label '4' is masked in the Thumos14 dataset. I wonder why the action category should be masked. What's the purpose of this step ?

unmatched function parameter order in hard nms

actionformer_release/libs/utils/nms.py

Lines 11 to 12 in 4e3a02b

 ctx, segs, cls_idxs, scores, 

 iou_threshold, min_score, max_num

actionformer_release/libs/utils/nms.py

Lines 170 to 171 in 4e3a02b

 segs, scores, cls_idxs, iou_threshold, 

 min_score, max_seg_num

cls_idxs, scores are in opposite order

baidutyun 提取密码多少显示错误

With BiFPN

Hi! thanks for your great work,I added BiFPN to the network, but the performance dropped. It was tested based on my own dataset. Have you ever tested adding a pyramid structure similar to BiFPN?

ActivityNet I3D features

Thanks for sharing the code of the .awesome work.
I have questions about the I3D features on ActivityNet. Are the features from CMCS or extracted on your own? I found that their features from CMCS are incomplete. The features of around 400 videos on the validation set are missing, which might affect the performance evaluation.
Thanks!

数据后处理

你好怎么使用postprocessing对结果预测

How are the feature dimensions of the model aligned with the video duration?

For example, the duration of video_validation_0000051 is 169.79 and the fps is 30. How to get 1269 in its i3d feature dimension (1269,2048)?

pretrained model

great work，could you provide the pretrained model？thx！

Computing mPA

Hi,

I was wondering if you could explian that why you compute the mPA only for 200 first frames of the test video?
Do not we need to compute it for all frames of all test videos?

Thanks

How to use multigpu？

I change the config file like this ，"devices": [0,1,2,3] and add code
USE_CUDA = torch.cuda.is_available()
device = torch.device("cuda:0" if USE_CUDA else "cpu")
but I got the error：
Traceback (most recent call last):
File "/data1/qianxiong/Actionformer/train.py", line 186, in
main(args)
File "/data1/qianxiong/Actionformer/train.py", line 128, in main
train_one_epoch(
File "/data1/qianxiong/Actionformer/libs/utils/train_utils.py", line 277, in train_one_epoch
losses = model(video_list)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise
raise exception
IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/qianxiong/Actionformer/libs/modeling/meta_archs.py", line 334, in forward
batched_inputs, batched_masks = self.preprocessing(video_list)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/data1/qianxiong/Actionformer/libs/modeling/meta_archs.py", line 423, in preprocessing
batched_inputs = batched_inputs.to(self.device)
File "/data1/qianxiong/Actionformer/libs/modeling/meta_archs.py", line 330, in device
return list(set(p.device for p in self.parameters()))[0]
IndexError: list index out of range

Process finished with exit code 1
Please tell me how to use multigpu to train your code？

How can I get the predicted onset and prediected offset

I want to get the predicted result and compare this with the true result ,can anyone give me a guide

请问如何计算出ext_score_file(thumos14_cls_scores.pkl)?

external classification scores #6

About environment

Could you tell me the environment in which the code runs, such as the version of pytorch and cuda. Because I found that after you modified some code, running your code on the original environment is an error.

How to visualize the results like in the paper

nms_1d_cpu

Hi, It seems that the ./libs/utils/nms.py import module name 'nms_1d_cpu'
Where could I find this module? It seems that nms.py is from mmcv but I cannot find any nms_1d_cpu named file in mmcv github...

ModuleNotFoundError: No module named 'nms_1d_cpu'

Hey,

I was trying to run your code but I am getting the above error. Could you please help me out with it?

loss

你好你使用了giouloss但是在这个任务中，c和(AuB)不应该是相等的吗

multi GPU training

how can I train this on multi GPU devices

external classification scores

Thanks for your great work，I've been trying to reproduce this code recently and I'd like to ask how to get this external classification scores. Does it come from I3D results ?
我的英语不是很好，我想问下您external classification scores是否来自于I3D对未剪辑视频的分类结果，谢谢您和您团队的工作！

	ctx, segs, cls_idxs, scores,
	iou_threshold, min_score, max_num

	segs, scores, cls_idxs, iou_threshold,
	min_score, max_seg_num

happyharrycn / actionformer_release Goto Github PK

actionformer_release's People

Contributors

Stargazers

Watchers

Forkers

actionformer_release's Issues

Recommend Projects

Recommend Topics

Recommend Org