happyharrycn / actionformer_release Goto Github PK
View Code? Open in Web Editor NEWCode release for ActionFormer (ECCV 2022)
License: MIT License
Code release for ActionFormer (ECCV 2022)
License: MIT License
Thanks for your great work!
When I check the scores of the external classifier on THUMOS dataset, I found the max scores and sum of all the scores for one video are larger than 1.
For example:
>>> import pickle as pkl
>>> scores= pkl.load(open('thumos14_cls_scores.pkl','rb'))
>>> k="video_test_0000004"; scores[k];sum(scores[k])
array([0.02432016, 0.03033004, 0.00686024, 0.00082171, 0.00296591,
0.40734518, 0.3418732 , 0.11635409, 0.21866749, 0.00644728,
0.0069871 , 0.03885338, 0.073525 , 0.01213155, 0.01179111,
0.0214216 , 0.10836662, 0.09010944, 0.01251512, 0.09871857],
dtype=float32)
1.6304047908633947
>>> k="video_test_0000786"
>>> k="video_test_0000786";scores[k];sum(scores[k])
array([1.4641240e-01, 3.0841224e-04, 5.1868934e-05, 1.9699870e-05,
1.0326537e-05, 1.2187198e+00, 6.0507798e-01, 1.5942456e-05,
1.3059583e-03, 1.2300874e-04, 4.0310311e-05, 2.0225532e-03,
2.5760273e-03, 4.8151556e-05, 1.1477998e-04, 6.2134140e-04,
3.6271741e-03, 8.6841742e-03, 3.3115208e-05, 8.4242632e-04],
dtype=float32)
1.9906554949593556
These scores seems already after softmax. It would be great if you can explain how you scale them. Thanks a lot!
thx for your work,could you tell me nore details about “score” in submission format?
Hello - thanks for the great contribution. Was wondering, for ActivityNet1.3, is the typical benchmark for the one-class problem? Saw that the config for activitynet config there was "num_classes: 1", and trying to understand if that is standard versus the 200-class problem. (can't seem to find confirmation in other literature) or I am misunderstanding the codebase.
Thanks!!
Currently when I infer or evaluate a video with the pretrained ActivityNet model, the action label is always 0[time segments are available as expected]. The reason I guess is the pretrained model was trained with the setting num_classes=1
Also, the valid_one_epoch uses external score file to get the action labels. How was the score file generated and how to get the action label if one does not have such file? Is there any parameter in the config file that needs to be changed?
Have you trained models for action on EPIC Kitchens?
When conducting compilation of NMS, an error occurs: "RuntimeError: Error compiling objects for extension",
I use:
Python 3.6.10
CUDA 11.3
Pytorch 1.10
Can you help me?
Thanks.
We found that you tested different window sizes for local self-attention, and I want to know when the window is Full, what the n_mha_win_size should be set to
I try to use i3d extract features as .npy format.However, the shape of the output is 4D.The shape of your preprocessing dataset is 2D.What should I do next to achieve the right demension?
Hello, this is a very good job! If I want to use DETAD for visualization (FN, sensitivity analysis, etc.) as in your paper, can you provide the code to generate the json file of the results of your model?
Hello, I want to ask, what is the PointGenerator in the code?
Great work! Sorry to bother you.
I want to ask how to get your visualization results on the regression of the localization.
I am very grateful if you can get it and give me an answer!
Thanks for your great work! I would like to ask a question about the denominator of classification loss.
As shown in Eq.7 in the paper, the classification loss (using sigmoid focal loss) is divided by the length of the valid input sequence. However, I found in your implementation, that the classification loss is divided by the number of positive samples, which is the same as regression loss. Can you explain the reason? Thanks.
After training on my own dataset, I found the result after mulit-class nms is confusing! For example, segment1 is [1s:4s] and label is 1, segment2 is [2s:5s] and label is 2, so the label of segment [2s:4s] is confusing!
Two objects' IOU is very high, but they belong to different labels, so nms do not filter one of them. In object detection it's a normal situation, but in video processing I don't know how to deal with it! Maybe multi-nms is not appropriate?
Can you give me a little advice? Thanks a lot!
If run the code directly, I will get the following error:
RuntimeError: index_add_cuda_ does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.
when doing sample with images from video, how to generate gt json consistent with feature extracted by action model(eg, I3D)?
look forward to your reply? Do you have same code for us inference? Thanks very much!
Hi @happyharrycn,
Thank you for the great work.
I am trying to run the inference code for ActivityNet dataset. Currently the pretrained model supports score fusion due to which labels are coming from the cuhk_val_simp_share.json
file. I want to try generating numbers without score fusion, and I think the pretrained model currently don't support that configuration.
I tried making the changes in ./configs/anet_tsp.yaml
by below changes:
@@ -51,9 +51,9 @@ test_cfg: {
max_seg_num: 100,
min_score: 0.001,
# score fusion
- multiclass_nms: False,
+ multiclass_nms: True,
nms_sigma : 0.75,
- ext_score_file: ./data/anet_1.3/annotations/cuhk_val_simp_share.json,
+ #ext_score_file: ./data/anet_1.3/annotations/cuhk_val_simp_share.json,
duration_thresh: 0.1,
}
With the above changes I got 0 mAP, probably because the num_classes
is configured as 1 currently.
Also, I could reproduce the number for THUMOS dataset where the score fusion is disabled and num_classes
is set to 20 which are the number of action classes for the dataset.
I have seen many different types of THUMOS14 in temporal action localization such as .json and .csv. Now I am making a new data set for temporal action localization, could you tell me if there any other method using the same type of data as yours? Thinks!
Great work!
I would like to know what specific changes you have made to make thumos14 perform better than before
Thank you for your good project!
in lines 510-515 at ./libs/modeling/meta_archs.py, I can't understand the code but it seems to keep only one action for one moment in gt_reg, but in experiments, I found that EPIC-kitchens-100 dataset has many videos that have more than one action at the same time, as shown in its original paper:
So, if there are some errors in the code, how to modify it to allow multiple actions per timestep?
My understanding is that enabling score fusion will use the external classification score, which should lead to better results.
However, when I disabling the score fusion, i.e.,:
# when using external scores, our model is generating "proposals"
# multiclass_nms: False,
# ext_score_file: ./data/thumos/annotations/thumos14_cls_scores.pkl,
# comment out L47-48 and uncomment L50 to disable score fusion
multiclass_nms: True,
The eval result is:
|tIoU = 0.30: mAP = 81.87 (%)
|tIoU = 0.40: mAP = 77.58 (%)
|tIoU = 0.50: mAP = 71.65 (%)
|tIoU = 0.60: mAP = 57.75 (%)
|tIoU = 0.70: mAP = 43.07 (%)
Avearge mAP: 66.38 (%)
While when enabling the score fusion, i.e.,:
# when using external scores, our model is generating "proposals"
multiclass_nms: False,
ext_score_file: ./data/thumos/annotations/thumos14_cls_scores.pkl,
# comment out L47-48 and uncomment L50 to disable score fusion
# multiclass_nms: True,
I got a worse result:
|tIoU = 0.30: mAP = 74.82 (%)
|tIoU = 0.40: mAP = 71.34 (%)
|tIoU = 0.50: mAP = 65.70 (%)
|tIoU = 0.60: mAP = 55.28 (%)
|tIoU = 0.70: mAP = 41.97 (%)
Avearge mAP: 61.82 (%)
BTW, I did NOT retrain the model after revising the yaml because I think this should only do with the testing.
Hi! thanks for your great work, look forward to your way to training and evaluating on our own dataset ?
segments = torch.from_numpy(
(video_item['segments'] * video_item['fps'] - 0.5 * num_frames) / feat_stride
)
Here, I don't understand if your purpose is to convert time stamp (in second) into temporal feature grids, why it is ok to have small negative values and float value? the result of segment here is the feature's index (start,end) of temporal dim? what is the meaning of - 0.5 * self.num_frames?
Thank you for your reply
For ActivityNet dataset, you use TSP model to get its feature. I want to konw how can you get the feature dimention [T,C]. As I know, the output feature dimention of TSP is [ feature size ].
Hi! thanks for your great work. I wonder what the out_mask and qx_mask are used for?
Thanks for your great work!
The datasets that used here are all single-label datasets where there is just one action per time-step?
Can this model be used on multi-label datasets (i. e., MultiTHUMOS, Charades) by replacing loss functions or just replacing nms with multi-class nms.
sorry to bother you,could you tell me some details about negative samples for the video features?
Thanks for your works first.And I'm trying to train the model on my own dataset, and found that there's a file named thumos14_cls_scores.pkl under thmos dataset.How can I generate the pkl file on my own dataset?
Thanks for your nice work! In the released code I notice that one action label '4' is masked in the Thumos14 dataset. I wonder why the action category should be masked. What's the purpose of this step ?
actionformer_release/libs/utils/nms.py
Lines 11 to 12 in 4e3a02b
actionformer_release/libs/utils/nms.py
Lines 170 to 171 in 4e3a02b
cls_idxs, scores are in opposite order
Hi! thanks for your great work,I added BiFPN to the network, but the performance dropped. It was tested based on my own dataset. Have you ever tested adding a pyramid structure similar to BiFPN?
Thanks for sharing the code of the .awesome work.
I have questions about the I3D features on ActivityNet. Are the features from CMCS or extracted on your own? I found that their features from CMCS are incomplete. The features of around 400 videos on the validation set are missing, which might affect the performance evaluation.
Thanks!
你好怎么使用postprocessing对结果预测
For example, the duration of video_validation_0000051 is 169.79 and the fps is 30. How to get 1269 in its i3d feature dimension (1269,2048)?
great work,could you provide the pretrained model?thx!
Hi,
I was wondering if you could explian that why you compute the mPA only for 200 first frames of the test video?
Do not we need to compute it for all frames of all test videos?
Thanks
I change the config file like this ,"devices": [0,1,2,3] and add code
USE_CUDA = torch.cuda.is_available()
device = torch.device("cuda:0" if USE_CUDA else "cpu")
but I got the error:
Traceback (most recent call last):
File "/data1/qianxiong/Actionformer/train.py", line 186, in
main(args)
File "/data1/qianxiong/Actionformer/train.py", line 128, in main
train_one_epoch(
File "/data1/qianxiong/Actionformer/libs/utils/train_utils.py", line 277, in train_one_epoch
losses = model(video_list)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise
raise exception
IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/qianxiong/Actionformer/libs/modeling/meta_archs.py", line 334, in forward
batched_inputs, batched_masks = self.preprocessing(video_list)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/data1/qianxiong/Actionformer/libs/modeling/meta_archs.py", line 423, in preprocessing
batched_inputs = batched_inputs.to(self.device)
File "/data1/qianxiong/Actionformer/libs/modeling/meta_archs.py", line 330, in device
return list(set(p.device for p in self.parameters()))[0]
IndexError: list index out of range
Process finished with exit code 1
Please tell me how to use multigpu to train your code?
I want to get the predicted result and compare this with the true result ,can anyone give me a guide
external classification scores #6
Hi, It seems that the ./libs/utils/nms.py import module name 'nms_1d_cpu'
Where could I find this module? It seems that nms.py is from mmcv but I cannot find any nms_1d_cpu named file in mmcv github...
Hey,
I was trying to run your code but I am getting the above error. Could you please help me out with it?
你好你使用了giouloss但是在这个任务中,c和(AuB)不应该是相等的吗
how can I train this on multi GPU devices
Thanks for your great work,I've been trying to reproduce this code recently and I'd like to ask how to get this external classification scores. Does it come from I3D results ?
我的英语不是很好,我想问下您external classification scores是否来自于I3D对未剪辑视频的分类结果,谢谢您和您团队的工作!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.