Giter VIP home page Giter VIP logo

actionformer_release's People

Contributors

fmu2 avatar happyharrycn avatar tzzcl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

actionformer_release's Issues

External Classifier scores are larger than 1

Thanks for your great work!
When I check the scores of the external classifier on THUMOS dataset, I found the max scores and sum of all the scores for one video are larger than 1.
For example:

>>> import pickle as pkl
>>> scores= pkl.load(open('thumos14_cls_scores.pkl','rb'))
>>> k="video_test_0000004"; scores[k];sum(scores[k])
array([0.02432016, 0.03033004, 0.00686024, 0.00082171, 0.00296591,
       0.40734518, 0.3418732 , 0.11635409, 0.21866749, 0.00644728,
       0.0069871 , 0.03885338, 0.073525  , 0.01213155, 0.01179111,
       0.0214216 , 0.10836662, 0.09010944, 0.01251512, 0.09871857],
      dtype=float32)
1.6304047908633947
>>> k="video_test_0000786"
>>> k="video_test_0000786";scores[k];sum(scores[k])
array([1.4641240e-01, 3.0841224e-04, 5.1868934e-05, 1.9699870e-05,
       1.0326537e-05, 1.2187198e+00, 6.0507798e-01, 1.5942456e-05,
       1.3059583e-03, 1.2300874e-04, 4.0310311e-05, 2.0225532e-03,
       2.5760273e-03, 4.8151556e-05, 1.1477998e-04, 6.2134140e-04,
       3.6271741e-03, 8.6841742e-03, 3.3115208e-05, 8.4242632e-04],
      dtype=float32)
1.9906554949593556

These scores seems already after softmax. It would be great if you can explain how you scale them. Thanks a lot!

about submission

thx for your work,could you tell me nore details about “score” in submission format?

ActivityNet 1-class vs. 200-class problem

Hello - thanks for the great contribution. Was wondering, for ActivityNet1.3, is the typical benchmark for the one-class problem? Saw that the config for activitynet config there was "num_classes: 1", and trying to understand if that is standard versus the 200-class problem. (can't seem to find confirmation in other literature) or I am misunderstanding the codebase.

Thanks!!

How to get the action labels?

Currently when I infer or evaluate a video with the pretrained ActivityNet model, the action label is always 0[time segments are available as expected]. The reason I guess is the pretrained model was trained with the setting num_classes=1

Also, the valid_one_epoch uses external score file to get the action labels. How was the score file generated and how to get the action label if one does not have such file? Is there any parameter in the config file that needs to be changed?

About compilation of NMS

When conducting compilation of NMS, an error occurs: "RuntimeError: Error compiling objects for extension",
I use:

Python 3.6.10
CUDA 11.3
Pytorch 1.10

Can you help me?
Thanks.

About window size

We found that you tested different window sizes for local self-attention, and I want to know when the window is Full, what the n_mha_win_size should be set to

About Visualization

Hello, this is a very good job! If I want to use DETAD for visualization (FN, sensitivity analysis, etc.) as in your paper, can you provide the code to generate the json file of the results of your model?

PointGenerato

Hello, I want to ask, what is the PointGenerator in the code?

About visualization

Great work! Sorry to bother you.
I want to ask how to get your visualization results on the regression of the localization.
I am very grateful if you can get it and give me an answer!

About the denominator of classification loss

Thanks for your great work! I would like to ask a question about the denominator of classification loss.

As shown in Eq.7 in the paper, the classification loss (using sigmoid focal loss) is divided by the length of the valid input sequence. However, I found in your implementation, that the classification loss is divided by the number of positive samples, which is the same as regression loss. Can you explain the reason? Thanks.

cls_loss /= self.loss_normalizer

multi-class nms

After training on my own dataset, I found the result after mulit-class nms is confusing! For example, segment1 is [1s:4s] and label is 1, segment2 is [2s:5s] and label is 2, so the label of segment [2s:4s] is confusing!
Two objects' IOU is very high, but they belong to different labels, so nms do not filter one of them. In object detection it's a normal situation, but in video processing I don't know how to deal with it! Maybe multi-nms is not appropriate?
Can you give me a little advice? Thanks a lot!

where is the undeterministic implementation?

If run the code directly, I will get the following error:

RuntimeError: index_add_cuda_ does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

Generating evaluation for ActivityNet dataset without score fusion

Hi @happyharrycn,

Thank you for the great work.

I am trying to run the inference code for ActivityNet dataset. Currently the pretrained model supports score fusion due to which labels are coming from the cuhk_val_simp_share.json file. I want to try generating numbers without score fusion, and I think the pretrained model currently don't support that configuration.

I tried making the changes in ./configs/anet_tsp.yaml by below changes:

@@ -51,9 +51,9 @@ test_cfg: {
   max_seg_num: 100,
   min_score: 0.001,
   # score fusion
-  multiclass_nms: False,
+  multiclass_nms: True,
   nms_sigma : 0.75,
-  ext_score_file: ./data/anet_1.3/annotations/cuhk_val_simp_share.json,
+  #ext_score_file: ./data/anet_1.3/annotations/cuhk_val_simp_share.json,
   duration_thresh: 0.1,
 }

With the above changes I got 0 mAP, probably because the num_classes is configured as 1 currently.
Also, I could reproduce the number for THUMOS dataset where the score fusion is disabled and num_classes is set to 20 which are the number of action classes for the dataset.

About THUMOS14 in this repo

I have seen many different types of THUMOS14 in temporal action localization such as .json and .csv. Now I am making a new data set for temporal action localization, could you tell me if there any other method using the same type of data as yours? Thinks!

About performence

Great work!
I would like to know what specific changes you have made to make thumos14 perform better than before

Does it only support the gt_segment that just has one action for one moment?

Thank you for your good project!
1662003258957
in lines 510-515 at ./libs/modeling/meta_archs.py, I can't understand the code but it seems to keep only one action for one moment in gt_reg, but in experiments, I found that EPIC-kitchens-100 dataset has many videos that have more than one action at the same time, as shown in its original paper:
1662003935208
So, if there are some errors in the code, how to modify it to allow multiple actions per timestep?

Why disabling score fusion get even better results?

My understanding is that enabling score fusion will use the external classification score, which should lead to better results.
However, when I disabling the score fusion, i.e.,:

# when using external scores, our model is generating "proposals"
# multiclass_nms: False,
# ext_score_file: ./data/thumos/annotations/thumos14_cls_scores.pkl,
# comment out L47-48 and uncomment L50 to disable score fusion
multiclass_nms: True,

The eval result is:

|tIoU = 0.30: mAP = 81.87 (%)
|tIoU = 0.40: mAP = 77.58 (%)
|tIoU = 0.50: mAP = 71.65 (%)
|tIoU = 0.60: mAP = 57.75 (%)
|tIoU = 0.70: mAP = 43.07 (%)
Avearge mAP: 66.38 (%)

While when enabling the score fusion, i.e.,:

# when using external scores, our model is generating "proposals"
multiclass_nms: False,
ext_score_file: ./data/thumos/annotations/thumos14_cls_scores.pkl,
# comment out L47-48 and uncomment L50 to disable score fusion
# multiclass_nms: True,

I got a worse result:

|tIoU = 0.30: mAP = 74.82 (%)                  
|tIoU = 0.40: mAP = 71.34 (%)                
|tIoU = 0.50: mAP = 65.70 (%)            
|tIoU = 0.60: mAP = 55.28 (%)                   
|tIoU = 0.70: mAP = 41.97 (%)                
Avearge mAP: 61.82 (%)   

BTW, I did NOT retrain the model after revising the yaml because I think this should only do with the testing.

convert time stamp into temporal feature

image
Here, I don't understand if your purpose is to convert time stamp (in second) into temporal feature grids, why it is ok to have small negative values and float value? the result of segment here is the feature's index (start,end) of temporal dim? what is the meaning of - 0.5 * self.num_frames?

Thank you for your reply

Feature dimension

For ActivityNet dataset, you use TSP model to get its feature. I want to konw how can you get the feature dimention [T,C]. As I know, the output feature dimention of TSP is [ feature size ].

Can this method be used for multi-label temporal localization?

Thanks for your great work!
The datasets that used here are all single-label datasets where there is just one action per time-step?
Can this model be used on multi-label datasets (i. e., MultiTHUMOS, Charades) by replacing loss functions or just replacing nms with multi-class nms.

video features

sorry to bother you,could you tell me some details about negative samples for the video features?

About training on my own dataset

Thanks for your works first.And I'm trying to train the model on my own dataset, and found that there's a file named thumos14_cls_scores.pkl under thmos dataset.How can I generate the pkl file on my own dataset?

The meaning of label '4'

Thanks for your nice work! In the released code I notice that one action label '4' is masked in the Thumos14 dataset. I wonder why the action category should be masked. What's the purpose of this step ?

With BiFPN

Hi! thanks for your great work,I added BiFPN to the network, but the performance dropped. It was tested based on my own dataset. Have you ever tested adding a pyramid structure similar to BiFPN?

ActivityNet I3D features

Thanks for sharing the code of the .awesome work.
I have questions about the I3D features on ActivityNet. Are the features from CMCS or extracted on your own? I found that their features from CMCS are incomplete. The features of around 400 videos on the validation set are missing, which might affect the performance evaluation.
Thanks!

Computing mPA

Hi,

I was wondering if you could explian that why you compute the mPA only for 200 first frames of the test video?
Do not we need to compute it for all frames of all test videos?

Thanks

How to use multigpu?

I change the config file like this ,"devices": [0,1,2,3] and add code
USE_CUDA = torch.cuda.is_available()
device = torch.device("cuda:0" if USE_CUDA else "cpu")
but I got the error:
Traceback (most recent call last):
File "/data1/qianxiong/Actionformer/train.py", line 186, in
main(args)
File "/data1/qianxiong/Actionformer/train.py", line 128, in main
train_one_epoch(
File "/data1/qianxiong/Actionformer/libs/utils/train_utils.py", line 277, in train_one_epoch
losses = model(video_list)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise
raise exception
IndexError: Caught IndexError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/data1/qianxiong/Actionformer/libs/modeling/meta_archs.py", line 334, in forward
batched_inputs, batched_masks = self.preprocessing(video_list)
File "/data1/qianxiong/anaconda3/envs/Actionformer/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/data1/qianxiong/Actionformer/libs/modeling/meta_archs.py", line 423, in preprocessing
batched_inputs = batched_inputs.to(self.device)
File "/data1/qianxiong/Actionformer/libs/modeling/meta_archs.py", line 330, in device
return list(set(p.device for p in self.parameters()))[0]
IndexError: list index out of range

Process finished with exit code 1
Please tell me how to use multigpu to train your code?

About environment

Could you tell me the environment in which the code runs, such as the version of pytorch and cuda. Because I found that after you modified some code, running your code on the original environment is an error.
image

nms_1d_cpu

Hi, It seems that the ./libs/utils/nms.py import module name 'nms_1d_cpu'
Where could I find this module? It seems that nms.py is from mmcv but I cannot find any nms_1d_cpu named file in mmcv github...

loss

你好你使用了giouloss但是在这个任务中,c和(AuB)不应该是相等的吗

external classification scores

Thanks for your great work,I've been trying to reproduce this code recently and I'd like to ask how to get this external classification scores. Does it come from I3D results ?
我的英语不是很好,我想问下您external classification scores是否来自于I3D对未剪辑视频的分类结果,谢谢您和您团队的工作!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.