pilhyeon / wtal-uncertainty-modeling Goto Github PK

Official Pytorch Implementation of 'Weakly-supervised Temporal Action Localization by Uncertainty Modeling' (AAAI-21)

License: MIT License

Python 99.06% Shell 0.94%

temporal-action-localization weakly-supervised-learning background-modeling deep-learning pytorch uncertainty

wtal-uncertainty-modeling's Issues

Results of your provided pre-trained model

Thanks for your great work! But the pre-trained model you provide cannot achieve the results in the paper. I saw the same reproducted results in a closed issue, could you please check whether you didn't upload the latest pre-trained model or maybe some other mistakes? Thanks~
Step: 0
Test_acc: 0.8905
average_mAP: 0.4038
[email protected]: 0.6551
[email protected]: 0.5836
[email protected]: 0.5052
[email protected]: 0.4158
[email protected]: 0.3245
[email protected]: 0.2266
[email protected]: 0.1161

ActivityNet 1.3 Features and model

Hi, Excellent work !

You mentioned that you will make the ActivityNet Features and model public. Can you do so now ? I t is very difficult to reproduce the reported results as someone already posted in this repository without the desired settings ( i tried your suggested setting for AN but not as reported results )

My email is : [email protected] ! You can you send me the link here if you wish

How long is the training time

thanks for your nice work, and can you provide the details about training GPU and training time?

Act1.2 and act1.3 feature？

Hello，Pilhyeon。
When do you relase the feature of act1.2 and act1.3? I have been waiting for your feature about four months.
Or you can send me the feature by email:[email protected].
Thanks.

the newest result

Hello guys.Thank you for your excellent work. I read the paper you updated and I found that the result is better than before.Can you share the codes that match the best result?

I am waiting for your reply.

hi, about test code -upsample and down

what is the meaning of it ?

question about feat_magnitudes

Good work! And when I'm reading the paper, I get a question: how is the feature magnitudes defined in Figure 2 of the paper? (X-axis) Is that the normalized video feature(shape: [B,T,F]) or something? I am confused about the x-axis of Figure 2 and the histogram plotting. Thanks!

Cannot get provided feature.

Hi @Pilhyeon:

thanks for your contribution, I cannot download the features provided by you, when I open the google drive link, there is all the files:

I am not sure this is the features that used in your repo, since I do not know how to use these files in the google drive link. Could you please check this or explain it?

Question about Figure 2 in paper.

I have a doubt that how to determine which frames are background frames and which frames are action frames when drawn the histogram?

Can't reproduce the experimental results

Thanks for your code. I use the code you provide and download the thumos14's feature, but I get the results following, like mAP is 1% less than the results in your paper:

Experiment	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	average map
Paper-I3D	46.9	39.2	30.7	20.8	12.5	30.0
Reproduce-I3D	46.0	38.4	29.7	20.5	12.1	29.3

Could there be any problems?

> Can't reproduce the result?

Hello,
In fact, the performance is improved by hyper-parameter tuning without any model change.
Specifically, alpha: 0.0002 -> 0.0005, r_act: 8 -> 9, r_bkg: 6 -> 4
You can also find it in the options.py.
In addition, I updated the best model file, with which you can see the improved result.
Thanks!

Thanks for your reply.I test the best model that you have updated. I have changed the parameter as you said.But I can't reproduce the result.The result that I test is as follows:
Step: 0
Test_acc: 0.8905
average_mAP: 0.4038
[email protected]: 0.6551
[email protected]: 0.5836
[email protected]: 0.5052
[email protected]: 0.4158
[email protected]: 0.3245
[email protected]: 0.2266
[email protected]: 0.1161
Do you know the reason?

Originally posted by @xumh-9 in #9 (comment)

Can anyone reproduce the results in the paper?

I have ran the code many times(about 100 times), same environment with requirements.txt, even changed 3 different machines, but the best result I could get is about to 40, much worse than 41.8.

I tried to change random seed but keep all the hyper-parameters, but it did not work, I am sure that I used the same environment and latest code.

I hope the author could reply this question since there are also other people cannot reproduce the result #10 #5 #11

Their issues are closed by the author, and no responds to their questions, I think the author should check code carefully again, especially the hyperparameters.

If anyone else could achieve mAP 41.8, please tell me, I do not hope that author close this issue, since the problem is not solved, I also hope the author @Pilhyeon could run this code again in a different machine and then public the result and code, or provide the training log if you could prove that.

Dataloader of ActivityNet 1.3

Thank you for your excellent work.
Could you provide the PyTorch dataset file (like thumos_features.py)?
I am doing the experiment on a reorganized ActivityNet 1.3 dataset. Therefore, I hope to get more details for a fair comparison.
Think you!

If possible, you can also send it to my email: [email protected]

Reproducing the results on ActivityNet

Hi @Pilhyeon, thanks for your great work! Now I'm following your BMUE and have some trouble in reproducing the results on ActivityNet dataset.

I have tried to do some experiments on ActivityNet v1.2. I downloaded the I3D features provided by this link and adapt them to BMUE format. The following are some of my results. All experiments run for 6k epochs, the results are showed in the form of "(average_mAP, Test_acc)":

[no params changed] (0.0390, 0.43)

According to Sec 4.1 in your arXiv paper, T=50, so I set num_segments as 50 and do the following experiments:

[num_segments: 50] (0.0275, 0.31)
- [class_th: 0.1] (0.0332, 0.36)
- [class_th: 0.1, act_thresh_cas: np.arange(0.0, 0.15, 0.015)] (0.0400, 0.36)

Besides, I also tried to change "act_thresh_magnitudes", "NMS thresh", "alpha", "_lambda & gamma in get_proposal_oic()", etc. The results don't seem to get better: the test accuracy is around 0.4 and the average_mAP is very low. It's hard for me to find the best settings. Could you please share your params settings on ActivityNet 1.2 & 1.3 datasets? Or give me some advice on which params to change?

Looking forward to your reply and I'd be glad to cite your excellent work. Thanks!

Why choose softmax as the activation function instead of sigmoid?

It is a multi-label classification problem. @Pilhyeon

Confusion of proposal

Can you explain the workflow of proposal? @Pilhyeon

WTAL-Uncertainty-Modeling/utils.py

Line 19 in ea630d4

 def get_proposal_oic(tList, wtcam, final_score, c_pred, scale, v_len, sampling_frames, num_segments, _lambda=0.25, gamma=0.2): 

How to decide these four values:
_lambda=0.25, gamma=0.2, feature_fps = 24, scale = 24

About baseline performance

Why mAP of baseline in BasNet (Table2: IoU0.5: 12.0) and this paper (Figure 3: IoU0.5: 22.6 ) is so different?
What is the main differrence in implementation?

questions about thumos feature

Hi @Pilhyeon

I noticed that your features(932 M) are less than the features provided by other paper and repo, e.g.:

CMCS-CVPR'2019 -> 80 GB for ten-crop
DGAM-CVPR'2020 -> 1.82 GB
W-TALC -> 1.04 GB

Could you please explain it? Whether the size of the feature is related to the removal of some videos (270, 1292, 1496)?

Why there is a dropout when generate pseudo action features?

Hi @Pilhyeon , I have read all of your code but there is a question I cannot understand, why there is a dropout ?

It seems like you did not select all magnitude values, while remove most of them(0.7 in your code), and generate pseudo action features and pseudo background features via the remain magnitude(0.3).

Is this a regularization method?

Some questions in your paper

Hi @Pilhyeon

Thanks for your contribution, I tried again and could reproduce your result! It is really an amazing work!

I read your paper carefully, but there are still some details I cannot understand, could you please answer me if you have time?

Could you please explain the figure 2 in your paper? What does the Y-axis Density mean?
Can I understand the original features are obtained from embedded feature only use main pipeline as the whole model while separated features use both main pipeline and Uncertainty modeling as final model?
Does the softmax score used in table 3 of ablation study means only use main pipeline in figure3 to obtain result?
If i understand correctly, the softmax score is obtained by the original features, which means they are not separated, have unconstrained magnitudes, so is the description in the figure below wrong? It should be For the **first**, as the original......
Could you please provide your extracted features and pretrained models for ActivityNet 1.2 and ActivityNet 1.3?

Thanks again for your contribution and patience, hope you can reply to me!

Could you please share the parameter settings for experiments on ActivityNet 1.2 and 1.3?

Real-time inference?

Hi, I read your paper and congrats for your work.
Anyway, there is the inference part that is unclear to me: at inference time is it possible to use this framework for the online action detection task? (i.e. let's suppose I have an input stream video, is the model able to predict frame-level labels as the frames arrive, with real-time speed ?)

Thank you!

GPU utilization is low

Hi @Pilhyeon
Thank you for your excellent work! I clone you code and run it,but I found that the GPU utilization of the program is very low(only 6%).Is this a normal phenomenon？My GPU is NVIDIA TITAN XP12GB.

How to get WUM_result_numpy

Hi, how to get WUM_result_numpy?
@Pilhyeon

An Error implement about `nms`

These two lines should not add 1 for compute areas:

WTAL-Uncertainty-Modeling/utils.py

Line 101 in ea630d4

areas = x2 - x1 + 1

WTAL-Uncertainty-Modeling/utils.py

Line 111 in ea630d4

inter = np.maximum(0.0, xx2 - xx1 + 1)

Reference the implement of temporal nms in mmaction2
https://mmaction2.readthedocs.io/en/latest/api.html#id26

Some questions about features and codes

Hi, I am reproducing your work, and following the hyperparameters as the paper does.
I use the feature extractor in the repo you recommend, and select 16 frames as a segment, choose the output of Logits Layer with the averaging pooling layer, so i get 1024-d vector as feature. But i can not reproduce the results in your paper.

Also, i find a difference between the code and the paper that when calculating the loss_act in the BMUE loss, you use abs function instead of the max function compared with 0.

pilhyeon / wtal-uncertainty-modeling Goto Github PK

wtal-uncertainty-modeling's Issues

Recommend Projects

Recommend Topics

Recommend Org