pilhyeon / wtal-uncertainty-modeling Goto Github PK
View Code? Open in Web Editor NEWOfficial Pytorch Implementation of 'Weakly-supervised Temporal Action Localization by Uncertainty Modeling' (AAAI-21)
License: MIT License
Official Pytorch Implementation of 'Weakly-supervised Temporal Action Localization by Uncertainty Modeling' (AAAI-21)
License: MIT License
Thanks for your great work! But the pre-trained model you provide cannot achieve the results in the paper. I saw the same reproducted results in a closed issue, could you please check whether you didn't upload the latest pre-trained model or maybe some other mistakes? Thanks~
Step: 0
Test_acc: 0.8905
average_mAP: 0.4038
[email protected]: 0.6551
[email protected]: 0.5836
[email protected]: 0.5052
[email protected]: 0.4158
[email protected]: 0.3245
[email protected]: 0.2266
[email protected]: 0.1161
Hi, Excellent work !
You mentioned that you will make the ActivityNet Features and model public. Can you do so now ? I t is very difficult to reproduce the reported results as someone already posted in this repository without the desired settings ( i tried your suggested setting for AN but not as reported results )
My email is : [email protected] ! You can you send me the link here if you wish
thanks for your nice work, and can you provide the details about training GPU and training time?
Hello,Pilhyeon。
When do you relase the feature of act1.2 and act1.3? I have been waiting for your feature about four months.
Or you can send me the feature by email:[email protected].
Thanks.
Hello guys.Thank you for your excellent work. I read the paper you updated and I found that the result is better than before.Can you share the codes that match the best result?
I am waiting for your reply.
Good work! And when I'm reading the paper, I get a question: how is the feature magnitudes defined in Figure 2 of the paper? (X-axis) Is that the normalized video feature(shape: [B,T,F]) or something? I am confused about the x-axis of Figure 2 and the histogram plotting. Thanks!
Hi @Pilhyeon:
thanks for your contribution, I cannot download the features provided by you, when I open the google drive link, there is all the files:
I am not sure this is the features that used in your repo, since I do not know how to use these files in the google drive link. Could you please check this or explain it?
I have a doubt that how to determine which frames are background frames and which frames are action frames when drawn the histogram?
Thanks for your code. I use the code you provide and download the thumos14's feature, but I get the results following, like mAP is 1% less than the results in your paper:
Experiment | [email protected] | [email protected] | [email protected] | [email protected] | [email protected] | average map |
---|---|---|---|---|---|---|
Paper-I3D | 46.9 | 39.2 | 30.7 | 20.8 | 12.5 | 30.0 |
Reproduce-I3D | 46.0 | 38.4 | 29.7 | 20.5 | 12.1 | 29.3 |
Could there be any problems?
Hello,
In fact, the performance is improved by hyper-parameter tuning without any model change.
Specifically, alpha: 0.0002 -> 0.0005, r_act: 8 -> 9, r_bkg: 6 -> 4
You can also find it in the options.py.
In addition, I updated the best model file, with which you can see the improved result.
Thanks!
Thanks for your reply.I test the best model that you have updated. I have changed the parameter as you said.But I can't reproduce the result.The result that I test is as follows:
Step: 0
Test_acc: 0.8905
average_mAP: 0.4038
[email protected]: 0.6551
[email protected]: 0.5836
[email protected]: 0.5052
[email protected]: 0.4158
[email protected]: 0.3245
[email protected]: 0.2266
[email protected]: 0.1161
Do you know the reason?
Originally posted by @xumh-9 in #9 (comment)
I have ran the code many times(about 100 times), same environment with requirements.txt, even changed 3 different machines, but the best result I could get is about to 40, much worse than 41.8.
I tried to change random seed but keep all the hyper-parameters, but it did not work, I am sure that I used the same environment and latest code.
I hope the author could reply this question since there are also other people cannot reproduce the result #10 #5 #11
Their issues are closed by the author, and no responds to their questions, I think the author should check code carefully again, especially the hyperparameters.
If anyone else could achieve mAP 41.8, please tell me, I do not hope that author close this issue, since the problem is not solved, I also hope the author @Pilhyeon could run this code again in a different machine and then public the result and code, or provide the training log if you could prove that.
Thank you for your excellent work.
Could you provide the PyTorch dataset file (like thumos_features.py)?
I am doing the experiment on a reorganized ActivityNet 1.3 dataset. Therefore, I hope to get more details for a fair comparison.
Think you!
If possible, you can also send it to my email: [email protected]
Hi @Pilhyeon, thanks for your great work! Now I'm following your BMUE and have some trouble in reproducing the results on ActivityNet dataset.
I have tried to do some experiments on ActivityNet v1.2. I downloaded the I3D features provided by this link and adapt them to BMUE format. The following are some of my results. All experiments run for 6k epochs, the results are showed in the form of "(average_mAP, Test_acc)":
According to Sec 4.1 in your arXiv paper, T=50, so I set num_segments as 50 and do the following experiments:
Besides, I also tried to change "act_thresh_magnitudes", "NMS thresh", "alpha", "_lambda & gamma in get_proposal_oic()", etc. The results don't seem to get better: the test accuracy is around 0.4 and the average_mAP is very low. It's hard for me to find the best settings. Could you please share your params settings on ActivityNet 1.2 & 1.3 datasets? Or give me some advice on which params to change?
Looking forward to your reply and I'd be glad to cite your excellent work. Thanks!
It is a multi-label classification problem. @Pilhyeon
Can you explain the workflow of proposal? @Pilhyeon
WTAL-Uncertainty-Modeling/utils.py
Line 19 in ea630d4
_lambda=0.25
, gamma=0.2
, feature_fps = 24
, scale = 24
Hi @Pilhyeon
I noticed that your features(932 M) are less than the features provided by other paper and repo, e.g.:
Could you please explain it? Whether the size of the feature is related to the removal of some videos (270, 1292, 1496)?
Hi @Pilhyeon , I have read all of your code but there is a question I cannot understand, why there is a dropout ?
It seems like you did not select all magnitude values, while remove most of them(0.7 in your code), and generate pseudo action features and pseudo background features via the remain magnitude(0.3).
Is this a regularization method?
Hi @Pilhyeon
Thanks for your contribution, I tried again and could reproduce your result! It is really an amazing work!
I read your paper carefully, but there are still some details I cannot understand, could you please answer me if you have time?
Density
mean?main pipeline
as the whole model while separated features use both main pipeline
and Uncertainty modeling
as final model?softmax score
used in table 3 of ablation study means only use main pipeline
in figure3 to obtain result?softmax score
is obtained by the original features, which means they are not separated, have unconstrained magnitudes, so is the description in the figure below wrong? It should be For the **first**, as the original......
Thanks again for your contribution and patience, hope you can reply to me!
Hi, I read your paper and congrats for your work.
Anyway, there is the inference part that is unclear to me: at inference time is it possible to use this framework for the online action detection task? (i.e. let's suppose I have an input stream video, is the model able to predict frame-level labels as the frames arrive, with real-time speed ?)
Thank you!
Hi @Pilhyeon
Thank you for your excellent work! I clone you code and run it,but I found that the GPU utilization of the program is very low(only 6%).Is this a normal phenomenon?My GPU is NVIDIA TITAN XP12GB.
Hi, how to get WUM_result_numpy?
@Pilhyeon
These two lines should not add 1 for compute areas:
WTAL-Uncertainty-Modeling/utils.py
Line 101 in ea630d4
WTAL-Uncertainty-Modeling/utils.py
Line 111 in ea630d4
mmaction2
Hi, I am reproducing your work, and following the hyperparameters as the paper does.
I use the feature extractor in the repo you recommend, and select 16 frames as a segment, choose the output of Logits Layer with the averaging pooling layer, so i get 1024-d vector as feature. But i can not reproduce the results in your paper.
Also, i find a difference between the code and the paper that when calculating the loss_act in the BMUE loss, you use abs function instead of the max function compared with 0.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.