Giter VIP home page Giter VIP logo

tadtr's Introduction

Hi there 👋

I am Xiaolong. I received the PhD degree from Huazhong University of Science and Technology (HUST) in 2022. At HUST, I was supervised by Professor Xiang Bai. My research interest lies in computer vision, with a special focus on video action recognition.

Email: 1) brucelio at outlook dot com (Preferred) 2) liuxl at hust dot edu dot cn (I have graduated, this email account will be deactivated.)

Homepage: https://xlliu7.github.io/

Google Scholar: https://scholar.google.com/citations?user=XDypsogAAAAJ

Xiaolong Liu's GitHub stats

tadtr's People

Contributors

xlliu7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tadtr's Issues

No training/inference code or weights

Hi! I'm really interested in using this work for action detection - is there any way I could get access to your training scripts and pretrained weights?

Reproducibility of ActivityNet

Hi, first thanks for your great work.
I am trying to reproduce your results in ActivityNet. I follow the operations in your paper. Using TSP features and add some codes in Dataset module. I can run through whole process in ActivityNet but i just cannot get results as good as you present in the paper. For me, the results drop all about 3-4%.
I am wondering whether you have planning to open source the train code for ActivityNet?

code releasing date

Thanks for your great work. When will the training & inference code release? Can you give an approximate date? Thanks!

Code bugs in calculating lossess?

I noticed that there are mismatched keys name in weight_dict, effectively making the losses calculation skipped loss_segments and loss_actionness in this line:

TadTR/engine.py

Lines 45 to 46 in 3af0abc

losses = sum(loss_dict[k] * weight_dict[k]
for k in loss_dict.keys() if k in weight_dict)

Looking at the weight_dict, loss_seg is used rather than loss_segments

TadTR/models/tadtr.py

Lines 498 to 501 in 3af0abc

weight_dict = {
'loss_ce': args.cls_loss_coef,
'loss_seg': args.seg_loss_coef,
'loss_iou': args.iou_loss_coef}

losses['loss_segments'] = loss_segment.sum() / num_segments

For actionness, it's assigned as loss_iou instead of loss_actionness, which replaced the loss_iou by segments loss.

losses['loss_iou'] = loss_actionness

Are these bugs? Could you confirm it? Thanks.

Different lengths of Thumos14 I3D Features

Hi, xiaolong. I'm very interested in your work. As you mentioned in another issue, you use the I3D features form P-GCN for the Thumos14 experiment. I find that some features for the same video have different sizes so that I can't concat them directly. And the diff is always 1. Have you ever met this situation. If ever, how you deal with it? Thx~

Actionness Regression not working

The claimed improvement from actionness regression does not seem to materialize based on my implementation using this code repository. The results with and without actionness regression are very similar.

Upon inspecting the implementation, I noticed a potential issue:

TadTR/models/tadtr.py

Lines 314 to 325 in 983ae14

src_segments = outputs['pred_segments'].view((-1, 2))
target_segments = torch.cat([t['segments'] for t in targets], dim=0)
losses = {}
iou_mat = segment_ops.segment_iou(
segment_ops.segment_cw_to_t1t2(src_segments),
segment_ops.segment_cw_to_t1t2(target_segments))
gt_iou = iou_mat.max(dim=1)[0]
pred_actionness = outputs['pred_actionness']
loss_actionness = F.l1_loss(pred_actionness.view(-1), gt_iou.view(-1).detach())

On line 315, all target segments in the batch are concatenated, and on line 323 the maximum IoU between a predicted segment and all target segments is taken as the actionness ground truth. However, the IoUs are computed across videos, likely producing a maximum IoU between a predicted segment in video A and a ground truth segment in video B.

Even after correcting this issue, there was still no performance improvement from the actionness regression in my runs (the performance drops a lot actually). Upon my inspection, that because the actionness regression suffers from a serious label imbalance problem as most target IoUs are zero.

求源码

请问大佬源码什么时候公布呢?

How to generate th14_i3d2s_ft_info.json?

Hello, thank you for your good work!
I want to know how to generate th14_i3d2s_ft_info.json for thumos14 video features. And how to compute ''feature_length", "feature_second" and "feature_fps" for each video?

about anet feature

Hello, can you provide the TSN features after linear interpolation of activitynet1.3?

About th14_i3d2s_ft_info.json

Hello,Thank u for your work!
I want to know how feature_length can be read directly from the video feature files,because I use my own dataset to try this code.

Inference on single video

作者您好!最近准备研究这个方向,请问有没有在单个视频上进行推理的代码?

Modification of focal loss for it to works with mix-up augmentation?

I'm trying to train on relatively small datasets, mix-up is one way to reduce it from overfitting, but it seems like focal loss is not designed to works with label with probabilities. It seems that this line

target_classes_onehot.scatter_(2, target_classes.unsqueeze(-1), 1)
specifically designed for binary classification.

Do you have any idea how to modify focal loss for label with probabilities?

The networks weight

Dear researchers,

Thank you for your work.

The links for your networks weights don t work, which prevent us to reproduce your work.

best regards,

One question about the loss backward of temporal_deform_attn

Thanks open source for this good work.

But, I met a problem.

models/ops/temporal_deform_attn/functions/temporal_deform_attn_func.py", line 40, in backward value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, grad_output, ctx.seq2col_step) RuntimeError: Not implemented

I wonder if it is convenient for you to answer.

E2E-TAD code

Do you have a planed time to release the code of E2E-TAD?

Undeterministic results

Hello,

thank you for sharing the code. I checked the code and all of the seeds are set.
I further added torch.backends.cudnn.deterministic = True and torch.backends.cudnn.benchmark = False to make the code produce the same results in different runs. However, the results differ between different runs.
Do you have any idea why?

Thanks in advance.

Missing datasets

Dear @xlliu7, thank you for your valuable contribution to the community.
I know cleaning code and supporting all datasets require much work.
However, I would greatly appreciate it if you were to release the rest of the code for reproducing results in HACS and ActivityNet.

Could you kindly let me know the time horizon for this?

Best,
Mattia

Request code for ActivityNet

Hello, thanks for the nice work.
I have sent an email to request you the codes for ActivityNet.
Could you share your codes?

Best wishes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.