xlliu7 / tadtr Goto Github PK

[TIP 2022] End-to-end Temporal Action Detection with Transformer

License: Apache License 2.0

Python 93.45% C++ 1.09% Cuda 5.12% Shell 0.34%

transformer temporal-action-localization temporal-action-detection action-recognition pytorch

tadtr's Introduction

Hi there 👋

I am Xiaolong. I received the PhD degree from Huazhong University of Science and Technology (HUST) in 2022. At HUST, I was supervised by Professor Xiang Bai. My research interest lies in computer vision, with a special focus on video action recognition.

Email: 1) brucelio at outlook dot com (Preferred) 2) liuxl at hust dot edu dot cn (I have graduated, this email account will be deactivated.)

Homepage: https://xlliu7.github.io/

Google Scholar: https://scholar.google.com/citations?user=XDypsogAAAAJ

tadtr's People

Contributors

Stargazers

Watchers

Forkers

carpedkm mfkiwl hubutui daidaiershidi aoteman233 aschortgen zhuzhu804 1190202328 nguyenvanthanhhust southglory damien911224 srislab mymuli

tadtr's Issues

No training/inference code or weights

Hi! I'm really interested in using this work for action detection - is there any way I could get access to your training scripts and pretrained weights?

Reproducibility of ActivityNet

Hi, first thanks for your great work.
I am trying to reproduce your results in ActivityNet. I follow the operations in your paper. Using TSP features and add some codes in Dataset module. I can run through whole process in ActivityNet but i just cannot get results as good as you present in the paper. For me, the results drop all about 3-4%.
I am wondering whether you have planning to open source the train code for ActivityNet?

How to evaluate the FLOPS?

Could you please share the counting code? Thanks!

code releasing date

Thanks for your great work. When will the training & inference code release? Can you give an approximate date? Thanks!

Code bugs in calculating lossess?

I noticed that there are mismatched keys name in weight_dict, effectively making the losses calculation skipped loss_segments and loss_actionness in this line:

TadTR/engine.py

Lines 45 to 46 in 3af0abc

 losses = sum(loss_dict[k] * weight_dict[k] 

 for k in loss_dict.keys() if k in weight_dict)

Looking at the weight_dict, loss_seg is used rather than loss_segments

TadTR/models/tadtr.py

Lines 498 to 501 in 3af0abc

 weight_dict = { 

 'loss_ce': args.cls_loss_coef, 

 'loss_seg': args.seg_loss_coef, 

 'loss_iou': args.iou_loss_coef}

TadTR/models/tadtr.py

Line 299 in 3af0abc

losses['loss_segments'] = loss_segment.sum() / num_segments

For actionness, it's assigned as loss_iou instead of loss_actionness, which replaced the loss_iou by segments loss.

TadTR/models/tadtr.py

Line 327 in 3af0abc

losses['loss_iou'] = loss_actionness

Are these bugs? Could you confirm it? Thanks.

Different lengths of Thumos14 I3D Features

Hi, xiaolong. I'm very interested in your work. As you mentioned in another issue, you use the I3D features form P-GCN for the Thumos14 experiment. I find that some features for the same video have different sizes so that I can't concat them directly. And the diff is always 1. Have you ever met this situation. If ever, how you deal with it? Thx~

Redundant computation of reference point

https://github.com/xlliu7/TadTR/blob/master/models/transformer.py#L287
这里是在decoder的forward里面计算reference point，
但是在这里
https://github.com/xlliu7/TadTR/blob/master/models/tadtr.py#L185
又重新计算了一次，这两部分应该是重复了

Actionness Regression not working

The claimed improvement from actionness regression does not seem to materialize based on my implementation using this code repository. The results with and without actionness regression are very similar.

Upon inspecting the implementation, I noticed a potential issue:

TadTR/models/tadtr.py

Lines 314 to 325 in 983ae14

 src_segments = outputs['pred_segments'].view((-1, 2)) 

 target_segments = torch.cat([t['segments'] for t in targets], dim=0) 

 losses = {} 

 iou_mat = segment_ops.segment_iou( 

 segment_ops.segment_cw_to_t1t2(src_segments), 

 segment_ops.segment_cw_to_t1t2(target_segments)) 

 gt_iou = iou_mat.max(dim=1)[0] 

 pred_actionness = outputs['pred_actionness'] 

 loss_actionness = F.l1_loss(pred_actionness.view(-1), gt_iou.view(-1).detach())

On line 315, all target segments in the batch are concatenated, and on line 323 the maximum IoU between a predicted segment and all target segments is taken as the actionness ground truth. However, the IoUs are computed across videos, likely producing a maximum IoU between a predicted segment in video A and a ground truth segment in video B.

Even after correcting this issue, there was still no performance improvement from the actionness regression in my runs (the performance drops a lot actually). Upon my inspection, that because the actionness regression suffers from a serious label imbalance problem as most target IoUs are zero.

求源码

请问大佬源码什么时候公布呢？

How to generate th14_i3d2s_ft_info.json?

Hello, thank you for your good work!
I want to know how to generate th14_i3d2s_ft_info.json for thumos14 video features. And how to compute ''feature_length", "feature_second" and "feature_fps" for each video?

about anet feature

Hello, can you provide the TSN features after linear interpolation of activitynet1.3？

train on my dataset with miatakes

when i train on my dataset ,something wrong happened.the map_raw was nan
can you give me some suggestions?
thanks!

Reproduction on ActivityNet1.3 Dataset

Hi, thank you for your brilliant work!

Can you provided your test results on ActivityNet1.3 dataset?

Thank you very much!

About th14_i3d2s_ft_info.json

Hello,Thank u for your work!
I want to know how feature_length can be read directly from the video feature files，because I use my own dataset to try this code.

how to combine with classifier?

Hi @xlliu7,

Interesting paper! I want to know how to combine your model with the classifier? e.g. PGCN in Table 1.
Would you mind sharing the code? Thanks.

I3D 2stream Feature

Hi Could you tell me how to get the I3D 2stream Feature ? Thanks

Inference on single video

作者您好！最近准备研究这个方向，请问有没有在单个视频上进行推理的代码？

Regarding the features

Can you provide the link to download the features.

Modification of focal loss for it to works with mix-up augmentation?

I'm trying to train on relatively small datasets, mix-up is one way to reduce it from overfitting, but it seems like focal loss is not designed to works with label with probabilities. It seems that this line

TadTR/models/tadtr.py

Line 274 in 3af0abc

target_classes_onehot.scatter_(2, target_classes.unsqueeze(-1), 1)

specifically designed for binary classification.

Do you have any idea how to modify focal loss for label with probabilities?

The networks weight

Dear researchers,

Thank you for your work.

The links for your networks weights don t work, which prevent us to reproduce your work.

best regards,

One question about the loss backward of temporal_deform_attn

Thanks open source for this good work.

But, I met a problem.

models/ops/temporal_deform_attn/functions/temporal_deform_attn_func.py", line 40, in backward value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, grad_output, ctx.seq2col_step) RuntimeError: Not implemented

I wonder if it is convenient for you to answer.

E2E-TAD code

Do you have a planed time to release the code of E2E-TAD?

Undeterministic results

Hello,

thank you for sharing the code. I checked the code and all of the seeds are set.
I further added torch.backends.cudnn.deterministic = True and torch.backends.cudnn.benchmark = False to make the code produce the same results in different runs. However, the results differ between different runs.
Do you have any idea why?

Thanks in advance.

Missing datasets

Dear @xlliu7, thank you for your valuable contribution to the community.
I know cleaning code and supporting all datasets require much work.
However, I would greatly appreciate it if you were to release the rest of the code for reproducing results in HACS and ActivityNet.

Could you kindly let me know the time horizon for this?

Best,
Mattia

	losses = sum(loss_dict[k] * weight_dict[k]
	for k in loss_dict.keys() if k in weight_dict)

	weight_dict = {
	'loss_ce': args.cls_loss_coef,
	'loss_seg': args.seg_loss_coef,
	'loss_iou': args.iou_loss_coef}

	src_segments = outputs['pred_segments'].view((-1, 2))
	target_segments = torch.cat([t['segments'] for t in targets], dim=0)

	losses = {}

	iou_mat = segment_ops.segment_iou(
	segment_ops.segment_cw_to_t1t2(src_segments),
	segment_ops.segment_cw_to_t1t2(target_segments))

	gt_iou = iou_mat.max(dim=1)[0]
	pred_actionness = outputs['pred_actionness']
	loss_actionness = F.l1_loss(pred_actionness.view(-1), gt_iou.view(-1).detach())

xlliu7 / tadtr Goto Github PK

tadtr's Introduction

Hi there 👋

tadtr's People

Contributors

Stargazers

Watchers

Forkers

tadtr's Issues

Recommend Projects

Recommend Topics

Recommend Org