cmhungsteve / ta3n Goto Github PK

[ICCV 2019 (Oral)] Temporal Attentive Alignment for Large-Scale Video Domain Adaptation (PyTorch)

Home Page: https://arxiv.org/abs/1907.12743

License: MIT License

Python 93.19% Shell 6.81%

action-recognition cvpr2019 domain-adaptation domain-discrepancy iccv2019 pytorch temporal-dynamics video video-classification video-da-datasets

ta3n's Introduction

Hi there 👋

My name is Min-Hung (Steve) Chen (陳敏弘 in Chinese). I am a Senior Research Scientist at NVIDIA Research Taiwan, working on Vision+X Multi-Modal AI. I received my Ph.D. degree from Georgia Tech, advised by Prof. Ghassan AlRegib and in collaboration with Prof. Zsolt Kira. Before joining NVIDIA, I was working on Biometric Research for Cognitive Services as a Research Engineer II at Microsoft Azure AI, and was working on Edge-AI Research as a Senior AI Engineer at MediaTek, respectively.

My research interest is mainly Multi-Modal AI, including Vision-Language, Video Understanding, Cross-Modal Learning, Efficient Tuning, and Transformer. I am also interested in Learning without Fully Supervision, including domain adaptation, transfer learning, continual learning, X-supervised learning, etc.

[Update] I released a comprehensive paper list for Vision Transformer & Attention to facilitate related research. Feel free to check it (I would be appreciative if you can ★STAR it)!

[Personal Website][LinkedIn][Twitter][Google Scholar][Resume]

ta3n's People

Contributors

Stargazers

Watchers

ta3n's Issues

Input data links are expired

HI,
I'm terribly sorry to disturb you again, but Input data links are expired.
Please update the URL.

About the UCF-HMDB_full Dataset

Where can I download the UCF-HMDB_full dataset used in the paper?

About the shell script

Hi @cmhungsteve ,

Thanks for sharing the codes. I wonder if the script_train_val file is correct or not. I try to run it but face some errors. (e.g. error: argument --pred_normalize: expected one argument) It seems that this version misses some arguments.

Best.

Question about the training details?

Nice work!
Thanks for your sharing code!

I am interested in this task,
I want to know some training detail.

Which GPUs are you using, and how many GPUs are you using?
Using your setting, I am curious about your UCF -> HMDE and HMDE -> UCF experiments detail

How long do you train on these experiments(hours/days)?

Do we need parallel data for this to train properly?

Do we need all of the classes for the source and target domains to match?
Also, when training, are the classes aligned for every mini batch?

Question about attention

Hello sir, I would like to ask the difference between use_attn and use_attn_frame. Is the use_attn for aggregated features/frames and the use_attn_frame for a single frame? Thanks in advance

question about num_segments

hi,

for any given iteration, does the network use num_segments number of frames to classify a video?

argument_modality: invalid choice

Hi, I used the pre-extracted features provided and manually edited the paths in the data list files. I am sure my paths are correct but I am getting the following error when running the script_train_val.sh:

main.py: error: argument modality: invalid choice: 'data/classInd_ucf_olympic.txt' (choose from 'RGB', 'Flow', 'RGBDiff', 'RGBDiff2', 'RGBDiffplus') .

Do you have any insight as to where I did it wrong? Thank you in advance.

Question about Frame Relation when use_target is set to "none"

Hello Sir,

Sorry I forgot to ask this in my previous issue:
When use_target is set to "none", does frame-relation still work?

Thank you

Is it possible to make the feature encoders learnable?

Hi,

Thanks for this awesome codebase!

I would like to train the feature encoders also while reading the data from *.jpgs. Is it possible with this codebase?

T-SNE visualization

I want to visualized data after a training network. could you provide an idea of how can I visualize source and target data after training for comparison of data alignment way? I read your paper you are using T-SNE for visualization but I have no idea how can I use T-SNE after training finished. I prepared T-SNE before training source and target data but after training, I have a problem with visualization.

output feature

Hello, thanks for your great job, I want to ask if the code could output the feature processed?and how to visulize the feature processed and unprocessed like the picture in your paper?

Some questions about the paper

Why is formula 7 a minus sign instead of a plus sign? Why not add more weight to the ones that are hard to distinguish.

I am looking forward to your answer. Thank you very much!

question about Target Entropy

Hello Sir,

I have a question about add_loss_DA configuration. I have read the TA3N paper, but the target entropy is not mentioned, may I ask where can I find more information about it? Thank you in advance.

Can you provide the Olympic and GamePlay dataset?

It is better to be provided a download script for kinetics.

Cannot run code using default save_attention setting

Thank you for the wonderful code.

However, when trying to reproduce the results, I am getting errors using the default setting in opts.py. I found that when running it would produce the following error:

Traceback (most recent call last):
  File "main.py", line 836, in <module>
    main()
  File "main.py", line 240, in main
    loss_c, attn_epoch_source, attn_epoch_target = train(num_class, source_loader, target_loader, model, criterion, criterion_domain, optimizer, epoch, train_file, train_short_file, alpha, beta, gamma, mu)
  File "main.py", line 668, in train
    return losses_c.avg, attn_epoch_source.mean(0), attn_epoch_target.mean(0)
RuntimeError: invalid argument 2: invalid dimension 0 at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/TH/generic/THTensorMath.c:3897

After checking, I suspect that the save_attention argument was the cause, and I change the default from -1 to 0. Although now it could run, I am getting only 73.89% accuracy. Could you help explain why the save_attention was set to -1? And what is the meaning of it setting to different values? Thank you!

Code for obtaining the baseline results.

Hi @cmhungsteve

Thanks for the great work. I wanted to execute and obtain the baseline results mentioned in the paper (DANN + [TempPooling/TempRelation]), can you please let me know if you can make the code public?

Thanks

How to use the C3D

Hi, I want to konw how to get the /models/c3d.pickle? Could this code extract features using C3d networks?

Reproducing the results in the paper

Hi @cmhungsteve,

Thank you for releasing the wonderful code. I am trying to reproduce the UCF-HMDB_{full} results from the ICCV19 paper. i.e. source only, target only, TA^2N, and TA^3N of Table 3 and Table 4 of the paper. But I cannot reproduce the TA^2N and TA^3N results. Can you share the hyperparameter settings of the TA^2N and TA^3N? If you can share the shell script file of those, it would be really helpful.

Thanks!

cmhungsteve / ta3n Goto Github PK

ta3n's Introduction

Hi there 👋

ta3n's People

Contributors

Stargazers

Watchers

Forkers

ta3n's Issues

Recommend Projects

Recommend Topics

Recommend Org