cshizhe / hgr_v2t Goto Github PK

View Code? Open in Web Editor NEW

206.0 206.0 21.0 495 KB

Code accompanying the paper "Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning".

License: MIT License

Python 85.51% Jupyter Notebook 14.49%

hgr_v2t's Introduction

Hi there 👋

hgr_v2t's People

Contributors

Stargazers

Watchers

hgr_v2t's Issues

About data file

I found that some files are missing in the data file downloaded from BaiduNetdisk. There are 6 files in MSRVTT/annotation/RET (int2word.npy, ref_cpation.json, sent2rolegraoh.augment.json, sent2srl.json and word2int.json), but some are not found in other dataset. For example, there are only 2 files in MSVD/annotation/RET (ref_cpation.json, sent2rolegraoh.augment.json).

captioning my own video

Hi~ thanks for your nice work~
I want to caption a self-captured video, could you please give some detailed instructions on how to adapt the pretrained model provide in the code to finish this task? For example, the feature extraction method, feature data format, and how to visualize the final result? Thanks a lot!

VATEX has no Resnet152 feature

OSError: Unable to open file (unable to open file: name = 'data/VATEX/ordered_feature/SA/resnet152.pth/trn_ft.hdf5'

Could you tell me to use I3D feature?

about visualizing examples

Hello, thanks for your great work, I'm very interested in visualizing the examples.How can I visualize the retrieved videos?Could you please upload the code?

Is there a schedule for code release？

About other dataset

Hi, I'm very interested in your work, and I want to use other datasets like Charades on your model. But there are several files like annotations which other datasets don't have. What should I do could get these annotations and how to get the role graph? Could you provide your tools metioned in your paper? Thank you very much if you can reply me.

Can not find "word_embeds.glove42b.th"

Hi Shizhe,

Thanks for your great work! I noticed in the training script, it needs to load a pre-train model:

--resume_file $resdir/../../word_embeds.glove42b.th

This leads to initialize the text embedding module？

Besides, I can not find this file from "MSRVTT/results/RET.released/" and can only find one "MSRVTT/results/RET/word_embeds.glove32b.th". Is there any difference between word_embeds.glove42b.th and word_embeds.glove32b.th? Could you please share the "word_embeds.glove42b.th"?

Problem about allennlp: "srl_bert is not a registered name for Model"

The pretrained model "bert-base-srl-2019.06.17.tar.gz" seems not to be applicable for the latest version of allennlp

If there is no verb in the sentence, what should we do with it？

If there is no verb in the sentence, what should we do with it

About training time

Thanks for your great work!

I have a question that how long to train your model on such 3 dataset?

And the BaiduNetdisk is empty.

When I run semantic_role_labeling.py ,I got a error.

allennlp.common.checks.ConfigurationError: srl not in acceptable choices for dataset_reader.type

My config code for predictor is

**archive=load_archive('bert-base-srl-2019.06.17')
predictor=Predictor.from_archive(archive,'video-text classifier')**

Download dataset without Baidu account

Can you provide datasets to other domain such as google drive/ dropbox ? To download from Baidu require account and I'm not from China nor have China phone number.
Thank you.

Where I can downlaod Youtube2Text dataset, the link: http://research.microsoft.com/en-us/downloads/38cf15fd-b8df-477e-a4e4-a4680caa75af/default.aspx is error

Youtube2Text data set

Recently I saw your paper fine-video-text Retrieval with Hierarchical Graph Reasoning. I saw you used Youtube2Text dataset in your paper. However, I did not find the video features and sentence features of Youtube2Text data set in baidu cloud link. Could you please provide me with the download link of Youtube2Text data set? Thank you very much!

A question about mpdata.py and rolesgraph.py in reader folder

There is a doubt in this get data function: why only obtain one caption in a video ?

def getitem(self, idx):
out={}
if self.is_train:
video_idx,cap_idx=self.pair_idxs[idx]
video_name=self.video_names[video_idx]
mp_feature=self.mp_features[video_idx]
sent=self.captions[cap_idx]
cap_ids,cap_len=self.process_sent(sent,self.max_words_embedding)
out['captions_ids']=cap_ids
out['captions_lens']=cap_len
else:
video_name=self.video_names[idx]
mp_feature=self.mp_features[idx]

    out['names']=video_name
    out['mp_fts']=mp_feature

    return out

Regarding MP features used for global matching

Hi,

How are the features (MP) used for global matching extracted? Are these obtained by spatio-temporally average pooling the features obtained from ResNet-152 pretrained on ImageNet?

Questions about the MSR-VTT dataset

Hi, Shizhe, thanks for your great work. I downloaded the MSR-VTT dataset you provided, and I have a question. I found that not every video corresponds to 20 captions. Some videos only correspond to less than 20 captions. I would like to ask if you specifically selected these captions and how to choose them?

About the recurrence of paper results

Thank you for your great codes ! And after running your codes in my server for several times, I am surprised to find out that I cannot reproduce your result in paper. The best result of final recall sum that I got in MSRVTT is 170.1 while your paper's result is 172.4, and i did not modify anything of your codes...
Could you please share the best parameters of your codes ? or introduce the solution of my problem ?

about the frame rate

Hi, cshizhe

I find that the number_of_feature/video_duration of videos are different, can you tell me the temporal interval of visual features?

Thanks

How to get the original dataset of MSRVTT?

Hello author, can you provide the original video dataset of MSRVTT?

How to get the word embedding weights for a new Dataset?

Hi, Shizhe, thanks for the wonderful work!

For a new dataset, how can I get the word2int.json, int2word.npy and word.embedding.glove42.th?
I assume that you used a Glove model for word embedding weight initialization.
Could you provide an instruction of it?

different time get different scores

Hi, cshizhe, thanks for your great work.
when testing performance on MSRVTT dataset, I found that the performance in different test are same, but the sent_scores, verb_scores and noun_scores were different. I don't know why.

there are some outputs in different test :
.......
tensor(-197.5491, device='cuda:0') tensor(4066.6943, device='cuda:0') tensor(4957.7461, device='cuda:0')
tensor(-172.1141, device='cuda:0') tensor(4193.5151, device='cuda:0') tensor(5157.7603, device='cuda:0')
tensor(-68.0737, device='cuda:0') tensor(1171.2622, device='cuda:0') tensor(1342.9297, device='cuda:0')
tensor(82.5919, device='cuda:0') tensor(4531.4185, device='cuda:0') tensor(5212.8369, device='cuda:0')
tensor(-43.9712, device='cuda:0') tensor(4319.0312, device='cuda:0') tensor(5150.5146, device='cuda:0')
tensor(1.5257, device='cuda:0') tensor(4386.4746, device='cuda:0') tensor(5333.5151, device='cuda:0')
tensor(-22.8292, device='cuda:0') tensor(1247.3308, device='cuda:0') tensor(1393.1257, device='cuda:0')
tensor(23.0804, device='cuda:0') tensor(1473.0065, device='cuda:0') tensor(1647.1292, device='cuda:0')
tensor(-31.6811, device='cuda:0') tensor(1406.5350, device='cuda:0') tensor(1616.0713, device='cuda:0')
tensor(-41.8293, device='cuda:0') tensor(1422.7487, device='cuda:0') tensor(1656.0972, device='cuda:0')
tensor(-10.5121, device='cuda:0') tensor(397.1695, device='cuda:0') tensor(444.0505, device='cuda:0')
ir1,ir5,ir10,imedr,imeanr,imAP,cr1,cr5,cr10,cmedr,cmeanr,cmAP,rsum
ir5-rsum,epoch.28.th,22.89,51.07,63.17,5.00,40.16,36.14,22.30,51.10,62.90,5.00,39.20,35.62,273.43

different time:
........
tensor(-89.9776, device='cuda:0') tensor(4095.6599, device='cuda:0') tensor(5116.2510, device='cuda:0')
tensor(-145.8661, device='cuda:0') tensor(4161.9165, device='cuda:0') tensor(5351.6670, device='cuda:0')
tensor(-40.3292, device='cuda:0') tensor(1177.1305, device='cuda:0') tensor(1314.6021, device='cuda:0')
tensor(-58.3337, device='cuda:0') tensor(4536.5352, device='cuda:0') tensor(4928.3350, device='cuda:0')
tensor(35.2728, device='cuda:0') tensor(4343.3838, device='cuda:0') tensor(5280.2969, device='cuda:0')
tensor(2.8130, device='cuda:0') tensor(4361.0112, device='cuda:0') tensor(5508.0010, device='cuda:0')
tensor(37.5651, device='cuda:0') tensor(1243.3253, device='cuda:0') tensor(1373.3599, device='cuda:0')
tensor(-25.2279, device='cuda:0') tensor(1490.6547, device='cuda:0') tensor(1566.4670, device='cuda:0')
tensor(7.1009, device='cuda:0') tensor(1408.7480, device='cuda:0') tensor(1670.6154, device='cuda:0')
tensor(-34.9750, device='cuda:0') tensor(1403.9734, device='cuda:0') tensor(1701.3884, device='cuda:0')
tensor(-7.8403, device='cuda:0') tensor(396.0836, device='cuda:0') tensor(424.8773, device='cuda:0')
ir1,ir5,ir10,imedr,imeanr,imAP,cr1,cr5,cr10,cmedr,cmeanr,cmAP,rsum
ir5-rsum,epoch.28.th,22.89,51.07,63.17,5.00,40.16,36.14,22.30,51.10,62.90,5.00,39.20,35.62,273.43

About semantic_role_labeling.py

Hi, when I generate my own rolegraph, there are something wrong. With the predictor's model adress https://s3-us-west-2.amazonaws.com/allennlp/models/bert-base-srl-2019.06.17.tar.gz you provided in semantic_role_labeling.py, I got the predictor's output like {'verbs': [{'verb': 'talks', 'description': 'a woman talks about a futuristic bicycle design', 'tags': ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']}], 'words': ['a', 'woman', 'talks', 'about', 'a', 'futuristic', 'bicycle', 'design']}, all tags are O. So are there someting wrong with the model? I try other models like https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz, which is used in semantic roles labeling provided in https://demo.allennlp.org/semantic-role-labeling/MjMyODEwNg==, it works correctly, the output is {'verbs': [{'verb': 'is', 'description': 'someone [V: is] blowing a little boys face with a leaf blower', 'tags': ['O', 'B-V', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']}, {'verb': 'blowing', 'description': '[ARG0: someone] is [V: blowing] [ARG1: a little boys face] [ARGM-MNR: with a leaf blower]', 'tags': ['B-ARG0', 'O', 'B-V', 'B-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'B-ARGM-MNR', 'I-ARGM-MNR', 'I-ARGM-MNR', 'I-ARGM-MNR']}, {'verb': 'face', 'description': 'someone is blowing [ARG0: a little boys] [V: face] with a leaf blower', 'tags': ['O', 'O', 'O', 'B-ARG0', 'I-ARG0', 'I-ARG0', 'B-V', 'O', 'O', 'O', 'O']}], 'words': ['someone', 'is', 'blowing', 'a', 'little', 'boys', 'face', 'with', 'a', 'leaf', 'blower']}.

Question about inconsistent results with the other papers

Hi cshizhe.
In your paper, video-to-text retrieval results of all methods on TGIF are much lower than the results in the PVSE paper.
Because there is no description about the result, I can't understand the discrepancy of the results.
Can you explain about this?
I do have to train your code on TGIF and get the result but I think it's more certain.

Thank you in advance.

I found that only the positive examples are given attention in the paper, is there any data leakage?

MSVD has no word2int.json and int2word.npy file

The MSVD dataset has no word2int.json and int2word.npy file. Could you give me a new link�?

About the dataset

Hi,
I click the BaiduNetdisk url, but it appears the following information:

此链接分享内容可能因为涉及侵权、色情、反动、低俗等信息，无法访问！

The biadu link for annotations, pretrained features is gone.

Can you provide the split information of Vatex dataset?

I find that the vatex dataset you used in hgr is VATEX v1.0 which does not provide the annotations on testing set.
Then you randomly split the validation set into two equal parts with 1,500 videos as validation set and other 1,500 videos as testing set.
I want to follow your dataset partitioning, but i can not find any split information in this repo.
Could you please provide the 'csv' or 'json' files of vatex dataset which contain the partition information.

cshizhe / hgr_v2t Goto Github PK

hgr_v2t's Introduction

Hi there 👋

hgr_v2t's People

Contributors

Stargazers

Watchers

Forkers

hgr_v2t's Issues

Recommend Projects

Recommend Topics

Recommend Org