Giter VIP home page Giter VIP logo

hgr_v2t's Introduction

Hi there 👋

Shizhe's GitHub stats

hgr_v2t's People

Contributors

cshizhe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hgr_v2t's Issues

About data file

I found that some files are missing in the data file downloaded from BaiduNetdisk. There are 6 files in MSRVTT/annotation/RET (int2word.npy, ref_cpation.json, sent2rolegraoh.augment.json, sent2srl.json and word2int.json), but some are not found in other dataset. For example, there are only 2 files in MSVD/annotation/RET (ref_cpation.json, sent2rolegraoh.augment.json).

captioning my own video

Hi~ thanks for your nice work~
I want to caption a self-captured video, could you please give some detailed instructions on how to adapt the pretrained model provide in the code to finish this task? For example, the feature extraction method, feature data format, and how to visualize the final result? Thanks a lot!

VATEX has no Resnet152 feature

OSError: Unable to open file (unable to open file: name = 'data/VATEX/ordered_feature/SA/resnet152.pth/trn_ft.hdf5'

Could you tell me to use I3D feature?

about visualizing examples

Hello, thanks for your great work, I'm very interested in visualizing the examples.How can I visualize the retrieved videos?Could you please upload the code?

About other dataset

Hi, I'm very interested in your work, and I want to use other datasets like Charades on your model. But there are several files like annotations which other datasets don't have. What should I do could get these annotations and how to get the role graph? Could you provide your tools metioned in your paper? Thank you very much if you can reply me.

Can not find "word_embeds.glove42b.th"

Hi Shizhe,

Thanks for your great work! I noticed in the training script, it needs to load a pre-train model:

--resume_file $resdir/../../word_embeds.glove42b.th

This leads to initialize the text embedding module?

Besides, I can not find this file from "MSRVTT/results/RET.released/" and can only find one "MSRVTT/results/RET/word_embeds.glove32b.th". Is there any difference between word_embeds.glove42b.th and word_embeds.glove32b.th? Could you please share the "word_embeds.glove42b.th"?

About training time

Thanks for your great work!

I have a question that how long to train your model on such 3 dataset?

And the BaiduNetdisk is empty.

When I run semantic_role_labeling.py ,I got a error.

allennlp.common.checks.ConfigurationError: srl not in acceptable choices for dataset_reader.type

My config code for predictor is

**archive=load_archive('bert-base-srl-2019.06.17')
predictor=Predictor.from_archive(archive,'video-text classifier')**

Download dataset without Baidu account

Can you provide datasets to other domain such as google drive/ dropbox ? To download from Baidu require account and I'm not from China nor have China phone number.
Thank you.

Youtube2Text data set

Recently I saw your paper fine-video-text Retrieval with Hierarchical Graph Reasoning. I saw you used Youtube2Text dataset in your paper. However, I did not find the video features and sentence features of Youtube2Text data set in baidu cloud link. Could you please provide me with the download link of Youtube2Text data set? Thank you very much!

A question about mpdata.py and rolesgraph.py in reader folder

There is a doubt in this get data function: why only obtain one caption in a video ?

def getitem(self, idx):
out={}
if self.is_train:
video_idx,cap_idx=self.pair_idxs[idx]
video_name=self.video_names[video_idx]
mp_feature=self.mp_features[video_idx]
sent=self.captions[cap_idx]
cap_ids,cap_len=self.process_sent(sent,self.max_words_embedding)
out['captions_ids']=cap_ids
out['captions_lens']=cap_len
else:
video_name=self.video_names[idx]
mp_feature=self.mp_features[idx]

    out['names']=video_name
    out['mp_fts']=mp_feature

    return out

Regarding MP features used for global matching

Hi,

How are the features (MP) used for global matching extracted? Are these obtained by spatio-temporally average pooling the features obtained from ResNet-152 pretrained on ImageNet?

Questions about the MSR-VTT dataset

Hi, Shizhe, thanks for your great work. I downloaded the MSR-VTT dataset you provided, and I have a question. I found that not every video corresponds to 20 captions. Some videos only correspond to less than 20 captions. I would like to ask if you specifically selected these captions and how to choose them?

About the recurrence of paper results

Thank you for your great codes ! And after running your codes in my server for several times, I am surprised to find out that I cannot reproduce your result in paper. The best result of final recall sum that I got in MSRVTT is 170.1 while your paper's result is 172.4, and i did not modify anything of your codes...
Could you please share the best parameters of your codes ? or introduce the solution of my problem ?

about the frame rate

Hi, cshizhe

I find that the number_of_feature/video_duration of videos are different, can you tell me the temporal interval of visual features?

Thanks

How to get the word embedding weights for a new Dataset?

Hi, Shizhe, thanks for the wonderful work!

For a new dataset, how can I get the word2int.json, int2word.npy and word.embedding.glove42.th?
I assume that you used a Glove model for word embedding weight initialization.
Could you provide an instruction of it?

different time get different scores

Hi, cshizhe, thanks for your great work.
when testing performance on MSRVTT dataset, I found that the performance in different test are same, but the sent_scores, verb_scores and noun_scores were different. I don't know why.

there are some outputs in different test :
.......
tensor(-197.5491, device='cuda:0') tensor(4066.6943, device='cuda:0') tensor(4957.7461, device='cuda:0')
tensor(-172.1141, device='cuda:0') tensor(4193.5151, device='cuda:0') tensor(5157.7603, device='cuda:0')
tensor(-68.0737, device='cuda:0') tensor(1171.2622, device='cuda:0') tensor(1342.9297, device='cuda:0')
tensor(82.5919, device='cuda:0') tensor(4531.4185, device='cuda:0') tensor(5212.8369, device='cuda:0')
tensor(-43.9712, device='cuda:0') tensor(4319.0312, device='cuda:0') tensor(5150.5146, device='cuda:0')
tensor(1.5257, device='cuda:0') tensor(4386.4746, device='cuda:0') tensor(5333.5151, device='cuda:0')
tensor(-22.8292, device='cuda:0') tensor(1247.3308, device='cuda:0') tensor(1393.1257, device='cuda:0')
tensor(23.0804, device='cuda:0') tensor(1473.0065, device='cuda:0') tensor(1647.1292, device='cuda:0')
tensor(-31.6811, device='cuda:0') tensor(1406.5350, device='cuda:0') tensor(1616.0713, device='cuda:0')
tensor(-41.8293, device='cuda:0') tensor(1422.7487, device='cuda:0') tensor(1656.0972, device='cuda:0')
tensor(-10.5121, device='cuda:0') tensor(397.1695, device='cuda:0') tensor(444.0505, device='cuda:0')
ir1,ir5,ir10,imedr,imeanr,imAP,cr1,cr5,cr10,cmedr,cmeanr,cmAP,rsum
ir5-rsum,epoch.28.th,22.89,51.07,63.17,5.00,40.16,36.14,22.30,51.10,62.90,5.00,39.20,35.62,273.43

different time:
........
tensor(-89.9776, device='cuda:0') tensor(4095.6599, device='cuda:0') tensor(5116.2510, device='cuda:0')
tensor(-145.8661, device='cuda:0') tensor(4161.9165, device='cuda:0') tensor(5351.6670, device='cuda:0')
tensor(-40.3292, device='cuda:0') tensor(1177.1305, device='cuda:0') tensor(1314.6021, device='cuda:0')
tensor(-58.3337, device='cuda:0') tensor(4536.5352, device='cuda:0') tensor(4928.3350, device='cuda:0')
tensor(35.2728, device='cuda:0') tensor(4343.3838, device='cuda:0') tensor(5280.2969, device='cuda:0')
tensor(2.8130, device='cuda:0') tensor(4361.0112, device='cuda:0') tensor(5508.0010, device='cuda:0')
tensor(37.5651, device='cuda:0') tensor(1243.3253, device='cuda:0') tensor(1373.3599, device='cuda:0')
tensor(-25.2279, device='cuda:0') tensor(1490.6547, device='cuda:0') tensor(1566.4670, device='cuda:0')
tensor(7.1009, device='cuda:0') tensor(1408.7480, device='cuda:0') tensor(1670.6154, device='cuda:0')
tensor(-34.9750, device='cuda:0') tensor(1403.9734, device='cuda:0') tensor(1701.3884, device='cuda:0')
tensor(-7.8403, device='cuda:0') tensor(396.0836, device='cuda:0') tensor(424.8773, device='cuda:0')
ir1,ir5,ir10,imedr,imeanr,imAP,cr1,cr5,cr10,cmedr,cmeanr,cmAP,rsum
ir5-rsum,epoch.28.th,22.89,51.07,63.17,5.00,40.16,36.14,22.30,51.10,62.90,5.00,39.20,35.62,273.43

About semantic_role_labeling.py

Hi, when I generate my own rolegraph, there are something wrong. With the predictor's model adress https://s3-us-west-2.amazonaws.com/allennlp/models/bert-base-srl-2019.06.17.tar.gz you provided in semantic_role_labeling.py, I got the predictor's output like {'verbs': [{'verb': 'talks', 'description': 'a woman talks about a futuristic bicycle design', 'tags': ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']}], 'words': ['a', 'woman', 'talks', 'about', 'a', 'futuristic', 'bicycle', 'design']}, all tags are O. So are there someting wrong with the model? I try other models like https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz, which is used in semantic roles labeling provided in https://demo.allennlp.org/semantic-role-labeling/MjMyODEwNg==, it works correctly, the output is {'verbs': [{'verb': 'is', 'description': 'someone [V: is] blowing a little boys face with a leaf blower', 'tags': ['O', 'B-V', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']}, {'verb': 'blowing', 'description': '[ARG0: someone] is [V: blowing] [ARG1: a little boys face] [ARGM-MNR: with a leaf blower]', 'tags': ['B-ARG0', 'O', 'B-V', 'B-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'B-ARGM-MNR', 'I-ARGM-MNR', 'I-ARGM-MNR', 'I-ARGM-MNR']}, {'verb': 'face', 'description': 'someone is blowing [ARG0: a little boys] [V: face] with a leaf blower', 'tags': ['O', 'O', 'O', 'B-ARG0', 'I-ARG0', 'I-ARG0', 'B-V', 'O', 'O', 'O', 'O']}], 'words': ['someone', 'is', 'blowing', 'a', 'little', 'boys', 'face', 'with', 'a', 'leaf', 'blower']}.

Question about inconsistent results with the other papers

Hi cshizhe.
In your paper, video-to-text retrieval results of all methods on TGIF are much lower than the results in the PVSE paper.
Because there is no description about the result, I can't understand the discrepancy of the results.
Can you explain about this?
I do have to train your code on TGIF and get the result but I think it's more certain.

Thank you in advance.

About the dataset

Hi,
I click the BaiduNetdisk url, but it appears the following information:

此链接分享内容可能因为涉及侵权、色情、反动、低俗等信息,无法访问!

Can you provide the split information of Vatex dataset?

I find that the vatex dataset you used in hgr is VATEX v1.0 which does not provide the annotations on testing set.
Then you randomly split the validation set into two equal parts with 1,500 videos as validation set and other 1,500 videos as testing set.
I want to follow your dataset partitioning, but i can not find any split information in this repo.
Could you please provide the 'csv' or 'json' files of vatex dataset which contain the partition information.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.