xiadingz / video-caption.pytorch Goto Github PK

View Code? Open in Web Editor NEW

399.0 399.0 128.0 98.54 MB

pytorch implementation of video captioning

License: MIT License

Python 100.00%

deep-learning pytorch video-captioning

video-caption.pytorch's People

Contributors

Stargazers

Watchers

Forkers

blues5 tsingzao shubhampachori12110095 liviust hyzcn eaglep91 qazwsx74269 dadadidodi evander-dacosta carsonchen1129 mhasa004 tenaflyyy ywzcode mbajaj01 nhu2000 tenzinchw yapengtian ammieqi kaikangsdu xun-yang lvaleriu sojhal ruotianluo us241098 swordsmanxyz visense ybcliff forence zhangjiajiaer shengleih fangqin0703 amirunpri2018 156aasdfg abhishek1907 videocaptiontwin andrew-zhu hongrenwang fantasy-fibonacci cfh3c shiyaya ylqi paymemoney xinlongxiao sreedattasanjay hnumair weichentangming chengjianglong stevenghr charaisk rishabh2301 ferdib-al-islam anirudh257 eustcpl ktqiu lydaidai liuhongyer chongkewu yytzsy andychongyz chang111 sciroccogti benben0413 bbradt beebrain rubickh fred199683 leiwill aniloc111 qbenliu aquuuuf deepaliverma shanehuanghz wang-xu-14 tushar-31093 2226171237 wanrudu 150104010139 zhushaoquan thesane kono-dioda-1111 edwhelloworld nightfury12366 vinace rohan1561 yimikai kuangzijian wzw1998 mtcai salmon7ish zzzzlalala ppintelligence yugaljain1999 kamino666 w21180239 fleschier sleepppy peiwenjie-11 zyq-143 gluver keennjupt

video-caption.pytorch's Issues

ptbtokenizer.py

How do you solve this problem?
UnicodeEncodeError: 'gbk' codec can't encode character '\ufeff' in position 5464: illegal multibyte sequence

FileNotFoundError: [Errno 2] No such file or directory: 'data/feats/c3d_feats/video6175.npy'

As there are issues with FFmpeg compilation for yasm updation, using opencv sounds reasonable. The following block needs to be modified and incorporated.

import sys
import argparse
import cv2
print(cv2.version)
def extractImages(pathIn, pathOut):
vidcap = cv2.VideoCapture(pathIn)
success,image = vidcap.read()
count = 0
success = True
while success:
success,image = vidcap.read()
print ('Read a new frame: ', success)
cv2.imwrite( pathOut + "\frame%d.jpg" % count, image) # save frame as JPEG file
count += 1

if name=="main":
print("aba")
i = argparse.ArgumentParser()
i.add_argument("--pathIn", help="path to video")
i.add_argument("--pathOut", help="path to images")
args = i.parse_args()
print(args)
extractImages(args.pathIn, args.pathOut)

can't download data

Is the download link lost? i can't open the link, is there any other way to download?

how can I make inference with my video?

error in eval.py : BrokenPipeError: [Errno 32] Broken pipe

vocab size is 183
number of train videos: 8
number of val videos: 1
number of test videos: 1
['data/feats/resnet152/data/train-video']
load feats from ['data/feats/resnet152/data/train-video']
max sequence length in data is 28
/home/chamim/SkripsiP3/lib/python3.6/site-packages/torch/nn/modules/rnn.py:51: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
/home/chamim/SkripsiP3/lib/python3.6/site-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
init COCO-EVAL scorer
eval.py:34: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead
json.load(open(opt["input_json"]))['sentences'])
/home/chamim/SkripsiP3/lib/python3.6/site-packages/torch/nn/functional.py:1339: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
Traceback (most recent call last):
File "eval.py", line 122, in
main(opt)
File "eval.py", line 91, in main
test(model, crit, dataset, dataset.get_vocab(), opt)
File "eval.py", line 57, in test
valid_score = scorer.score(gts, samples, samples.keys())
File "/home/chamim/SkripsiP3/skripsi/video captioning/misc/cocoeval.py", line 88, in score
score, scores = scorer.compute_score(gts, res)
File "coco-caption/pycocoevalcap/meteor/meteor.py", line 41, in compute_score
self.meteor_p.stdin.flush()
BrokenPipeError: [Errno 32] Broken pipe

please help me :( when I did an evaluation with eval.py, an error occurred BrokenPipeError: [Errno 32] Broken pipe, is there anything I need to fix? thank you

Disable cuda?

Is it possible to disable cuda to run this?

About beam search

Though I can find you have implemented this function in your another repository 'video-caption-openNMT.pytorch', it's hard to comprehend it. Would you please make it available in this repository? Thanks a lot.

number of train caption is < 10000

Msr vtt dataset have 10000 videos and 20 captions for each video but in this implementation only a video-caption pair in train phase is considered. Therefore in total <= 10000 example for train.
someone has seen the same thing????
has anyone changed the code?

hi, there. I forked coco-caption and fixed the bug of meteor when using python3.

here is the fixed version of coco-caption: https://github.com/XgDuan/coco-caption

How to use the c3d's features

I want to know about how to use the features extracted from the c3d, just use the output.json as the tran.py's input ?
can someone give some tips? THANKS!!!

Most likely an error in S2VTModel

Hi Ding,

Thanks a ton for this project! This might be an issue, I am just not sure. With

self.rnn1.flatten_parameters()
self.rnn2.flatten_parameters()

in 'train' mode in S2VTModel, I am getting the following error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.
To suppress this error, I am commenting out snippet for flattening the parameters. The model seems to be converging, with loss values:
model_0, loss: 22.717190; model_50, loss: 15.616700; model_100, loss: 12.238667; model_150, loss: 11.222753.

I just wanted your opinion whether I am making any mistake by commenting out self.rnn1.flatten_parameters(); self.rnn2.flatten_parameters() ? (I am using one GPU)

pretrainedmodels problem

when i run the features preporation code:
python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152 --n_frame_steps 40 --gpu 1,2,3
i got the problem:
No module named pretrainedmodels,.
where could i get it?

Could you provide a s2vtattmodel pretrainedmodel?

i want to finetune on it because the training time is too long

pretrainedmodels

import pretrainedmodels error. Does you mean torchvision.models?

when I use the train.py , the error raised....TypeError: gru() received an invalid combination of arguments

TypeError: gru() received an invalid combination of arguments - got (Tensor, Tensor, list, bool, int, float, bool, int, bool), but expected one of:

(Tensor data, Tensor batch_sizes, Tensor hx, tuple of Tensors params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional)
didn't match because some of the arguments have invalid types: (Tensor, Tensor, !list!, !bool!, !int!, !float!, !bool!, !int!, bool)
(Tensor input, Tensor hx, tuple of Tensors params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first)
didn't match because some of the arguments have invalid types: (Tensor, Tensor, !list!, bool, int, float, bool, !int!, bool)

While running train.py got ValueError: not enough values to unpack (expected 3, got 2)

Traceback (most recent call last):
File "train.py", line 133, in
main(opt)
File "train.py", line 120, in main
train(dataloader, model, crit, optimizer, exp_lr_scheduler, opt, rl_crit)
File "train.py", line 40, in train
seq_probs, _ = model(fc_feats, labels, 'train')
File "/home/pg2018/cse/18071003/.conda/envs/env_name/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/pg2018/cse/18071003/video-captioning/models/S2VTModel.py", line 34, in forward
batch_size, n_frames, _ = vid_feats.shape
ValueError: not enough values to unpack (expected 3, got 2)

ValueError: could not convert string to float: 'Error: specify Meteor stats'

I used Python3.5 and Pytorch0.4 in ubuntu, I got this errors:

init COCO-EVAL scorer
Traceback (most recent call last):
  File "eval.py", line 122, in <module>
    main(opt)
  File "eval.py", line 91, in main
    test(model, crit, dataset, dataset.get_vocab(), opt)
  File "eval.py", line 57, in test
    valid_score = scorer.score(gts, samples, samples.keys())
  File "/host/xxx/video_caption/video_caption_pytorch/misc/cocoeval.py", line 88, in score
    score, scores = scorer.compute_score(gts, res)
  File "coco-caption/pycocoevalcap/meteor/meteor.py", line 44, in compute_score
    score = float(self.meteor_p.stdout.readline().decode().strip())
ValueError: could not convert string to float: 'Error: specify Meteor stats'

How to deal with it?

S2VTAttModel doesn't converge

Hello DingXia! I'm trying to repro your results and train S2VTAttModel with train data you linked in read.me with train.py params --gpu 0,1,2,3 --epochs 9001 --batch_size 450 --checkpoint_path data/save --feats_dir data/feats/resnet152/train-video --dim_vid 2048 --model S2VTAttModel
But after 4 days of training, train/loss still too high. iter 12 (epoch 5463), train_loss = 23.042719 Does I'm doing something wrong?

I met dataparallel bug,please help me

RuntimeError:Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight' ; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)

How to deal with this error? Thanks

I use 100 videos to check if I can run this project,but while training I got this error. I can't find what is wrong. Help me ,please.Thanks.

save opt details to ../data/save/opt_info.json
vocab size is 85
number of train videos: 100
number of val videos: 0
number of test videos: 0
load feats from ['../data/feats/resnet152']
max sequence length in data is 10
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/rnn.py:51: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
/usr/local/lib/python3.6/dist-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:100: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "train.py", line 133, in
main(opt)
File "train.py", line 120, in main
train(dataloader, model, crit, optimizer, exp_lr_scheduler, opt, rl_crit)
File "train.py", line 32, in train
for data in loader:
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 346, in next
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/drive/My Drive/data/video-caption.pytorch/dataloader.py", line 63, in getitem
captions = self.captions['video%i'%(ix)]['final_captions']
KeyError: 'video57'

How can I continue to train with the last model when the train was interrupted?

My server happened to shutdown while it's training. I hope that there's a way that I can go on with the train instead of start another brand-new training.

size mismatch

D:\Anaconda\envs\vp12\lib\site-packages\torch\nn\modules\rnn.py:51: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers
greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
D:\Anaconda\envs\vp12\lib\site-packages\torch\nn_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
D:\Anaconda\envs\vp12\lib\site-packages\torch\optim\lr_scheduler.py:82: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you s
hould call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedul
e.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
Traceback (most recent call last):
File "train.py", line 133, in
main(opt)
File "train.py", line 120, in main
train(dataloader, model, crit, optimizer, exp_lr_scheduler, opt, rl_crit)
File "train.py", line 40, in train
seq_probs, _ = model(fc_feats, labels, 'train')
File "D:\Anaconda\envs\vp12\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\video-caption\code12pytorch\video-caption.pytorch-master\models\S2VTAttModel.py", line 28, in forward
encoder_outputs, encoder_hidden = self.encoder(vid_feats)
File "D:\Anaconda\envs\vp12\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\video-caption\code12pytorch\video-caption.pytorch-master\models\EncoderRNN.py", line 53, in forward
vid_feats = self.vid2hid(vid_feats.view(-1, dim_vid))
File "D:\Anaconda\envs\vp12\lib\site-packages\torch\nn\modules\module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "D:\Anaconda\envs\vp12\lib\site-packages\torch\nn\modules\linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "D:\Anaconda\envs\vp12\lib\site-packages\torch\nn\functional.py", line 1369, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [12000 x 2048], m2: [4096 x 512] at C:/w/1/s/tmp_conda_3.7_055457/conda/conda-bld/pytorch_1565416617654/work/aten/src\THC/generic/THCTensorMathBlas.cu:
273
How to resolve size mismatches？I can't find a place to set parameters in a convolutional layer

The reward calculated using the ciderD of pyciderevalCap is 0

Source Data download

hi~, the dataset seems can not be visited through your shared link. How to download it? Thx 😬

About the performance

Cant access files in stored on Baidu

Any chance the files could also be stored on google/onedrive/dropbox?

invalid argument 0: Sizes of tensors must match except in dimension 0. Got 27 and 41 in dimension

Hi, I met a bug, but I haven't fixed it. I want to use this code to train model on the MSVD dataset. I can train the model on the MSR-VTT. So I prepare the caption.json and info.json based on default setting(like MSR-VTT), but when I train the model on the MSVD, I met the problem:

Traceback (most recent call last):
File "/home/tuyunbin/video-caption.pytorch/train.py", line 139, in
main(opt)
File "/home/tuyunbin/video-caption.pytorch/train.py", line 124, in main
train(dataloader, model, crit, optimizer, exp_lr_scheduler, opt, rl_crit)
File "/home/tuyunbin/video-caption.pytorch/train.py", line 33, in train
for data in loader:
File "/home/tuyunbin/anaconda3/envs/caffe2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 560, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/tuyunbin/anaconda3/envs/caffe2/lib/python2.7/site-packages/torch/utils/data/_utils/collate.py", line 63, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
File "/home/tuyunbin/anaconda3/envs/caffe2/lib/python2.7/site-packages/torch/utils/data/_utils/collate.py", line 63, in
return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
File "/home/tuyunbin/anaconda3/envs/caffe2/lib/python2.7/site-packages/torch/utils/data/_utils/collate.py", line 43, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 31 and 44 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:711

When I set the batch_size =1, the code of train.py could run. But when I change this value, the error will occur.

'DecoderRNN' object has no attribute 'sample_beam'

when I run python eval.py with beam_size=2, occur the problem 'DecoderRNN' object has no attribute 'sample_beam'

init COCO-EVAL scorer
Traceback (most recent call last):
File "eval.py", line 122, in
main(opt)
File "eval.py", line 91, in main
test(model, crit, dataset, dataset.get_vocab(), opt)
File "eval.py", line 48, in test
fc_feats, mode='inference', opt=opt)
File "/u01/isi/maxq/envs/scan_envs/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/u01/isi/wangjunyan/video_captioning/video-caption.pytorch/models/S2VTAttModel.py", line 29, in forward
seq_prob, seq_preds = self.decoder(encoder_outputs, encoder_hidden, target_variable, mode, opt)
File "/u01/isi/maxq/envs/scan_envs/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/u01/isi/wangjunyan/video_captioning/video-caption.pytorch/models/DecoderRNN.py", line 109, in forward
return self.sample_beam(encoder_outputs, decoder_hidden, opt)
File "/u01/isi/maxq/envs/scan_envs/lib/python2.7/site-packages/torch/nn/modules/module.py", line 532, in getattr
type(self).name, name))
AttributeError: 'DecoderRNN' object has no attribute 'sample_beam'

Could you provide the info.json and caption.json?

Captioning Result

Hi, Can I get your captioning result?

A bug: TypeError: gru() received an invalid combination of arguments - got (Tensor, Tensor, list, bool, int, float, bool, int, bool), but expected one of:

Problem
My python version is 2.7(but this bug also exist when using python3+), and pytorch version is 1.1.0
TypeError: gru() received an invalid combination of arguments - got (Tensor, Tensor, list, bool, int, float, bool, int, bool), but expected one of:
(Tensor data, Tensor batch_sizes, Tensor hx, tuple of Tensors params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional)
didn't match because some of the arguments have invalid types: (Tensor, Tensor, !list!, !bool!, !int!, !float!, !bool!, !int!, bool)
(Tensor input, Tensor hx, tuple of Tensors params, bool has_biases, int num_layers, float dropout, bool train, bool bidirectional, bool batch_first)
didn't match because some of the arguments have invalid types: (Tensor, Tensor, !list!, bool, int, float, bool, !int!, bool)
Solution:
change bidirectional=opt["bidirectional"] to bidirectional=bool(opt["bidirectional"])

Question about the scores?

I have ran your code, but got a higher score , And I guess if there are some mistake in my settings, could you help me? Thank you

For example: with vgg19 + s2vt without attention, I got:
"CIDEr": 0.381709195850067,
"Bleu_4": 0.35092030557193526,
"Bleu_3": 0.46800626456106637,
"Bleu_2": 0.6047642387263332,
"Bleu_1": 0.7574938986755618,
"ROUGE_L": 0.5712265574740849,
"METEOR": 0.25508078041867904
for the best.
But Actually, I didn't change anything important in your code.
I split the train_dataset downed from README to 6513/497/2990 for train/val/test.
And the training loss is here:
model_0, loss: 57.772758
model_10, loss: 44.913509
model_20, loss: 40.874763
model_30, loss: 40.119427
model_40, loss: 37.268291
model_50, loss: 33.424942
model_60, loss: 35.766853
model_70, loss: 34.876366
model_80, loss: 31.450918
model_90, loss: 29.820242
model_100, loss: 29.936274
model_110, loss: 30.059401
model_120, loss: 30.751385
model_130, loss: 28.711311
model_140, loss: 29.971272
model_150, loss: 30.382835
model_160, loss: 28.844414
model_170, loss: 26.373568
model_180, loss: 28.996819
model_190, loss: 27.722120
model_200, loss: 28.414360
model_210, loss: 25.155075
model_220, loss: 27.731709
model_230, loss: 28.479822
model_240, loss: 26.850664
model_250, loss: 26.169445
model_260, loss: 27.791225
model_270, loss: 25.879797
model_280, loss: 24.860294
model_290, loss: 24.067417
model_300, loss: 23.089293
model_310, loss: 24.369297
model_320, loss: 24.594177
model_330, loss: 24.342461
model_340, loss: 24.752075
model_350, loss: 25.322969
model_360, loss: 25.452364
model_370, loss: 22.378075
model_380, loss: 24.766953
model_390, loss: 22.536497
model_400, loss: 21.342590
I only trained the model for 400 epoch, because I find that model around 100 epoch performs better.
with "model_100", I got the best score as showed above.
I am new to this, and don't know what is wrong...
Wish for your help.

c3d feats extraction example

Hello! Thank you for your work! Is c3d features currently not supported? I saw some traces in sources, but not sure. Could you please give any hints? Is these features have to be extracted as resnet features and later simply jointed?

Question about LSTM?

I tried the code with '--rnn_type lstm', and the training with model S2VTModel is nomal, but when I try to eval the results, I got the following error:

('vocab size is ', 16860)
('number of train videos: ', 6513)
('number of val videos: ', 497)
('number of test videos: ', 2990)
load feats from [u'data/feats/resnet152']
('max sequence length in data is', 28)
/usr/local/lib/python2.7/dist-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
Traceback (most recent call last):
File "eval.py", line 148, in
main(opt, i)
File "eval.py", line 75, in main
dataset = VideoDataset(opt, "val")
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for S2VTAttModel:
While copying the parameter named "encoder.rnn.weight_hh_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "encoder.rnn.weight_ih_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "encoder.rnn.bias_ih_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "encoder.rnn.bias_hh_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "decoder.rnn.weight_hh_l0", whose dimensions in the model are torch.Size([1536, 512]) and whose dimensions in the checkpoint are torch.Size([2048, 512]).
While copying the parameter named "decoder.rnn.weight_ih_l0", whose dimensions in the model are torch.Size([1536, 1024]) and whose dimensions in the checkpoint are torch.Size([2048, 1024]).
While copying the parameter named "decoder.rnn.bias_ih_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).
While copying the parameter named "decoder.rnn.bias_hh_l0", whose dimensions in the model are torch.Size([1536]) and whose dimensions in the checkpoint are torch.Size([2048]).

I would like to find out why it happens, and could you help me? Thanks.

one of the variables needed for gradient computation has been modified by an inplace operation

hello, When I trained the model, the following error is displayed:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

I am using pytorch 0.4.0
How can i solve this?

Size mismatch at EncoderRNN.py

I did all the prerequisites i.e download the dataset , extract features and build vocab before start training.
When I started training a run time error occurred.

File "drive/DeepLearning/PytorchModel/Pytorch/train.py", line 138, in
main(opt)
File "drive/DeepLearning/PytorchModel/Pytorch/train.py", line 121, in main
train(dataloader, model, crit, optimizer, exp_lr_scheduler, opt, rl_crit)
File "drive/DeepLearning/PytorchModel/Pytorch/train.py", line 40, in train
seq_probs, _ = model(fc_feats, labels, 'train')
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/content/drive/DeepLearning/PytorchModel/Pytorch/models/S2VTAttModel.py", line 28, in forward
encoder_outputs, encoder_hidden = self.encoder(vid_feats)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/content/drive/DeepLearning/PytorchModel/Pytorch/models/EncoderRNN.py", line 53, in forward
vid_feats = self.vid2hid(vid_feats.view(-1, dim_vid))
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/linear.py", line 55, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 992, in linear
return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [4000 x 2048], m2: [4096 x 512] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:249

problem with model S2VTModel

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This error occurs when I use S2VTModel, but everything works well with S2VTAttModel. As I started with S2VTModel and want to fix this bug, but failed. Do you have any idea?

How to obtain info.json or caption.json?

Which research paper did you used in this implementation?

train.py

if not sc_flag:if not sc_flag:
seq_probs = model(fc_feats, labels)
loss = loss_fn(seq_probs, labels[:, 1:], masks[:, 1:])
check this.

Subprocess.call(...)FileNotFoundError

code: video_to_frames_command = ["ffmpeg", #line31
# (optional) overwrite output file if it exists
'-y',
'-i', video, # input file
'-vf', "scale=10000:7000", # input file
'-qscale:v', "2", # quality for JPEG
'{0}/%06d.jpg'.format(dst)]
subprocess.call(video_to_frames_command,
stdout=ffmpeg_log, stderr=ffmpeg_log) #line 39
0%| | 0/7010 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:/Pycharm/prog/prepro_feats.py", line 120, in
extract_feats(params, model, load_image_fn)
File "D:/Pycharm/prog/prepro_feats.py", line 59, in extract_feats
extract_frames(video, dst)
File "D:/Pycharm/prog/prepro_feats.py", line 39, in extract_frames
stdout=ffmpeg_log, stderr=ffmpeg_log)
FileNotFoundError: [WinError 2]

How to evaluate Meteor value?

Hey

I have trained the model on my own dataset (Video + captions), and I have received three files namely, model_5.pth, model_score.txt and opt_info.json.

How can I evaluate the Meteor, Blue etc for my training data.? Also, train.py only train the model, how can I divide data into valid and train as we do in keras?

Thanks

Question about the split？

I wonder if there is a standard to split the data? I only split the data into train data and test data in 9:1. And got a higher score on CIDEr、ROUGE_L、METEOR？
What should I do to correct experiment？

Please help me with this cuda error :: Using anaconda

(base) C:\Users\harne\Downloads\video-caption.pytorch-master> python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152 --n_frame_steps 40 --gpu 4,5
THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=50 error=38 : no CUDA-capable device is detected
Traceback (most recent call last):
File "prepro_feats.py", line 106, in
model = model.cuda()
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 311, in cuda
return self._apply(lambda t: t.cuda(device))
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 208, in _apply
module._apply(fn)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 208, in _apply
module._apply(fn)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 230, in _apply
param_applied = fn(param)
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 311, in
return self.apply(lambda t: t.cuda(device))
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\cuda_init.py", line 179, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at ..\aten\src\THC\THCGeneral.cpp:50

Provided json file has no train/val/test split

The json file you provided in google drive has no train/val/test split. All 10000 videos belong to train split. Could someone help provide a correct one? Thanks in advance.

preprocess videos and labels

May I ask whether the first step should be run twice, and the training video and test video should be run once respectively?

Compiling PyTorch from source

I guess, providing these steps in the READme might help new users-

run the Anaconda installation, let Anaconda modify your .bashrc file
source .bashrc
module load apps cuda/8.0
to check if cuda is installed properly - ls /usr/local/cuda-8.0/lib64
export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]
conda install numpy pyyaml mkl setuptools cmake cffi
conda install -c soumith magma-cuda80
git clone --recursive https://github.com/pytorch/pytorch
cd pytorch/
CUDA_HOME="/usr/local/cuda-8.0" python setup.py install

What is the paper corresponding to this code?

Is it necessary to use a vid2hid layer before the rnn cell?

video-caption.pytorch/models/EncoderRNN.py

Line 25 in 9e4759d

self.vid2hid = nn.Linear(dim_vid, dim_hidden)

As the title, why do we need another linear transform layer for video features when the rnn will do it inside the cell?

If it is to save the number of parameters, will it be better if we specify the rnn input dimension using another variable?
For instance:

self.vid2hid = nn.Linear(dim_vid, dim_rnn_input)
...
self.rnn = self.rnn_cell(dim_rnn_input, dim_hidden, n_layers, batch_first=True,
                         bidirectional=bidirectional, dropout=self.rnn_dropout_p)