vijayvee / video-captioning Goto Github PK

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

License: MIT License

Python 100.00%

video-captioning tensorflow s2vt sequence-to-sequence multimodal-deep-learning seq2seq

video-captioning's Introduction

Automated Video Captioning using S2VT

Introduction

This repository contains my implementation of a video captioning system. This system takes as input a video and generates a caption describing the event in the video.

I took inspiration from Sequence to Sequence -- Video to Text, a video captioning work proposed by researchers at the University of Texas, Austin.

Requirements

For running my code and reproducing the results, the following packages need to be installed first. I have used Python 2.7 for the whole of this project.

Packages:

TensorFlow
Caffe
NumPy
cv2
imageio
scikit-image

S2VT - Architecture and working

Attached below is the architecture diagram of S2VT as given in their paper.

The working of the system while generating a caption for a given video is represented below diagrammatically.

Running instructions

Install all the packages mentioned in the 'Requirements' section for the smooth running of this project.
Using Vid2Url_Full.txt, download the dataset clips from Youtube and store in <YOUTUBE_CLIPS_DIR>.
- Example to use Vid2Url - {'vid1547': 'm1NR0uNNs5Y_104_110'}
- YouTube video identifier - m1NR0uNNs5Y
- Start time - 104 seconds, End time - 110 seconds
- Download frames between 104 seconds and 110 seconds in https://www.youtube.com/watch?v=m1NR0uNNs5Y
- Relevant frames for video id 'vid1547' have been downloaded
Pass downloaded video paths and batch size (depending on hardware constraints) to extract_feats() in Extract_Feats.py to extract VGG16 features for the downloaded video clips and store in <VIDEO_DIR>.
Change paths in lines 13 to 16 in utils.py to point to directories in your workspace.
Run training_vidcap.py with the number of epochs as a command line argument. eg. python training_vidcap.py 10
Pass saved checkpoint files from Step 5 to test_videocap.py to run trained model on the validation set.

Sample results

Attached below are a few screenshots from caption generation for videos from the validation set.

Dataset

Even though S2VT was trained on MSVD, M-VAD and MPII-MD, I have trained my system only on MSVD, which can be downloaded here.

Demo

A demo of my system can be found here

Acknowledgements

Sequence to Sequence -- Video to Text - Subhasini Venugopalan et al.

video-captioning's People

Contributors

Stargazers

Watchers

video-captioning's Issues

How long did it take to train?

Can I get the extracted feature files of MSVD video as i am having trouble installing Caffe on my system.

np.rand?

Hi,
for this part in utils:

def fetch_data_batch(batch_size):

curr_batch_vids = np.random.rand(video_files,batch_size)

I get this error:
TypeError: 'dict_keys' object cannot be interpreted as an integer

What should I do?
Any idea?
Thanks!

Error on Extract_Feats

IOError Please help

I downloaded your model and extracted features, but it still report errors when I run the test_videocap.py code. Please help.
IOError: Cannot parse file S2VT_Dyn_10_0.0001_300_46000.ckpt.meta: 1:1 : Message type "tensorflow.MetaGraphDef" has no field named "version"..

optical flow features

Hi, do you use optical flow features or just RGB features?

ImportError: DLL load failed: The specified module could not be found.

After running Extract_Feat.py,
these error come, please resolve this error.

ImportError Traceback (most recent call last)
in
6 import numpy as np
7 sys.path.insert(0,'/home/vijay/deep-learning/caffe/python')
----> 8 import caffe
9 import skimage.transform
10 def extract_feats(filenames,batch_size):

~\Anaconda3\envs\caffe\lib\site-packages\caffe_init_.py in
----> 1 from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
2 from ._caffe import init_log, log, set_mode_cpu, set_mode_gpu, set_device, Layer, get_solver, layer_type_list, set_random_seed, solver_count, set_solver_count, solver_rank, set_solver_rank, set_multiprocess, has_nccl
3 from ._caffe import version
4 from .proto.caffe_pb2 import TRAIN, TEST
5 from .classifier import Classifier

~\Anaconda3\envs\caffe\lib\site-packages\caffe\pycaffe.py in
11 import numpy as np
12
---> 13 from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver,
14 RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
15 import caffe.io

ImportError: DLL load failed: The specified module could not be found.

What version of tensorflow and version of Caffe were tested?

Thank you. Wanted to avoid installing and reinstalling different version.

architecture and memory of GPU used for training

Hi, I am using your code of S2VT to train my own dataset, I wanted to ask what architecture of GPU did you use for training? and also if you remember how much time it took you to train. Thank you very much.

Aborted Error

when I run the extract_feats.py code, I got the following error, please resolve this

I0906 07:10:07.085180 26398 layer_factory.hpp:77] Creating layer input
I0906 07:10:07.085196 26398 net.cpp:84] Creating Layer input
I0906 07:10:07.085204 26398 net.cpp:380] input -> data
I0906 07:10:07.085224 26398 net.cpp:122] Setting up input
I0906 07:10:07.085232 26398 net.cpp:129] Top shape: 10 3 224 224 (1505280)
I0906 07:10:07.085237 26398 net.cpp:137] Memory required for data: 6021120
I0906 07:10:07.085242 26398 layer_factory.hpp:77] Creating layer conv1_1
I0906 07:10:07.085253 26398 net.cpp:84] Creating Layer conv1_1
I0906 07:10:07.085258 26398 net.cpp:406] conv1_1 <- data
I0906 07:10:07.085263 26398 net.cpp:380] conv1_1 -> conv1_1
F0906 07:10:07.096971 26398 cudnn_conv_layer.cpp:52] Check failed: error == cudaSuccess (30 vs. 0) unknown error
*** Check failure stack trace: ***
Aborted (core dumped)

feature extraction

Hello!I would like to ask, is there no good feature file extracted in this project?

If so, why the following error occurs:

I would appreciate it if the blogger could give me a complete feature extraction step.

not the output in demo

Hello,
First of all, thanks for the repo. It's of great help.
I tried using it with the pretrained model ckpt provided. The outputs are gibberish and not any useful sentence. I am unable to figure out what the issue is. Please help.

What is the sequence of execution of code

hi
What is the sequence of execution of code

computer dead when running extract_feats()

I write a main function by myself to run extract_feats(),and input required video files.But after print "VGG Network loaded",my computer becomes very slow,and no other print(My computer runs on CPU).

So I don't know if it is normal or if there's something wrong with my operation.

here is my main()

if name=="main":

L = []

root = '/home/zzy/Downloads/videos'

for root, dirs, files in os.walk(root):

    for file in files:

        str = root + '/' + file

        L.append(str)

extract_feats(L,2)`

I change the code that make my computer dead to see more clearly:

       for num,frame in enumerate(vid):

        print num

        frame = skimage.transform.resize(frame,[224,224])

        if len(frame.shape)<3:

            frame = np.repeat(frame,3).reshape([224,224,3])

        curr_frames.append(frame)`

when the num become 4 thousand my computer dead.

is there any pretrained model

there is no any use of commiting project to git if u are not documenting it properly no one will understand or use it please sharethe sequence of execution

RuntimeError: The ffmpeg plugin does not work on Python 2.x

Hi !! refered to this, I'm student studying s2vt.

Using by text_files, I made it portion of video. ex) 'vid1547'

and I try to do 'extract_feats(portion_of_video.mp4, batch_size)'.
but error is RuntimeError: The ffmpeg plugin does not work on Python 2.x...
your repo spec is python 2.x, please, How to fix this problem..?

in extract_feats code

    for file in filenames:
 # maybe error is in imageio.get_reader
        vid = imageio.get_reader(file,'ffmpeg')
        curr_frames = []
        for frame in vid:
            frame = skimage.transform.resize(frame,[224,224])
            if len(frame.shape)<3:
                frame = np.repeat(frame,3).reshape([224,224,3])
            curr_frames.append(frame)
        curr_frames = np.array(curr_frames)
        print "Shape of frames: {0}".format(curr_frames.shape)
        idx = map(int,np.linspace(0,len(curr_frames)-1,80))
        curr_frames = curr_frames[idx,:,:,:]
        print "Captured 80 frames: {0}".format(curr_frames.shape)

error message is in here !

VGG Network loaded
Traceback (most recent call last):
  File "/home/ivcl/Desktop/git/video-captioning/s2vt_sample.py", line 30, in <module>
    extract_feats(video_path+'test_m.mp4',4)
  File "/home/ivcl/Desktop/git/video-captioning/Extract_Feats.py", line 34, in extract_feats
    vid = imageio.get_reader(file,'mkv')
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/core/functions.py", line 186, in get_reader
    return format.get_reader(request)
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/core/format.py", line 164, in get_reader
    return self.Reader(self, request)
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/core/format.py", line 214, in __init__
    self._open(**self.request.kwargs.copy())
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/plugins/ffmpeg.py", line 261, in _open
    self._ffmpeg_api = _get_ffmpeg_api()
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/plugins/ffmpeg.py", line 61, in _get_ffmpeg_api
    raise RuntimeError("The ffmpeg plugin does not work on Python 2.x")
RuntimeError: The ffmpeg plugin does not work on Python 2.x

can not find S2VT_Dyn_10_0.0001_300_46000.ckpt.meta ?

hi ， i got below error.
IOError: File S2VT_Dyn_10_0.0001_300_46000.ckpt.meta does not exist
what can i do?

Extract_Feats.py

@vijayvee
Hello Vijay,

I have doubt to know about path defined in "sys.path.insert(0,'/home/vijay/deep-learning/caffe/python')".

Which path is given here? Should I need to give my python path?
Also, where you call this file?

Thanks

where can I download S2VT?

hello! I met the problem like this:

 File "/home/gsy/video-captioning/utils.py", line 8, in <module>
    from VideoCap import S2VT
ImportError: No module named VideoCap

I think maybe your code doesn't include S2VT .I have found so many code,but it still missed.... Could you give me some guide? thanks!

cap no

Hey!
Nice implementation

Since there are multiple captions for a single video so you have tooked all the caption for training or just selected random caption from the multiple caption in every epoc?
And second one is, did you took whole caption during training or you took first word and video feat then predicted next word so on...?

Message type "tensorflow.MetaGraphDef" has no field named "version"..

It reported errors when I run the test_videocap.py code. Please help me. IOError: Cannot parse file S2VT_Dyn_10_0.0001_300_46000.ckpt.meta: 1:1 : Message type "tensorflow.MetaGraphDef" has no field named "version"..

Caffe version problem

Hello,
I am facing problem while executing the code due to the caffe version mismatch. Can you please tell me what version of caffe you have used to implement the feature extraction code?

What version of tensorflow and version of Caffe were tested?

To avoid the conflict between the saver and restorer version of tensorflow

MSVD Dataset

Is there a way to download all the video files of a MSVD dataset directly from some url.

The download link redirects to Microsoft store.

Captioning Result

Hi, Can I get your captioning result?

Message type "tensorflow.MetaGraphDef" has no field named "version"..

Channel Reversal for VGG

Hi, Looking at the feature extraction part, I cannot see if you have reversed the channels to BGR from the RGB frames. Have you trained the VGG network on RGB? Please confirm.

hi,i need some questions.can you help me?

drop_out problem

Traceback (most recent call last):
dropout_prob: np.float32(0.5),
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

Can i know how to resolve the above error , the above error when we are running program in python3

If we are using python2 we are getting a version conflict of imageio stating that
imageio requires python >= 3.5 but the running python is 2.7

curr_frames = curr_frames[idx,:,:,:] IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I am trying to execute Extract_Feats.py and hitting the following error.

curr_frames = curr_frames[idx,:,:,:]
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

How do I proceed?

I am using Py 3.6. And it seeems to be necessary to convert the map object to list manually to proceed.

How to use Vid2_Url.txt

What is the script to run to download the video?