Giter VIP home page Giter VIP logo

vijayvee / video-captioning Goto Github PK

View Code? Open in Web Editor NEW
165.0 9.0 66.0 3.47 MB

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

License: MIT License

Python 100.00%
video-captioning tensorflow s2vt sequence-to-sequence multimodal-deep-learning seq2seq

video-captioning's Introduction

Automated Video Captioning using S2VT

Introduction

This repository contains my implementation of a video captioning system. This system takes as input a video and generates a caption describing the event in the video.

I took inspiration from Sequence to Sequence -- Video to Text, a video captioning work proposed by researchers at the University of Texas, Austin.

Requirements

For running my code and reproducing the results, the following packages need to be installed first. I have used Python 2.7 for the whole of this project.

Packages:

  • TensorFlow
  • Caffe
  • NumPy
  • cv2
  • imageio
  • scikit-image

S2VT - Architecture and working

Attached below is the architecture diagram of S2VT as given in their paper.

Arch_S2VT

The working of the system while generating a caption for a given video is represented below diagrammatically.

S2VT_Working

Running instructions

  1. Install all the packages mentioned in the 'Requirements' section for the smooth running of this project.
  2. Using Vid2Url_Full.txt, download the dataset clips from Youtube and store in <YOUTUBE_CLIPS_DIR>.
    • Example to use Vid2Url - {'vid1547': 'm1NR0uNNs5Y_104_110'}
    • YouTube video identifier - m1NR0uNNs5Y
    • Start time - 104 seconds, End time - 110 seconds
    • Download frames between 104 seconds and 110 seconds in https://www.youtube.com/watch?v=m1NR0uNNs5Y
    • Relevant frames for video id 'vid1547' have been downloaded
  3. Pass downloaded video paths and batch size (depending on hardware constraints) to extract_feats() in Extract_Feats.py to extract VGG16 features for the downloaded video clips and store in <VIDEO_DIR>.
  4. Change paths in lines 13 to 16 in utils.py to point to directories in your workspace.
  5. Run training_vidcap.py with the number of epochs as a command line argument. eg. python training_vidcap.py 10
  6. Pass saved checkpoint files from Step 5 to test_videocap.py to run trained model on the validation set.

Sample results

Attached below are a few screenshots from caption generation for videos from the validation set.

Result1

Result2

Dataset

Even though S2VT was trained on MSVD, M-VAD and MPII-MD, I have trained my system only on MSVD, which can be downloaded here.

Demo

A demo of my system can be found here

Acknowledgements

video-captioning's People

Contributors

vijayvee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

video-captioning's Issues

np.rand?

Hi,
for this part in utils:

def fetch_data_batch(batch_size):

curr_batch_vids = np.random.rand(video_files,batch_size)

I get this error:
TypeError: 'dict_keys' object cannot be interpreted as an integer

What should I do?
Any idea?
Thanks!

IOError Please help

I downloaded your model and extracted features, but it still report errors when I run the test_videocap.py code. Please help.
IOError: Cannot parse file S2VT_Dyn_10_0.0001_300_46000.ckpt.meta: 1:1 : Message type "tensorflow.MetaGraphDef" has no field named "version"..

ImportError: DLL load failed: The specified module could not be found.

After running Extract_Feat.py,
these error come, please resolve this error.

ImportError Traceback (most recent call last)
in
6 import numpy as np
7 sys.path.insert(0,'/home/vijay/deep-learning/caffe/python')
----> 8 import caffe
9 import skimage.transform
10 def extract_feats(filenames,batch_size):

~\Anaconda3\envs\caffe\lib\site-packages\caffe_init_.py in
----> 1 from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
2 from ._caffe import init_log, log, set_mode_cpu, set_mode_gpu, set_device, Layer, get_solver, layer_type_list, set_random_seed, solver_count, set_solver_count, solver_rank, set_solver_rank, set_multiprocess, has_nccl
3 from ._caffe import version
4 from .proto.caffe_pb2 import TRAIN, TEST
5 from .classifier import Classifier

~\Anaconda3\envs\caffe\lib\site-packages\caffe\pycaffe.py in
11 import numpy as np
12
---> 13 from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver,
14 RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
15 import caffe.io

ImportError: DLL load failed: The specified module could not be found.

architecture and memory of GPU used for training

Hi, I am using your code of S2VT to train my own dataset, I wanted to ask what architecture of GPU did you use for training? and also if you remember how much time it took you to train. Thank you very much.

Aborted Error

when I run the extract_feats.py code, I got the following error, please resolve this

I0906 07:10:07.085180 26398 layer_factory.hpp:77] Creating layer input
I0906 07:10:07.085196 26398 net.cpp:84] Creating Layer input
I0906 07:10:07.085204 26398 net.cpp:380] input -> data
I0906 07:10:07.085224 26398 net.cpp:122] Setting up input
I0906 07:10:07.085232 26398 net.cpp:129] Top shape: 10 3 224 224 (1505280)
I0906 07:10:07.085237 26398 net.cpp:137] Memory required for data: 6021120
I0906 07:10:07.085242 26398 layer_factory.hpp:77] Creating layer conv1_1
I0906 07:10:07.085253 26398 net.cpp:84] Creating Layer conv1_1
I0906 07:10:07.085258 26398 net.cpp:406] conv1_1 <- data
I0906 07:10:07.085263 26398 net.cpp:380] conv1_1 -> conv1_1
F0906 07:10:07.096971 26398 cudnn_conv_layer.cpp:52] Check failed: error == cudaSuccess (30 vs. 0) unknown error
*** Check failure stack trace: ***
Aborted (core dumped)

feature extraction

Hello!I would like to ask, is there no good feature file extracted in this project?

If so, why the following error occurs:
image

I would appreciate it if the blogger could give me a complete feature extraction step.

not the output in demo

Hello,
First of all, thanks for the repo. It's of great help.
I tried using it with the pretrained model ckpt provided. The outputs are gibberish and not any useful sentence. I am unable to figure out what the issue is. Please help.

computer dead when running extract_feats()

I write a main function by myself to run extract_feats(),and input required video files.But after print "VGG Network loaded",my computer becomes very slow,and no other print(My computer runs on CPU).

So I don't know if it is normal or if there's something wrong with my operation.

screenshot from 2019-01-10 11 30 29

here is my main()

if name=="main":

L = []

root = '/home/zzy/Downloads/videos'

for root, dirs, files in os.walk(root):

    for file in files:

        str = root + '/' + file

        L.append(str)

extract_feats(L,2)`

I change the code that make my computer dead to see more clearly:

       for num,frame in enumerate(vid):

        print num

        frame = skimage.transform.resize(frame,[224,224])

        if len(frame.shape)<3:

            frame = np.repeat(frame,3).reshape([224,224,3])

        curr_frames.append(frame)`

when the num become 4 thousand my computer dead.

is there any pretrained model

there is no any use of commiting project to git if u are not documenting it properly no one will understand or use it please sharethe sequence of execution

RuntimeError: The ffmpeg plugin does not work on Python 2.x

Hi !! refered to this, I'm student studying s2vt.

Using by text_files, I made it portion of video. ex) 'vid1547'

and I try to do 'extract_feats(portion_of_video.mp4, batch_size)'.
but error is RuntimeError: The ffmpeg plugin does not work on Python 2.x...
your repo spec is python 2.x, please, How to fix this problem..?

in extract_feats code

    for file in filenames:
 # maybe error is in imageio.get_reader
        vid = imageio.get_reader(file,'ffmpeg')
        curr_frames = []
        for frame in vid:
            frame = skimage.transform.resize(frame,[224,224])
            if len(frame.shape)<3:
                frame = np.repeat(frame,3).reshape([224,224,3])
            curr_frames.append(frame)
        curr_frames = np.array(curr_frames)
        print "Shape of frames: {0}".format(curr_frames.shape)
        idx = map(int,np.linspace(0,len(curr_frames)-1,80))
        curr_frames = curr_frames[idx,:,:,:]
        print "Captured 80 frames: {0}".format(curr_frames.shape)

error message is in here !

VGG Network loaded
Traceback (most recent call last):
  File "/home/ivcl/Desktop/git/video-captioning/s2vt_sample.py", line 30, in <module>
    extract_feats(video_path+'test_m.mp4',4)
  File "/home/ivcl/Desktop/git/video-captioning/Extract_Feats.py", line 34, in extract_feats
    vid = imageio.get_reader(file,'mkv')
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/core/functions.py", line 186, in get_reader
    return format.get_reader(request)
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/core/format.py", line 164, in get_reader
    return self.Reader(self, request)
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/core/format.py", line 214, in __init__
    self._open(**self.request.kwargs.copy())
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/plugins/ffmpeg.py", line 261, in _open
    self._ffmpeg_api = _get_ffmpeg_api()
  File "/home/ivcl/anaconda2/envs/ai/lib/python2.7/site-packages/imageio/plugins/ffmpeg.py", line 61, in _get_ffmpeg_api
    raise RuntimeError("The ffmpeg plugin does not work on Python 2.x")
RuntimeError: The ffmpeg plugin does not work on Python 2.x

Extract_Feats.py

@vijayvee
Hello Vijay,

I have doubt to know about path defined in "sys.path.insert(0,'/home/vijay/deep-learning/caffe/python')".

Which path is given here? Should I need to give my python path?
Also, where you call this file?

Thanks

where can I download S2VT?

hello! I met the problem like this:

 File "/home/gsy/video-captioning/utils.py", line 8, in <module>
    from VideoCap import S2VT
ImportError: No module named VideoCap

I think maybe your code doesn't include S2VT .I have found so many code,but it still missed.... Could you give me some guide? thanks!

cap no

Hey!
Nice implementation

  1. Since there are multiple captions for a single video so you have tooked all the caption for training or just selected random caption from the multiple caption in every epoc?
  2. And second one is, did you took whole caption during training or you took first word and video feat then predicted next word so on...?

Caffe version problem

Hello,
I am facing problem while executing the code due to the caffe version mismatch. Can you please tell me what version of caffe you have used to implement the feature extraction code?

MSVD Dataset

Is there a way to download all the video files of a MSVD dataset directly from some url.

Channel Reversal for VGG

Hi, Looking at the feature extraction part, I cannot see if you have reversed the channels to BGR from the RGB frames. Have you trained the VGG network on RGB? Please confirm.

drop_out problem

Traceback (most recent call last):
dropout_prob: np.float32(0.5),
np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

Can i know how to resolve the above error , the above error when we are running program in python3

If we are using python2 we are getting a version conflict of imageio stating that
imageio requires python >= 3.5 but the running python is 2.7

curr_frames = curr_frames[idx,:,:,:] IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I am trying to execute Extract_Feats.py and hitting the following error.

curr_frames = curr_frames[idx,:,:,:]
IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

How do I proceed?

I am using Py 3.6. And it seeems to be necessary to convert the map object to list manually to proceed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.