Giter VIP home page Giter VIP logo

abivirnet's Introduction

ABiViRNet: Attention Bidirectional Video Recurrent Net for video captioning

This repository contains the code for building a system similar to the one from the work Video Description using Bidirectional Recurrent Neural Networks, presented at the International Conference of Artificial Neural Networks (ICANN'16). With this module, you can replicate our experiments and easily deploy new models. ABiViRNet is built upon our fork of Keras framework and tested for the Theano and Tensorflow backends.

Features:

  • Attention model over the input sequence of frames
  • Peeked decoder LSTM: The previously generated word is an input of the current LSTM timestep
  • MLPs for initializing the LSTM hidden and memory state
  • Beam search decoding

Architecture

ICANN_model

Requirements

ABiViRNet requires the following libraries:

Instructions:

Assuming you have a dataset and features extracted from the video frames:

  1. Prepare data:

python data_engine/subsample_frames_features.py

python data_engine/generate_features_lists.py

python data_engine/generate_descriptions_lists.py

See data_engine/README.md for detailed information.

  1. Prepare the inputs/outputs of your model in data_engine/prepare_data.py

  2. Set a model configuration in config.py

  3. Train!:

python main.py

Citation

If you use this code for any purpose, please, do not forget to cite the following paper:

Peris, Á., Bolanos, M., Radeva, P., & Casacuberta, F. (2016, September). Video description using bidirectional recurrent neural networks. In International Conference on Artificial Neural Networks (pp. 3-11). Springer International Publishing.

Bibtex version:

@inproceedings{peris2016video,
  title={Video description using bidirectional recurrent neural networks},
  author={Peris, {\'A}lvaro and Bolanos, Marc and Radeva, Petia and Casacuberta, Francisco},
  booktitle={International Conference on Artificial Neural Networks},
  pages={3--11},
  year={2016},
  organization={Springer}
}

About

Joint collaboration between the Computer Vision at the University of Barcelona (CVUB) group at Universitat de Barcelona-CVC and the PRHLT Research Center at Universitat Politècnica de València.

Contact

Álvaro Peris (web page): [email protected]

Marc Bolaños (web page): [email protected]

abivirnet's People

Contributors

lvapeab avatar marcbs avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.