Giter VIP home page Giter VIP logo

lvapeab / interactive-keras-captioning Goto Github PK

View Code? Open in Web Editor NEW
16.0 2.0 6.0 684 KB

Interactive multimedia captioning with Keras

Home Page: http://casmacat.prhlt.upv.es/interactive-seq2seq/

Python 99.51% Shell 0.49%
keras sequence-to-sequence transformer attention-mechanism rnn lstm image-captioning video-captioning interactive-machine-learning theano tensorflow gpu attention-is-all-you-need nmt

interactive-keras-captioning's Introduction

Interactive Keras Captioning

Compatibility Requirements Status Documentation Status license

Interactive multimedia captioning with Keras (Theano and Tensorflow). Given an input image or video, we describe its content.

Documentation: https://interactive-keras-captioning.readthedocs.io

Recurrent neural network model with attention

alt text

Transformer model

alt text

Interactive captioning

Interactive-predictive pattern recognition is a collaborative human-machine framework for obtaining high-quality predictions while minimizing the human effort spent during the process.

It consists in an iterative prediction-correction process: each time the user introduces a correction to a hypothesis, the system reacts offering an alternative, considering the user feedback.

For further reading about this framework, please refer to Interactive Neural Machine Translation, Online Learning for Effort Reduction in Interactive Neural Machine Translation and Active Learning for Interactive Neural Machine Translation of Data Streams.

Features (in addition to the full Keras cosmos): .

Installation

Assuming that you have pip installed, run:

git clone https://github.com/lvapeab/interactive-keras-captioning
cd interactive-keras-captioning
pip install -r requirements.txt

for obtaining the required packages for running this library.

Requirements

Interactive Keras Captioning requires the following libraries:

For accelerating the training and decoding on CUDA GPUs, you can optionally install:

Usage

Preprocessing

The instructions for data preprocessing (image or videos) are here.

Training

  1. Set a training configuration in the config.py script. Each parameter is commented. You can also specify the parameters when calling the main.py script following the syntax Key=Value

  2. Train!:

python main.py

Decoding

Once we have our model trained, we can translate new text using the caption.py script. In short, if we want to use evaluate the test set from a the dataset MSVD with an ensemble of two models, we should run something like:

 python caption.py 
             --models trained_models/epoch_1 \ 
                      trained_models/epoch_2 \
             --dataset datasets/Dataset_MSVD.pkl \
             --splits test

Acknowledgement

This library is strongly based on NMT-Keras. Much of the library has been developed together with Marc Bolaños (web page) for other sequence-to-sequence problems.

To see other projects following the same philosophy and style of Interactive Keras Captioning, take a look to:

NMT-Keras: Neural Machine Translation.

ABiViRNet: Video description.

TMA: Egocentric captioning based on temporally-linked sequences.

VIBIKNet: Visual question answering.

Sentence SelectioNN: Sentence classification and selection.

DeepQuest: State-of-the-art models for multi-level Quality Estimation.

Warning!

There is a known issue with the Theano backend. When running main.py with this backend, it will show the following message:

[...]
raise theano.gof.InconsistencyError("Trying to reintroduce a removed node")
InconsistencyError: Trying to reintroduce a removed node

It is not a critical error, the model keeps working and it is safe to ignore it. However, if you want the message to be gone, use the Theano flag optimizer_excluding=scanOp_pushout_output.

Contact

Álvaro Peris (web page): [email protected]

interactive-keras-captioning's People

Contributors

lvapeab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.