Giter VIP home page Giter VIP logo

img2txt's Introduction

img2txt

End-to-end deep learning model to generate a summary of the content of an image in a sentence.

Overview

Requirements

How to run img2txt

Overview

For a quick overview, please see the slides for the 5-min demo of this project.

Acknowledgement

The model architecture is based on

"Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge." Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. IEEE transactions on pattern analysis and machine intelligence (2016). http://arxiv.org/abs/1609.06647

and the following code in the TensorFlow model zoo is frequently used as a reference,

https://github.com/tensorflow/models/tree/master/im2txt

but the code is written from scratch using TensorFlow APIs, except Inception models whose codes are obtained in the current master version (325609e) of https://github.com/tensorflow/models/tree/master/slim.

Requirements

Library

img2txt is developed on the following environment.

  • Ubuntu 16.04.2 LTS
  • Python 3.6
  • NumPy
  • TensorFlow 1.2
  • Pillow
  • NLTK (NLTK data needed for tokenization; only nltk_data/tokenizers/punkt/PY3/english.pickle needed.)

And its web UI requires the following libraries.

  • Flask
  • Bokeh (for word embedding visualization)

Datasets

img2txt.dataset contains convenient wrappers for various public caption datasets including MS COCO, Flickr 8k/30k, and PASCAL. Put each downloaded dataset in a separate directory, which will be used during the training of the model.

Using pre-trained convnet models

Inception (v3, v4)

VGG16

  • Copy Keras' pretrained model ~/.keras/models/vgg16_weights_tf_dim_ordering_tf_kernels.h5 to in img2txt/pretrained/vgg16_weights.h5.

How to run img2txt

Training

Please see https://github.com/chan-y-park/img2txt/blob/master/img2txt_api_example.ipynb for a step-by-step guide.

Inference

After training the model, put saved files in img2txt/inference, more specifically

  • the checkpoint files as img2txt/inference/img2txt.*,
  • the configuration file as img2txt/inference/config.json, and
  • the vocabulary file as img2txt/inference/vocabulary.json. The run img2txt/web_app.wsgi, open a web browser, and go to http://localhost:9999 to use the web UI for inference. web_ui_screenshot

Performance

When trained on MS COCO training dataset for 500k weight updates, where each update is a training on a minibatch of 32 image-caption pairs, the model gets 25.9 BLEU-4 score and 86.4 CIDEr score, which are evaluated using 4k random selections from MS COCO validation dataset and MS COCO Caption Evaluation API (https://github.com/tylin/coco-caption).

img2txt's People

Contributors

chan-y-park avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.