Giter VIP home page Giter VIP logo

chainer_caption_generation's Introduction

#image caption generation by chainer This codes are trying to reproduce the image captioning by google in CVPR 2015. Show and Tell: A Neural Image Caption Generator http://arxiv.org/abs/1411.4555

The training data is MSCOCO. I used GoogleNet to extract images feature in advance (preprocessed them before training), and then trained language model to generate caption.

I made pre-trained model available. The model achieves CIDEr of 0.66 for the MSCOCO validation dataset. To achieve the better score, the use of beam search is first step (not implemented yet). Also, I think the CNN has to be fine-tuned.

More information including some sample captions are in my blog post. http://t-satoshi.blogspot.com/2015/12/image-caption-generation-by-cnn-and-lstm.html

##requirement chainer 1.5 http://chainer.org and some more packages.
If you are new, I suggest you to install Anaconda and then install chainer.

##I just want to generate caption! OK, first, you need to download the models and other preprocessed files. Then you can generate caption.

bash download.sh
cd codes
python generate_caption.py -i ../images/test_image.jpg

This generate a caption for ../images/test_image.jpg. If you want to use your image, you just have to indicate -i option to image that you want to generate captions.

##I want to train the model by myself. I extracted the GoogleNet features and pickled, so you use it for training.

 cd codes
 python train_caption_model.py 
 python train_caption_model.py  -g 0 # to use gpu. change the number to gpu_id

The log and trained model will be saved to a directory (experiment1 is defalt)
If you want to change, use -d option.

 python train_caption_model.py -d ./yourdirectory

##I want to train from other data. Sorry, current implementation does not support it. You need to preprocess the data. Maybe you can read and modify the code.

##I want to fine-tune CNN part. Sorry, current implementation does not support it. Maybe you can read and modify the code.

chainer_caption_generation's People

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.