Giter VIP home page Giter VIP logo

nic-2015-pytorch's Introduction

NIC-2015-Pytorch

This project is the Pytorch implementation of Neural Image Captioning 2015 paper by Vinyals et. al.[PDF]. The implementation is inspired from the Udacity Image captioning project [Repo link]

  • Backend : Pytorch, Pytorch Vision
  • Dataset : MS COCO 2014 Dataset [Link]

Model Architecture

File Description

  • data_load.py : Dataloader class and functions for data augmentation.
  • model.py : Model class consisting of model definitions and functions.
  • vocabulary.py : Model class consisting of vocublary functions.
  • training.ipynb : Jupyter notebook with training hyperparameters like learning rate, batch size, embedding size, hidden state size etc.
  • inference.ipynb : Jupyter notebook to sample the captions generated by the encoder-decoder model.
  • vocabulary and architecture experiments.ipynb : Jupyter notebook to understand the vocabulary generation process and experiment with the CNN-RNN architecture to check whether the model.py implementation is correct or not.

Dataset setup instructions


Please follow these instructions to setup the MS COCO 2014 dataset for training. Remember, the training dataset is 13GB along with test data(6GB). Before downloading, ensure good bandwidth and enough storage(atleast 20 GB for dataset) on server.
  1. Clone this repo: https://github.com/cocodataset/cocoapi
git clone https://github.com/cocodataset/cocoapi.git  
  1. Setup the coco API (also described in the readme here)
cd cocoapi/PythonAPI  
make  
cd ..
  1. Download some specific data from here: http://cocodataset.org/#download (described below)
  • Under Annotations, download:

    • 2014 Train/Val annotations [241MB] (extract captions_train2014.json and captions_val2014.json, and place at locations cocoapi/annotations/captions_train2014.json and cocoapi/annotations/captions_val2014.json, respectively)
    • 2014 Testing Image info [1MB] (extract image_info_test2014.json and place at location cocoapi/annotations/image_info_test2014.json)
  • Under Images, download:

    • 2014 Train images [83K/13GB] (extract the train2014 folder and place at location cocoapi/images/train2014/)
    • 2014 Val images [41K/6GB] (extract the val2014 folder and place at location cocoapi/images/val2014/)
    • 2014 Test images [41K/6GB] (extract the test2014 folder and place at location cocoapi/images/test2014/)

nic-2015-pytorch's People

Contributors

pshwetank avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.