cvnd-automaticimagecaptioning's Introduction

Automatic Image Captioning

I completed this project as part of Udacity's Computer Vision Nanodegree program.

The goal of this project is to create a neural network architecture to automatically generate captions from images. We use the Microsoft Common Objects in COntext (MS COCO) dataset to train the network, and then test the network on novel images.

Working with the COCO API

Set up the COCO API

Clone this repo: https://github.com/cocodataset/cocoapi

git clone https://github.com/cocodataset/cocoapi.git

Setup the coco API (also described in the readme here)

cd cocoapi/PythonAPI  
make  
cd ..

On my local development machine I made the cocoapi folder and the folder for this project at the same level, i.e.,:

DEV_ROOT
  |
  |- cocoapi
      |- images
      |- annotations
      |- PythonAPI
      |- (... etc.)
  |- CVND-AutomaticImageCaptioning

Note, you will need to create cocoapi\images and cocoapi\annotations. We need to reference the content of cocoapi\images and cocoapi\annotations from the notebooks in CVND-AutomaticImageCaptioning.

Downloading the MS COCO dataset

Download the following specific data from here: http://cocodataset.org/#download:

Under Annotations, download:
- 2014 Train/Val annotations [241MB] (extract captions_train2014.json and captions_val2014.json, and place at locations cocoapi/annotations/captions_train2014.json and cocoapi/annotations/captions_val2014.json, respectively)
- 2014 Testing Image info [1MB] (extract image_info_test2014.json and place at location cocoapi/annotations/image_info_test2014.json)
Under Images, download:
- 2014 Train images [83K/13GB] (extract the train2014 folder and place at location cocoapi/images/train2014/)
- 2014 Val images [41K/6GB] (extract the val2014 folder and place at location cocoapi/images/val2014/)
- 2014 Test images [41K/6GB] (extract the test2014 folder and place at location cocoapi/images/test2014/)

The project is structured as a series of Jupyter notebooks that are designed to be completed in sequential order (0_Dataset.ipynb, 1_Preliminaries.ipynb, 2_Training.ipynb, 3_Inference.ipynb).

Recommend Projects

ken-power / cvnd-automaticimagecaptioning Goto Github PK

cvnd-automaticimagecaptioning's Introduction

Automatic Image Captioning

Working with the COCO API

Set up the COCO API

Downloading the MS COCO dataset

cvnd-automaticimagecaptioning's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent