Giter VIP home page Giter VIP logo

attend2u's Introduction

Attend2u

alt tag

This project hosts the code for our CVPR 2017 paper and TPAMI 2018 paper.

  • Cesc Chunseong Park, Byeongchang Kim and Gunhee Kim. Attend to You: Personalized Image Captioning with Context Sequence Memory Networks. In CVPR, 2017. (Spotlight) [arxiv]
  • Cesc Chunseong Park, Byeongchang Kim and Gunhee Kim. Towards Personalized Image Captioning via Multimodal Memory Networks. In IEEE TPAMI, 2018. [pdf]

We address personalization issues of image captioning, which have not been discussed yet in previous research. For a query image, we aim to generate a descriptive sentence, accounting for prior knowledge such as the user's active vocabularies in previous documents. As applications of personalized image captioning, we tackle two post automation tasks: hashtag prediction and post generation, on our newly collected Instagram dataset, consisting of 1.1M posts from 6.3K users. We propose a novel captioning model named Context Sequence Memory Network (CSMN).

Reference

If you use this code or dataset as part of any published research, please refer one of the following papers.

@inproceedings{attend2u:2017:CVPR,
    author    = {Park, Cesc Chunseong and Kim, Byeongchang and Kim, Gunhee},
    title     = "{Attend to You: Personalized Image Captioning with Context Sequence Memory Networks}",
    booktitle = {CVPR},
    year      = 2017
}
@inproceedings{attend2u:2018:TPAMI,
    author    = {Park, Cesc Chunseong and Kim, Byeongchang and Kim, Gunhee},
    title     = "{Towards Personalized Image Captioning via Multimodal Memory Networks}",
    booktitle = {IEEE TPAMI},
    year      = 2018
}

Running Code

Get our code

git clone https://github.com/cesc-park/attend2u

Prerequisites

  1. Install python modules
pip install -r requirements.txt
  1. Download pre-trained resnet checkpoint
cd ${project_root}/scripts
./download_pretrained_resnet_101.sh
  1. Download our version of YFCC100M dataset

You can download our personalized image captioning split of YFCC100M dataset

Download data from the links below and save it to ${project_root}/data_yfcc.

[Download json (YFCC100M)] [Download images (YFCC100M)]

cd ${project_root}/data_yfcc
tar -xvf yfcc_json.tar.gz
tar -xvf yfcc_images.tar.gz
  1. Generate formatted dataset and extract Resnet-101 pool5 features
cd ${project_root}/scripts
./extract_yfcc_features.sh

Training

Run training script. You can train the model with multiple gpus.

python -m train --num_gpus 4 --batch_size 200 --data_dir ./data_yfcc/caption_dataset

Evaluation

Run evaluation script. You can evaluate the model with multiple gpus

python -m eval --num_gpus 2 --batch_size 500 --data_dir ./data_yfcc/caption_dataset

InstaPIC-1.1M Dataset

Temporarily not supported.

YFCC100M Dataset

YFCC100M (Yahoo Flickr Creative Commons 100 Million Dataset) consists of 100 million Flickr user-uploaded images and videos between 2004 and 2014 along with their corresponding metadata including titles, descriptions, camera types and usertags. We processed a series of filtering to make personalized image captioning split of YFCC100M. We regard the titles and descriptions as captions and usertags as hashtags.

Key statistics of personalized image captioning splitted YFCC100M dataset are outlined below. We also show average and median (in parentheses) values. The total unique posts and users in our dataset are (867,922/11,093)

Dataset #posts #users #posts/user #words/post
caption 462,036 6,197 74.6 (40) 6.30 (5)
hashtag 434,936 5,495 79.2 (49) 7.46 (6)

If you download and uncompress the dataset correctly, structure of dataset will follow the below structure.

{project_root}/data_yfcc
โ”œโ”€โ”€ json
โ”‚   โ”œโ”€โ”€ yfcc-caption-train.json
โ”‚   โ”œโ”€โ”€ yfcc-caption-test.json
โ”‚   โ”œโ”€โ”€ yfcc-hashtag-train.json
โ”‚   โ”œโ”€โ”€ yfcc-hashtag-test1.json
โ”‚   โ””โ”€โ”€ yfcc-hashtag-test2.json
โ””โ”€โ”€ images
    โ”œโ”€โ”€ {user1_id}_{post1_id}
    โ”œโ”€โ”€ {user1_id}_{post2_id}
    โ”œโ”€โ”€ {user2_id}_{post1_id}
    โ””โ”€โ”€ ...

We provide one type of test set for image captioning and two types of test set for hashtag prediction.

Examples

Here are post generation examples:

alt tag

Here are hashtag generation examples:

alt tag

Here are hashtag and post generation examples with query images and multiple predictions by different users:

alt tag

Here are (little bit wrong but) interesting post generation examples:

alt tag

Here are (little bit wrong but) interesting hashtag generation examples:

alt tag

Acknowledgement

We implement our model using tensorflow package. Thanks for tensorflow developers. :)

We also thank Instagram for their API and Instagram users for their valuable posts.

Additionally, we thank coco-caption developers for providing caption evaluation tools.

We also appreciate Juyong Kim, Yunseok Jang and Jongwook Choi for helpful comments and discussions.

We are further thankful to Hyunjae Woo for help with YFCC100M dataset preprocessing and Amelie Schmidt-Colberg for carefully correcting our English writing.

Authors

Cesc Chunseong Park, Byeongchang Kim and Gunhee Kim

Vision and Learning Lab @ Computer Science and Engineering, Seoul National University, Seoul, Korea

License

MIT license

attend2u's People

Contributors

bckim92 avatar cesc-park avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.