Giter VIP home page Giter VIP logo

attention-on-attention-for-vqa's Introduction

This is the code for our paper by the same name. Link in the title.

This Project was done for Stanford's CS 224N and CS 230.

Our model architecture is inspired by the winning entry of the 2017 VQA Challenge.

Which follows the VQA system described in "Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering" and "Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge".

Licence

MIT

Our Architectures

Models

Models

This Project uses code provided here

We used the preprocessing and base code provided by the above link and then performed an extensive architecture and hyperparameter search.

Results

Model Validation Accuracy Training Time
Reported Model 63.15 12 - 18 hours (Tesla K40)
Our A3x2 Model 64.78 4 hours AWS g3.8xlarge instance (2x M60)

The accuracy was calculated using the VQA evaluation metric.

About

This is part of a project done for Stanford's CS 224N and CS 230.

Implementation Details

Check out our paper for the full implemetation details and hyperparamter search. ArXiv link coming soon.

HyperParameters Search

Hyper

Dual Attention Visualization

HeatMaps

Usage

Prerequisites

Make sure you are on a machine with a NVIDIA GPU and Python 2.7+ with about 70 GB disk space.

  1. Install PyTorch with CUDA and Python 2.7.
  2. Install h5py.

Data Setup

All data should be downloaded to a data/ directory in the root directory of this repository.

The easiest way to download the data is to run the provided script tools/download.sh from the repository root. If the script does not work, it should be easy to examine the script and modify the steps outlined in it according to your needs. Then run tools/process.sh from the repository root to process the data to the correct format.

Training

Simply run python main.py to start training. The default model run is the best performing A3x2. Other model variations can be run using the models flag. The training and validation scores will be printed every epoch, and the best model will be saved under the directory "saved_models". The default flags should give you the result provided in the table above.

Pre-Trained Models

Certain Pretrained models availible upon request.

Paper

Please use the Citation found at:

http://dblp.uni-trier.de/rec/bibtex/journals/corr/abs-1803-07724

attention-on-attention-for-vqa's People

Contributors

singhjasdeep avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.