Giter VIP home page Giter VIP logo

fga's Introduction

Factor Graph Attention

  • A general multimodal attention approach inspired by probabilistic graphical models.
  • Achieves a state-of-the-art performance (MRR) on visual dialog task.

This repository is the official implementation of Factor Graph Attention. (Appeared in CVPR'19)

Requirements

The model can easily run on a single GPU :)

To install requirements:

conda env create -f fga.yml

follows with:

 conda activate fga

Preprocessed data:

Add the following files under data dir:

visdial_params.json

visdial_data.h5

Pretrained features:

  • VGG A grid image features based on the VGG model pretrained on ImageNet (Faster). Note, the h5 databases has slightly different dataset keys, therfore the code needs to be adapted accordingly.
  • F-RCNN based on object detector with ResNetx101 backbone, 37 proposals, fine-tuned on Visual Genome. Achives SOTA. The file includes boxes and classes information.

Note: You can use CurlWget to easily download the features on your server.

See the original paper for performance differences. I recommand using the FRCNN features, mainly because it is finetuned on the relevant VisualGenome dataset.

Training

To train the model in the paper, run this command:

python train.py --batch-size  128 \
             --image_data "data/frcnn_features_new" \
             --test-batch-size 64 \
             --epochs 10 \
             --lr 1e-3 \
             --opt 0 \
             --folder-prefix "baseline" \
             --mode "FGA" \
             --initialization "he" \
             --lstm-initialization "he" \
             --log-interval 3000 \
             --test-after-every 1 \
             --word-embed-dim 200 \
             --hidden-ans-dim 512 \
             --hidden-hist-dim 128 \
             --hidden-cap-dim 128 \
             --hidden-ques-dim 512 \
             --seed 0

Evaluation

To evaluate on the val split, provide a path using the model-pathname arg. The path should contain a model file, best_model_mrr.pth.tar.

Call example:

python train.py --batch-size  128 \
             --image_data "data/frcnn_features_new" \
             --test-batch-size 64 \
             --epochs 10 \
             --lr 1e-3 \
             --opt 0 \
             --only_val T \
             --model-pathname "models/baseline" \
             --folder-prefix "baseline" \
             --mode "FGA" \
             --initialization "he" \
             --lstm-initialization "he" \
             --log-interval 3000 \
             --test-after-every 1 \
             --word-embed-dim 200 \
             --hidden-ans-dim 512 \
             --hidden-hist-dim 128 \
             --hidden-cap-dim 128 \
             --hidden-ques-dim 512 \
             --seed 0

If you wish to create a test submission file (can be submitted to the challenge servers @ EvalAI) replace only_val, with submission arg, i.e.:

python train.py --batch-size  128 \
             --image_data "data/frcnn_features_new" \
             --test-batch-size 64 \
             --epochs 10 \
             --lr 1e-3 \
             --opt 0 \
             --submission T \
             --model-pathname "models/baseline" \
             --folder-prefix "baseline" \
             --mode "FGA" \
             --initialization "he" \
             --lstm-initialization "he" \
             --log-interval 3000 \
             --test-after-every 1 \
             --word-embed-dim 200 \
             --hidden-ans-dim 512 \
             --hidden-hist-dim 128 \
             --hidden-cap-dim 128 \
             --hidden-ques-dim 512 \
             --seed 0

Pre-trained Models

You can download pertained models here:

Results

Evaluation is done on VisDialv1.0.

Short description:

VisDial v1.0 contains 1 dialog with 10 question-answer pairs (starting from an image caption) on ~130k images from COCO-trainval and Flickr, totalling ~1.3 million question-answer pairs.

Our model achieves the following performance on the validation set, and similar results on test-std/test-challenge.

Model name R@1 MRR
FGA 53% 66
FGA 53% 66
5×FGA 56% 69

Note, the paper results may slightly vary from the results of this repo, since it is a refactored version. For the legacy version, please contact via email

Contributing

Please cite Factor Graph Attention if you use this work in your research:

@inproceedings{schwartz2019factor,
  title={Factor graph attention},
  author={Schwartz, Idan and Yu, Seunghak and Hazan, Tamir and Schwing, Alexander G},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={2039--2048},
  year={2019}
}

fga's People

Contributors

idansc avatar jkalogero avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.