Giter VIP home page Giter VIP logo

vqa-mfb.pytorch's Introduction

Multi-modal Factorized Bilinear Pooling (MFB) for VQA

This is an unofficial and Pytorch implementation for Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering and Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering.

Figure 1: The MFB+CoAtt Network architecture for VQA.

The result of MFB-baseline and MFH-baseline can be replicated.(Not able to replicate MFH-coatt-glove result, maybe a devil hidden in detail.)

The author helped me a lot when I tried to replicate the result. Great thanks.

The official implementation is based on pycaffe is available here.

Requirements

Python 2.7, pytorch 0.2, torchvision 0.1.9, tensorboardX

Result

Datasets\Models MFB MFH MFH+CoAtt+GloVe (FRCN img features)
VQA-1.0 58.75% 59.15% 68.78%
  • MFB and MFH refer to MFB-baseline and MFH-baseline, respectively.
  • The results of MFB and MFH are trained with train sets, tested with val sets, using ResNet152 pool5 features. The result of MFH+CoAtt+GloVe is trained with train+val sets, tested with test-dev sets.

Figure 2: MFB-baseline result

Figure 3: MFH-baseline result

Training from Scratch

$ python train_*.py

  • Most of the hyper-parameters and configrations with comments are defined in the config.py file.
  • Pretrained GloVe word embedding model (the spacy library) is required to train the mfb/h-coatt-glove model. The installation instructions of spacy and GloVe model can be found here.

Citation

If you find this implementation helpful, please consider citing:

@article{yu2017mfb,
  title={Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering},
  author={Yu, Zhou and Yu, Jun and Fan, Jianping and Tao, Dacheng},
  journal={IEEE International Conference on Computer Vision (ICCV)},
  year={2017}
}

@article{yu2017beyond,
  title={Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering},
  author={Yu, Zhou and Yu, Jun and Xiang, Chenchao and Fan, Jianping and Tao, Dacheng},
  journal={arXiv preprint arXiv:1708.03619},
  year={2017}
}

vqa-mfb.pytorch's People

Contributors

asdf0982 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.