Giter VIP home page Giter VIP logo

meta-module-network's Introduction

Meta-Module-Network

Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"

Data Downloading

Download all the question files and scene graph files and bottom-up features from the web server, it can take up to 300G disk space.

  bash get_data.sh

This script will download questions/ folder, and the "trainval_all_programs.json" is used for bootstrapping and "trainval_unbiased_programs.json" is used for finetunning in the paper. The "trainval_unbiased_programs.json" and "testdev_pred_programs.json" are both generated by the program generator model.

Meta Module Network Implementation

To understand more detailed implementation of MMN, please refer to README.

Description of different files

  • sceneGraphs/trainval_bounding_box.json: the scene graph provided by the original GQA dataset
      {
        imageId:
        {
          bouding_box_id:
          {
            x: number,
            y: number,
            w: number,
            h: number,
            relations: [{object: "bounding_box_id", name: "relation_name"} ... ],
            name: object_class,
            attributes: [attr1, attr2, ... ]
          },
          bouding_box_id:
          {
            ...
          },
        }
      }
    
  • questions: the questions-program pairs and their associated images.
    [
      [
        "ImageId",
        "Question",
        "Programs": [f1, f2, ..., fn],
        "QuestionId",
        "Answer"
      ]
    ]
    

Data Preprocessing [Optional]:

If you want to know how the programs and training data are generated, please follow the following steps:

Preprocessing Question-Program Pairs:

Download the questions from the original GQA website and then put it in the parent folder '../gqa-questions/', the following steps are aimed to convert "questions" into program format as follows:

  1. preprocess the trainval_all_question into trainval_all_programs.json
  python preprocess.py trainval_all
  1. preprocess the "balanced" programs into different forms:
  python preprocess.py create_balanced_programs
  1. create the programs into the "input" forms for trainval_all_programs.json:
  python preprocess.py create_all_inputs
  1. create the programs into the "input" forms for *balanced.json:
  python preprocess.py create_inputs

Using NL2Program Model to Predict Test-Dev Programs from input questions:

  1. Train the sequence-2-sequence model:
  python generate_program.py --do_preprocess
  1. Evaluate the NL2Program
  python generate_program.py --do_testdev
  1. Prepare the generated programs for the modular transformer
  python generate_program.py --do_trainval_unbiased

Meta Module Network Training and Evaluation

  • Prepare the inputs for the modular transformer:
      python preprocess.py create_pred_inputs
    
  • Start the bootstrap training of the modular transoformer or you can download the pre-trained models directly from Google Drive. This bootstrap process could take quite a long time, please be patient if you are training on your own:
     python run_experiments.py --do_train_all --model TreeSparsePostv2 --id TreeSparsePost2Full --stacking 2 --batch_size 1024
    
  • Start the finetunning on the balanced split:
      python run_experiments.py --do_finetune --id FinetuneTreeSparseStack2RemovalFullValSeed6999 --model TreeSparsePostv2 --load_from models/TreeSparsePost2Full --seed 6999 --stacking 2
    
  • Test the model on the testdev split:
      python run_experiments.py --do_testdev_pred --id FinetuneTreeSparseStack2RemovalValSeed6777 --load_from [MODEL_NAME]  --model TreeSparsePostv2 --stacking 2
    

Citation

If you find this paper useful, please add the following reference to your paper.

  @article{chen2019meta,
  title={Meta module network for compositional visual reasoning},
  author={Chen, Wenhu and Gan, Zhe and Li, Linjie and Cheng, Yu and Wang, William and Liu, Jingjing},
  journal={Proceedings of WACV},
  year={2021}
}

meta-module-network's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

meta-module-network's Issues

Question about coordinate projection in TreeTransformerSparsePostv2

This layer in modular.py, TreeTransformerSparsePostv2 class:
self.coordinate_proj = nn.Linear(coordinate_dim, hidden_dim)

apparently expects 6 dimensional bounding box coordinate input and the pretrained model is trained as such. See also args.additional_dim in run_experiments.py which is set to 6 by default. Could you please explain why your bboxes have 6 dimensions or if I'm misinterpreting it? (I can't download your provided data to check if/what content is in dim 5&6.)
Thanks!

Code for teacher model

Hello, tks for your sharing. I didn't find the training process for the symbolic executor. Could you please give some references or open-source the code for it? Thanks anyway.

missing en_emb.npy

Hi, thanks for your quick reply. The file en_emb.npy is also missing, which is required at generate_program.py Line 218. Can you provide this file as well?

Complete code release

Hi, do you plan to update the code?
I can't find the implementations of three core parts of the model: visual encoder, program generator and meta modules. ex) where is Networks.py referred in generatre_program.py?

GQA_hypernym missing.

Hi, in Constants.py you need to use GQA_hypernym.json in Line 84. But I did not find this file. How to resolve this?

Pretrained models

Thanks for opening the code!
I think your work on preprocessing GQA programs is very necessary.
This is a crutial step for practical visual reasoning on natural images. I decide to follow this work.
For convenience of followers, could you please provide your pretrained models (especially for the program generator)?

Failing to download gqa_features.zip continually

Hi, I'm a student in South Korea.
I want to reproduce your experiment data but there was errors continually to download gqa_features.zip.

This was a wget command in get_data.sh:
wget https://convaisharables.blob.core.windows.net/meta-module-network/gqa_visual_features.zip

I have already read closed issue #3, this happens to me the same way and the downloading was stopped by "peer connection reset".
How can I download gqa_features.zip?
Or could you tell me how to extract the features?

where does gqa_visual_features.zip come from?

Hi, I notice that you use a customized version of gqa visual feature. How does it come from?
I cannot download it because Read error at byte 38950928384/174855185539 (Connection reset by peer). Giving up.

Coarse-to-fine Parser

Hi,

Thank you for your awesome work.

From generate_programs.py I understood that you are using a normal Seq2Seq Parser. Could you let me know if you have open-sourced the Coarse2Fine Parser Implementation so that I can replicate your parsing results described in the paper on my own data?

Thanks

PS: I have downloaded the google drive which has the model files and data.

Reproducing results from scratch not working

I want to train this model with new visual features from a different object detector.

I've now trained this model from scratch using bootstrapping with all train data, as described. I'm using the program files that were provided in this repo. The only (major) difference is that I'm using different visual features (quality should not be worse than bottomup-generated features). After training a few epochs (bootstrapping followed by finetuning), it's obvious that I'm nowhere close to reaching the numbers in the paper, only reaching accuracy of ~40% on both the balanced val set and the testdev set.

Any ideas what the problem could be? Is there any non-obvious tuning done wrt your visual features?

Implement details of MMN w/o BS

image
The GQA testdev accuracy of MMN without Bootstrap is 58.4% in your paper, however, I only reached 57.7% with your provided vision features under the default setup.
My primary hyperparameters as follows:

  • model=TreeSparsePostv2
  • pre_layers=3
  • stacking=2
  • lr_default=1e-4
  • batch_size=256

Could you please tell me more details about this experiment result?

Code for CLEVR experiments?

Hi, I wanted to ask, is there any chance that the code for the preliminary CLEVR experiments mentioned in the paper could be released? Thanks for your time!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.