Giter VIP home page Giter VIP logo

vqa.pytorch's Introduction

Visual Question Answering in pytorch

/!\ New version of pytorch for VQA available here: https://github.com/Cadene/block.bootstrap.pytorch

This repo was made by Remi Cadene (LIP6) and Hedi Ben-Younes (LIP6-Heuritech), two PhD Students working on VQA at UPMC-LIP6 and their professors Matthieu Cord (LIP6) and Nicolas Thome (LIP6-CNAM). We developed this code in the frame of a research paper called MUTAN: Multimodal Tucker Fusion for VQA which is (as far as we know) the current state-of-the-art on the VQA 1.0 dataset.

The goal of this repo is two folds:

  • to make it easier to reproduce our results,
  • to provide an efficient and modular code base to the community for further research on other VQA datasets.

If you have any questions about our code or model, don't hesitate to contact us or to submit any issues. Pull request are welcome!

News:

  • 16th january 2018: a pretrained vqa2 model and web demo
  • 18th july 2017: VQA2, VisualGenome, FBResnet152 (for pytorch) added v2.0 commit msg
  • 16th july 2017: paper accepted at ICCV2017
  • 30th may 2017: poster accepted at CVPR2017 (VQA Workshop)

Summary:

Introduction

What is the task about?

The task is about training models in a end-to-end fashion on a multimodal dataset made of triplets:

  • an image with no other information than the raw pixels,
  • a question about visual content(s) on the associated image,
  • a short answer to the question (one or a few words).

As you can see in the illustration bellow, two different triplets (but same image) of the VQA dataset are represented. The models need to learn rich multimodal representations to be able to give the right answers.

The VQA task is still on active research. However, when it will be solved, it could be very useful to improve human-to-machine interfaces (especially for the blinds).

Quick insight about our method

The VQA community developped an approach based on four learnable components:

  • a question model which can be a LSTM, GRU, or pretrained Skipthoughts,
  • an image model which can be a pretrained VGG16 or ResNet-152,
  • a fusion scheme which can be an element-wise sum, concatenation, MCB, MLB, or Mutan,
  • optionally, an attention scheme which may have several "glimpses".

One of our claim is that the multimodal fusion between the image and the question representations is a critical component. Thus, our proposed model uses a Tucker Decomposition of the correlation Tensor to model richer multimodal interactions in order to provide proper answers. Our best model is based on :

  • a pretrained Skipthoughts for the question model,
  • features from a pretrained Resnet-152 (with images of size 3x448x448) for the image model,
  • our proposed Mutan (based on a Tucker Decomposition) for the fusion scheme,
  • an attention scheme with two "glimpses".

Installation

Requirements

First install python 3 (we don't provide support for python 2). We advise you to install python 3 and pytorch with Anaconda:

conda create --name vqa python=3
source activate vqa
conda install pytorch torchvision cuda80 -c soumith

Then clone the repo (with the --recursive flag for submodules) and install the complementary requirements:

cd $HOME
git clone --recursive https://github.com/Cadene/vqa.pytorch.git 
cd vqa.pytorch
pip install -r requirements.txt

Submodules

Our code has two external dependencies:

Data

Data will be automaticaly downloaded and preprocessed when needed. Links to data are stored in vqa/datasets/vqa.py, vqa/datasets/coco.py and vqa/datasets/vgenome.py.

Reproducing results on VQA 1.0

Features

As we first developped on Lua/Torch7, we used the features of ResNet-152 pretrained with Torch7. We ported the pretrained resnet152 trained with Torch7 in pytorch in the v2.0 release. We will provide all the extracted features soon. Meanwhile, you can download the coco features as following:

mkdir -p data/coco/extract/arch,fbresnet152torch
cd data/coco/extract/arch,fbresnet152torch
wget https://data.lip6.fr/coco/trainset.hdf5
wget https://data.lip6.fr/coco/trainset.txt
wget https://data.lip6.fr/coco/valset.hdf5
wget https://data.lip6.fr/coco/valset.txt
wget https://data.lip6.fr/coco/testset.hdf5
wget https://data.lip6.fr/coco/testset.txt

/!\ There are currently 3 versions of ResNet152:

  • fbresnet152torch which is the torch7 model,
  • fbresnet152 which is the porting of the torch7 in pytorch,
  • resnet152 which is the pretrained model from torchvision (we've got lower results with it).

Pretrained VQA models

We currently provide three models trained with our old Torch7 code and ported to Pytorch:

  • MutanNoAtt trained on the VQA 1.0 trainset,
  • MLBAtt trained on the VQA 1.0 trainvalset and VisualGenome,
  • MutanAtt trained on the VQA 1.0 trainvalset and VisualGenome.
mkdir -p logs/vqa
cd logs/vqa
wget http://webia.lip6.fr/~cadene/Downloads/vqa.pytorch/logs/vqa/mutan_noatt_train.zip 
wget http://webia.lip6.fr/~cadene/Downloads/vqa.pytorch/logs/vqa/mlb_att_trainval.zip 
wget http://webia.lip6.fr/~cadene/Downloads/vqa.pytorch/logs/vqa/mutan_att_trainval.zip 

Even if we provide results files associated to our pretrained models, you can evaluate them once again on the valset, testset and testdevset using a single command:

python train.py -e --path_opt options/vqa/mutan_noatt_train.yaml --resume ckpt
python train.py -e --path_opt options/vqa/mlb_noatt_trainval.yaml --resume ckpt
python train.py -e --path_opt options/vqa/mutan_att_trainval.yaml --resume ckpt

To obtain test and testdev results on VQA 1.0, you will need to zip your result json file (name it as results.zip) and to submit it on the evaluation server.

Reproducing results on VQA 2.0

Features 2.0

You must download the coco dataset (and visual genome if needed) and then extract the features with a convolutional neural network.

Pretrained VQA models 2.0

We currently provide three models trained with our current pytorch code on VQA 2.0

  • MutanAtt trained on the trainset with the fbresnet152 features,
  • MutanAtt trained on thetrainvalset with the fbresnet152 features.
cd $VQAPYTORCH
mkdir -p logs/vqa2
cd logs/vqa2
wget http://data.lip6.fr/cadene/vqa.pytorch/vqa2/mutan_att_train.zip 
wget http://data.lip6.fr/cadene/vqa.pytorch/vqa2/mutan_att_trainval.zip 

Documentation

Architecture

.
├── options        # default options dir containing yaml files
├── logs           # experiments dir containing directories of logs (one by experiment)
├── data           # datasets directories
|   ├── coco       # images and features
|   ├── vqa        # raw, interim and processed data
|   ├── vgenome    # raw, interim, processed data + images and features
|   └── ...
├── vqa            # vqa package dir
|   ├── datasets   # datasets classes & functions dir (vqa, coco, vgenome, images, features, etc.)
|   ├── external   # submodules dir (VQA, skip-thoughts.torch, pretrained-models.pytorch)
|   ├── lib        # misc classes & func dir (engine, logger, dataloader, etc.)
|   └── models     # models classes & func dir (att, fusion, notatt, seq2vec, convnets)
|
├── train.py       # train & eval models
├── eval_res.py    # eval results files with OpenEnded metric
├── extract.py     # extract features from coco with CNNs
└── visu.py        # visualize logs and monitor training

Options

There are three kind of options:

  • options from the yaml options files stored in the options directory which are used as default (path to directory, logs, model, features, etc.)
  • options from the ArgumentParser in the train.py file which are set to None and can overwrite default options (learning rate, batch size, etc.)
  • options from the ArgumentParser in the train.py file which are set to default values (print frequency, number of threads, resume model, evaluate model, etc.)

You can easly add new options in your custom yaml file if needed. Also, if you want to grid search a parameter, you can add an ArgumentParser option and modify the dictionnary in train.py:L80.

Datasets

We currently provide four datasets:

  • COCOImages currently used to extract features, it comes with three datasets: trainset, valset and testset
  • VisualGenomeImages currently used to extract features, it comes with one split: trainset
  • VQA 1.0 comes with four datasets: trainset, valset, testset (including test-std and test-dev) and "trainvalset" (concatenation of trainset and valset)
  • VQA 2.0 same but twice bigger (however same images than VQA 1.0)

We plan to add:

Models

We currently provide four models:

  • MLBNoAtt: a strong baseline (BayesianGRU + Element-wise product)
  • MLBAtt: the previous state-of-the-art which adds an attention strategy
  • MutanNoAtt: our proof of concept (BayesianGRU + Mutan Fusion)
  • MutanAtt: the current state-of-the-art

We plan to add several other strategies in the futur.

Quick examples

Extract features from COCO

The needed images will be automaticaly downloaded to dir_data and the features will be extracted with a resnet152 by default.

There are three options for mode :

  • att: features will be of size 2048x14x14,
  • noatt: features will be of size 2048,
  • both: default option.

Beware, you will need some space on your SSD:

  • 32GB for the images,
  • 125GB for the train features,
  • 123GB for the test features,
  • 61GB for the val features.
python extract.py -h
python extract.py --dir_data data/coco --data_split train
python extract.py --dir_data data/coco --data_split val
python extract.py --dir_data data/coco --data_split test

Note: By default our code will share computations over all available GPUs. If you want to select only one or a few, use the following prefix:

CUDA_VISIBLE_DEVICES=0 python extract.py
CUDA_VISIBLE_DEVICES=1,2 python extract.py

Extract features from VisualGenome

Same here, but only train is available:

python extract.py --dataset vgenome --dir_data data/vgenome --data_split train

Train models on VQA 1.0

Display help message, selected options and run default. The needed data will be automaticaly downloaded and processed using the options in options/vqa/default.yaml.

python train.py -h
python train.py --help_opt
python train.py

Run a MutanNoAtt model with default options.

python train.py --path_opt options/vqa/mutan_noatt_train.yaml --dir_logs logs/vqa/mutan_noatt_train

Run a MutanAtt model on the trainset and evaluate on the valset after each epoch.

python train.py --vqa_trainsplit train --path_opt options/vqa/mutan_att_trainval.yaml 

Run a MutanAtt model on the trainset and valset (by default) and run throw the testset after each epoch (produce a results file that you can submit to the evaluation server).

python train.py --vqa_trainsplit trainval --path_opt options/vqa/mutan_att_trainval.yaml

Train models on VQA 2.0

See options of vqa2/mutan_att_trainval:

python train.py --path_opt options/vqa2/mutan_att_trainval.yaml

Train models on VQA (1.0 or 2.0) + VisualGenome

See options of vqa2/mutan_att_trainval_vg:

python train.py --path_opt options/vqa2/mutan_att_trainval_vg.yaml

Monitor training

Create a visualization of an experiment using plotly to monitor the training, just like the picture bellow (click the image to access the html/js file):

Note that you have to wait until the first open ended accuracy has finished processing and then the html file will be created and will pop out on your default browser. The html will be refreshed every 60 seconds. However, you will currently need to press F5 on your browser to see the change.

python visu.py --dir_logs logs/vqa/mutan_noatt

Create a visualization of multiple experiments to compare them or monitor them like the picture bellow (click the image to access the html/js file):

python visu.py --dir_logs logs/vqa/mutan_noatt,logs/vqa/mutan_att

Restart training

Restart the model from the last checkpoint.

python train.py --path_opt options/vqa/mutan_noatt.yaml --dir_logs logs/vqa/mutan_noatt --resume ckpt

Restart the model from the best checkpoint.

python train.py --path_opt options/vqa/mutan_noatt.yaml --dir_logs logs/vqa/mutan_noatt --resume best

Evaluate models on VQA

Evaluate the model from the best checkpoint. If your model has been trained on the training set only (vqa_trainsplit=train), the model will be evaluate on the valset and will run throw the testset. If it was trained on the trainset + valset (vqa_trainsplit=trainval), it will not be evaluate on the valset.

python train.py --vqa_trainsplit train --path_opt options/vqa/mutan_att.yaml --dir_logs logs/vqa/mutan_att --resume best -e

Web demo

You must set your local ip address and port in demo_server.py line 169 and your global ip address and port in demo_web/js/custom.js line 51. The port associated to the global ip address must redirect to your local ip address.

Launch your API:

CUDA_VISIBLE_DEVICES=0 python demo_server.py

Open demo_web/index.html on your browser to access the API with a human interface.

Citation

Please cite the arXiv paper if you use Mutan in your work:

@article{benyounescadene2017mutan,
  author = {Hedi Ben-Younes and 
    R{\'{e}}mi Cad{\`{e}}ne and
    Nicolas Thome and
    Matthieu Cord},
  title = {MUTAN: Multimodal Tucker Fusion for Visual Question Answering},
  journal = {ICCV},
  year = {2017},
  url = {http://arxiv.org/abs/1705.06676}
}

Acknowledgment

Special thanks to the authors of MLB for providing some Torch7 code, MCB for providing some Caffe code, and our professors and friends from LIP6 for the perfect working atmosphere.

vqa.pytorch's People

Contributors

backpropper avatar cadene avatar jancen0 avatar justinshenk avatar ogrisel avatar ssnl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vqa.pytorch's Issues

Extracting features from Visual Genome gives error

Hello,

When extracting features from vgenome I get the following error.

OSError: cannot identify image file 'data/vgenome/raw/images/2335991.jpg'

It seems like this is a corrupt image and there are quite a few in the vgenome dataset. Are these invalid images handled manually or there is a built-in support.

Thanks in advance.

Hi, Cadene, why don't you update the feature extractor when training?

Hi, Cadene,

Thank you very much for your code, it's very helpful. But it seems that you use resnet as a seperate feature extractor instead of updating it when training the VQA model. Could I ask you the reason of that? Intuitively, it might give better results to train the feature extractor and the VQA model together, since the size of training data is not small.

Thank you in advance.

Batch first in LSTM

In the LSTM documentation is specified that we should feed to the RNN inputs of shape (seq_len, batch, input_size) however it seems to me that we are feeding inputs with shape (batch, seq_len, input_size).

Therefore I believe that the parameter batch_first should be set True.

Slow data loading?

First of all, thank you for open-sourcing your code! It is very useful for my research.

I notice that data loading is quite slow, i.e., out of the total time for a batch (10s, 13s, 5s), data-loading takes about 80% of the time (8s, 11s, 4s) for most batches. Consequently I can only get about 4 epochs in per a day. I use 4 workers and the default options in the repository. Do you have any recommendations on how to speed that part up?

Thanks,
David

How to process the multiple choice answer

Hi,
I am confused that how to use the multiple choice answer in the multiple-choice task when training and evaluate the model?

Can we process the multiple choice answer the same as the open-ended task?

Thanks.

iteration over a 0-d tensor when trying to run the demo

When trying to run the demo, model(visual, question) fails, yielding:

...
~/phd/vqa.pytorch/vqa/external/skip-thoughts.torch/pytorch/skipthoughts.py in _process_lengths(self, input)
    132     def _process_lengths(self, input):
    133         max_length = input.size(1)
--> 134         lengths = list(max_length - input.data.eq(0).sum(1).squeeze())
    135         return lengths
    136 

~/python3_env/lib/python3.5/site-packages/torch/tensor.py in __iter__(self)
    358         # map will interleave them.)
    359         if self.dim() == 0:
--> 360             raise TypeError('iteration over a 0-d tensor')
    361         return iter(imap(lambda i: self[i], range(self.size(0))))
    362 

TypeError: iteration over a 0-d tensor

If I change line 134 to:
lengths = [max_length - input.data.eq(0).sum(1).squeeze())]
I don't get the error anymore, but I only get nonsensical results with the pretrained model, for different images, I don't know if this is related or not.

Using torch 0.4.

LSTM avaliable?

Can I directly use the LSTM or TwoLstm in the part of seq2vec?
When I use these two ways, error say: RuntimeError: size mismatch at /pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:247

What should be the size of the input_question in engine.py ?

Hi, thank you so much for providing your code!
I want to check the output shapes at every stage of the model. I plan to do that by passing in some random tensors of the specific shape required model as I really cannot download and extract the entire VQA dataset just for testing.

Can anyone please let me know the shape of input_question in the engine.py file here (with and without attention) before passing it as input to the model and how to create a random tensor of that specific shape. I tried using

word_to_ix = {"hello": 0, "world": 1}
lookup_tensor = torch.tensor([word_to_ix["world"]], dtype=torch.long).cuda()

input_img = torch.randn(1,2048,14,14)
input_img = x1.long()
input_img = Variable(torch.LongTensor(input_img)).cuda()

out = model(input_img,lookup_tensor)

But I get this error

ensor([1], device='cuda:0')
Traceback (most recent call last):
  File "test.py", line 83, in <module>
    out = model(x1,lookup_tensor)
  File "/home/sarvani/anaconda3/envs/MMTOD_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sarvani/Desktop/SaiCharan/misc/vqa.pytorch/vqa/models/att.py", line 160, in forward
    x_q_vec = self.seq2vec(input_q)
  File "/home/sarvani/anaconda3/envs/MMTOD_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/sarvani/Desktop/SaiCharan/misc/vqa.pytorch/vqa/models/seq2vec.py", line 62, in forward
    lengths = process_lengths(input)
  File "/home/sarvani/Desktop/SaiCharan/misc/vqa.pytorch/vqa/models/seq2vec.py", line 12, in process_lengths
    max_length = input.size(1)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Any help would be appreciated. Thanks!

loss decreasing is very slow

I try to use a single lstm and a classifier to train a question-only model, but the loss decreasing is very slow and the val acc1 is under 30 even through 40 epochs

FileNotFoundError: [Errno 2] No such file or directory: 'data/vqa2/raw/annotations/mscoco_train2014_annotations.json'

After installing the dependencies and starting the server, a script is trying to download resources resulting in a 404.

--2019-12-11 20:14:09-- http://visualqa.org/data/mscoco/vqa/v2_Questions_Train_mscoco.zip
Auflösen des Hostnamens visualqa.org (visualqa.org)… 185.199.109.153, 185.199.110.153, 185.199.108.153, ...
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.109.153|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 301 Moved Permanently
Platz: https://visualqa.org/data/mscoco/vqa/v2_Questions_Train_mscoco.zip [folgend]
--2019-12-11 20:14:09-- https://visualqa.org/data/mscoco/vqa/v2_Questions_Train_mscoco.zip
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.109.153|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 404 Not Found
2019-12-11 20:14:09 FEHLER 404: Not Found.

--2019-12-11 20:14:09-- http://visualqa.org/data/mscoco/vqa/v2_Questions_Val_mscoco.zip
Auflösen des Hostnamens visualqa.org (visualqa.org)… 185.199.110.153, 185.199.108.153, 185.199.111.153, ...
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.110.153|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 301 Moved Permanently
Platz: https://visualqa.org/data/mscoco/vqa/v2_Questions_Val_mscoco.zip [folgend]
--2019-12-11 20:14:09-- https://visualqa.org/data/mscoco/vqa/v2_Questions_Val_mscoco.zip
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.110.153|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 404 Not Found
2019-12-11 20:14:10 FEHLER 404: Not Found.

--2019-12-11 20:14:10-- http://visualqa.org/data/mscoco/vqa/v2_Questions_Test_mscoco.zip
Auflösen des Hostnamens visualqa.org (visualqa.org)… 185.199.108.153, 185.199.111.153, 185.199.109.153, ...
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.108.153|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 301 Moved Permanently
Platz: https://visualqa.org/data/mscoco/vqa/v2_Questions_Test_mscoco.zip [folgend]
--2019-12-11 20:14:10-- https://visualqa.org/data/mscoco/vqa/v2_Questions_Test_mscoco.zip
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.108.153|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 404 Not Found
2019-12-11 20:14:10 FEHLER 404: Not Found.

--2019-12-11 20:14:10-- http://visualqa.org/data/mscoco/vqa/v2_Annotations_Train_mscoco.zip
Auflösen des Hostnamens visualqa.org (visualqa.org)… 185.199.111.153, 185.199.109.153, 185.199.110.153, ...
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.111.153|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 301 Moved Permanently
Platz: https://visualqa.org/data/mscoco/vqa/v2_Annotations_Train_mscoco.zip [folgend]
--2019-12-11 20:14:10-- https://visualqa.org/data/mscoco/vqa/v2_Annotations_Train_mscoco.zip
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.111.153|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 404 Not Found
2019-12-11 20:14:10 FEHLER 404: Not Found.

--2019-12-11 20:14:10-- http://visualqa.org/data/mscoco/vqa/v2_Annotations_Val_mscoco.zip
Auflösen des Hostnamens visualqa.org (visualqa.org)… 185.199.109.153, 185.199.110.153, 185.199.108.153, ...
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.109.153|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 301 Moved Permanently
Platz: https://visualqa.org/data/mscoco/vqa/v2_Annotations_Val_mscoco.zip [folgend]
--2019-12-11 20:14:11-- https://visualqa.org/data/mscoco/vqa/v2_Annotations_Val_mscoco.zip
Verbindungsaufbau zu visualqa.org (visualqa.org)|185.199.109.153|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 404 Not Found
2019-12-11 20:14:11 FEHLER 404: Not Found.

unzip: cannot find or open data/vqa2/raw/zip/v2_Questions_Train_mscoco.zip, data/vqa2/raw/zip/v2_Questions_Train_mscoco.zip.zip or data/vqa2/raw/zip/v2_Questions_Train_mscoco.zip.ZIP.
unzip: cannot find or open data/vqa2/raw/zip/v2_Questions_Val_mscoco.zip, data/vqa2/raw/zip/v2_Questions_Val_mscoco.zip.zip or data/vqa2/raw/zip/v2_Questions_Val_mscoco.zip.ZIP.
unzip: cannot find or open data/vqa2/raw/zip/v2_Questions_Test_mscoco.zip, data/vqa2/raw/zip/v2_Questions_Test_mscoco.zip.zip or data/vqa2/raw/zip/v2_Questions_Test_mscoco.zip.ZIP.
unzip: cannot find or open data/vqa2/raw/zip/v2_Annotations_Train_mscoco.zip, data/vqa2/raw/zip/v2_Annotations_Train_mscoco.zip.zip or data/vqa2/raw/zip/v2_Annotations_Train_mscoco.zip.ZIP.
unzip: cannot find or open data/vqa2/raw/zip/v2_Annotations_Val_mscoco.zip, data/vqa2/raw/zip/v2_Annotations_Val_mscoco.zip.zip or data/vqa2/raw/zip/v2_Annotations_Val_mscoco.zip.ZIP.
mv: rename data/vqa2/raw/annotations/v2_mscoco_train2014_annotations.json to data/vqa2/raw/annotations/mscoco_train2014_annotations.json: No such file or directory
mv: rename data/vqa2/raw/annotations/v2_mscoco_val2014_annotations.json to data/vqa2/raw/annotations/mscoco_val2014_annotations.json: No such file or directory
mv: rename data/vqa2/raw/annotations/v2_OpenEnded_mscoco_train2014_questions.json to data/vqa2/raw/annotations/OpenEnded_mscoco_train2014_questions.json: No such file or directory
mv: rename data/vqa2/raw/annotations/v2_OpenEnded_mscoco_val2014_questions.json to data/vqa2/raw/annotations/OpenEnded_mscoco_val2014_questions.json: No such file or directory
mv: rename data/vqa2/raw/annotations/v2_OpenEnded_mscoco_test2015_questions.json to data/vqa2/raw/annotations/OpenEnded_mscoco_test2015_questions.json: No such file or directory
mv: rename data/vqa2/raw/annotations/v2_OpenEnded_mscoco_test-dev2015_questions.json to data/vqa2/raw/annotations/OpenEnded_mscoco_test-dev2015_questions.json: No such file or directory
Loading annotations and questions...
Traceback (most recent call last):
File "demo_server.py", line 174, in
main()
File "demo_server.py", line 144, in main
options['vqa'])
File "/Users/xxxxxx/vqa.pytorch/vqa/datasets/vqa.py", line 258, in factory
dataset_vqa = VQA2(data_split, opt, dataset_img)
File "/Users/xxxxxx/vqa.pytorch/vqa/datasets/vqa.py", line 148, in init
super(VQA2, self).init(data_split, opt, dataset_img)
File "/Users/xxxxxx/vqa.pytorch/vqa/datasets/vqa.py", line 21, in init
super(AbstractVQA, self).init(data_split, opt, dataset_img)
File "/Users/xxxxxx/vqa.pytorch/vqa/datasets/utils.py", line 20, in init
self._interim()
File "/Users/xxxxxx/vqa.pytorch/vqa/datasets/vqa.py", line 179, in _interim
vqa2_interim(self.opt['dir'])
File "/Users/xxxxxx/vqa.pytorch/vqa/datasets/vqa2_interim.py", line 52, in vqa_interim
annotations_train = json.load(open(os.path.join(dir_vqa, 'raw', 'annotations', 'mscoco_train2014_annotations.json'), 'r'))
FileNotFoundError: [Errno 2] No such file or directory: 'data/vqa2/raw/annotations/mscoco_train2014_annotations.json'

I would be very happy if you could take a look on this issue, since I'm very eager to try out your pre-trained models, thanks in advance!

extract.py gives error for COCOImages

extract.py calls self._raw() (

). But since self.split_name is not defined until the parent constructor is complete, it gives an Attribute error.
Some suggestions - you can either pass the split_name to the _raw function or define it before calling the parent constructor in COCOImages.

Results on test split

Hi, I have finished training and selected the model with best performance on val set. How can I get the results on test split which can be submitted into the VQA challenge website to get the test-dev results?

training MUTAN+Att using pytorch code achieve low accuracy

Hi, thank you so much for your code.
Right now, I am trying to replicate your ICCV results with the pytorch implementation.
Here is the setting
'batch_size': None,
'dir_logs': None,
'epochs': None,
'evaluate': False,
'help_opt': False,
'learning_rate': None,
'path_opt': 'options/vqa/mutan_att_trainval.yaml',
'print_freq': 10,
'resume': '',
'save_all_from': None,
'save_model': True,
'st_dropout': None,
'st_fixed_emb': None,
'st_type': None,
'start_epoch': 0,
'vqa_trainsplit': 'train',
'workers': 16}

options

{'coco': {'arch': 'fbresnet152torch', 'dir': 'data/coco', 'mode': 'att'},
'logs': {'dir_logs': 'logs/vqa/mutan_att_trainval'},
'model': {'arch': 'MutanAtt',
'attention': {'R': 5,
'activation_q': 'tanh',
'activation_v': 'tanh',
'dim_hq': 310,
'dim_hv': 310,
'dim_mm': 510,
'dropout_hv': 0,
'dropout_mm': 0.5,
'dropout_q': 0.5,
'dropout_v': 0.5,
'nb_glimpses': 2},
'classif': {'dropout': 0.5},
'dim_q': 2400,
'dim_v': 2048,
'fusion': {'R': 5,
'activation_q': 'tanh',
'activation_v': 'tanh',
'dim_hq': 310,
'dim_hv': 620,
'dim_mm': 510,
'dropout_hq': 0,
'dropout_hv': 0,
'dropout_q': 0.5,
'dropout_v': 0.5},
'seq2vec': {'arch': 'skipthoughts',
'dir_st': 'data/skip-thoughts',
'dropout': 0.25,
'fixed_emb': False,
'type': 'BayesianUniSkip'}},
'optim': {'batch_size': 128, 'epochs': 100, 'lr': 0.0001},
'vqa': {'dataset': 'VQA',
'dir': 'data/vqa',
'maxlength': 26,
'minwcount': 0,
'nans': 2000,
'nlp': 'mcb',
'pad': 'right',
'samplingans': True,
'trainsplit': 'train'}}
Warning: 399/930911 words are not in dictionary, thus set UNK
Warning fusion.py: no visual embedding before fusion
Warning fusion.py: no question embedding before fusion
Warning fusion.py: no visual embedding before fusion
Warning fusion.py: no question embedding before fusion
Model has 37840812 parameters

Here is the result after 100 epoch
Epoch: [99][1740/1760] Time 0.403 (0.412) Data 0.000 (0.007) Loss 0.8993 (0.9064) Acc@1 71.094 (73.912) Acc@5 94.531 (94.830)
Epoch: [99][1750/1760] Time 0.387 (0.412) Data 0.000 (0.007) Loss 0.8277 (0.9061) Acc@1 71.875 (73.915) Acc@5 95.312 (94.833)
Val: [900/950] Time 0.138 (0.188) Loss 3.1201 (2.8397) Acc@1 49.219 (52.236) Acc@5 75.000 (78.115)
Val: [910/950] Time 0.189 (0.187) Loss 2.4805 (2.8372) Acc@1 58.594 (52.240) Acc@5 80.469 (78.139)
Val: [920/950] Time 0.210 (0.187) Loss 2.8639 (2.8388) Acc@1 53.125 (52.226) Acc@5 77.344 (78.137)
Val: [930/950] Time 0.179 (0.187) Loss 2.1427 (2.8388) Acc@1 59.375 (52.227) Acc@5 82.031 (78.137)
Val: [940/950] Time 0.151 (0.187) Loss 3.1772 (2.8367) Acc@1 50.781 (52.263) Acc@5 72.656 (78.163)

  • Acc@1 52.266 Acc@5 52.266

No learning rate decay?

Hi,

I saw you commented out the adjust_learning_rate function. Does this mean lr decay isn't essential to the performance of VQA training?

Cheers,
Jake

Extracting image features from Visual Genome fails with multi-gpu

Hello,

I'm trying to use multiple gpus to speed up data extraction. I'm getting this error with the following command:
CUDA_VISIBLE_DEVICES=0,2 python extract.py --dataset vgenome --dir_data data/vgenome --data_split train --mode att

Warning: shape_att=(108249, 2048, 14, 14)
Traceback (most recent call last):
  File "extract.py", line 157, in <module>
    main()
  File "extract.py", line 87, in main
    extract(data_loader, model, path_file, args.mode)
  File "extract.py", line 121, in extract
    output_att = model(input_var)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 46, in parallel_apply
    raise output
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 25, in _worker
    output = module(*input, **kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aosman/vqa/vqa.pytorch/vqa/models/convnets.py", line 62, in <lambda>
    model.forward = lambda x: forward_resnet(convnet, x)
  File "/home/aosman/vqa/vqa.pytorch/vqa/models/convnets.py", line 26, in forward_resnet
    x = self.conv1(x)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 237, in forward
    self.padding, self.dilation, self.groups)
  File "/home/aosman/miniconda2/envs/vqa/lib/python3.6/site-packages/torch/nn/functional.py", line 40, in conv2d
    return f(input, weight, bias)
RuntimeError: tensors are on different GPUs

Using CUDA_VISIBLE_DEVICES with only 1 gpu works fine. I've looked in to the code and it seems you guys had multi-gpu extraction in mind. Do you have a clue why this happens?

Training one model on multiple GPUs simultaneously not working (possible deadlock?)

Hello again,

I was training MUTAN_att on VQA2+VG and I tried to run another process to train a second model (modified MUTAN) and both processes now seem to be stuck. I checked iotop and it seems disk reads also stopped. I verified that the modified MUTAN works when trained alone.

I suspect that the dataloader process freaks out when another training process is launched. Is this possible? I assumed that the new training process would generate its own dataloader processes.

BONUS: I can't seem to kill all these processes in a graceful manner, CTRL-C doesn't work. Only kill -9 PID works but that seems to create zombie processes!

Any help is appreciated!

Feature extraction: CUDA out of memory

Hi,

I am trying to extract features for coco dir - however it always gives me RuntimeError: CUDA out of memory.
I have tried using single GPU by setting CUDA_VISIBLE_DEVICES=0- still ti gives me the same Runtime error even when the GPU is free.

What should I do?

Regards,
Vedika

why do i get this error on the demo server?

Uncaught TypeError: Cannot read property 'ans' of undefined
complete
i
fireWith
y
ajax
formBasic
(anonymous function)
dispatch
r.handle

i entered a new puppy picture and the question was 'what is this?'

Information regarding training time

Can you please provide some information regarding the training time (with the hardware specifications) and the size of the dataset used?

Thanks!

Why don't you apply a softmax function before the final prediction?

HI, thank you for great work, I have a little question. As a classification task,we usually apply a softmax function to convert the output of a model into a probabilistic vector, each entry of which represents the probability of the input that belonging to the corresponding category. However, it seems that in your code the output of the Mutan model (the output of the second multimodel fusion followed by only a linear transformation without a softmax) is directly fed into the loss function. Is there any special consideration?

x = self.linear_classif(x)

Left padding when encoding questions

In vqa_processed.py, encode_question fails (index out of range) when the padding desired is 'left' and the question length is larger than the maximum length.

A possible fix is to replace the else branch :

else:   #['pad'] == 'left'
                    new_k = k + maxlength - len(ex['question_words_UNK'])
                    ex['question_wids'][new_k] = word_to_wid[w]
                ex['seq_length'] = len(ex['question_words_UNK'])

With

else:   # ['pad'] == 'left'
    if maxlength < len(ex['question_words_UNK']):
        ex['question_wids'][k] = word_to_id[w]
    else:
        new_k = k + maxlength - len(ex['question_words_UNK'])
        ex['question_wids'][new_k] = word_to_id[w]

Hi Cadene, is the Tucker decomposition part in fusion.py?

Hi Cadene,

I think the decomposition part is in fusion.py. But i have get them all clear yet. Would mind helping me out? Because I want to code a Tensorflow version based on your pytorch code. And I also find that the BayesianUniSkip(skipthoughts) part is a challenge part for me, because i also didn't find the Tensorflow version. Thanks for you kind help.

How to train on VQA 2?

I trained on VQA 1 successfully. When I switch to VQA 2, I run into the error below 👍

Traceback (most recent call last):
File "train.py", line 373, in
main()
File "train.py", line 120, in main
options['vgenome'])
File "/home/pgao/vqa_real_train/vqa.pytorch/vqa/datasets/vqa.py", line 253, in factory
dataset_img = coco.factory(data_split, opt_coco)
File "/home/pgao/vqa_real_train/vqa.pytorch/vqa/datasets/coco.py", line 93, in factory
trainset = factory('train', opt, transform)
File "/home/pgao/vqa_real_train/vqa.pytorch/vqa/datasets/coco.py", line 102, in factory
return FeaturesDataset(data_split, opt)
File "/home/pgao/vqa_real_train/vqa.pytorch/vqa/datasets/features.py", line 20, in init
'File not found in {}, you must extract the features first with extract.py'.format(self.path_hdf5)

VQA 1 and VQA 2 have the same image feature, right?

sysntax error in line 80

runfile('F:/project/vqa.pytorch-master/demo_server.py', wdir='F:/project/vqa.pytorch-master')
File "F:\project\vqa.pytorch-master\demo_server.py", line 80
visual_data = visual_data.cuda(async=True)
^
SyntaxError: invalid syntax

I do not have a demo server

Warning: 565/930911 words are not in dictionary, thus set UNK
Warning fusion.py: no visual embedding before fusion
Warning fusion.py: no question embedding before fusion
Warning fusion.py: no visual embedding before fusion
Warning fusion.py: no question embedding before fusion
Warning train.py: no optim checkpoint found at 'logs/vqa/mutan_noatt_train/best_optim.pth.tar'
This is an error.

I didn't have enough hard disk capacity, so i deleted the raw data Does this matter?

'OpenEnded_mscoco_val2014_model_accuracy.json' is not created

Hi :
The 'OpenEnded_mscoco_val2014_model_accuracy.json' is needed in line 26 in 'visu.py', but I can not find the function to generate it in your 'train.py'. When the 'trainsplit = train' , it can not generete the 'OpenEnded_mscoco_val2014_model_accuracy.json' . And only 'OpenEnded_mscoco_val2014_model_result.json' was cteated. I want to know what is wrong with it?

Thanks

Reproduce the problem using fusion scheme of concatenation

Hi Cadene,

Thank you so much for sharing the codes.

I am trying to reproduce vqa problem using two basic fusion scheme element-wise sum and concatenation.

What I tried is to set
x_mm = torch.cat((x_q, x_v),0) in fusion.py for concat case compared to
x_mm = torch.mul(x_q, x_v) of MLBFusion

Am I right?

Thanks in advance.

model specifications not coherent with the MLB paper

The model configuration is not the same as described in the paper. There is a softmax layer missing at the end of the model. The paper concatenates the attention * vision features for all the glimpses and then pass it through a single linear layer. You use non-linearity both times before and after fusion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.