facebookresearch / vilbert-multi-task Goto Github PK

Multi Task Vision and Language

License: MIT License

Jupyter Notebook 83.32% Python 16.68%

vilbert-multi-task's Introduction

12-in-1: Multi-Task Vision and Language Representation Learning

Please cite the following if you use this code. Code and pre-trained models for 12-in-1: Multi-Task Vision and Language Representation Learning:

@InProceedings{Lu_2020_CVPR,
author = {Lu, Jiasen and Goswami, Vedanuj and Rohrbach, Marcus and Parikh, Devi and Lee, Stefan},
title = {12-in-1: Multi-Task Vision and Language Representation Learning},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

and ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks:

@inproceedings{lu2019vilbert,
  title={Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks},
  author={Lu, Jiasen and Batra, Dhruv and Parikh, Devi and Lee, Stefan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={13--23},
  year={2019}
}

Repository Setup

Create a fresh conda environment, and install all dependencies.

conda create -n vilbert-mt python=3.6
conda activate vilbert-mt
git clone --recursive https://github.com/facebookresearch/vilbert-multi-task.git
cd vilbert-multi-task
pip install -r requirements.txt

Install pytorch

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

Install apex, follows https://github.com/NVIDIA/apex
Install this codebase as a package in this environment.

python setup.py develop

Data Setup

Check README.md under data for more details.

Visiolinguistic Pre-training and Multi Task Training

Pretraining on Conceptual Captions

python train_concap.py --bert_model bert-base-uncased --config_file config/bert_base_6layer_6conect.json --train_batch_size 512 --objective 1 --file_path <path_to_extracted_cc_features>

Download link

Multi-task Training

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <pretrained_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1-2-4-7-8-9-10-11-12-13-15-17 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model

Download link

Fine-tune from Multi-task trained model

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <multi_task_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name finetune_from_multi_task_model

License

vilbert-multi-task is licensed under MIT license available in LICENSE file.

vilbert-multi-task's People

Contributors

Stargazers

Watchers

Forkers

jialinwu17 schangpi sgondala yupenggao gyq716 supermousse jialih nyquist0 jaeyun95 johntiger1 sxwxs hongbo-sun yana-xuyan bhoomit tfka peternara makailove123 kanji95 danielhv10 aistudentsh murilo nilanshrajput abbaahmad yssongbit vision-lang aashsach vikshree avani17101 mikelmrtinez weikuoguo elchico1990 trungtv felix2048 vickimo kushalkafle hhhhnwl bouzaien jiasenlu xiaming9880 ttaoretw oishikimchi97 howardhsu yanyushu benob pvss gullalc zhiyuanchen tzebin amitakamath jornvanwier devvrit gqkc emransalehali heavenask youngergao luowensheng detectiveli gsgoncalves chopinsharp seo-95 settahri jiananzhao0224 soltrinox fallcat kinghup ayushjain1144 aschneidman hellbell shiyuzh2007 meera1hahn guhur ankitshah009 fightingfighting anonymous1art wanglc2008 davkis123 hwanheelee1993 albertbj sinamalakouti awesomericky dymmmm reallsp karmarv zeroroman cdancette stjordanis candacelax haif-liu superxiang ishitamed19 hirethehero gavine199 noman-tanveer sfcurre vivym starlight-2021 yfyuan01 mtchibozo tahlor saraansh1999

vilbert-multi-task's Issues

Hardware requirements: CUDA out of memory error while training

Hello,

Following the multi-task training instructions, I used python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/multi_task_model.bin --config_file config/bert_base_6layer_6conect.json --tasks 1 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model_0 in order to train the base model for the VQA task.

I am using a single 4-GB GPU with the following characteristic:

When I start training, I am running into the following issue: RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 3.94 GiB total capacity; 3.28 GiB already allocated; 48.31 MiB free; 3.39 GiB reserved in total by PyTorch). I also tried to reduce the batch_size for TASK1 in the vilbert_tasks.yml file to use less memory, but the error persists.

Could anyone provide the minimum hardware requirements to train the model for a specific task (e.g., VQA)?

Here is a full discription of the error:

07/17/2020 08:39:34 - INFO - pytorch_transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/aloui/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
07/17/2020 08:39:34 - INFO - vilbert.task_utils -   Loading VQA Dataset with batch size 128
07/17/2020 08:39:34 - INFO - vilbert.datasets.vqa_dataset -   Loading from datasets/VQA/cache/VQA_trainval_23_cleaned.pkl
07/17/2020 08:41:46 - INFO - vilbert.datasets.vqa_dataset -   Loading from datasets/VQA/cache/VQA_minval_23_cleaned.pkl
07/17/2020 08:41:46 - INFO - vilbert.utils -   logging file at: save/VQA_bert_base_6layer_6conect-multi_task_model_0/logs
07/17/2020 08:41:46 - INFO - vilbert.utils -   loading weights file save/multi_task_model.bin
559 559
***** Running training *****
  Num Iters:  {'TASK1': 4236}
  Batch size:  {'TASK1': 128}
  Num steps: 84720
Epoch:   0%|                                                |0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train_tasks.py", line 673, in <module>
    main()
  File "train_tasks.py", line 533, in main
    task_losses,
  File "/aloui/vilbert-multi-task/vilbert/task_utils.py", line 321, in ForwardModelsTrain
    task_tokens,
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 1662, in forward
    output_all_attention_masks=output_all_attention_masks,
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 1387, in forward
    output_all_attention_masks=output_all_attention_masks,
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 1065, in forward
    use_co_attention_mask,
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 895, in forward
    layer_output1 = self.v_output(intermediate_output1, attention_output1)
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 676, in forward
    hidden_states = self.dropout(hidden_states)
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/dropout.py", line 54, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/functional.py", line 807, in dropout
    else _VF.dropout(input, p, training))
RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 3.94 GiB total capacity; 3.28 GiB already allocated; 46.81 MiB free; 3.39 GiB reserved in total by PyTorch)

Citation Year

I think I found a minor typo in the citation date.
For "12-in-1: Multi-Task Vision and Language Representation Learning", google scholar says the year is 2020. However, in the github page, it says 2019.

Thanks!

missing caption info in convert_to_lmdb.py

I think you are missing caption text per image when generate lmdb file. Otherwise you wouldn't get caption in this line of code.

vilbert-multi-task/vilbert/datasets/concept_cap_dataset.py

Line 429 in a290ba6

 image_feature_wp, image_target_wp, image_location_wp, num_boxes, image_h, image_w, image_id, caption = ( 

problem with .lmdb file for pretrain

For the code to generate ".lmdb file" only contains keys
["image_id,"image_h","image_w","num_boxes","boxes","features" ]
But for the code to pretrain needs keys
["image_feature_wp", "image_target_wp", "image_location_wp", "num_boxes", "image_h"," image_w", "image_id", "caption" ]
Can anyone help ?
Thanks a lot
@vedanuj

Provided Vilbert model Not working with the data

Hi there,
I tried to run some demo using the code from this repo.
I prepared the data following the instructions in the data folder
I.E. download datasets.tar.gz and resnext152 coco features(by wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets/coco/features_100/COCO_trainval_resnext152_faster_rcnn_genome.lmdb/data.mdb && mv data.mdb COCO_trainval_resnext152_faster_rcnn_genome.lmdb/)

I tried to evaluate the VQA performance using
CUDA_VISIBLE_DEVICES=1 python eval_tasks.py --bert_model bert-base-uncased --from_pretrained multi_task_model.bin --config_file config/bert_base_6layer_6conect.json --tasks 1 --task_specific_tokens

the multi_task_model.bin is download from
https://dl.fbaipublicfiles.com/vilbert-multi-task/multi_task_model.bin

But I got

I am wondering if someone else facing the same issues or is there something wrong with my code

Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS

Hi,

Following the visual feature extraction instructions, I'm running into the following issue:

Traceback (most recent call last):
File "../vilbert-multi-task/script/extract_features.py", line 233, in
feature_extractor = FeatureExtractor()
File "../vilbert-multi-task/script/extract_features.py", line 30, in init
self.detection_model = self._build_detection_model()
File "../vilbert-multi-task/script/extract_features.py", line 76, in _build_detection_model
cfg.merge_from_file(self.args.config_file)
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 213, in merge_from_file
self.merge_from_other_cfg(cfg)
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
_merge_a_into_b(cfg_other, self, self, [])
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 460, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 460, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 473, in _merge_a_into_b
raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS'

From this issue it seems like that the config provided here might not be consistent with maskrcnn current code. Is there any particular version I should install?

Trying to run demo

I am trying to run demo file. But I can not find the config file (save/resnext_models/e2e_faster_rcnn_X-152-32x8d-FPN_1x_MLP_2048_FPN_512_train.yaml ) and model file (save/resnext_models/model_final.pth). Can anyone help me? Thank you so much.

# of output classes hardcoded?

Seems like we are hardcoding the # of output classes for GQA dataset to be 1533? Where does that come from? I think it should be self.num_labels

vilbert-multi-task/vilbert/vilbert.py

Line 1614 in 72770f1

config.bi_hidden_size, config.bi_hidden_size * 2, 1533, 0.5

You need to install libcap development headers to build this module

Any idea on how to install libcap? Thanks

    ERROR: Command errored out with exit status 1:
     command: /h/johnchen/anaconda3/envs/vilbert-mt/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-eyi3jvqh/python-prctl/setup.py'"'"'; __file__='"'"'/tmp/pip-install-eyi3jvqh/python-prctl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-eyi3jvqh/python-prctl/pip-egg-info
         cwd: /tmp/pip-install-eyi3jvqh/python-prctl/
    Complete output (1 lines):
    You need to install libcap development headers to build this module
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

NVCC Error while installing VQA-maskrcnn-benchmark

I am trying to install vqa-maskrcnn-benchmark with the following command:

git clone https://gitlab.com/vedanuj/vqa-maskrcnn-benchmark.git
cd vqa-maskrcnn-benchmark
python setup.py build develop

I got an error saying:
unable to execute '/usr/local/cuda/bin/nvcc': No such file or directory error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1
Can anyone please suggest how to fix this?
Thank you! :)

What is the license for the pre-trained models

I understand the package itself is licensed under MIT. However, no license information is provided for the models, and the project README contains no hint of it. Can you clarify?

i got "ConnectionResetError"

hi
i run this code for VCR task.(training)
i got an error like this.

(vilbert-mt) ailab@ailab:~/vilbert-multi-task$ python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --tasks 5-6 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model
Failed to import tensorflow.
03/13/2020 13:52:59 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
03/13/2020 13:53:00 - INFO - pytorch_transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/13/2020 13:53:00 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 16
03/13/2020 13:53:11 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 16
03/13/2020 13:53:25 - INFO - vilbert.utils -   logging file at: save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-multi_task_model/logs
03/13/2020 13:53:25 - INFO - vilbert.utils -   loading weights file save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin
03/13/2020 13:53:29 - INFO - vilbert.utils -   Weights of VILBertForVLTasks not initialized from pretrained model: ['bert.embeddings.task_embeddings.weight', 'vil_prediction.logit_fc.0.weight', 'vil_prediction.logit_fc.0.bias', 'vil_prediction.logit_fc.2.weight', 'vil_prediction.logit_fc.2.bias', 'vil_prediction.logit_fc.3.weight', 'vil_prediction.logit_fc.3.bias', 'vil_prediction_gqa.logit_fc.0.weight', 'vil_prediction_gqa.logit_fc.0.bias', 'vil_prediction_gqa.logit_fc.2.weight', 'vil_prediction_gqa.logit_fc.2.bias', 'vil_prediction_gqa.logit_fc.3.weight', 'vil_prediction_gqa.logit_fc.3.bias', 'vil_binary_prediction.logit_fc.0.weight', 'vil_binary_prediction.logit_fc.0.bias', 'vil_binary_prediction.logit_fc.2.weight', 'vil_binary_prediction.logit_fc.2.bias', 'vil_binary_prediction.logit_fc.3.weight', 'vil_binary_prediction.logit_fc.3.bias', 'vil_tri_prediction.weight', 'vil_tri_prediction.bias']
03/13/2020 13:53:29 - INFO - vilbert.utils -   Weights from pretrained model not used in VILBertForVLTasks: ['vil_prediction.main.0.bias', 'vil_prediction.main.0.weight_g', 'vil_prediction.main.0.weight_v', 'vil_prediction.main.3.bias', 'vil_prediction.main.3.weight_g', 'vil_prediction.main.3.weight_v']
559 559
***** Running training *****
  Num Iters:  {'TASK5': 2039, 'TASK6': 2039}
  Batch size:  {'TASK5': 16, 'TASK6': 16}
  Num steps: 40780
Epoch:   0%|                                             | 0/20 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "train_tasks.py", line 670, in <module>
    main()
  File "train_tasks.py", line 529, in main
    task_losses,
  File "/home/ailab/vilbert-multi-task/vilbert/task_utils.py", line 200, in ForwardModelsTrain
    if task_cfg[task_id]["process"] in ["dialog"]:
KeyError: 'process'
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/utils/data/_utils/pin_memory.py", line 21, in _pin_memory_loop
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 276, in rebuild_storage_fd
    fd = df.detach()
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 493, in Client
    answer_challenge(c, authkey)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 737, in answer_challenge
    response = connection.recv_bytes(256)        # reject large message
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

how do i do? help T^T

Problem with GQA question ID

vilbert-multi-task/vilbert/datasets/gqa_dataset.py

Line 28 in 6c1d98c

"question_id": int(item["question_id"]),

Converting string to int for GQA datasets can cause possible mismatch for the prediction as there are question IDs such as 01827492 and 1827492 in balanced validation sets, which is problematic for evaluation using the official script.

msgpack.exceptions.ExtraData: unpack(b) received extra data.

Hi,

When I tried to run

python train_concap.py --bert_model bert-base-uncased \
                       --config_file config/bert_base_6layer_6conect.json \
                       --train_batch_size 512 --objective 1 \
                       --file_path ../cc_dataset/dataset \
                       --output_dir ../vilbert_files/models

I got the error message below:

Process _Worker-1:
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/parallel.py", line 169, in run
    for dp in self.ds:
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/common.py", line 641, in __iter__
    for dp in self._inf_iter:
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/common.py", line 370, in __iter__
    for dp in self.ds:
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/common.py", line 297, in __iter__
    ret = self.func(copy(dp))  # shallow copy the list
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/serialize.py", line 84, in <lambda>
    return MapData(df, lambda dp: loads(dp[1]))
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/utils/serialize.py", line 43, in loads_msgpack
    max_str_len=MAX_MSGPACK_LEN)
  File "/usr/local/python3/lib/python3.6/site-packages/msgpack_numpy.py", line 255, in unpackb
    return _unpackb(packed, **kwargs)
  File "msgpack/_unpacker.pyx", line 208, in msgpack._unpacker.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.

Do you have any ideas what happens?
Thanks!

i had "AttributeError: 'NoneType' object has no attribute 'named_parameters'" error!

hi
i want to train on VCRdataset.

i got this error!
how do i do? help T^T

(vilbert-mt) ailab@ailab:~/vilbert-multi-task$ python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin --config_file config/bert_base_6layer_6conect.json --tasks 5-6 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model
Failed to import tensorflow.
03/12/2020 23:24:28 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
03/12/2020 23:24:29 - INFO - pytorch_transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/12/2020 23:24:29 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 64
03/12/2020 23:24:40 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 64
03/12/2020 23:24:53 - INFO - vilbert.utils -   logging file at: save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-multi_task_model/logs
03/12/2020 23:24:53 - ERROR - vilbert.utils -   Model name 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, roberta-base, roberta-large, roberta-large-mnli). We assumed 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
  File "train_tasks.py", line 670, in <module>
    main()
  File "train_tasks.py", line 398, in main
    for key, value in dict(model.named_parameters()).items():
AttributeError: 'NoneType' object has no attribute 'named_parameters'
(vilbert-mt) ailab@ailab:~/vilbert-multi-task$ python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin --config_file config/bert_base_6layer_6conect.json --tasks 5-6 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model
Failed to import tensorflow.
03/12/2020 23:34:06 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
03/12/2020 23:34:08 - INFO - pytorch_transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/12/2020 23:34:08 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 64
03/12/2020 23:34:19 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 64
03/12/2020 23:34:34 - INFO - vilbert.utils -   logging file at: save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-multi_task_model/logs
03/12/2020 23:34:34 - ERROR - vilbert.utils -   Model name 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, roberta-base, roberta-large, roberta-large-mnli). We assumed 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
  File "train_tasks.py", line 670, in <module>
    main()
  File "train_tasks.py", line 398, in main
    for key, value in dict(model.named_parameters()).items():
AttributeError: 'NoneType' object has no attribute 'named_parameters'

Where can I find visual genome's object categories?

Hi!

I'm trying to extract features on Conceptual Captions with script/extract_features.py, while the output .npy file contains also the class predictions, I wonder could you also provide the indices corresponding to those 1,600 object categories in Visual Genome? (such as this or this)

Thanks!

ModuleNotFoundError: No module named 'tools.refer.refer'

!python train_tasks.py --bert_model bert-base-uncased --from_pretrained data/pretrained_model.bin --config_file config/bert_base_6layer_6conect.json --tasks 1-2-4-7-8-9-10-11-12-13-15-17 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model

output:
2020-07-21 14:05:52.998850: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "train_tasks.py", line 33, in
from vilbert.task_utils import (
File "/content/drive/My Drive/vilbert-multi-task-master/vilbert/task_utils.py", line 19, in
from vilbert.datasets import DatasetMapTrain, DatasetMapEval
File "/content/drive/My Drive/vilbert-multi-task-master/vilbert/datasets/init.py", line 15, in
from .refer_expression_dataset import ReferExpressionDataset
File "/content/drive/My Drive/vilbert-multi-task-master/vilbert/datasets/refer_expression_dataset.py", line 16, in
from tools.refer.refer import REFER
ModuleNotFoundError: No module named 'tools.refer.refer'

'datasets' folder?

For the whole datasets folder, is there any scripts to generate it or are we supposed to create it? also where is the script to generate several .pkl files that supposed to be contained in the datasets folder, where is the info about the cleaned dataset? thanks

Transfer learning with vilbert

From my understanding, we get visuo-linguistic embeddings using VilBert (and LXMERT and VL-Bert for that matter too). Is it possible to simply use these as a layer/feature extractor backbone for visual/linguistic tasks? For instance, if we wanted to add a linear classifier (or LSTM) on top of the VilBert embeddings, are there any provided pretrained weights?

Thanks

Could you please provide access to the required data files?

Hi! Thank you for releasing this great project! However, I notice that the data files (including the lmdb feature files as well as other metadata) needed to run pre-training and multi-task fine-tuning is not accessible. Could you please add accessible links to them? Or a readme guiding how to generate them is also fine. Thank you very much!

Wrong num_train_optimization_steps with gradient accumulation

Hey @vedanuj, it looks like there is a bug in the number of training steps when using gradient accumulation (L343):

vilbert-multi-task/train_concap.py

Lines 340 to 344 in 72770f1

 num_train_optimization_steps = int( 

 train_dataset.num_dataset 

 / args.train_batch_size 

 / args.gradient_accumulation_steps 

 ) * (args.num_train_epochs - args.start_epoch)

as args.train_batch_size was already divided by args.gradient_accumulation_steps in L288:

vilbert-multi-task/train_concap.py

Line 288 in 72770f1

 args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps 

PS. This issue is also present in vilbert_beta (@jiasenlu)

requirements.txt update

pytorch-transformers==1.0.0 does not provide implementation for RoBERTa model. The support starts from 1.1.0. With pytorch-transformers==1.0.0, it gives me errors when running the training script.
2.
tensorboardX==1.2
tensorflow==1.15.2
tensorpack==0.9.4
For some reasons, I have the following errors when running the scripts. I am wondering if there are any conflicts
Traceback (most recent call last): File "/home/mou/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/ptvsd_launcher.py", line 48, in <module> main(ptvsdArgs) File "/home/mou/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main run() File "/home/mou/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 316, in run_file runpy.run_path(target, run_name='__main__') File "/home/mou/miniconda3/envs/evqa/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/home/mou/miniconda3/envs/evqa/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/home/mou/miniconda3/envs/evqa/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/mou/Desktop/evqa/vilbert-multi-task/train_evqa.py", line 33, in <module> from vilbert.task_utils import ( File "/home/mou/Desktop/evqa/vilbert-multi-task/vilbert/task_utils.py", line 20, in <module> from vilbert.datasets import DatasetMapTrain, DatasetMapEval File "/home/mou/Desktop/evqa/vilbert-multi-task/vilbert/datasets/__init__.py", line 6, in <module> from .concept_cap_dataset import ( File "/home/mou/Desktop/evqa/vilbert-multi-task/vilbert/datasets/concept_cap_dataset.py", line 15, in <module> import tensorpack.dataflow as td File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorpack/__init__.py", line 5, in <module> from tensorpack.libinfo import __version__, __git_version__, _HAS_TF File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorpack/libinfo.py", line 53, in <module> import tensorflow as tf # noqa File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorflow/__init__.py", line 99, in <module> from tensorflow_core import * File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorflow_core/__init__.py", line 36, in <module> from tensorflow._api.v1 import compat File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorflow_core/_api/v1/compat/__init__.py", line 24, in <module> from tensorflow._api.v1.compat import v2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorflow_core/_api/v1/compat/v2/__init__.py", line 322, in <module> from tensorboard.summary._tf import summary File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/summary/__init__.py", line 25, in <module> from tensorboard.summary import v1 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/summary/v1.py", line 24, in <module> from tensorboard.plugins.audio import summary as _audio_summary File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/plugins/audio/summary.py", line 36, in <module> from tensorboard.plugins.audio import metadata File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/plugins/audio/metadata.py", line 21, in <module> from tensorboard.compat.proto import summary_pb2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/compat/proto/summary_pb2.py", line 16, in <module> from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/compat/proto/tensor_pb2.py", line 16, in <module> from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/compat/proto/resource_handle_pb2.py", line 16, in <module> from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py", line 23, in <module> serialized_pb=_b('\n+tensorboard/compat/proto/tensor_shape.proto\x12\x0btensorboard\"{\n\x10TensorShapeProto\x12.\n\x03\x64im\x18\x02 \x03(\x0b\x32!.tensorboard.TensorShapeProto.Dim\x12\x14\n\x0cunknown_rank\x18\x03 \x01(\x08\x1a!\n\x03\x44im\x12\x0c\n\x04size\x18\x01 \x01(\x03\x12\x0c\n\x04name\x18\x02 \x01(\tBq\n\x18org.tensorflow.frameworkB\x11TensorShapeProtosP\x01Z=github.com/tensorflow/tensorflow/tensorflow/go/core/framework\xf8\x01\x01\x62\x06proto3') File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/google/protobuf/descriptor.py", line 878, in __new__ return _message.default_pool.AddSerializedFile(serialized_pb) TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "tensorboard/compat/proto/tensor_shape.proto": tensorboard.TensorShapeProto.dim: "tensorboard.TensorShapeProto.dim" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.unknown_rank: "tensorboard.TensorShapeProto.unknown_rank" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.Dim.size: "tensorboard.TensorShapeProto.Dim.size" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.Dim.name: "tensorboard.TensorShapeProto.Dim.name" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.Dim: "tensorboard.TensorShapeProto.Dim" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto: "tensorboard.TensorShapeProto" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.dim: "tensorboard.TensorShapeProto.Dim" seems to be defined in "tensorboardX/src/tensor_shape.proto", which is not imported by "tensorboard/compat/proto/tensor_shape.proto". To use it here, please add the necessary import.

*_weight_name.json for different bert pretrained models?

Hello,
Could you please tell me how to generate *_weight_name.json for different Bert pre-trained models? I would like to experiment model for another language rather than English.
Thank you very much,

extract_features failed to run

hi，When I run this extract_features file,I got this error，Whether there is an error for detectron_model.pth and detectron_config.yaml ？

How are the images embedded?

The paper does not really go into detail on how the embeddings are initialized. From my understanding in a text embedding we assign tokens an id (generally the row number), and each row has a corresponding vector for the token. How does this work for images? Does each row represent some region in the image?

I saw from the code:

vilbert-multi-task/vilbert/vilbert.py

Lines 1409 to 1432 in 76c96c6

 class BertImageEmbeddings(nn.Module): 

 """Construct the embeddings from image, spatial location (omit now) and token_type embeddings. 

  """ 

 def __init__(self, config): 

 super(BertImageEmbeddings, self).__init__() 

 self.image_embeddings = nn.Linear(config.v_feature_size, config.v_hidden_size) 

 self.image_location_embeddings = nn.Linear(5, config.v_hidden_size) 

 self.LayerNorm = BertLayerNorm(config.v_hidden_size, eps=1e-12) 

 self.dropout = nn.Dropout(config.hidden_dropout_prob) 

 def forward(self, input_ids, input_loc): 

 img_embeddings = self.image_embeddings(input_ids) 

 loc_embeddings = self.image_location_embeddings(input_loc) 

 # TODO: we want to make the padding_idx == 0, however, with custom initilization, it seems it will have a bias. 

 # Let's do masking for now 

 embeddings = self.LayerNorm(img_embeddings + loc_embeddings) 

 # embeddings = self.LayerNorm(img_embeddings+loc_embeddings) 

 embeddings = self.dropout(embeddings) 

 return embeddings

Looking at one of the configs bert_base_8layer_8conect.json it looks like the visual embeddings are of shape (2048, 1024) as the v_feature_size is 2048 and the v_hidden_size is 1024.

In the bert model:

vilbert-multi-task/vilbert/vilbert.py

Line 1378 in 76c96c6

v_embedding_output = self.v_embeddings(input_imgs, image_loc)

It looks like we pass in the extracted features (highest prob boxes from maskrnn) and its locations to the embedding as seen here

But I don't understand what its actually doing? ~~If we pass in a list of features how is it looking it up?~~ Okay, it isn't exactly the same as an embedding layer. This is just a linear layer which takes input multiplies with weight and sums up all scores etc. In this "embedding" is the linear layer mapping similar features to similar outputs? Like if there are similar "in" features (like eyes on dogs) the projected points should be close (in the out feature space). But if I fed the linear layer features from a horses teeth it would project a point which is far from the eyes (obviously with regards to the problem)

training_feat_all.lmdb and caption_train.json not found

Hi, thanks for your excellent work.

I'm trying to run your code with Pretraining on Conceptual Captions, but some files do not exist, and I didn't find any script to generate them. I've tried to set up data as specified here, but the step 3 only gives me a data.mdb and a lock.mdb. Can you provide training_feat_all.lmdb and caption_train.json ?

Many thanks.

Some questions about code.

Hello, I'm curious about some code in this repo.

https://github.com/facebookresearch/vilbert-multi-task/blob/master/vilbert/vilbert.py#L1522
why need max(torch.sum((image_label == 1)), 0) and What's the use of 0 here? It seems that 0 should be changed to 1.

Hope to get your reply. Thank you very much.

Extracting vilbert representations

Is there any easy way to save representations from intermediate layers?

msgpack.exceptions.ExtraData: unpack(b) received extra data.

When running preprocess_sequential_train_segment.py, I got this error. Any ideas?

Process _Worker-1:
Traceback (most recent call last):
  File "/home/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/anaconda3//lib/python3.7/site-packages/tensorpack/dataflow/parallel.py", line 285, in run
    for dp in self.ds:
  File "/home/anaconda3/lib/python3.7/site-packages/tensorpack/dataflow/common.py", line 297, in __iter__
    ret = self.func(copy(dp))  # shallow copy the list
  File "/home/anaconda3/lib/python3.7/site-packages/tensorpack/dataflow/serialize.py", line 84, in <lambda>
    return MapData(df, lambda dp: loads(dp[1]))
  File "/home/anaconda3/lib/python3.7/site-packages/tensorpack/utils/serialize.py", line 43, in loads_msgpack
    max_str_len=MAX_MSGPACK_LEN)
  File "/home/anaconda3/lib/python3.7/site-packages/msgpack_numpy.py", line 255, in unpackb
    return _unpackb(packed, **kwargs)
  File "msgpack/_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.

Referred Object Comprehension code (tools/refer) is in python 2

Hello,

I am trying to use reproduce train and test performance using this repository however one of the external packages in tools/refer seems to be written in python2.

Am I looking at the wrong version or is there any workaround apart from converting the print statements (which seem to be the culprit for now?)

Please let me know.

trainval_ans2label.pkl?

Did anyone know where to find or how to generate the 'trainval_ans2label.pkl' required here? Thanks a lot~

Clarification of features ("fc6") and bbox from extract_features.py

Hi,

I intend to extract features from a simulated environment, and object detection models like detectron and yolov3 perform poorly on my dataset, and hence I plan to use metadata from the simulator to extract object detection boxes. For this, I wanted to clarify my understanding of features and bbox.

bbox in "extract_features.py" (bbox=out["proposals"]), is the object detection model output box coordinates in the format (xmin, ymin, xmax, ymax) (corner coordinates of the box), there are n such boxes (n=100) per image, while features represent the ResNet101 pre-trained model output of the regions inside the boxes, and there are n such features per image (n=100, same as number of boxes).

Thank you in anticipation!!

The use of python-prctl

Hello,

Could you please tell me where python-prctl is utilized? I need to run the code on a machine where I don't have sudo access and many ways to install python-prctl properly seem to depend on sudo (it needs build-essential and libpcap-dev, other ways of installing seem to end up needing sudo at some point, as well). I was able to install it on my laptop, but it doesn't have a GPU, so I run into other issues.

Best regards

AttributeError: 'BertTokenizer' object has no attribute 'add_special_tokens_sentences_pair'

I tried to fine-tune on the vcr dataset, and there is the problem. I replaced the function with 'add_special_tokens_sequence_pair', but either this one exists.
The bert model I used is bert-base-uncased.

Details about object representation for GuessWhat?!

Hello,

I was checking your code and paper and I was a little bit confused by your approach at GuessWhat?!. I can see from your dataset reader that you compute Intersection over Union in order to find a match between the gold bounding boxes and the predicted ones by FastRCNN (https://github.com/facebookresearch/vilbert-multi-task/blob/master/vilbert/datasets/guesswhat_pointing_dataset.py#L276). However, seems that you concatenate the two bounding boxes later on. How does it work exactly? Could you please provide more details and rationale about this approach? Unfortunately, I can't see an explanation for it either in the code or in the paper.

Thank you in advance for your help.

Usage of co_attention_mask

I'm trying to extract representations from pre-trained VILBert, by building a method within class VILBertForVLTasks that returns sequence_output_t, sequence_output_v, pooled_output_t, pooled_output_v (line 1652 of vilbert/vilbert.py). I want the text representations (sequence_output_t) to be independent of the visual input, which I figured I would need to do using the co_attention_mask input.

The default co_attention_mask is None, do I need to set it a tensor of ones of the appropriate size in order to mask the text and visual inputs from each other? I tried following the usage of co_attention_mask in BertModel, BertEncoder etc, but it's not clear exactly where this masking is applied, or where the variable use_co_attention_mask is set to True.

Json concap

How does json file of captions look like?

Bug in Validation Logger

Hi,

In the line below we aren't multiplying the score with batch-size, and while logging we are dividing the sum of scores with the complete dataset-size. Does this seem correct?

vilbert-multi-task/vilbert/utils.py

Line 271 in a290ba6

self.task_score_val[task_id] += score

vilbert-multi-task/vilbert/utils.py

Line 339 in a290ba6

score = self.task_score_val[task_id] / float(self.task_datasize_val[task_id])

How to use VilBert pretrained for Caption-Based Image Retrieval

I would like to know if the pre-trained model given by this link (https://dl.fbaipublicfiles.com/vilbert-multi-task/pretrained_model.bin) can be used for Caption-Based Image Retrieval.

My first guess is that I can load the model using (not sure if the configuration file is the proper one):

config = BertConfig.from_json_file('config/bert_base_6layer_6conect.json')
model = VILBertForVLTasks.from_pretrained('pretrained_model.bin', config=config)

Afterwards I have seen digging in the code that running the inner bert model should give the sequence outputs for text and for image.

I have several questions:

As explained in the paper in page 4, how can I extract from these sequences the output hIMG and hCLS, can I assume they are the first one in each corresponding sequences?
Since the training aims to have a proper prediction on wether these two representations are aligned, can we expect that image embeddings (hIMG) and text embeddings (hCLS) to have large cosine similarity, (or any relation to other distance metric)?
Would the model fail if no text or no image is not provided? I would like to use it to extract one-feature-or-the-other but not providing both inputs.
Does the model expect to have a complete input image and it handles inside the object detection? Or does it expect the meaningful regions to be extracted as a preprocessing step? If it is expected to be called inside the model, what is supposed to be the image_loc parameter?

I hope I made myself clear

Thank you very much

Apex version

Hi, I have problems when using fp16 training with apex, which is caused by API changes in apex. Can you provide the specific git commit of apex and the build command you used?

Use the "extract_features" script with Detectron2?

NOTE: I have made some changes towards the bottom if someone can take a look at it to let me know if it looks about right?

I have been trying to get this to work with detectron2 as I have a fine tuned model on some custom data.

In particular I do not know how to implement this portion using detectron2

def _process_feature_extraction(
        self, output, im_scales, im_infos, feature_name="fc6", conf_thresh=0
    ):
        batch_size = len(output[0]["proposals"])
        n_boxes_per_image = [len(boxes) for boxes in output[0]["proposals"]]
        score_list = output[0]["scores"].split(n_boxes_per_image)
        score_list = [torch.nn.functional.softmax(x, -1) for x in score_list]
        feats = output[0][feature_name].split(n_boxes_per_image)
        cur_device = score_list[0].device

I have tried implementing part of it but I am stuck on what the scores are? What does it represent? is it the full softmax vector?

this is what I have done so far:

images = ImageList.from_tensors(lst[:1], size_divisibility=32).to("cuda")  # preprocessed input tensor
#setup config
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (pnumonia)
#Just run these lines if you have the trained model im memory
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
#build model
model = build_model(cfg)
DetectionCheckpointer(model).load("output/model_final.pth")
model.eval()#make sure its in eval mode

#run model
features = model.backbone(images.tensor.float())
proposals, _ = model.proposal_generator(images, features)
instances = model.roi_heads._forward_box(features, proposals)
mask_features = [features[f] for f in model.roi_heads.in_features]
mask_features = model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances])
###########
batch_size = len(proposals)
n_boxes_per_image = [len(boxes) for boxes in proposals]

EDIT

I have changed the extract features methods to run using the detectron2 model. I believe this is correct, could anyone take a quick look at it:
_process_feature_extraction:

    def _process_feature_extraction(
        self, output, im_scales, im_infos, feature_name="p6", conf_thresh=0
    ):
        feat_list = []
        info_list = []
        batch_size = len(output['instances'])
        #print(batch_size)
        for i in range(batch_size):
            feat_list.append(output['features'][feature_name][i])
            info_list.append(
                    {
                        "bbox": output['instances'][i].pred_boxes.to('cpu').tensor.numpy() / im_scales[i],
                        "num_boxes": len(output['instances'][i]),
                        "objects": output['instances'][i].pred_classes.to('cpu').numpy(),
                        "image_width": im_infos[i]["width"],
                        "image_height": im_infos[i]["height"],
                        "cls_prob": output['instances'][i].scores.to('cpu').numpy(),
                    }
                )

        return feat_list, info_list

get_detectron_features:

    def get_detectron_features(self, image_paths):
        img_tensor, im_scales, im_infos = [], [], []

        for image_path in image_paths:
            im, im_scale, im_info = self._image_transform(image_path)
            img_tensor.append(im)
            im_scales.append(im_scale)
            im_infos.append(im_info)

        # Image dimensions should be divisible by 32, to allow convolutions
        # in detector to work
        current_img_list = ImageList.from_tensors(img_tensor, size_divisibility=32)
        current_img_list = current_img_list.to("cuda")
        #print(current_img_list.tensor)
        #print(np.shape(current_img_list.tensor))

        with torch.no_grad():
            #run model
            features = self.detection_model.backbone(current_img_list.tensor)#outputs features
            proposals, _ = self.detection_model.proposal_generator(current_img_list, features)
            instances, scores = self.detection_model.roi_heads._forward_box(features, proposals)
            mask_features = [features[f] for f in self.detection_model.roi_heads.in_features]
            mask_features = self.detection_model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances])
            output = {'features':features, 'proposals':proposals, 'instances':instances, 'mask_features': mask_features}

        feat_list = self._process_feature_extraction(
            output,
            im_scales,
            im_infos,
            self.feature_name,
            self.confidence_threshold,
        )

        return feat_list

_build_detection_model:

    def _build_detection_model(self):
        cfg = get_cfg()
        cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
        #cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
        cfg.SOLVER.IMS_PER_BATCH = 1
        cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (pnumonia)
        #Just run these lines if you have the trained model im memory
        cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
        model = build_model(cfg)
        DetectionCheckpointer(model).load("output/model_final.pth")
        cfg.freeze()

        model.to("cuda")
        model.eval()
        return model

inference for image retrieval ?

my question is : how to do inference for caption based image retrieval, given a set of images and a question/caption ? can you provide an example in demo.ipynb ?

i want to fine tune for VCR task

hi!
i want to fine-tune for VCR task.
i have to use pre-train conceptual caption 6 layer model at 'from_pretrained' parameter. right?

thank you!

Where is the '<IMG>' token ?

i analyze you're code. i can't see IMG token anywhere.

where is the IMG token?
I would be grateful if you could tell me where to put the IMG token.

thank you:)

No caption_train.json file

I have followed the data instruction to prepare the data. However, I am not able to find the caption_train.json file. Can you please tell me where I can find this file.

Issue with generating lmdb

Running the script/convert_to_lmdb.py stops at around 90% with no error, but apparently it hasn't finished, as running train_tasks.py gives TypeError: a bytes-like object is required, not 'NoneType' at here. Does anyone know what could be the problem? Thanks~

errors when running fine tuning cmd

I download your dataset and run:
python train_tasks.py --bert_model bert-base-uncased --from_pretrained ./multi_task_model.bin --config_file config/bert_base_6layer_6conect.json --tasks 1 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name finetune_from_multi_task_model

it will give error:
Traceback (most recent call last):
File "train_tasks.py", line 670, in
main()
File "train_tasks.py", line 529, in main
task_losses,
File "/dccstor/extrastore/vilbert-multi-task/vilbert/task_utils.py", line 197, in ForwardModelsTrain
batch
ValueError: too many values to unpack (expected 9)

Non-existent config key error?

Hi, I got KeyError: 'Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS' when ran the extract_features.py file. I was wondering if you know the possible reason (I downloaded the model and config file using the given link)? Thanks for the work, btw

Caught StopIteration in replica 0 on device 0.

When I use multi-gpu , error happens, the detail is bellow. How can I solve this problem

File "vilbert-multi-task/train_cls.py", line 535, in
main()
File "vilbert-multi-task/train_cls.py", line 407, in main
task_losses,
File "..../vilbert-multi-task/vilbert/task_utils.py", line 327, in ForwardModelsTrain
task_tokens,
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "..../vilbert-multi-task/vilbert/vilbert.py", line 1662, in forward
output_all_attention_masks=output_all_attention_masks,
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "...../vilbert-multi-task/vilbert/vilbert.py", line 1351, in forward
dtype=next(self.parameters()).dtype
StopIteration

Training Loop Issue: ForwardModelsTrain()

Doesn't anybody feel it probmatic or twisted about the training loop?
In ForwardModelsTrain(), there is a loop over dataloader to use the batch data one after another. But there is a "return loss, batch_score" within the loop, meaning that the function only uses the first batch before exiting. After the loss and score of the batch are returned, the parent function does the back propagration and a few other tasks as it is supposed to do.

My question is that should the loop over the dataloader outside the ForwardModelsTrain(), replacing the line for step in range(median_num_iter): ?

	num_train_optimization_steps = int(
	train_dataset.num_dataset
	/ args.train_batch_size
	/ args.gradient_accumulation_steps
	) * (args.num_train_epochs - args.start_epoch)

	class BertImageEmbeddings(nn.Module):
	"""Construct the embeddings from image, spatial location (omit now) and token_type embeddings.
	"""

	def __init__(self, config):
	super(BertImageEmbeddings, self).__init__()

	self.image_embeddings = nn.Linear(config.v_feature_size, config.v_hidden_size)
	self.image_location_embeddings = nn.Linear(5, config.v_hidden_size)
	self.LayerNorm = BertLayerNorm(config.v_hidden_size, eps=1e-12)
	self.dropout = nn.Dropout(config.hidden_dropout_prob)

	def forward(self, input_ids, input_loc):

	img_embeddings = self.image_embeddings(input_ids)
	loc_embeddings = self.image_location_embeddings(input_loc)

	# TODO: we want to make the padding_idx == 0, however, with custom initilization, it seems it will have a bias.
	# Let's do masking for now
	embeddings = self.LayerNorm(img_embeddings + loc_embeddings)
	# embeddings = self.LayerNorm(img_embeddings+loc_embeddings)
	embeddings = self.dropout(embeddings)

	return embeddings

facebookresearch / vilbert-multi-task Goto Github PK

vilbert-multi-task's Introduction

12-in-1: Multi-Task Vision and Language Representation Learning

Repository Setup

Data Setup

Visiolinguistic Pre-training and Multi Task Training

Pretraining on Conceptual Captions

Multi-task Training

Fine-tune from Multi-task trained model

License

vilbert-multi-task's People

Contributors

Stargazers

Watchers

Forkers

vilbert-multi-task's Issues

NOTE: I have made some changes towards the bottom if someone can take a look at it to let me know if it looks about right?

EDIT

Recommend Projects

Recommend Topics

Recommend Org