Giter VIP home page Giter VIP logo

vilbert-multi-task's Introduction

12-in-1: Multi-Task Vision and Language Representation Learning

Please cite the following if you use this code. Code and pre-trained models for 12-in-1: Multi-Task Vision and Language Representation Learning:

@InProceedings{Lu_2020_CVPR,
author = {Lu, Jiasen and Goswami, Vedanuj and Rohrbach, Marcus and Parikh, Devi and Lee, Stefan},
title = {12-in-1: Multi-Task Vision and Language Representation Learning},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

and ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks:

@inproceedings{lu2019vilbert,
  title={Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks},
  author={Lu, Jiasen and Batra, Dhruv and Parikh, Devi and Lee, Stefan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={13--23},
  year={2019}
}

Repository Setup

  1. Create a fresh conda environment, and install all dependencies.
conda create -n vilbert-mt python=3.6
conda activate vilbert-mt
git clone --recursive https://github.com/facebookresearch/vilbert-multi-task.git
cd vilbert-multi-task
pip install -r requirements.txt
  1. Install pytorch
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
  1. Install apex, follows https://github.com/NVIDIA/apex

  2. Install this codebase as a package in this environment.

python setup.py develop

Data Setup

Check README.md under data for more details.

Visiolinguistic Pre-training and Multi Task Training

Pretraining on Conceptual Captions

python train_concap.py --bert_model bert-base-uncased --config_file config/bert_base_6layer_6conect.json --train_batch_size 512 --objective 1 --file_path <path_to_extracted_cc_features>

Download link

Multi-task Training

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <pretrained_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1-2-4-7-8-9-10-11-12-13-15-17 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model

Download link

Fine-tune from Multi-task trained model

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <multi_task_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name finetune_from_multi_task_model

License

vilbert-multi-task is licensed under MIT license available in LICENSE file.

vilbert-multi-task's People

Contributors

arjunmajum avatar dependabot[bot] avatar jiasenlu avatar kdexd avatar vedanuj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vilbert-multi-task's Issues

Hardware requirements: CUDA out of memory error while training

Hello,

Following the multi-task training instructions, I used python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/multi_task_model.bin --config_file config/bert_base_6layer_6conect.json --tasks 1 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model_0 in order to train the base model for the VQA task.

I am using a single 4-GB GPU with the following characteristic:
Screenshot from 2020-07-17 11-16-24

When I start training, I am running into the following issue: RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 3.94 GiB total capacity; 3.28 GiB already allocated; 48.31 MiB free; 3.39 GiB reserved in total by PyTorch). I also tried to reduce the batch_size for TASK1 in the vilbert_tasks.yml file to use less memory, but the error persists.

Could anyone provide the minimum hardware requirements to train the model for a specific task (e.g., VQA)?

Here is a full discription of the error:

07/17/2020 08:39:34 - INFO - pytorch_transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/aloui/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
07/17/2020 08:39:34 - INFO - vilbert.task_utils -   Loading VQA Dataset with batch size 128
07/17/2020 08:39:34 - INFO - vilbert.datasets.vqa_dataset -   Loading from datasets/VQA/cache/VQA_trainval_23_cleaned.pkl
07/17/2020 08:41:46 - INFO - vilbert.datasets.vqa_dataset -   Loading from datasets/VQA/cache/VQA_minval_23_cleaned.pkl
07/17/2020 08:41:46 - INFO - vilbert.utils -   logging file at: save/VQA_bert_base_6layer_6conect-multi_task_model_0/logs
07/17/2020 08:41:46 - INFO - vilbert.utils -   loading weights file save/multi_task_model.bin
559 559
***** Running training *****
  Num Iters:  {'TASK1': 4236}
  Batch size:  {'TASK1': 128}
  Num steps: 84720
Epoch:   0%|                                                |0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train_tasks.py", line 673, in <module>
    main()
  File "train_tasks.py", line 533, in main
    task_losses,
  File "/aloui/vilbert-multi-task/vilbert/task_utils.py", line 321, in ForwardModelsTrain
    task_tokens,
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 1662, in forward
    output_all_attention_masks=output_all_attention_masks,
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 1387, in forward
    output_all_attention_masks=output_all_attention_masks,
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 1065, in forward
    use_co_attention_mask,
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 895, in forward
    layer_output1 = self.v_output(intermediate_output1, attention_output1)
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/aloui/vilbert-multi-task/vilbert/vilbert.py", line 676, in forward
    hidden_states = self.dropout(hidden_states)
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/modules/dropout.py", line 54, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "/home/aloui/miniconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/nn/functional.py", line 807, in dropout
    else _VF.dropout(input, p, training))
RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 3.94 GiB total capacity; 3.28 GiB already allocated; 46.81 MiB free; 3.39 GiB reserved in total by PyTorch)

Citation Year

I think I found a minor typo in the citation date.
For "12-in-1: Multi-Task Vision and Language Representation Learning", google scholar says the year is 2020. However, in the github page, it says 2019.

Thanks!

problem with .lmdb file for pretrain

For the code to generate ".lmdb file" only contains keys
["image_id,"image_h","image_w","num_boxes","boxes","features" ]
But for the code to pretrain needs keys
["image_feature_wp", "image_target_wp", "image_location_wp", "num_boxes", "image_h"," image_w", "image_id", "caption" ]
Can anyone help ?
Thanks a lot
@vedanuj

Provided Vilbert model Not working with the data

Hi there,
I tried to run some demo using the code from this repo.
I prepared the data following the instructions in the data folder
I.E. download datasets.tar.gz and resnext152 coco features(by wget https://dl.fbaipublicfiles.com/vilbert-multi-task/datasets/coco/features_100/COCO_trainval_resnext152_faster_rcnn_genome.lmdb/data.mdb && mv data.mdb COCO_trainval_resnext152_faster_rcnn_genome.lmdb/)

I tried to evaluate the VQA performance using
CUDA_VISIBLE_DEVICES=1 python eval_tasks.py --bert_model bert-base-uncased --from_pretrained multi_task_model.bin --config_file config/bert_base_6layer_6conect.json --tasks 1 --task_specific_tokens

the multi_task_model.bin is download from
https://dl.fbaipublicfiles.com/vilbert-multi-task/multi_task_model.bin

But I got

image

I am wondering if someone else facing the same issues or is there something wrong with my code

Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS

Hi,

Following the visual feature extraction instructions, I'm running into the following issue:

Traceback (most recent call last):
File "../vilbert-multi-task/script/extract_features.py", line 233, in
feature_extractor = FeatureExtractor()
File "../vilbert-multi-task/script/extract_features.py", line 30, in init
self.detection_model = self._build_detection_model()
File "../vilbert-multi-task/script/extract_features.py", line 76, in _build_detection_model
cfg.merge_from_file(self.args.config_file)
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 213, in merge_from_file
self.merge_from_other_cfg(cfg)
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
_merge_a_into_b(cfg_other, self, self, [])
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 460, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 460, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/home/gamaga/anaconda3/envs/multimodalqa/lib/python3.8/site-packages/yacs/config.py", line 473, in _merge_a_into_b
raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS'

From this issue it seems like that the config provided here might not be consistent with maskrcnn current code. Is there any particular version I should install?

Trying to run demo

I am trying to run demo file. But I can not find the config file (save/resnext_models/e2e_faster_rcnn_X-152-32x8d-FPN_1x_MLP_2048_FPN_512_train.yaml ) and model file (save/resnext_models/model_final.pth). Can anyone help me? Thank you so much.

You need to install libcap development headers to build this module

Any idea on how to install libcap? Thanks

    ERROR: Command errored out with exit status 1:
     command: /h/johnchen/anaconda3/envs/vilbert-mt/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-eyi3jvqh/python-prctl/setup.py'"'"'; __file__='"'"'/tmp/pip-install-eyi3jvqh/python-prctl/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-eyi3jvqh/python-prctl/pip-egg-info
         cwd: /tmp/pip-install-eyi3jvqh/python-prctl/
    Complete output (1 lines):
    You need to install libcap development headers to build this module
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

NVCC Error while installing VQA-maskrcnn-benchmark

I am trying to install vqa-maskrcnn-benchmark with the following command:

git clone https://gitlab.com/vedanuj/vqa-maskrcnn-benchmark.git
cd vqa-maskrcnn-benchmark
python setup.py build develop

I got an error saying:
unable to execute '/usr/local/cuda/bin/nvcc': No such file or directory error: command '/usr/local/cuda/bin/nvcc' failed with exit status 1
Can anyone please suggest how to fix this?
Thank you! :)

i got "ConnectionResetError"

hi
i run this code for VCR task.(training)
i got an error like this.

(vilbert-mt) ailab@ailab:~/vilbert-multi-task$ python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --tasks 5-6 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model
Failed to import tensorflow.
03/13/2020 13:52:59 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
03/13/2020 13:53:00 - INFO - pytorch_transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/13/2020 13:53:00 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 16
03/13/2020 13:53:11 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 16
03/13/2020 13:53:25 - INFO - vilbert.utils -   logging file at: save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-multi_task_model/logs
03/13/2020 13:53:25 - INFO - vilbert.utils -   loading weights file save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin
03/13/2020 13:53:29 - INFO - vilbert.utils -   Weights of VILBertForVLTasks not initialized from pretrained model: ['bert.embeddings.task_embeddings.weight', 'vil_prediction.logit_fc.0.weight', 'vil_prediction.logit_fc.0.bias', 'vil_prediction.logit_fc.2.weight', 'vil_prediction.logit_fc.2.bias', 'vil_prediction.logit_fc.3.weight', 'vil_prediction.logit_fc.3.bias', 'vil_prediction_gqa.logit_fc.0.weight', 'vil_prediction_gqa.logit_fc.0.bias', 'vil_prediction_gqa.logit_fc.2.weight', 'vil_prediction_gqa.logit_fc.2.bias', 'vil_prediction_gqa.logit_fc.3.weight', 'vil_prediction_gqa.logit_fc.3.bias', 'vil_binary_prediction.logit_fc.0.weight', 'vil_binary_prediction.logit_fc.0.bias', 'vil_binary_prediction.logit_fc.2.weight', 'vil_binary_prediction.logit_fc.2.bias', 'vil_binary_prediction.logit_fc.3.weight', 'vil_binary_prediction.logit_fc.3.bias', 'vil_tri_prediction.weight', 'vil_tri_prediction.bias']
03/13/2020 13:53:29 - INFO - vilbert.utils -   Weights from pretrained model not used in VILBertForVLTasks: ['vil_prediction.main.0.bias', 'vil_prediction.main.0.weight_g', 'vil_prediction.main.0.weight_v', 'vil_prediction.main.3.bias', 'vil_prediction.main.3.weight_g', 'vil_prediction.main.3.weight_v']
559 559
***** Running training *****
  Num Iters:  {'TASK5': 2039, 'TASK6': 2039}
  Batch size:  {'TASK5': 16, 'TASK6': 16}
  Num steps: 40780
Epoch:   0%|                                             | 0/20 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "train_tasks.py", line 670, in <module>
    main()
  File "train_tasks.py", line 529, in main
    task_losses,
  File "/home/ailab/vilbert-multi-task/vilbert/task_utils.py", line 200, in ForwardModelsTrain
    if task_cfg[task_id]["process"] in ["dialog"]:
KeyError: 'process'
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/utils/data/_utils/pin_memory.py", line 21, in _pin_memory_loop
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 276, in rebuild_storage_fd
    fd = df.detach()
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 493, in Client
    answer_challenge(c, authkey)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 737, in answer_challenge
    response = connection.recv_bytes(256)        # reject large message
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ailab/anaconda3/envs/vilbert-mt/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

how do i do? help T^T

msgpack.exceptions.ExtraData: unpack(b) received extra data.

Hi,

When I tried to run

python train_concap.py --bert_model bert-base-uncased \
                       --config_file config/bert_base_6layer_6conect.json \
                       --train_batch_size 512 --objective 1 \
                       --file_path ../cc_dataset/dataset \
                       --output_dir ../vilbert_files/models

I got the error message below:

Process _Worker-1:
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/parallel.py", line 169, in run
    for dp in self.ds:
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/common.py", line 641, in __iter__
    for dp in self._inf_iter:
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/common.py", line 370, in __iter__
    for dp in self.ds:
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/common.py", line 297, in __iter__
    ret = self.func(copy(dp))  # shallow copy the list
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/dataflow/serialize.py", line 84, in <lambda>
    return MapData(df, lambda dp: loads(dp[1]))
  File "/usr/local/python3/lib/python3.6/site-packages/tensorpack/utils/serialize.py", line 43, in loads_msgpack
    max_str_len=MAX_MSGPACK_LEN)
  File "/usr/local/python3/lib/python3.6/site-packages/msgpack_numpy.py", line 255, in unpackb
    return _unpackb(packed, **kwargs)
  File "msgpack/_unpacker.pyx", line 208, in msgpack._unpacker.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.

Do you have any ideas what happens?
Thanks!

i had "AttributeError: 'NoneType' object has no attribute 'named_parameters'" error!

hi
i want to train on VCRdataset.

i got this error!
how do i do? help T^T

(vilbert-mt) ailab@ailab:~/vilbert-multi-task$ python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin --config_file config/bert_base_6layer_6conect.json --tasks 5-6 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model
Failed to import tensorflow.
03/12/2020 23:24:28 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
03/12/2020 23:24:29 - INFO - pytorch_transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/12/2020 23:24:29 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 64
03/12/2020 23:24:40 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 64
03/12/2020 23:24:53 - INFO - vilbert.utils -   logging file at: save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-multi_task_model/logs
03/12/2020 23:24:53 - ERROR - vilbert.utils -   Model name 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, roberta-base, roberta-large, roberta-large-mnli). We assumed 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
  File "train_tasks.py", line 670, in <module>
    main()
  File "train_tasks.py", line 398, in main
    for key, value in dict(model.named_parameters()).items():
AttributeError: 'NoneType' object has no attribute 'named_parameters'
(vilbert-mt) ailab@ailab:~/vilbert-multi-task$ python train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin --config_file config/bert_base_6layer_6conect.json --tasks 5-6 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model
Failed to import tensorflow.
03/12/2020 23:34:06 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
03/12/2020 23:34:08 - INFO - pytorch_transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.cache/torch/pytorch_transformers/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/12/2020 23:34:08 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 64
03/12/2020 23:34:19 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 64
03/12/2020 23:34:34 - INFO - vilbert.utils -   logging file at: save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-multi_task_model/logs
03/12/2020 23:34:34 - ERROR - vilbert.utils -   Model name 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, roberta-base, roberta-large, roberta-large-mnli). We assumed 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
  File "train_tasks.py", line 670, in <module>
    main()
  File "train_tasks.py", line 398, in main
    for key, value in dict(model.named_parameters()).items():
AttributeError: 'NoneType' object has no attribute 'named_parameters'

ModuleNotFoundError: No module named 'tools.refer.refer'

!python train_tasks.py --bert_model bert-base-uncased --from_pretrained data/pretrained_model.bin --config_file config/bert_base_6layer_6conect.json --tasks 1-2-4-7-8-9-10-11-12-13-15-17 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model

output:
2020-07-21 14:05:52.998850: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "train_tasks.py", line 33, in
from vilbert.task_utils import (
File "/content/drive/My Drive/vilbert-multi-task-master/vilbert/task_utils.py", line 19, in
from vilbert.datasets import DatasetMapTrain, DatasetMapEval
File "/content/drive/My Drive/vilbert-multi-task-master/vilbert/datasets/init.py", line 15, in
from .refer_expression_dataset import ReferExpressionDataset
File "/content/drive/My Drive/vilbert-multi-task-master/vilbert/datasets/refer_expression_dataset.py", line 16, in
from tools.refer.refer import REFER
ModuleNotFoundError: No module named 'tools.refer.refer'

'datasets' folder?

For the whole datasets folder, is there any scripts to generate it or are we supposed to create it? also where is the script to generate several .pkl files that supposed to be contained in the datasets folder, where is the info about the cleaned dataset? thanks

Transfer learning with vilbert

From my understanding, we get visuo-linguistic embeddings using VilBert (and LXMERT and VL-Bert for that matter too). Is it possible to simply use these as a layer/feature extractor backbone for visual/linguistic tasks? For instance, if we wanted to add a linear classifier (or LSTM) on top of the VilBert embeddings, are there any provided pretrained weights?

Thanks

Could you please provide access to the required data files?

Hi! Thank you for releasing this great project! However, I notice that the data files (including the lmdb feature files as well as other metadata) needed to run pre-training and multi-task fine-tuning is not accessible. Could you please add accessible links to them? Or a readme guiding how to generate them is also fine. Thank you very much!

Wrong num_train_optimization_steps with gradient accumulation

Hey @vedanuj, it looks like there is a bug in the number of training steps when using gradient accumulation (L343):

num_train_optimization_steps = int(
train_dataset.num_dataset
/ args.train_batch_size
/ args.gradient_accumulation_steps
) * (args.num_train_epochs - args.start_epoch)

as args.train_batch_size was already divided by args.gradient_accumulation_steps in L288:

args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps

PS. This issue is also present in vilbert_beta (@jiasenlu)

requirements.txt update

pytorch-transformers==1.0.0 does not provide implementation for RoBERTa model. The support starts from 1.1.0. With pytorch-transformers==1.0.0, it gives me errors when running the training script.
2.
tensorboardX==1.2
tensorflow==1.15.2
tensorpack==0.9.4
For some reasons, I have the following errors when running the scripts. I am wondering if there are any conflicts
Traceback (most recent call last): File "/home/mou/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/ptvsd_launcher.py", line 48, in <module> main(ptvsdArgs) File "/home/mou/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 432, in main run() File "/home/mou/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/old_ptvsd/ptvsd/__main__.py", line 316, in run_file runpy.run_path(target, run_name='__main__') File "/home/mou/miniconda3/envs/evqa/lib/python3.6/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/home/mou/miniconda3/envs/evqa/lib/python3.6/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/home/mou/miniconda3/envs/evqa/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/mou/Desktop/evqa/vilbert-multi-task/train_evqa.py", line 33, in <module> from vilbert.task_utils import ( File "/home/mou/Desktop/evqa/vilbert-multi-task/vilbert/task_utils.py", line 20, in <module> from vilbert.datasets import DatasetMapTrain, DatasetMapEval File "/home/mou/Desktop/evqa/vilbert-multi-task/vilbert/datasets/__init__.py", line 6, in <module> from .concept_cap_dataset import ( File "/home/mou/Desktop/evqa/vilbert-multi-task/vilbert/datasets/concept_cap_dataset.py", line 15, in <module> import tensorpack.dataflow as td File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorpack/__init__.py", line 5, in <module> from tensorpack.libinfo import __version__, __git_version__, _HAS_TF File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorpack/libinfo.py", line 53, in <module> import tensorflow as tf # noqa File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorflow/__init__.py", line 99, in <module> from tensorflow_core import * File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorflow_core/__init__.py", line 36, in <module> from tensorflow._api.v1 import compat File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorflow_core/_api/v1/compat/__init__.py", line 24, in <module> from tensorflow._api.v1.compat import v2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorflow_core/_api/v1/compat/v2/__init__.py", line 322, in <module> from tensorboard.summary._tf import summary File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/summary/__init__.py", line 25, in <module> from tensorboard.summary import v1 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/summary/v1.py", line 24, in <module> from tensorboard.plugins.audio import summary as _audio_summary File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/plugins/audio/summary.py", line 36, in <module> from tensorboard.plugins.audio import metadata File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/plugins/audio/metadata.py", line 21, in <module> from tensorboard.compat.proto import summary_pb2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/compat/proto/summary_pb2.py", line 16, in <module> from tensorboard.compat.proto import tensor_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__pb2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/compat/proto/tensor_pb2.py", line 16, in <module> from tensorboard.compat.proto import resource_handle_pb2 as tensorboard_dot_compat_dot_proto_dot_resource__handle__pb2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/compat/proto/resource_handle_pb2.py", line 16, in <module> from tensorboard.compat.proto import tensor_shape_pb2 as tensorboard_dot_compat_dot_proto_dot_tensor__shape__pb2 File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/tensorboard/compat/proto/tensor_shape_pb2.py", line 23, in <module> serialized_pb=_b('\n+tensorboard/compat/proto/tensor_shape.proto\x12\x0btensorboard\"{\n\x10TensorShapeProto\x12.\n\x03\x64im\x18\x02 \x03(\x0b\x32!.tensorboard.TensorShapeProto.Dim\x12\x14\n\x0cunknown_rank\x18\x03 \x01(\x08\x1a!\n\x03\x44im\x12\x0c\n\x04size\x18\x01 \x01(\x03\x12\x0c\n\x04name\x18\x02 \x01(\tBq\n\x18org.tensorflow.frameworkB\x11TensorShapeProtosP\x01Z=github.com/tensorflow/tensorflow/tensorflow/go/core/framework\xf8\x01\x01\x62\x06proto3') File "/home/mou/miniconda3/envs/evqa/lib/python3.6/site-packages/google/protobuf/descriptor.py", line 878, in __new__ return _message.default_pool.AddSerializedFile(serialized_pb) TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "tensorboard/compat/proto/tensor_shape.proto": tensorboard.TensorShapeProto.dim: "tensorboard.TensorShapeProto.dim" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.unknown_rank: "tensorboard.TensorShapeProto.unknown_rank" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.Dim.size: "tensorboard.TensorShapeProto.Dim.size" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.Dim.name: "tensorboard.TensorShapeProto.Dim.name" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.Dim: "tensorboard.TensorShapeProto.Dim" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto: "tensorboard.TensorShapeProto" is already defined in file "tensorboardX/src/tensor_shape.proto". tensorboard.TensorShapeProto.dim: "tensorboard.TensorShapeProto.Dim" seems to be defined in "tensorboardX/src/tensor_shape.proto", which is not imported by "tensorboard/compat/proto/tensor_shape.proto". To use it here, please add the necessary import.

extract_features failed to run

hi,When I run this extract_features file,I got this error,Whether there is an error for detectron_model.pth and detectron_config.yaml ?
image

How are the images embedded?

The paper does not really go into detail on how the embeddings are initialized. From my understanding in a text embedding we assign tokens an id (generally the row number), and each row has a corresponding vector for the token. How does this work for images? Does each row represent some region in the image?

I saw from the code:

class BertImageEmbeddings(nn.Module):
"""Construct the embeddings from image, spatial location (omit now) and token_type embeddings.
"""
def __init__(self, config):
super(BertImageEmbeddings, self).__init__()
self.image_embeddings = nn.Linear(config.v_feature_size, config.v_hidden_size)
self.image_location_embeddings = nn.Linear(5, config.v_hidden_size)
self.LayerNorm = BertLayerNorm(config.v_hidden_size, eps=1e-12)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
def forward(self, input_ids, input_loc):
img_embeddings = self.image_embeddings(input_ids)
loc_embeddings = self.image_location_embeddings(input_loc)
# TODO: we want to make the padding_idx == 0, however, with custom initilization, it seems it will have a bias.
# Let's do masking for now
embeddings = self.LayerNorm(img_embeddings + loc_embeddings)
# embeddings = self.LayerNorm(img_embeddings+loc_embeddings)
embeddings = self.dropout(embeddings)
return embeddings

Looking at one of the configs bert_base_8layer_8conect.json it looks like the visual embeddings are of shape (2048, 1024) as the v_feature_size is 2048 and the v_hidden_size is 1024.

In the bert model:

v_embedding_output = self.v_embeddings(input_imgs, image_loc)

It looks like we pass in the extracted features (highest prob boxes from maskrnn) and its locations to the embedding as seen here

But I don't understand what its actually doing? If we pass in a list of features how is it looking it up? Okay, it isn't exactly the same as an embedding layer. This is just a linear layer which takes input multiplies with weight and sums up all scores etc. In this "embedding" is the linear layer mapping similar features to similar outputs? Like if there are similar "in" features (like eyes on dogs) the projected points should be close (in the out feature space). But if I fed the linear layer features from a horses teeth it would project a point which is far from the eyes (obviously with regards to the problem)

msgpack.exceptions.ExtraData: unpack(b) received extra data.

When running preprocess_sequential_train_segment.py, I got this error. Any ideas?

Process _Worker-1:
Traceback (most recent call last):
  File "/home/anaconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/anaconda3//lib/python3.7/site-packages/tensorpack/dataflow/parallel.py", line 285, in run
    for dp in self.ds:
  File "/home/anaconda3/lib/python3.7/site-packages/tensorpack/dataflow/common.py", line 297, in __iter__
    ret = self.func(copy(dp))  # shallow copy the list
  File "/home/anaconda3/lib/python3.7/site-packages/tensorpack/dataflow/serialize.py", line 84, in <lambda>
    return MapData(df, lambda dp: loads(dp[1]))
  File "/home/anaconda3/lib/python3.7/site-packages/tensorpack/utils/serialize.py", line 43, in loads_msgpack
    max_str_len=MAX_MSGPACK_LEN)
  File "/home/anaconda3/lib/python3.7/site-packages/msgpack_numpy.py", line 255, in unpackb
    return _unpackb(packed, **kwargs)
  File "msgpack/_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.

Referred Object Comprehension code (tools/refer) is in python 2

Hello,

I am trying to use reproduce train and test performance using this repository however one of the external packages in tools/refer seems to be written in python2.

Am I looking at the wrong version or is there any workaround apart from converting the print statements (which seem to be the culprit for now?)

Please let me know.

Clarification of features ("fc6") and bbox from extract_features.py

Hi,

I intend to extract features from a simulated environment, and object detection models like detectron and yolov3 perform poorly on my dataset, and hence I plan to use metadata from the simulator to extract object detection boxes. For this, I wanted to clarify my understanding of features and bbox.

bbox in "extract_features.py" (bbox=out["proposals"]), is the object detection model output box coordinates in the format (xmin, ymin, xmax, ymax) (corner coordinates of the box), there are n such boxes (n=100) per image, while features represent the ResNet101 pre-trained model output of the regions inside the boxes, and there are n such features per image (n=100, same as number of boxes).

Thank you in anticipation!!

The use of python-prctl

Hello,

Could you please tell me where python-prctl is utilized? I need to run the code on a machine where I don't have sudo access and many ways to install python-prctl properly seem to depend on sudo (it needs build-essential and libpcap-dev, other ways of installing seem to end up needing sudo at some point, as well). I was able to install it on my laptop, but it doesn't have a GPU, so I run into other issues.

Best regards

Details about object representation for GuessWhat?!

Hello,

I was checking your code and paper and I was a little bit confused by your approach at GuessWhat?!. I can see from your dataset reader that you compute Intersection over Union in order to find a match between the gold bounding boxes and the predicted ones by FastRCNN (https://github.com/facebookresearch/vilbert-multi-task/blob/master/vilbert/datasets/guesswhat_pointing_dataset.py#L276). However, seems that you concatenate the two bounding boxes later on. How does it work exactly? Could you please provide more details and rationale about this approach? Unfortunately, I can't see an explanation for it either in the code or in the paper.

Thank you in advance for your help.

Usage of co_attention_mask

I'm trying to extract representations from pre-trained VILBert, by building a method within class VILBertForVLTasks that returns sequence_output_t, sequence_output_v, pooled_output_t, pooled_output_v (line 1652 of vilbert/vilbert.py). I want the text representations (sequence_output_t) to be independent of the visual input, which I figured I would need to do using the co_attention_mask input.

The default co_attention_mask is None, do I need to set it a tensor of ones of the appropriate size in order to mask the text and visual inputs from each other? I tried following the usage of co_attention_mask in BertModel, BertEncoder etc, but it's not clear exactly where this masking is applied, or where the variable use_co_attention_mask is set to True.

Json concap

How does json file of captions look like?

How to use VilBert pretrained for Caption-Based Image Retrieval

I would like to know if the pre-trained model given by this link (https://dl.fbaipublicfiles.com/vilbert-multi-task/pretrained_model.bin) can be used for Caption-Based Image Retrieval.

My first guess is that I can load the model using (not sure if the configuration file is the proper one):

config = BertConfig.from_json_file('config/bert_base_6layer_6conect.json')
model = VILBertForVLTasks.from_pretrained('pretrained_model.bin', config=config) 

Afterwards I have seen digging in the code that running the inner bert model should give the sequence outputs for text and for image.

I have several questions:

  • As explained in the paper in page 4, how can I extract from these sequences the output hIMG and hCLS, can I assume they are the first one in each corresponding sequences?
  • Since the training aims to have a proper prediction on wether these two representations are aligned, can we expect that image embeddings (hIMG) and text embeddings (hCLS) to have large cosine similarity, (or any relation to other distance metric)?
  • Would the model fail if no text or no image is not provided? I would like to use it to extract one-feature-or-the-other but not providing both inputs.
  • Does the model expect to have a complete input image and it handles inside the object detection? Or does it expect the meaningful regions to be extracted as a preprocessing step? If it is expected to be called inside the model, what is supposed to be the image_loc parameter?

I hope I made myself clear

Thank you very much

Apex version

Hi, I have problems when using fp16 training with apex, which is caused by API changes in apex. Can you provide the specific git commit of apex and the build command you used?

Use the "extract_features" script with Detectron2?

NOTE: I have made some changes towards the bottom if someone can take a look at it to let me know if it looks about right?

I have been trying to get this to work with detectron2 as I have a fine tuned model on some custom data.

In particular I do not know how to implement this portion using detectron2

def _process_feature_extraction(
        self, output, im_scales, im_infos, feature_name="fc6", conf_thresh=0
    ):
        batch_size = len(output[0]["proposals"])
        n_boxes_per_image = [len(boxes) for boxes in output[0]["proposals"]]
        score_list = output[0]["scores"].split(n_boxes_per_image)
        score_list = [torch.nn.functional.softmax(x, -1) for x in score_list]
        feats = output[0][feature_name].split(n_boxes_per_image)
        cur_device = score_list[0].device

I have tried implementing part of it but I am stuck on what the scores are? What does it represent? is it the full softmax vector?

this is what I have done so far:

images = ImageList.from_tensors(lst[:1], size_divisibility=32).to("cuda")  # preprocessed input tensor
#setup config
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
cfg.SOLVER.IMS_PER_BATCH = 1
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (pnumonia)
#Just run these lines if you have the trained model im memory
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
#build model
model = build_model(cfg)
DetectionCheckpointer(model).load("output/model_final.pth")
model.eval()#make sure its in eval mode

#run model
features = model.backbone(images.tensor.float())
proposals, _ = model.proposal_generator(images, features)
instances = model.roi_heads._forward_box(features, proposals)
mask_features = [features[f] for f in model.roi_heads.in_features]
mask_features = model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances])
###########
batch_size = len(proposals)
n_boxes_per_image = [len(boxes) for boxes in proposals]

EDIT

I have changed the extract features methods to run using the detectron2 model. I believe this is correct, could anyone take a quick look at it:
_process_feature_extraction:

    def _process_feature_extraction(
        self, output, im_scales, im_infos, feature_name="p6", conf_thresh=0
    ):
        feat_list = []
        info_list = []
        batch_size = len(output['instances'])
        #print(batch_size)
        for i in range(batch_size):
            feat_list.append(output['features'][feature_name][i])
            info_list.append(
                    {
                        "bbox": output['instances'][i].pred_boxes.to('cpu').tensor.numpy() / im_scales[i],
                        "num_boxes": len(output['instances'][i]),
                        "objects": output['instances'][i].pred_classes.to('cpu').numpy(),
                        "image_width": im_infos[i]["width"],
                        "image_height": im_infos[i]["height"],
                        "cls_prob": output['instances'][i].scores.to('cpu').numpy(),
                    }
                )

        return feat_list, info_list


get_detectron_features:

    def get_detectron_features(self, image_paths):
        img_tensor, im_scales, im_infos = [], [], []

        for image_path in image_paths:
            im, im_scale, im_info = self._image_transform(image_path)
            img_tensor.append(im)
            im_scales.append(im_scale)
            im_infos.append(im_info)

        # Image dimensions should be divisible by 32, to allow convolutions
        # in detector to work
        current_img_list = ImageList.from_tensors(img_tensor, size_divisibility=32)
        current_img_list = current_img_list.to("cuda")
        #print(current_img_list.tensor)
        #print(np.shape(current_img_list.tensor))

        with torch.no_grad():
            #run model
            features = self.detection_model.backbone(current_img_list.tensor)#outputs features
            proposals, _ = self.detection_model.proposal_generator(current_img_list, features)
            instances, scores = self.detection_model.roi_heads._forward_box(features, proposals)
            mask_features = [features[f] for f in self.detection_model.roi_heads.in_features]
            mask_features = self.detection_model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances])
            output = {'features':features, 'proposals':proposals, 'instances':instances, 'mask_features': mask_features}

        feat_list = self._process_feature_extraction(
            output,
            im_scales,
            im_infos,
            self.feature_name,
            self.confidence_threshold,
        )

        return feat_list

_build_detection_model:

    def _build_detection_model(self):
        cfg = get_cfg()
        cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))
        #cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")
        cfg.SOLVER.IMS_PER_BATCH = 1
        cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (pnumonia)
        #Just run these lines if you have the trained model im memory
        cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7   # set the testing threshold for this model
        model = build_model(cfg)
        DetectionCheckpointer(model).load("output/model_final.pth")
        cfg.freeze()

        model.to("cuda")
        model.eval()
        return model

inference for image retrieval ?

my question is : how to do inference for caption based image retrieval, given a set of images and a question/caption ? can you provide an example in demo.ipynb ?

i want to fine tune for VCR task

hi!
i want to fine-tune for VCR task.
i have to use pre-train conceptual caption 6 layer model at 'from_pretrained' parameter. right?

thank you!

Where is the '<IMG>' token ?

hi

i analyze you're code. i can't see IMG token anywhere.

where is the IMG token?
I would be grateful if you could tell me where to put the IMG token.

thank you:)

No caption_train.json file

I have followed the data instruction to prepare the data. However, I am not able to find the caption_train.json file. Can you please tell me where I can find this file.

Issue with generating lmdb

Running the script/convert_to_lmdb.py stops at around 90% with no error, but apparently it hasn't finished, as running train_tasks.py gives TypeError: a bytes-like object is required, not 'NoneType' at here. Does anyone know what could be the problem? Thanks~

errors when running fine tuning cmd

I download your dataset and run:
python train_tasks.py --bert_model bert-base-uncased --from_pretrained ./multi_task_model.bin --config_file config/bert_base_6layer_6conect.json --tasks 1 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name finetune_from_multi_task_model

it will give error:
Traceback (most recent call last):
File "train_tasks.py", line 670, in
main()
File "train_tasks.py", line 529, in main
task_losses,
File "/dccstor/extrastore/vilbert-multi-task/vilbert/task_utils.py", line 197, in ForwardModelsTrain
batch
ValueError: too many values to unpack (expected 9)

Non-existent config key error?

Hi, I got KeyError: 'Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS' when ran the extract_features.py file. I was wondering if you know the possible reason (I downloaded the model and config file using the given link)? Thanks for the work, btw

Caught StopIteration in replica 0 on device 0.

When I use multi-gpu , error happens, the detail is bellow. How can I solve this problem

File "vilbert-multi-task/train_cls.py", line 535, in
main()
File "vilbert-multi-task/train_cls.py", line 407, in main
task_losses,
File "..../vilbert-multi-task/vilbert/task_utils.py", line 327, in ForwardModelsTrain
task_tokens,
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "..../vilbert-multi-task/vilbert/vilbert.py", line 1662, in forward
output_all_attention_masks=output_all_attention_masks,
File "/opt/conda/envs/python3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "...../vilbert-multi-task/vilbert/vilbert.py", line 1351, in forward
dtype=next(self.parameters()).dtype
StopIteration

Training Loop Issue: ForwardModelsTrain()

Doesn't anybody feel it probmatic or twisted about the training loop?
In ForwardModelsTrain(), there is a loop over dataloader to use the batch data one after another. But there is a "return loss, batch_score" within the loop, meaning that the function only uses the first batch before exiting. After the loss and score of the batch are returned, the parent function does the back propagration and a few other tasks as it is supposed to do.

My question is that should the loop over the dataloader outside the ForwardModelsTrain(), replacing the line for step in range(median_num_iter): ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.