i got this error.
(oscar) ailab@ailab:~/oscar/Oscar/oscar$ python run_vqa.py -j 4 --img_feature_dim 2054 --max_img_seq_length 50 --data_label_type mask --img_feature_type faster_r-cnn --data_dir /media/ailab/jaeyun/oscar/datasets/vqa/2k/ --model_type bert --model_name_or_path /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/ --task_name vqa_text --do_train --do_lower_case --max_seq_length 128 --per_gpu_eval_batch_size 1 --per_gpu_train_batch_size 1 --learning_rate 5e-05 --num_train_epochs 25 --output_dir results --label_file /media/ailab/jaeyun/oscar/datasets/vqa/cache/trainval_ans2label.pkl --save_epoch 1 --seed 88 --evaluate_during_training --logging_steps 4000 --drop_out 0.3 --weight_decay 0.05 --warmup_steps 0 --loss_type bce --img_feat_format pt --classifier linear --cls_hidden_scale 3 --txt_data_dir /media/ailab/jaeyun/oscar/datasets/vqa/2k/
07/06/2020 12:17:14 - WARNING - __main__ - Process rank: -1, device: cuda, n_gpu: 2, distributed training: False, 16-bits training: False
07/06/2020 12:17:14 - INFO - __main__ - Task Name: vqa_text, #Labels: 3129
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.modeling_utils - loading configuration file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/config.json
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.modeling_utils - Model config {
"attention_probs_dropout_prob": 0.1,
"finetuning_task": "vqa_text",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"img_feature_dim": 2054,
"img_feature_type": "faster_r-cnn",
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"num_labels": 3129,
"output_attentions": false,
"output_hidden_states": false,
"torchscript": false,
"type_vocab_size": 2,
"vocab_size": 30522
}
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.tokenization_utils - Model name '/media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc). Assuming '/media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/' is a path or url to a directory containing tokenizer files.
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.tokenization_utils - loading file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/added_tokens.json
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.tokenization_utils - loading file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/special_tokens_map.json
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.tokenization_utils - loading file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/vocab.txt
07/06/2020 12:17:14 - INFO - transformers.pytorch_transformers.modeling_utils - loading weights file /media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/pytorch_model.bin
07/06/2020 12:17:15 - INFO - oscar.modeling.modeling_bert - BertImgModel Image Dimension: 2054
07/06/2020 12:17:16 - INFO - transformers.pytorch_transformers.modeling_utils - Weights of ImageBertForSequenceClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
07/06/2020 12:17:16 - INFO - transformers.pytorch_transformers.modeling_utils - Weights from pretrained model not used in ImageBertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
07/06/2020 12:17:17 - INFO - __main__ - Training/evaluation parameters Namespace(adam_epsilon=1e-08, adjust_dp=False, adjust_loss=False, adjust_loss_epoch=-1, cache_dir='', classifier='linear', cls_hidden_scale=3, code_level='top', code_voc=512, config_name='', data_dir='/media/ailab/jaeyun/oscar/datasets/vqa/2k/', data_label_type='mask', device=device(type='cuda'), do_eval=False, do_lower_case=True, do_test=False, do_test_dev=False, do_train=True, do_train_val=False, drop_out=0.3, eval_all_checkpoints=False, evaluate_during_training=True, fp16=False, fp16_opt_level='O1', gradient_accumulation_steps=1, hard_label=False, img_feat_dir=None, img_feat_format='pt', img_feature_dim=2054, img_feature_type='faster_r-cnn', label2ans_file=None, label_file='/media/ailab/jaeyun/oscar/datasets/vqa/cache/trainval_ans2label.pkl', learning_rate=5e-05, load_fast=False, local_rank=-1, logging_steps=4000, loss_type='bce', max_grad_norm=1.0, max_img_seq_length=50, max_seq_length=128, max_steps=-1, model_name_or_path='/media/ailab/jaeyun/oscar/models/base-vg-labels/ep_107_1192087/', model_type='bert', n_gpu=2, no_cuda=False, num_train_epochs=25.0, output_dir='results', output_mode='classification', overwrite_cache=False, overwrite_output_dir=False, per_gpu_eval_batch_size=1, per_gpu_train_batch_size=1, philly=False, save_after_epoch=-1, save_epoch=1, save_steps=-1, scheduler='linear', seed=88, server_ip='', server_port='', task_name='vqa_text', tokenizer_name='', txt_data_dir='/media/ailab/jaeyun/oscar/datasets/vqa/2k/', use_vg=False, use_vg_dev=False, warmup_steps=0, weight_decay=0.05, workers=4)
07/06/2020 12:17:18 - INFO - __main__ - Info: loading val features using 0.13 secs
07/06/2020 12:17:18 - INFO - __main__ - val Data Examples: 10402
07/06/2020 12:17:33 - INFO - __main__ - Info: loading train features using 15.48 secs
07/06/2020 12:17:37 - INFO - __main__ - train Data Examples: 634516
07/06/2020 12:17:37 - INFO - __main__ - ***** Running training *****
07/06/2020 12:17:37 - INFO - __main__ - Num examples = 634516
07/06/2020 12:17:37 - INFO - __main__ - Num Epochs = 25
07/06/2020 12:17:37 - INFO - __main__ - Instantaneous batch size per GPU = 1
07/06/2020 12:17:37 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 2
07/06/2020 12:17:37 - INFO - __main__ - Gradient Accumulation steps = 1
07/06/2020 12:17:37 - INFO - __main__ - Total optimization steps = 7931450
Traceback (most recent call last):
File "run_vqa.py", line 1222, in <module>
main()
File "run_vqa.py", line 1145, in main
global_step, tr_loss = train(args, train_dataset, eval_dataset, model, tokenizer)
File "run_vqa.py", line 554, in train
for step, batch in enumerate(train_dataloader):
File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 278, in __iter__
return _MultiProcessingDataLoaderIter(self)
File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 682, in __init__
w.start()
File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/home/ailab/anaconda3/envs/oscar/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
i think it is lack of my gpu memory.
my gpu is 1080ti, and i use two gpu.
which gpu do you use?
thank you!