Describe Model I am using (UniLM, MiniLM, LayoutLM ...): BEIT-3</

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[BEIT-3] error happens when I evaluate BEiT-3 finetuned model on VQAv2 about unilm HOT 4 CLOSED

matsutaku44 commented on August 30, 2024

[BEIT-3] error happens when I evaluate BEiT-3 finetuned model on VQAv2

from unilm.

Comments (4)

matsutaku44 commented on August 30, 2024 1

I can get submit_vqav2_test.json (the list of pairs of question_id and answer).

I write this in run_beit3_finetuning.py (line 141)
parser.add_argument("--local-rank", type=int)

Then, I run this code. (Maybe you should not omit "-m torch.distributed.launch --nproc_per_node=2")

python -m torch.distributed.launch --nproc_per_node=2 run_beit3_finetuning.py \
        --model beit3_base_patch16_480 \
        --input_size 480 \
        --task vqav2 \
        --batch_size 16 \
        --sentencepiece_model ../../../../new_mensa/data/VQAv2/BEIT3/beit3.spm \
        --finetune ../../../../new_mensa/data/VQAv2/BEIT3/beit3_base_indomain_patch16_224.pth \
        --data_path ../../../../new_mensa/data/VQAv2 \
        --output_dir ./prediction_saveHere \
        --eval \
        --dist_eval

Then, I can get submit_vqav2_test.json

. . . 

Test:  [9310/9330]  eta: 0:00:05    time: 0.2790  data: 0.0002  max mem: 4665
Test:  [9320/9330]  eta: 0:00:02    time: 0.2789  data: 0.0002  max mem: 4665
Test:  [9329/9330]  eta: 0:00:00    time: 0.2674  data: 0.0001  max mem: 4665
Test: Total time: 0:43:23 (0.2790 s / it)
Infer 447793 examples into ./prediction_saveHere/submit_vqav2_test.json

I don't know why I can get the json file. But, I close this issue.

from unilm.

Sv3n01 commented on August 30, 2024

Changing
"python -m torch.distributed.launch --nproc_per_node=2 run_beit3_finetuning.py" to
"python -m run_beit3_finetuning"
solved it for me in google colab.

from unilm.

matsutaku44 commented on August 30, 2024

@Sv3n01 Thank you for replying!
I am trying this change now.

from unilm.

matsutaku44 commented on August 30, 2024

I removed "torch.distributed.launch --nproc_per_node=2" and run again.
Then, Evaluation seemed to be started. Thank you very much!
However, a different error happens.

I run this code.

python run_beit3_finetuning.py \
        --model beit3_base_patch16_480 \
        --input_size 480 \
        --task vqav2 \
        --batch_size 16 \
        --sentencepiece_model ../../../../new_mensa/data/VQAv2/BEIT3/beit3.spm \
        --finetune ../../../../new_mensa/data/VQAv2/BEIT3/beit3_base_indomain_patch16_224.pth \
        --data_path ../../../../new_mensa/data/VQAv2 \
        --output_dir ./prediction_saveHere \
        --eval \
        --dist_eval

The error

. . .

Test:  [18640/18659]  eta: 0:00:05    time: 0.2775  data: 0.0002  max mem: 3774
Test:  [18650/18659]  eta: 0:00:02    time: 0.2774  data: 0.0002  max mem: 3774
Test:  [18658/18659]  eta: 0:00:00    time: 0.2658  data: 0.0000  max mem: 3774
Test: Total time: 1:26:48 (0.2792 s / it)
Traceback (most recent call last):
  File "run_beit3_finetuning.py", line 448, in <module>
    main(opts, ds_init)
  File "run_beit3_finetuning.py", line 365, in main
    utils.dump_predictions(args, result, "vqav2_test")
  File "/home/matsuzaki.takumi/workspace/vqa/unilm/beit3/utils.py", line 845, in dump_predictions
    torch.distributed.barrier()
  File "/home/matsuzaki.takumi/.conda/envs/beit3-3.8/lib/python3.8/site-packages/torch/distributed/c10d_logger.py", line 75, in wrapper
    return func(*args, **kwargs)
  File "/home/matsuzaki.takumi/.conda/envs/beit3-3.8/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 3672, in barrier
    opts.device = _get_pg_default_device(group)
  File "/home/matsuzaki.takumi/.conda/envs/beit3-3.8/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 649, in _get_pg_default_device
    group = group or _get_default_group()
  File "/home/matsuzaki.takumi/.conda/envs/beit3-3.8/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1008, in _get_default_group
    raise ValueError(
ValueError: Default process group has not been initialized, please make sure to call init_process_group.

I am trying to solve this problem now.
Do you have a solution? Please teach me.

from unilm.

[BEIT-3] error happens when I evaluate BEiT-3 finetuned model on VQAv2 about unilm HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent