Giter VIP home page Giter VIP logo

Comments (6)

RogersSteve avatar RogersSteve commented on September 14, 2024

Hello, i have the same problem, do you know how to fix it now?

from geochat.

Hoteryoung avatar Hoteryoung commented on September 14, 2024

Are you sure that your mismatched size is the same as mine?
I remembered that I closed the issue because it was caused by an incorrect change of the original code somewhere by myself.
But I really can't remember the details.

from geochat.

RogersSteve avatar RogersSteve commented on September 14, 2024

Are you sure that your mismatched size is the same as mine? I remembered that I closed the issue because it was caused by an incorrect change of the original code somewhere by myself. But I really can't remember the details.

I have fixed this problem today, thanks for your answer.
The reason I got this problem is that I used a wrong model to finetune。

from geochat.

kartikey9254 avatar kartikey9254 commented on September 14, 2024

Are you sure that your mismatched size is the same as mine? I remembered that I closed the issue because it was caused by an incorrect change of the original code somewhere by myself. But I really can't remember the details.

I have fixed this problem today, thanks for your answer. The reason I got this problem is that I used a wrong model to finetune。

can you help me doing the same . when i use the model geochat 7b it shows the size mismatch and when using llava-v1.5-7b it shows files not found in the directory . i have double crossed from huggingface that each file is present . i am unnable to train my data due to this . if you dont get my question can u pleas explain how can i train my data .

from geochat.

Amazingren avatar Amazingren commented on September 14, 2024

hi @Hoteryoung

after this commend you provided above:

srun --jobid $SLURM_JOBID \
    bash -c "python -m torch.distributed.run \
        --nproc_per_node $GPUS_PER_NODE \
        --nnodes $SLURM_NNODES \
        --node_rank $SLURM_PROCID \
        --master_addr $MASTER_ADDR \
        --master_port $MASTER_PORT \
        geochat/train/train_mem.py \
            --lora_enable True \
            --model_name_or_path $CODE_DIR/llava-v1.5-7b/ \
            --version $PROMPT_VERSION \
            --data_path $DATASET_DIR/GeoChat_Instruct.json \
            --image_folder $DATASET_DIR/share/softwares/kartik/GeoChat_finetuning/final_images_llava/  \
            --vision_tower openai/clip-vit-large-patch14-336/ \
            --mm_projector_type mlp2x_gelu \
            --pretrain_mm_mlp_adapter $CODE_DIR/llava-v1.5-7b/mm_projector.bin \
            --mm_vision_select_layer -2 \
            --mm_use_im_start_end False \
            --mm_use_im_patch_token False \
            --image_aspect_ratio pad \
            --bf16 True \
            --output_dir $OUTPUT_DIR \
            --num_train_epochs 1 \
            --per_device_train_batch_size 32 \
            --per_device_eval_batch_size 4 \
            --gradient_accumulation_steps 1 \
            --evaluation_strategy 'no' \
            --save_strategy 'epoch' \
            --save_steps 10000 \
            --save_total_limit 1 \
            --learning_rate 2e-4 \
            --weight_decay 0. \
            --warmup_ratio 0.03 \
            --lr_scheduler_type 'cosine' \
            --logging_steps 1 \
            --tf32 True \
            --model_max_length 2048 \
            --gradient_checkpointing True \
            --lazy_preprocess True \
            --dataloader_num_workers 16 \
            --report_to wandb \
            --deepspeed ./scripts/zero2.json"

Do we need further fine-tune? (Since I read from the paper and this repo that there is another grounding fine-tune). Now I am a bit confused about the procedure. It would be super nice to have some suggestions from you regarding this issue.

Bests,

from geochat.

Hoteryoung avatar Hoteryoung commented on September 14, 2024

hi @Hoteryoung

after this commend you provided above:

srun --jobid $SLURM_JOBID \
    bash -c "python -m torch.distributed.run \
        --nproc_per_node $GPUS_PER_NODE \
        --nnodes $SLURM_NNODES \
        --node_rank $SLURM_PROCID \
        --master_addr $MASTER_ADDR \
        --master_port $MASTER_PORT \
        geochat/train/train_mem.py \
            --lora_enable True \
            --model_name_or_path $CODE_DIR/llava-v1.5-7b/ \
            --version $PROMPT_VERSION \
            --data_path $DATASET_DIR/GeoChat_Instruct.json \
            --image_folder $DATASET_DIR/share/softwares/kartik/GeoChat_finetuning/final_images_llava/  \
            --vision_tower openai/clip-vit-large-patch14-336/ \
            --mm_projector_type mlp2x_gelu \
            --pretrain_mm_mlp_adapter $CODE_DIR/llava-v1.5-7b/mm_projector.bin \
            --mm_vision_select_layer -2 \
            --mm_use_im_start_end False \
            --mm_use_im_patch_token False \
            --image_aspect_ratio pad \
            --bf16 True \
            --output_dir $OUTPUT_DIR \
            --num_train_epochs 1 \
            --per_device_train_batch_size 32 \
            --per_device_eval_batch_size 4 \
            --gradient_accumulation_steps 1 \
            --evaluation_strategy 'no' \
            --save_strategy 'epoch' \
            --save_steps 10000 \
            --save_total_limit 1 \
            --learning_rate 2e-4 \
            --weight_decay 0. \
            --warmup_ratio 0.03 \
            --lr_scheduler_type 'cosine' \
            --logging_steps 1 \
            --tf32 True \
            --model_max_length 2048 \
            --gradient_checkpointing True \
            --lazy_preprocess True \
            --dataloader_num_workers 16 \
            --report_to wandb \
            --deepspeed ./scripts/zero2.json"

Do we need further fine-tune? (Since I read from the paper and this repo that there is another grounding fine-tune). Now I am a bit confused about the procedure. It would be super nice to have some suggestions from you regarding this issue.

Bests,

I did no further fine-tune for the lack of details in the paper.
Based on my test, the VQA and scene classification metrics are close to the results reported in the paper. However, there is an obvious gap in the region grounding metric. Just so you know, I did not test the referring task.

from geochat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.