mbzuai-oryx / geochat Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2024 π₯] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
Home Page: https://mbzuai-oryx.github.io/GeoChat
[CVPR 2024 π₯] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
Home Page: https://mbzuai-oryx.github.io/GeoChat
I came into the following error:
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:11<00:11, 11.91s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:17<00:00, 7.90s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:17<00:00, 8.51s/it]
Traceback (most recent call last):
File "/xxxxx/Documents/code/GeoChat/geochat/train/train_mem.py", line 13, in <module>
train()
File "/xxxxx/Documents/code/GeoChat/geochat/train/train.py", line 828, in train
model = GeoChatLlamaForCausalLM.from_pretrained(
File "/xxxxx/anaconda3/envs/geochat/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/xxxxx/anaconda3/envs/geochat/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3310, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for GeoChatLlamaForCausalLM:
size mismatch for model.vision_tower.vision_tower.vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([577, 1024]) from checkpoint, the shape in current model is torch.Size([1297, 1024]).
You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
The script for finetuning:
srun --jobid $SLURM_JOBID \
bash -c "python -m torch.distributed.run \
--nproc_per_node $GPUS_PER_NODE \
--nnodes $SLURM_NNODES \
--node_rank $SLURM_PROCID \
--master_addr $MASTER_ADDR \
--master_port $MASTER_PORT \
geochat/train/train_mem.py \
--lora_enable True \
--model_name_or_path $CODE_DIR/llava-v1.5-7b/ \
--version $PROMPT_VERSION \
--data_path $DATASET_DIR/GeoChat_Instruct.json \
--image_folder $DATASET_DIR/share/softwares/kartik/GeoChat_finetuning/final_images_llava/ \
--vision_tower openai/clip-vit-large-patch14-336/ \
--mm_projector_type mlp2x_gelu \
--pretrain_mm_mlp_adapter $CODE_DIR/llava-v1.5-7b/mm_projector.bin \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--bf16 True \
--output_dir $OUTPUT_DIR \
--num_train_epochs 1 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy 'no' \
--save_strategy 'epoch' \
--save_steps 10000 \
--save_total_limit 1 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type 'cosine' \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--dataloader_num_workers 16 \
--report_to wandb \
--deepspeed ./scripts/zero2.json"
Please note that I use the latest commit.
I am using finetune_lora.sh with zero3_offload.json to train (context below) and get the following error.
Traceback (most recent call last):
File "/deep/u/emily712/GeoChat/geochat/train/train_mem.py", line 13, in <module>
train()
File "/deep/u/emily712/GeoChat/geochat/train/train.py", line 886, in train
model.get_model().initialize_vision_modules(
File "/deep/u/emily712/GeoChat/geochat/model/geochat_arch.py", line 62, in initialize_vision_modules
model.get_model().initialize_vision_modules(model.get_model().initialize_vision_modules(
File "/deep/u/emily712/GeoChat/geochat/model/geochat_arch.py", line 62, in initialize_vision_modules
File "/deep/u/emily712/GeoChat/geochat/model/geochat_arch.py", line 62, in initialize_vision_modules
vision_tower.load_model()
File "/deep/u/emily712/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 103, in load_model
vision_tower.load_model()
File "/deep/u/emily712/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 103, in load_model
vision_tower.load_model()
File "/deep/u/emily712/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 103, in load_model
self.clip_interpolate_embeddings(image_size=504, patch_size=14)
File "/deep/u/emily712/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 30, in clip_interpolate_embeddings
self.clip_interpolate_embeddings(image_size=504, patch_size=14)
File "/deep/u/emily712/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 30, in clip_interpolate_embeddings
n, seq_length, hidden_dim = pos_embedding.shape
ValueError n, seq_length, hidden_dim = pos_embedding.shape:
not enough values to unpack (expected 3, got 2)
ValueError: not enough values to unpack (expected 3, got 2)
self.clip_interpolate_embeddings(image_size=504, patch_size=14)
File "/deep/u/emily712/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 30, in clip_interpolate_embeddings
n, seq_length, hidden_dim = pos_embedding.shape
ValueError: not enough values to unpack (expected 3, got 2)
Further examination shows that the issue is the CLIP weights are not loaded at the time of positional interpolation. When I load CLIP via CLIPVisionModel.from_pretrained("openai/clip-vit-large-patch14-336") within deepspeed, none of the model weights are loaded (i.e. they are a tensor of size zero). Running
from transformers import CLIPVisionModel
vision_tower = CLIPVisionModel.from_pretrained("openai/clip-vit-large-patch14-336")
state_dict = vision_tower.vision_model.embeddings.position_embedding.state_dict()
pos_embedding = state_dict['weight']
print("pos embedding shape: ", pos_embedding.shape)
with deepspeed within the CLIPVisionTower.load_model() method prints torch.Size([0]), whereas running the same lines of code within a program without deepspeed or within a python shell yields torch.Size([577, 1024]), which is the correct size.
Expected behavior
pos_embedding should have shape torch.Size([577, 1024]).
ds_report output
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
ninja .................. [OKAY]
op name ................ installed .. compatible
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
DeepSpeed general environment info:
torch install path ............... ['/deep/group/aicc-bootcamp/packages/miniconda3/envs/vllava/lib/python3.9/site-packages/torch']
torch version .................... 2.0.1+cu117
deepspeed install path ........... ['/deep/group/aicc-bootcamp/packages/miniconda3/envs/vllava/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.13.1, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7
shared memory (/dev/shm) size .... 251.77 GB
System info (please complete the following information):
HiοΌI use the script to evaluate on the grounding task, but I got the prediction jsonl file contains the obvious wrong answer. For example, the first row is:
{"question_id": "fast_6217", "image_id": "train_5007_0017", "answer": "{<89><47><97><55>|<58>}{<50><24><54><28>|<58>}{<48><16><52><20>|<58>}", "ground_truth": [[[584.0, 337.0], [619.0, 313.0], [601.0, 282.0], [565.0, 304.0]], [[553.0, 287.0], [592.0, 262.0], [573.0, 229.0], [534.0, 254.0]], [[517.0, 237.0], [555.0, 214.0], [534.0, 181.0], [498.0, 204.0]]], "question": "3 airplanes at the right", "type": "ref", "dataset": "FAST", "obj_ids": [1, 2, 3], "size_group": "small"}
The difference between the answer and gt is too large. Is it normal?
Thanks!
I see that both the eval code and the demo code seem to accept just one image.
https://github.com/mbzuai-oryx/GeoChat/blob/main/geochat/conversation.py
https://github.com/mbzuai-oryx/GeoChat/blob/main/geochat/eval/batch_geochat_grounding.py
Is there any method to accept two images and one question just to get one response?(for example: find the differences between two images)
when l use ZeRO-2 for training, l meet the folllowing issue. Does anyone face the same issue? if anyone can resolve this issue?
wandb: Currently logged in as: huduu. Use wandb login --relogin
to force relogin
wandb: Network error (ReadTimeout), entering retry loop.
wandb: ERROR Run initialization has timed out after 90.0 sec.
wandb: ERROR Please refer to the documentation for additional information: https://docs.wandb.ai/guides/track/tracking-faq#initstarterror-error-communicating-with-wandb-process-
Traceback (most recent call last):
File "/media/kou/Data2/ty/GeoChat/geochat/train/train.py", line 960, in
train()
File "/media/kou/Data2/ty/GeoChat/geochat/train/train.py", line 938, in train
trainer.train()
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/transformers/trainer.py", line 1752, in _inner_training_loop
self.control = self.callback_handler.on_train_begin(args, self.state, self.control)
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin
return self.call_event("on_train_begin", args, state, control)
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/transformers/trainer_callback.py", line 397, in call_event
result = getattr(callback, event)(
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/transformers/integrations.py", line 760, in on_train_begin
self.setup(args, state, model, **kwargs)
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/transformers/integrations.py", line 734, in setup
self._wandb.init(
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1185, in init
wandb._sentry.reraise(e)
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/wandb/analytics/sentry.py", line 155, in reraise
raise exc.with_traceback(sys.exc_info()[2])
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 1171, in init
return wi.init()
File "/home/junjie/.conda/envs/ty_GeoChat/lib/python3.10/site-packages/wandb/sdk/wandb_init.py", line 776, in init
raise error
wandb.errors.CommError: Run initialization has timed out after 90.0 sec.
Please refer to the documentation for additional information: https://docs.wandb.ai/guides/track/tracking-faq#initstarterror-error-communicating-with-wandb-process-
I'm trying to reproduce it for scene categorization, but it's reporting an error in batch_geochat_scene.py.
Here is what the error is reported:
Traceback (most recent call last):
File "/opt/data/private/RS-MLLM/GeoChat/./geochat/eval/batch_geochat_scene.py", line 149, in
eval_model(args)
File "/opt/data/private/RS-MLLM/GeoChat/./geochat/eval/batch_geochat_scene.py", line 102, in eval_model
image_tensor_batch = image_processor.preprocess(image_folder,crop_size ={'height': 504, 'width': 504},size = {'shortest_edge': 504}, return_tensors='pt')['pixel_values']
AttributeError: 'NoneType' object has no attribute 'preprocess'
and I checked image_processor and it is assigned to None in builder.py
Thank you for your excellent work. I encountered some problems when trying to use the geochat_demo.py file. Although I tried changing the conda environment many times and changing computers and servers, my problem still did not solve. I'm looking for your help here. My problem is:
In the "Install" section you wrote "Clone this repository and navigate to LLaVA folder". What is "LLaVA folder"? Do I need to create a new folder myself? If so, what is the directory structure of this folder compared to other files?
When I configure the environment according to your instructions and execute the code "python geochat_demo.py --model-path ./models", I will encounter the following error:
(geochat) root@Y9000K:/mnt/d/GeoChat# python geochat_demo.py --model-path ./models
/root/anaconda3/envs/geochat/lib/python3.8/site-packages/gradio_client/documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Parallel'>: No known documentation group for module 'gradio.mix'
warnings.warn(f"Could not get documentation group for {cls}: {exc}")
/root/anaconda3/envs/geochat/lib/python3.8/site-packages/gradio_client/documentation.py:103: UserWarning: Could not get documentation group for <class 'gradio.mix.Series'>: No known documentation group for module 'gradio.mix'
warnings.warn(f"Could not get documentation group for {cls}: {exc}")
Initializing Chat
Loading checkpoint shards: 50%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ | 1/2 [02:06<02:06, 126.92s/it]
Traceback (most recent call last):
File "geochat_demo.py", line 53, in
tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, args.load_8bit, args.load_4bit, device=args.device)
File "/mnt/d/GeoChat/geochat/model/builder.py", line 125, in load_pretrained_model
model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
File "/root/anaconda3/envs/geochat/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/root/anaconda3/envs/geochat/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/root/anaconda3/envs/geochat/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3260, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/root/anaconda3/envs/geochat/lib/python3.8/site-packages/transformers/modeling_utils.py", line 717, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/root/anaconda3/envs/geochat/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([577, 1024]) in "weight" (which has shape torch.Size([1297, 1024])), this look incorrect.
If I do not use the model from GeoChat-7B, but directly use "facebook/opt-350m", the interface can be loaded, but nothing happens when clicking the "send" button
Sorry to bother you, I haven't been able to solve this problem even though I tried many times. I hope I can get your help.
hi, thanks for your great work!
i had some issues when launching the demo, as no image_processor was loaded by default (same bug as a comment mentioned in the youtube demo video iirc).
i found a workaround by renaming the model downloaded from HF "llava" (instead of geochat-7B) and by adding 2 lines of code to the "clip_encoder.py" file, line 86:
else:
self.cfg_only = CLIPVisionConfig.from_pretrained(self.vision_tower_name)
self.vision_tower = CLIPVisionModel.from_pretrained(self.vision_tower_name) ## EDIT
self.vision_tower.requires_grad_(False) ## EDIT
self.clip_interpolate_embeddings(image_size=504, patch_size=14)
there is maybe a simpler fix idk, but it worked for me and i could play with the demo.
code plese T^T
Hi author,
It is nice work.
When run the evaluation codes, I find the output is json file.
My questions: How to calculate the metrics in table 7, 8, 9?
Would you like to provide the code for computing the metrics?
Thank you
ywsun
python geochat/eval/batch_geochat_grounding.py
--model-path /path/to/model
--question-file path/to/jsonl/file
--answer-file path/to/output/jsonl/file
--image_folder path/to/image/folder/
python geochat/eval/batch_geochat_referring.py
--model-path /path/to/model
--question-file path/to/jsonl/file
--answer-file path/to/output/jsonl/file
--image_folder path/to/image/folder/
When I use the model for visual grounding ,It responds me "answer": "{<91><46><94><50>|<40>}", what's the meaning of them
Hi, authors, thanks for your great work!
What is the minimum memory required during the training process? Is it possible to use 4x 4090 GPUs?
Hello, when I want to use Lora for fine-tuning, no matter how I lower the parameters, the following error will be reported. I am using 8xA40οΌ
(geochat) root@2170f15b1d25:/home/GeoChat/scripts# bash finetune_lora.sh
[2024-03-20 09:49:44,405] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:46,776] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-03-20 09:49:46,776] [INFO] [runner.py:555:main] cmd = /root/miniconda3/envs/geochat/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=21205 --enable_each_rank_log=None /home/GeoChat/geochat/train/train_mem.py --deepspeed /home/GeoChat/scripts/zero2.json --lora_enable True --model_name_or_path /home/LLaVA/llava-v1.5-7b --version v1 --data_path /home/LLaVA-HR/NEWrailwaytrain.json --image_folder /home/LLaVA/data --vision_tower /home/LLaVA/clip-vit-large-patch14-336 --mm_projector_type mlp2x_gelu --pretrain_mm_mlp_adapter /home/LLaVA/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad --bf16 True --output_dir /home/GeoChat/checkpoints_dir --num_train_epochs 1 --per_device_train_batch_size 16 --per_device_eval_batch_size 4 --gradient_accumulation_steps 1 --evaluation_strategy no --save_strategy epoch --save_steps 7000 --save_total_limit 1 --learning_rate 2e-4 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 True --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --dataloader_num_workers 4 --report_to wandb
[2024-03-20 09:49:48,460] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:50,889] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.17.1-1+cuda12.1
[2024-03-20 09:49:50,889] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.17.1-1
[2024-03-20 09:49:50,889] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.17.1-1
[2024-03-20 09:49:50,889] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
[2024-03-20 09:49:50,889] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.17.1-1+cuda12.1
[2024-03-20 09:49:50,889] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2
[2024-03-20 09:49:50,889] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.17.1-1
[2024-03-20 09:49:50,890] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
[2024-03-20 09:49:50,890] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
[2024-03-20 09:49:50,890] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
[2024-03-20 09:49:50,890] [INFO] [launch.py:163:main] dist_world_size=8
[2024-03-20 09:49:50,890] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
[2024-03-20 09:49:54,389] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:54,650] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:54,717] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:54,717] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:54,717] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:54,718] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:54,741] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:54,752] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-20 09:49:55,040] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-20 09:49:55,040] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-20 09:49:55,284] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-20 09:49:55,284] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-20 09:49:55,336] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-20 09:49:55,336] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-20 09:49:55,336] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-03-20 09:49:55,340] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-20 09:49:55,340] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-20 09:49:55,349] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-20 09:49:55,349] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-20 09:49:55,361] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-20 09:49:55,361] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-20 09:49:55,362] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-20 09:49:55,362] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-20 09:49:55,411] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-20 09:49:55,411] [INFO] [comm.py:594:init_distributed] cdb=None
You are using a model of type llava to instantiate a model of type geochat. This is not supported for all configurations of models and can yield errors.
You are using a model of type llava to instantiate a model of type geochat. This is not supported for all configurations of models and can yield errors.
You are using a model of type llava to instantiate a model of type geochat. This is not supported for all configurations of models and can yield errors.
You are using a model of type llava to instantiate a model of type geochat. This is not supported for all configurations of models and can yield errors.
You are using a model of type llava to instantiate a model of type geochat. This is not supported for all configurations of models and can yield errors.
You are using a model of type llava to instantiate a model of type geochat. This is not supported for all configurations of models and can yield errors.
You are using a model of type llava to instantiate a model of type geochat. This is not supported for all configurations of models and can yield errors.
You are using a model of type llava to instantiate a model of type geochat. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [02:24<00:00, 72.26s/it]
Adding LoRA adapters...
Formatting inputs...Skip in lazy mode
[2024-03-20 09:57:14,398] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7453
[2024-03-20 09:57:14,401] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7454
[2024-03-20 09:57:14,401] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7455
[2024-03-20 09:57:15,060] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7456
[2024-03-20 09:57:15,062] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7457
[2024-03-20 09:57:15,064] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7458
[2024-03-20 09:57:15,885] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7459
[2024-03-20 09:57:15,887] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7460
[2024-03-20 09:57:15,888] [ERROR] [launch.py:321:sigkill_handler] ['/root/miniconda3/envs/geochat/bin/python', '-u', '/home/GeoChat/geochat/train/train_mem.py', '--local_rank=7', '--deepspeed', '/home/GeoChat/scripts/zero2.json', '--lora_enable', 'True', '--model_name_or_path', '/home/LLaVA/llava-v1.5-7b', '--version', 'v1', '--data_path', '/home/LLaVA-HR/NEWrailwaytrain.json', '--image_folder', '/home/LLaVA/data', '--vision_tower', '/home/LLaVA/clip-vit-large-patch14-336', '--mm_projector_type', 'mlp2x_gelu', '--pretrain_mm_mlp_adapter', '/home/LLaVA/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--bf16', 'True', '--output_dir', '/home/GeoChat/checkpoints_dir', '--num_train_epochs', '1', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'epoch', '--save_steps', '7000', '--save_total_limit', '1', '--learning_rate', '2e-4', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--lazy_preprocess', 'True', '--dataloader_num_workers', '4', '--report_to', 'wandb'] exits with return code = -7
Thanks for the amazing work. I am trying to explore GeoChat on various tasks and I get stuck on the demo. I would appreciate it a lot if you could help me debug the error.
There is no error message in the terminal and I have followed the instructions on readme to set up everything. (Updated to the latest and the correct version of transformers etc...)
Hello, you provided a demo built with gradio in your project. Now I want to read the weights directly through transformers and use geochat. But loading with AutoModelForCausalLM failed. After analysis, I found that you may have rewritten a GeoChatMetaForCausalLM class.
Can you provide a simple example of using geochat with transformers?
Thank you very much!
hello, I can't unzip the data,it always show data corruption.can you fix it? thank you very much
when will the code and dataset be made public, I want to study against the paperοΌthank you so much
The normalization and quantization of width, height, center x, center y are included in the paper.
However, the information of angle is not included in the paper.
How did you normalize and quantize the angle?
Hello! I use finetune_lora.sh and set as feliows:
--model_name_or_path liuhaotian/llava-v1.5-7b
--vision_tower openai/clip-vit-large-patch14-336 \
and got these:
File "/home/wucz/remote-sensing/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 97, in init
self.clip_interpolate_embeddings(image_size=504, patch_size=14)
File "/home/wucz/remote-sensing/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 34, in clip_interpolate_embeddings
n, seq_length, hidden_dim = pos_embedding.shape
ValueError: not enough values to unpack (expected 3, got 2)
pos_embedding = state_dict['weight']
print(pos_embedding.shape) γtorch.Size([0])γ
pos_embedding = pos_embedding.unsqueeze(0)
print(pos_embedding.shape) γtorch.Size([1, 0])γ
n, seq_length, hidden_dim = pos_embedding.shape
Where did I set the wrong settings that caused me to not read the modelοΌ
I'm trying to reproduce its scene categorization, but there is a name in the code that is not defined.
Below is what is reported as an error:
Traceback (most recent call last):
File "/opt/data/private/RS-MLLM/GeoChat/./geochat/eval/batch_geochat_scene.py", line 139, in
eval_model(args)
File "/opt/data/private/RS-MLLM/GeoChat/./geochat/eval/batch_geochat_scene.py", line 49, in eval_model
questions = get_chunk(questions, args.num_chunks, args.chunk_idx)
NameError: name 'get_chunk' is not defined
It's an interesting study, but hard to reproduce without code.
Greetings,
Thank you very much for open-sourcing this outstanding work!
I run the demo locally using the instructions in LoRA.md. But I'm experiencing bugs, when I select the image and enter the question, I get an error.
AttributeError: 'NoneType' object has no attribute 'image_mean'
Hi,
In view of the use of LoveDA dataset for training your model on classes like buildings and roads, can your model still be used for commercial purposes? If no, what other options are available? Thanks in advance!
I have followed the instructions of finetune_lora.sh
and got the trained model.
this is my finetune_lora.sh
#!/bin/bash
################## VICUNA ##################
PROMPT_VERSION=v1
MODEL_VERSION="vicuna-v1.5-7b"
gpu_ids=0,1,2,3
################## VICUNA ##################
deepspeed --master_port=$((RANDOM + 10000)) --include localhost:$gpu_ids geochat/train/train_mem.py \
--deepspeed ./scripts/zero2.json \
--lora_enable True \
--model_name_or_path pretrained_weights/llavav1.5-7b \
--version $PROMPT_VERSION \
--data_path ~/datasets/GeoChat_Instruct.json \
--image_folder ~/datasets/GeoChat_finetuning/final_images_llava \
--vision_tower openai/clip-vit-large-patch14-336 \
--mm_projector_type mlp2x_gelu \
--pretrain_mm_mlp_adapter pretrained_weights/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--bf16 True \
--output_dir /nfs/geochat_output/checkpoints_dir \
--num_train_epochs 1 \
--per_device_train_batch_size 6 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--save_steps 1000 \
--save_total_limit 5 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--dataloader_num_workers 0 \
--report_to wandb
here is the saved lora fine_tuned model.
(base) β checkpoints_dir tree
.
βββ adapter_config.json
βββ adapter_model.bin
βββ checkpoint-3217
βΒ Β βββ adapter_config.json
βΒ Β βββ adapter_model.bin
βΒ Β βββ global_step3217
βΒ Β βΒ Β βββ bf16_zero_pp_rank_0_mp_rank_00_optim_states.pt
βΒ Β βΒ Β βββ bf16_zero_pp_rank_1_mp_rank_00_optim_states.pt
βΒ Β βΒ Β βββ bf16_zero_pp_rank_2_mp_rank_00_optim_states.pt
βΒ Β βΒ Β βββ bf16_zero_pp_rank_3_mp_rank_00_optim_states.pt
βΒ Β βΒ Β βββ mp_rank_00_model_states.pt
βΒ Β βββ latest
βΒ Β βββ README.md
βΒ Β βββ rng_state_0.pth
βΒ Β βββ rng_state_1.pth
βΒ Β βββ rng_state_2.pth
βΒ Β βββ rng_state_3.pth
βΒ Β βββ special_tokens_map.json
βΒ Β βββ tokenizer_config.json
βΒ Β βββ tokenizer.model
βΒ Β βββ trainer_state.json
βΒ Β βββ training_args.bin
βΒ Β βββ zero_to_fp32.py
βββ config.json
βββ non_lora_trainables.bin
βββ README.md
βββ trainer_state.json
I don't know how to load this model, I didn't find it in readme.md
. can anyone help me? Thank you!
I try to run the demo code but get a CUDA error from
streamer = chat.stream_answer(conv=chat_state,
img_list=img_list,
temperature=temperature,
max_new_tokens=500,
max_length=2000)
This is the error:
...
File "venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 385, in forward
hidden_states, attn_weights = self.self_attn(
^^^^^^^^^^^^^^^
File "venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "venv/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 324, in forward
attn_output = torch.bmm(attn_probs, value_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
I assume that the this error is related to the cuda and/or torch version. These are the relevant package and versions I installed (torch 2.0.1 with Coda 11.7):
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
torch 2.0.1
torchvision 0.15.2
transformers 4.31.0
Can you share the version you are using? I tested three different version but always got errors.
Thank you for your excellent work. Could you please disclose how the metrics for each task are calculated?
Below are my code and results for evaluating region caption performance using the weights from geochat-7B, but the results are quite different from Table 10 in the paper. Where is the problem? Thank you
Hello, I'm wondering about the minimum GPU memory required for training. Could you provide some information on this?
Thank you for your excellent work.
However, when I tested it on LR using MiniGPT-V2, the results I got were not consistent with those in your paper, being 10% higher than the paper.
May I ask how did you perform the test with MiniGPT-V2
The data downloaded from Huggingface is corrupted. After decompression, there are only 109332 images, which does not match the GeoChat_Instruct.json
I use the finetune_lora.sh
to train, the context:
deepspeed --master_port=$((RANDOM + 10000)) --include localhost:0,1 geochat/train/train_mem.py \
--deepspeed ./scripts/zero2.json \
--lora_enable True \
--model_name_or_path /checkpoints/llava-v1.5-7b \
--version $PROMPT_VERSION \
--data_path /geochat/GeoChat_Instruct.json \
--image_folder /Dataset/geochat/final_images_llava \
--vision_tower openai/clip-vit-large-patch14-336 \
--mm_projector_type mlp2x_gelu \
--pretrain_mm_mlp_adapter /checkpoints/llava-v1.5-7b/mm_projector.bin \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--bf16 True \
--output_dir ./out_checkpoints/geochat \
--num_train_epochs 1 \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--save_steps 7000 \
--save_total_limit 1 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--dataloader_num_workers 16
and get the following error:
[ File "/project/GeoChat/geochat/model/geochat_arch.py", line 96, in encode_images
image_features = self.encode_images(images)
File "/project/GeoChat/geochat/model/geochat_arch.py", line 96, in encode_images
image_features = self.get_model().get_vision_tower()(images)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
image_features = self.get_model().get_vision_tower()(images)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
...
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 866, in forward
hidden_states = self.embeddings(pixel_values)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
hidden_states = self.embeddings(pixel_values)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 200, in forward
return forward_call(*args, **kwargs)
File "/miniconda-3/envs/geochat/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 200, in forward
embeddings = embeddings + self.position_embedding(self.position_ids)
RuntimeError: The size of tensor a (577) must match the size of tensor b (1297) at non-singleton dimension 1
embeddings = embeddings + self.position_embedding(self.position_ids)
RuntimeError: The size of tensor a (577) must match the size of tensor b (1297) at non-singleton dimension 1]
Will the referring and grounding questions JSONL file for evaluation be opened?
Initializing Chat Traceback (most recent call last): File "/disk_sda/**/llava_project/GeoChat/geochat_demo.py", line 53, in <module> tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, args.load_8bit, args.load_4bit, device=args.device) File "/disk_sda/**/llava_project/GeoChat/geochat/model/builder.py", line 124, in load_pretrained_model model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs) File "/home/**/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained return model_class.from_pretrained( File "/home/**/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/disk_sda/**/llava_project/GeoChat/geochat/model/language_model/geochat_llama.py", line 46, in __init__ self.model = GeoChatLlamaModel(config) File "/disk_sda/**/llava_project/GeoChat/geochat/model/language_model/geochat_llama.py", line 38, in __init__ super(GeoChatLlamaModel, self).__init__(config) File "/disk_sda/**/llava_project/GeoChat/geochat/model/geochat_arch.py", line 33, in __init__ self.vision_tower = build_vision_tower(config, delay_load=True) File "/disk_sda/**/llava_project/GeoChat/geochat/model/multimodal_encoder/builder.py", line 9, in build_vision_tower return CLIPVisionTower(vision_tower, args=vision_tower_cfg, **kwargs) File "/disk_sda/**/llava_project/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 88, in __init__ self.clip_interpolate_embeddings(image_size=504, patch_size=14) File "/disk_sda/**/llava_project/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 25, in clip_interpolate_embeddings state_dict = self.vision_tower.vision_model.embeddings.position_embedding.state_dict() File "/home/**/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__ raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'CLIPVisionTower' object has no attribute 'vision_tower'. Did you mean: 'vision_tower_name'?
Traceback (most recent call last):
File "/content/GeoChat/geochat/eval/batch_geochat_scene.py", line 139, in
eval_model(args)
File "/content/GeoChat/geochat/eval/batch_geochat_scene.py", line 49, in eval_model
questions = get_chunk(questions, args.num_chunks, args.chunk_idx)
NameError: name 'get_chunk' is not defined
Dear @salman-h-khan ,
Thanks for your fantastic work GeoChat, I am really interested in it. And the ckpt provided by you works for me.
However, when I tried to reproduce it as a beginner of the LLMs. It turns out a bit confusing for me to conduct all the training/finturning step by step.
Could you please specify where am I wrong when regarding what I did:
finetune_lora.sh
as follows and run it################## VICUNA ##################
PROMPT_VERSION=v1
MODEL_VERSION="vicuna-v1.5-7b"
gpu_ids=0,1,2,3
################## VICUNA ##################
deepspeed --master_port=$((RANDOM + 10000)) --include localhost:$gpu_ids geochat/train/train_mem.py \
--deepspeed ./scripts/zero2.json \
--lora_enable True \
--model_name_or_path /data/.../geochat/llava-v1.5-7b \
--version $PROMPT_VERSION \
--data_path /data/.../geochat/GeoChat_Instruct.json \
--image_folder /data/.../geochat/final_images_llava \
--vision_tower openai/clip-vit-large-patch14-336 \
--mm_projector_type mlp2x_gelu \
--pretrain_mm_mlp_adapter /data/.../geochat/llava-v1.5-mlp2x-336px-pretrain-vicuna-7b-v1.5/mm_projector.bin \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--bf16 True \
--output_dir /data/.../outckpts/geochat_reproduce \
--num_train_epochs 1 \
--per_device_train_batch_size 18 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "no" \
--save_strategy "epoch" \
--save_steps 7000 \
--save_total_limit 1 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--dataloader_num_workers 16 \
--report_to wandb
then I got the following output folder geochat_reproduce
with the following files:
python scripts/merge_lora_weights.py \
--model-path /data/.../geochat/outckpts/geochat_reproduce \
--model-base /data/.../geochat/llava-v1.5-7b \
--save-model-path /data/.../geochat/outckpts/merged
python geochat_demo.py \
--model-path /data/.../geochat/outckpts/merged
It then turns out lots of errors as follows:
To this end, I would like to ask if there are some mistakes in my reproduction or if some other steps are missing.
It would be super nice to receive some guidance from you.
Best regards and have a nice day,
Thank you very much for your workοΌ
I discovered that the quantity of the open-source training data does not match that mentioned in the paper. When using a global batch size of 144, the number of iterations I trained for is 2144, while the paper indicates 2400.
I evaluated the VQA and scene cls tasks on the model fine-tuned using GeoChatInstruct, and the results are pretty close to the metrics reported in the paper, however, the region captioning result is a bit far from the paper.
The official evaluation result:
My result:
Note that:
I wonder whether I did something wrong or the metric gap is caused by the stage2 finetune?
How to increase the number of epochs in training?
βnum_train_epochβ seems to be invalid
Hello!
The gradio interface linked in the readme (https://3fa767b988e4136cd8.gradio.live/) is not running
from datasets import load_dataset
dataset = load_dataset("MBZUAI/GeoChat_Instruct", split="train", streaming=True)
print(next(iter(dataset)))
root@donggeun-selfsup-747b74575d-sj9n6:/nas/k8s/dev/mlops/donggeun/tools/hf_dataset# python3 test.py
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 121, in _generate_tables
pa_table = paj.read_json(
File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Column() changed from object to array in row 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/nas/k8s/dev/mlops/donggeun/tools/hf_dataset/test.py", line 4, in <module>
print(next(iter(dataset)))
File "/usr/local/lib/python3.10/dist-packages/datasets/iterable_dataset.py", line 1384, in __iter__
for key, example in ex_iterable:
File "/usr/local/lib/python3.10/dist-packages/datasets/iterable_dataset.py", line 282, in __iter__
for key, pa_table in self.generate_tables_fn(**self.kwargs):
File "/usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json/json.py", line 153, in _generate_tables
pa_table = pa.Table.from_pydict(mapping)
File "pyarrow/table.pxi", line 1813, in pyarrow.lib._Tabular.from_pydict
File "pyarrow/table.pxi", line 5339, in pyarrow.lib._from_pydict
File "pyarrow/array.pxi", line 374, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 344, in pyarrow.lib.array
File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object
Hi, thanks for your great work of GeoChat, which makes great progress for RS community. We are looking forward to the open-sourced data and codebase ASAP. What is the schedule of the data releasing.
my question is above like the title
do we have to first train the model to accurately run the model.
In the finetune_lora.sh, the argument --mm_use_im_start_end
is set to False.
However, based on the paper (see figure below), it should be True.
Furthermore, when I change this argument to True.
The following error occurred:
Traceback (most recent call last):
File "/xxx/Documents/code/geochat/geochat/train/train_mem.py", line 13, in <module>
train()
File "/xxx/Documents/code/geochat/geochat/train/train.py", line 952, in train
model.initialize_vision_tokenizer(model_args, tokenizer=tokenizer)
File "/xxx/Documents/code/geochat/geochat/model/geochat_arch.py", line 343, in initialize_vision_tokenizer
embed_tokens_weight = mm_projector_weights["model.embed_tokens.weight"]
KeyError: 'model.embed_tokens.weight'
[2024-03-20 16:15:45,873] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.76k/4.76k [00:00<00:00, 11.9MB/s]
Traceback (most recent call last):
File "/workspace/GeoChat/geochat/eval/batch_geochat_vqa.py", line 125, in <module>
eval_model(args)
File "/workspace/GeoChat/geochat/eval/batch_geochat_vqa.py", line 32, in eval_model
tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name)
File "/workspace/GeoChat/geochat/model/builder.py", line 124, in load_pretrained_model
model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/workspace/GeoChat/geochat/model/language_model/geochat_llama.py", line 46, in __init__
self.model = GeoChatLlamaModel(config)
File "/workspace/GeoChat/geochat/model/language_model/geochat_llama.py", line 38, in __init__
super(GeoChatLlamaModel, self).__init__(config)
File "/workspace/GeoChat/geochat/model/geochat_arch.py", line 33, in __init__
self.vision_tower = build_vision_tower(config, delay_load=True)
File "/workspace/GeoChat/geochat/model/multimodal_encoder/builder.py", line 9, in build_vision_tower
return CLIPVisionTower(vision_tower, args=vision_tower_cfg, **kwargs)
File "/workspace/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 88, in __init__
self.clip_interpolate_embeddings(image_size=504, patch_size=14)
File "/workspace/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 25, in clip_interpolate_embeddings
state_dict = self.vision_tower.vision_model.embeddings.position_embedding.state_dict()
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'CLIPVisionTower' object has no attribute 'vision_tower'. Did you mean: 'vision_tower_name'?
root@dfbbfcc8de85:/workspace/GeoChat# sh /workspace/GeoChat/scripts/LR.sh
[2024-03-20 16:25:28,364] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/workspace/GeoChat/geochat/eval/batch_geochat_vqa.py", line 125, in <module>
eval_model(args)
File "/workspace/GeoChat/geochat/eval/batch_geochat_vqa.py", line 32, in eval_model
tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name)
File "/workspace/GeoChat/geochat/model/builder.py", line 124, in load_pretrained_model
model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/workspace/GeoChat/geochat/model/language_model/geochat_llama.py", line 46, in __init__
self.model = GeoChatLlamaModel(config)
File "/workspace/GeoChat/geochat/model/language_model/geochat_llama.py", line 38, in __init__
super(GeoChatLlamaModel, self).__init__(config)
File "/workspace/GeoChat/geochat/model/geochat_arch.py", line 33, in __init__
self.vision_tower = build_vision_tower(config, delay_load=True)
File "/workspace/GeoChat/geochat/model/multimodal_encoder/builder.py", line 9, in build_vision_tower
return CLIPVisionTower(vision_tower, args=vision_tower_cfg, **kwargs)
File "/workspace/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 88, in __init__
self.clip_interpolate_embeddings(image_size=504, patch_size=14)
File "/workspace/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 25, in clip_interpolate_embeddings
state_dict = self.vision_tower_name.vision_model.embeddings.position_embedding.state_dict()
AttributeError: 'str' object has no attribute 'vision_model'
root@dfbbfcc8de85:/workspace/GeoChat# sh /workspace/GeoChat/scripts/LR.sh
[2024-03-20 16:26:28,295] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/workspace/GeoChat/geochat/eval/batch_geochat_vqa.py", line 125, in <module>
eval_model(args)
File "/workspace/GeoChat/geochat/eval/batch_geochat_vqa.py", line 32, in eval_model
tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name)
File "/workspace/GeoChat/geochat/model/builder.py", line 124, in load_pretrained_model
model = AutoModelForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/workspace/GeoChat/geochat/model/language_model/geochat_llama.py", line 46, in __init__
self.model = GeoChatLlamaModel(config)
File "/workspace/GeoChat/geochat/model/language_model/geochat_llama.py", line 38, in __init__
super(GeoChatLlamaModel, self).__init__(config)
File "/workspace/GeoChat/geochat/model/geochat_arch.py", line 33, in __init__
self.vision_tower = build_vision_tower(config, delay_load=True)
File "/workspace/GeoChat/geochat/model/multimodal_encoder/builder.py", line 9, in build_vision_tower
return CLIPVisionTower(vision_tower, args=vision_tower_cfg, **kwargs)
File "/workspace/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 88, in __init__
self.clip_interpolate_embeddings(image_size=504, patch_size=14)
File "/workspace/GeoChat/geochat/model/multimodal_encoder/clip_encoder.py", line 25, in clip_interpolate_embeddings
state_dict = self.vision_tower.vision_model.embeddings.position_embedding.state_dict()
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'CLIPVisionTower' object has no attribute 'vision_tower'. Did you mean: 'vision_tower_name'?
The code:
python geochat/eval/batch_geochat_vqa.py \
--model-path /workspace/GeoChat/geochat-7B \
--question-file /workspace/GeoChat/eva/LR/LR_split_test_questions.json \
--answers-file /workspace/GeoChat/eva/LR/result/LR_Geochat.jsonl \
--image-folder /workspace/GeoChat/eva/LR/Imaages_LR/
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.