we are trying to finetune chatGLM6B using LoRA on arcA770 1card and 2cards , use t

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

finetune chatGLM6B using LoRA on arc about bigdl HOT 3 OPEN

YongZhuIntel commented on September 25, 2024

finetune chatGLM6B using LoRA on arc

from bigdl.

Comments (3)

Uxito-Ada commented on September 25, 2024

Hi @YongZhuIntel ,

I reproduced it and got the same error, which means that XPU memory resource on the platform has been used out.

In addition, it is profiled that chatglm3-6b model in BF16 takes ~11G+ after the trainer is started. In the following forward/backward, the memory consumption gradually grows, and thus it is easy to exceed 16G max limit on Arc.

Also, it should be noted that multi-instance training is in a data-parallel way, which loads the whole model on each card and therefore does not save any memory.

Two suggestions:

Firstly, you could try QLoRA, which quants the base model into NF4 that requires less memory than BF16. As the base model is freezed, this will not harm the tuning accuracy. Moreover, we have already validated chatglm with QLoRA. This is the most recommended.

Secondly, hyperparameters can be tuned to decrease the memory consumption. With the below configurations, I can run more than 100 steps on 2 cards. And more configurations can be tried as well:

# in alpaca_lora_finetuning.py
lora_r: int = 2,
lora_alpha: int = 4,
lora_dropout: float = 0.85,

# in .sh script
......
      python ./alpaca_lora_finetuning.py \
      --micro_batch_size 1 \
      --batch_size 2 \
......

from bigdl.

YongZhuIntel commented on September 25, 2024

@Uxito-Ada Thanks for your help, I has successfully run qlora_finetune_chatglm3_6b on 1card , but when tryin to run qlora_finetune_chatglm3_6b on 2 card, I got error at 100 steps.

2 cards script:

export MASTER_ADDR=127.0.0.1
export OMP_NUM_THREADS=6
export FI_PROVIDER=tcp
export CCL_ATL_TRANSPORT=ofi
export TORCH_LLM_ALLREDUCE=0
mpirun -n 2 \
    python ./alpaca_qlora_finetuning.py \
    --base_model "/home/intel/models/chatglm3-6b" \
    --data_path "yahma/alpaca-cleaned" \
    --lora_target_modules '[query_key_value,dense,dense_h_to_4h,dense_4h_to_h]' \
    --output_dir "./ipex-llm-qlora-alpaca"

error message:

OSError: [Errno 39] Directory not empty: './ipex-llm-qlora-alpaca/tmp-checkpoint-100' -> './ipex-llm-qlora-alpaca/checkpoint-100'

and #11099 said this issue fixed on transformers 4.39.1
But After I installed transformers 4.39.1

pip install transformers==4.39.1
pip install accelerate==0.28.0

I got new error:

Traceback (most recent call last):
  File "/home/intel/miniconda3/envs/llm_ipex2.1.10_python3.11_finetune/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 759, in convert_to_tensors
    tensor = as_tensor(value)
             ^^^^^^^^^^^^^^^^
  File "/home/intel/miniconda3/envs/llm_ipex2.1.10_python3.11_finetune/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 721, in as_tensor
    return torch.tensor(value)
           ^^^^^^^^^^^^^^^^^^^
ValueError: expected sequence of length 256 at dim 1 (got 255)

Is there something else that needs to be installed?

error log：
qlora_finetune_chatglm3_6b_arc_2_card_def_tmp.log

from bigdl.

Uxito-Ada commented on September 25, 2024

Hi @YongZhuIntel ,

I reproduced your error, and the below dependencies can help to solve it:

pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==4.36.1
pip install accelerate==0.23.0

from bigdl.

finetune chatGLM6B using LoRA on arc about bigdl HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent