Comments (3)
Hi @YongZhuIntel ,
I reproduced it and got the same error, which means that XPU memory resource on the platform has been used out.
In addition, it is profiled that chatglm3-6b model in BF16 takes ~11G+ after the trainer is started. In the following forward/backward, the memory consumption gradually grows, and thus it is easy to exceed 16G max limit on Arc.
Also, it should be noted that multi-instance training is in a data-parallel way, which loads the whole model on each card and therefore does not save any memory.
Two suggestions:
Firstly, you could try QLoRA, which quants the base model into NF4 that requires less memory than BF16. As the base model is freezed, this will not harm the tuning accuracy. Moreover, we have already validated chatglm with QLoRA. This is the most recommended.
Secondly, hyperparameters can be tuned to decrease the memory consumption. With the below configurations, I can run more than 100 steps on 2 cards. And more configurations can be tried as well:
# in alpaca_lora_finetuning.py
lora_r: int = 2,
lora_alpha: int = 4,
lora_dropout: float = 0.85,
# in .sh script
......
python ./alpaca_lora_finetuning.py \
--micro_batch_size 1 \
--batch_size 2 \
......
from bigdl.
@Uxito-Ada Thanks for your help, I has successfully run qlora_finetune_chatglm3_6b on 1card , but when tryin to run qlora_finetune_chatglm3_6b on 2 card, I got error at 100 steps.
2 cards script:
export MASTER_ADDR=127.0.0.1
export OMP_NUM_THREADS=6
export FI_PROVIDER=tcp
export CCL_ATL_TRANSPORT=ofi
export TORCH_LLM_ALLREDUCE=0
mpirun -n 2 \
python ./alpaca_qlora_finetuning.py \
--base_model "/home/intel/models/chatglm3-6b" \
--data_path "yahma/alpaca-cleaned" \
--lora_target_modules '[query_key_value,dense,dense_h_to_4h,dense_4h_to_h]' \
--output_dir "./ipex-llm-qlora-alpaca"
error message:
OSError: [Errno 39] Directory not empty: './ipex-llm-qlora-alpaca/tmp-checkpoint-100' -> './ipex-llm-qlora-alpaca/checkpoint-100'
and #11099 said this issue fixed on transformers 4.39.1
But After I installed transformers 4.39.1
pip install transformers==4.39.1
pip install accelerate==0.28.0
I got new error:
Traceback (most recent call last):
File "/home/intel/miniconda3/envs/llm_ipex2.1.10_python3.11_finetune/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 759, in convert_to_tensors
tensor = as_tensor(value)
^^^^^^^^^^^^^^^^
File "/home/intel/miniconda3/envs/llm_ipex2.1.10_python3.11_finetune/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 721, in as_tensor
return torch.tensor(value)
^^^^^^^^^^^^^^^^^^^
ValueError: expected sequence of length 256 at dim 1 (got 255)
Is there something else that needs to be installed?
error log:
qlora_finetune_chatglm3_6b_arc_2_card_def_tmp.log
from bigdl.
Hi @YongZhuIntel ,
I reproduced your error, and the below dependencies can help to solve it:
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
pip install transformers==4.36.1
pip install accelerate==0.23.0
from bigdl.
Related Issues (20)
- Ollama Linux No Response Issue with IPEX-LLM HOT 2
- Qwen1.5-4b and Qwen1.5-7b model cannot be loaded correctly in ipex-llm version 20240522 HOT 9
- [inference]: fine tuned model fails to do inferencing HOT 1
- ModuleNotFoundError: No module named 'ipex_llm.vllm.xpu' while using docker and installation HOT 1
- [integration]: merging bfloat16 model failed HOT 2
- all-in-one with version 2.1.0b1 failed HOT 3
- need an easy way to roll back driver installs HOT 3
- all-in-one benchmark llama-3-8b-instruct issue with version 2.1.0b1 HOT 3
- about test 3 Gpu with ipex HOT 1
- install issue HOT 2
- bug for inference qwen1.5-7b-chat with SYM_INT4 on Windows platform HOT 3
- Evaluation on if MiniCPM-2B-sft-bf16 need model based optimization on ipex-llm HOT 2
- XEON and MAX with Kernel 5.15 configuration HOT 1
- transformers 4.38.1 gives bad llama3 performance on MTL iGPU HOT 2
- phi3 medium - garbage output in webui or generated by ollama HOT 12
- TypeError: llama_model_forward_4_36() got an unexpected keyword argument 'cache_position' during inference by TinyLlama-1.1B-Chat-v1.0 HOT 4
- Ollama codestral model produces nonsensical output on PVC HOT 3
- [Feature Request] Provide IPEX-LLM as an executable to install in Windows HOT 4
- Could ipex-llm support llama3 or the other llm QLora finetune? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigdl.