Comments (4)
You can use fp16 model to save more memory, convert model to fp16 before to xpu, like:
model = model.half().to('xpu')
This can save 700MB for 1024 tokens, 1400MB for 2048 tokens.
from bigdl.
If you can change the modeling file, you can empty xpu cache after https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/ea66ced17780ca3db39bc9f8aa601d8463db3da5/modeling_baichuan.py#L698
Like:
if hidden_states.device.type == "xpu" and hidden_states.size(1) != 1:
torch.xpu.empty_cache()
#10317 is working on the empty, but it's still testing. This can decrease the peak memory about 600MB.
Another unused memory is the sin_cached and cos_cached in https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat/blob/main/modeling_baichuan.py#L109, about 153MB cached on cpu. You can remove them.
from bigdl.
from bigdl.
Thanks Qiu,Xin!
from bigdl.
Related Issues (20)
- New instructions about "Run Distributed QLoRA Fine-Tuning on Kubernetes" in MPI-Operator v1alpha1 and with kubectl method HOT 1
- BIGDL-LM Acceleration for chatglm3-6b HOT 3
- No output when using Baichuan2-7B-Chat with 2k input and int4 on XPU HOT 3
- Failed to run Llama2-7B on Intel GPU HOT 2
- fail to run model when load low bits instead of load original for qwen HOT 1
- Failed to run Llama 2 inference on Flex 140 HOT 4
- HuatuoGPT-7B need to optimize performance about First token latency (ms) and After token latency (ms/token) HOT 3
- HuatuoGPT-7B will self Q & A with history by TextIteratorStreamer HOT 1
- Error when executing "from bigdl.llm.langchain.llms import TransformersLLM" HOT 4
- Running minicpm failed HOT 4
- QWEN2 Model generate failed HOT 1
- Qwen1.5-7B wrong outputs with 1024 prompts HOT 13
- Installation of BigDL-LLM and missing file in wheel HOT 1
- Can not load Yuan2-2B GGUF FP16 model HOT 3
- text-generation-webui server.py - modifying extensions
- ChatGLM3 can not stop with stop words HOT 1
- issue about Qwen-7b on Arc A770 HOT 3
- Can Bigdl-LLM support Qwen-14B or Qwen-72B based multi-card of Arc A770? HOT 1
- 通过webui进行模型chat,提示没有安装xpu,即使选择了cpu,也发现存在xpu问题 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigdl.