Comments (22)
update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
from bigdl.
The current machine being used is an a770, and the GPU memory should be sufficient. I hope you can provide me with some guidance.
from bigdl.
Could you provide more details?
- Are you running baichuan1 or baichuan2?
- What sequence lengths in and out are you using that have this memory issue?
from bigdl.
update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?
from bigdl.
Could you provide more details?
- Are you running baichuan1 or baichuan2?
- What sequence lengths in and out are you using that have this memory issue?
I am using baichuan2 and the sequence length should be the default.
model is doloading from following web.
https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main
from bigdl.
update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?
Can you please give me a sample like wehere and how to specify model in different xpu?
from bigdl.
update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?
Can you please give me a sample like wehere and how to specify model in different xpu?
https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50
https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59
For example, you can modify here to use xpu:0/1 if you wish.
from bigdl.
Could you provide more details?
- Are you running baichuan1 or baichuan2?
- What sequence lengths in and out are you using that have this memory issue?
I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main
Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one
from bigdl.
Could you provide more details?
- Are you running baichuan1 or baichuan2?
- What sequence lengths in and out are you using that have this memory issue?
I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main
Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one
I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default
from bigdl.
I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default
Does model.chat
use BigDL?
from bigdl.
model = model.to('xpu:1') this is not working
from bigdl.
I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default
Does
model.chat
use BigDL?
Yes I do use bigdl
from bigdl.
update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?
Can you please give me a sample like wehere and how to specify model in different xpu?
https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.
I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.
from bigdl.
update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?
Can you please give me a sample like wehere and how to specify model in different xpu?
https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.
I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.
So are there multiple gpu cards on your machine? After sourcing oneapi, you can use sycl-ls to check the gpu cards on your machine:
from bigdl.
update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?
Can you please give me a sample like wehere and how to specify model in different xpu?
https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.
I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.
So are there multiple gpu cards on your machine? After sourcing oneapi, you can use sycl-ls to check the gpu cards on your machine:
This is the results of sycl-ls
from bigdl.
Seems only one GPU is detected... Are other gpus properly set?
from bigdl.
Not sure why there only one gpu detected, I see the gpu 2 in this figure?
from bigdl.
You mean gpu:2 here? These two lines mean the same gpu, only one.
from bigdl.
update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
Then what this figure means, it seems have 32G gpu
from bigdl.
It looks like your driver(released in 2023.7) is a little old. Please update your driver to latest version and try again.
You can download it from https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html
from bigdl.
Could you provide more details?
- Are you running baichuan1 or baichuan2?
- What sequence lengths in and out are you using that have this memory issue?
I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main
Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one
I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default
Hi, I have tested it on my side using bigdl and model.chat from HF. And it worked fine. But I am a bit curious about the log output Thread
in your screenshot which seemed strange to appear.
from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# import intel_extension_for_pytorch as ipex
from transformers.generation.utils import GenerationConfig
model = AutoModelForCausalLM.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", optimize_model=True, load_in_low_bit="sym_int4",
trust_remote_code=True, use_cache=True, cpu_embedding=False).eval()
tokenizer = AutoTokenizer.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", trust_remote_code=True)
model.to("xpu")
model.generation_config = GenerationConfig.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", revision="v2.0")
messages = []
messages.append({"role": "user", "content": "解释一下“温故而知新”"})
response = model.chat(tokenizer, messages)
print(response)
from bigdl.
update new question here , when I use interence with following gpu,how can I put inputs id to another gpu
Then what this figure means, it seems have 32G gpu
The GPU memory Arc770 can actually use is only 16G, as shown in your device screen snapshot.
from bigdl.
Related Issues (20)
- Docker on Windows vllm serving issue HOT 15
- default values of max_generated_tokens, top_k, top_p, and temperature? HOT 1
- log using ipex-llm instead of bigdl-llm in while running native models
- Weights of LlamaForCausalLM were not initialized from the model checkpoint at meta-llama/Meta-Llama-3-8B-Instruct? HOT 1
- vLLM offline_inference.py failed to run on CPU inference HOT 1
- Unable to save quantized model HOT 1
- Llama 3 performance drop from transformers version 4.37.2 to 4.38.0 HOT 1
- about conflict HOT 2
- Phi3-4k winograde drop from 0515 version to 0516 version HOT 3
- [langchain-chatchat] ERROR: The expanded size of the tensor (559) must match the existing size (512) at non-singleton dimension 1. Target sizes: [1, 559]. Tensor sizes: [1, 512] HOT 2
- Unable to get LanguageBind/Video-LLaVA-7B-hf model working through ipex-llm HOT 8
- ipex-llm[cpp] error: Sub-group size 8 is not supported on the device HOT 3
- MTL Linux Qwen-VL: LLVM ERROR: GenXCisaBuilder failed
- Support for MTL-H & MTL-U iGPU on Linux HOT 1
- try to test multi xpu with example HOT 13
- miniCPM run benchmark get error in iGPU HOT 1
- Shape Mismatch with Checkpoint for Deepspeed Zero3
- [script issue] - newly created checkpoint already contain a file. HOT 1
- [Feature]internlm-xcomposer2-vl-7b support
- Qwen-7B TypeError: qwen_attention_forward() got an unexpected keyword argument 'registered_causal_mask' HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigdl.