<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

Could you provide more details? <ul dir="au

inference problem with baichuan 13b about bigdl HOT 22 OPEN

K-Alex13 commented on May 23, 2024

inference problem with baichuan 13b

from bigdl.

Comments (22)

K-Alex13 commented on May 23, 2024

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

from bigdl.

K-Alex13 commented on May 23, 2024

The current machine being used is an a770, and the GPU memory should be sufficient. I hope you can provide me with some guidance.

from bigdl.

hkvision commented on May 23, 2024

Could you provide more details?

Are you running baichuan1 or baichuan2?
What sequence lengths in and out are you using that have this memory issue?

from bigdl.

hkvision commented on May 23, 2024

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

from bigdl.

K-Alex13 commented on May 23, 2024

Could you provide more details?

Are you running baichuan1 or baichuan2?

What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default.
model is doloading from following web.
https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

from bigdl.

K-Alex13 commented on May 23, 2024

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

from bigdl.

hkvision commented on May 23, 2024

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50
https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59
For example, you can modify here to use xpu:0/1 if you wish.

from bigdl.

hkvision commented on May 23, 2024

Could you provide more details?

Are you running baichuan1 or baichuan2?

What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one

from bigdl.

K-Alex13 commented on May 23, 2024

Could you provide more details?

Are you running baichuan1 or baichuan2?

What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

from bigdl.

jason-dai commented on May 23, 2024

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

Does model.chat use BigDL?

from bigdl.

K-Alex13 commented on May 23, 2024

model = model.to('xpu:1') this is not working

from bigdl.

K-Alex13 commented on May 23, 2024

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

Does model.chat use BigDL?
Yes I do use bigdl

from bigdl.

K-Alex13 commented on May 23, 2024

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.

I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.

from bigdl.

hkvision commented on May 23, 2024

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.

I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.

So are there multiple gpu cards on your machine? After sourcing oneapi, you can use sycl-ls to check the gpu cards on your machine:

from bigdl.

K-Alex13 commented on May 23, 2024

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

If you have multiple GPUs, you can use xpu:0, xpu:1 to specify?

Can you please give me a sample like wehere and how to specify model in different xpu?

https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L50 https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py#L59 For example, you can modify here to use xpu:0/1 if you wish.

I try to use xpu:0 and xpu:1 teo different situation. In xpu:1 there will come a problem that the device_id is out of range, and xpu:0 is the original state. What can I do next.

So are there multiple gpu cards on your machine? After sourcing oneapi, you can use sycl-ls to check the gpu cards on your machine:

This is the results of sycl-ls

from bigdl.

hkvision commented on May 23, 2024

Seems only one GPU is detected... Are other gpus properly set?

from bigdl.

K-Alex13 commented on May 23, 2024

Not sure why there only one gpu detected, I see the gpu 2 in this figure?

from bigdl.

hkvision commented on May 23, 2024

You mean gpu:2 here? These two lines mean the same gpu, only one.

from bigdl.

K-Alex13 commented on May 23, 2024

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

Then what this figure means, it seems have 32G gpu

from bigdl.

qiuxin2012 commented on May 23, 2024

It looks like your driver(released in 2023.7) is a little old. Please update your driver to latest version and try again.
You can download it from https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html

from bigdl.

WeiguangHan commented on May 23, 2024

Could you provide more details?

Are you running baichuan1 or baichuan2?

What sequence lengths in and out are you using that have this memory issue?

I am using baichuan2 and the sequence length should be the default. model is doloading from following web. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/tree/main

Which script are you using and what is the default? Is it this one? https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark/all-in-one

I just download the baichuan2-13b model from HF and run model.chat. This is what I mean default

Hi, I have tested it on my side using bigdl and model.chat from HF. And it worked fine. But I am a bit curious about the log output Thread in your screenshot which seemed strange to appear.

from bigdl.llm.transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
# import intel_extension_for_pytorch as ipex
from transformers.generation.utils import GenerationConfig
model = AutoModelForCausalLM.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", optimize_model=True, load_in_low_bit="sym_int4",
                                                trust_remote_code=True, use_cache=True, cpu_embedding=False).eval()
tokenizer = AutoTokenizer.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", trust_remote_code=True)
model.to("xpu")
model.generation_config = GenerationConfig.from_pretrained(r"D:\llm-models\Baichuan2-13B-Chat", revision="v2.0")
messages = []
messages.append({"role": "user", "content": "解释一下“温故而知新”"})
response = model.chat(tokenizer, messages)
print(response)

from bigdl.

shane-huang commented on May 23, 2024

update new question here ， when I use interence with following gpu,how can I put inputs id to another gpu

Then what this figure means, it seems have 32G gpu

The GPU memory Arc770 can actually use is only 16G, as shown in your device screen snapshot.

from bigdl.

inference problem with baichuan 13b about bigdl HOT 22 OPEN

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent