Comments (14)
Hi, @K-Alex13 , if you have downloaded the model from https://huggingface.co/Qwen/Qwen1.5-14B-Chat/tree/main, please just replace 'Qwen/Qwen1.5-14B-Chat'
with your local model folder path here(https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/run_qwen_14b_arc_2_card.sh#L38).
from bigdl.
Yes I already use this method, the error comes up after the process you mentioned
from bigdl.
And the error mentioned missing file are also not in the Qwen/Qwen1.5-14B-Chat files.
from bigdl.
And the error mentioned missing file are also not in the Qwen/Qwen1.5-14B-Chat files.
If model.safetensors.index.json
is not in your local folder, such error message would still occur. You may need to check whether all model files are available and complete in your local model folder.
from bigdl.
what is the function of low-bit here. I think it is 4 bit initial that the gpu needed will less than 16G so I do not know if this use two gpu here. or can you please tell how to check the gpu usage during inference?
from bigdl.
Why gpu 0 did not inference the results and gpu1 did.
from bigdl.
what is the function of low-bit here. I think it is 4 bit initial that the gpu needed will less than 16G so I do not know if this use two gpu here. or can you please tell how to check the gpu usage during inference?
- As we introduced in README, you could specify other low bit optimizations (such as
fp8
) through--low-bit
. - If you want to monitor GPU usage, you could use a tool named
xpu-smi
. Usesudo apt install xpu-smi
to install, then you could usesudo xpu-smi stats -d 0
to check memory usage of GPU 0.
Why gpu 0 did not inference the results and gpu1 did.
- Both GPU did inference but we only print inference result of
RANK 0
here. In the log,[0]
corresponds to output message ofRANK 0
while[1]
isRANK 1
.
from bigdl.
from bigdl.
According to your screenshot, maybe you could try sudo apt install libmetee
and sudo apt install libmetee-dev
.
from bigdl.
how to use them
from bigdl.
how to use them
Sorry but I have no idea about the meaning of 'them'. ME TEE Library (libmetee/libmetee-dev) is a C library to access CSE/CSME/GSC firmware via, and xpu-smi
tool seems to need. Could you use xpu-smi
now?
from bigdl.
I install the packages you said above and try to us xpu-smi, same error comes up
from bigdl.
By the way I want to know if this is not a method which use two gpu as a bigger gpu to inference message. It just put model in two different gpu separately and inference separately?
from bigdl.
I install the packages you said above and try to us xpu-smi, same error comes up
Maybe you could try these steps?
sudo apt-get autoremove libmetee-dev
sudo apt-get autoremove libmetee
sudo apt-get install libmetee
sudo apt-get install libmetee-dev
sudo apt-get install xpu-smi
By the way I want to know if this is not a method which use two gpu as a bigger gpu to inference message. It just put model in two different gpu separately and inference separately?
The model is separated and put to two GPUs, each GPU need less memory to inference. In this way, you could treat two GPUs as a bigger one.
from bigdl.
Related Issues (20)
- Ollama Linux No Response Issue with IPEX-LLM HOT 2
- Qwen1.5-4b and Qwen1.5-7b model cannot be loaded correctly in ipex-llm version 20240522 HOT 9
- [inference]: fine tuned model fails to do inferencing HOT 1
- ModuleNotFoundError: No module named 'ipex_llm.vllm.xpu' while using docker and installation HOT 1
- [integration]: merging bfloat16 model failed HOT 2
- all-in-one with version 2.1.0b1 failed HOT 3
- need an easy way to roll back driver installs HOT 3
- all-in-one benchmark llama-3-8b-instruct issue with version 2.1.0b1 HOT 3
- about test 3 Gpu with ipex HOT 1
- install issue HOT 2
- bug for inference qwen1.5-7b-chat with SYM_INT4 on Windows platform HOT 3
- Evaluation on if MiniCPM-2B-sft-bf16 need model based optimization on ipex-llm HOT 1
- finetune chatGLM6B using LoRA on arc HOT 3
- XEON and MAX with Kernel 5.15 configuration HOT 1
- transformers 4.38.1 gives bad llama3 performance on MTL iGPU HOT 2
- phi3 medium - garbage output in webui or generated by ollama HOT 12
- TypeError: llama_model_forward_4_36() got an unexpected keyword argument 'cache_position' during inference by TinyLlama-1.1B-Chat-v1.0 HOT 4
- Ollama codestral model produces nonsensical output on PVC HOT 3
- [Feature Request] Provide IPEX-LLM as an executable to install in Windows HOT 4
- Could ipex-llm support llama3 or the other llm QLora finetune? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigdl.