Reminder <input type=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

v100显卡，加载量化模型Yi-34B-Chat-4bits，推理速度很慢 about yi HOT 7 OPEN

zxdposter commented on September 15, 2024

v100显卡，加载量化模型Yi-34B-Chat-4bits，推理速度很慢

from yi.

Comments (7)

ChinesePainting commented on September 15, 2024 1

可能是你直接pip install -r requirements.txt导致的torch不可用。
你检查下torch能不能用，或者启动模型时是不是有CUDA extension not installed.
我重新配了个环境解决了：
1.把requirements.txt里的torch那行去掉。
2.找对应你CUDA版本的pytorch版本，比如我cuda11.8.我看到gptq最低支持到pytorch2.1.0。
3.下面是我所有的安装命令：
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
（https://pytorch.org/get-started/previous-versions/）
pip install -r requirements.txt
（txt已经去掉了torch）
pip install auto-gptq==0.5.1 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
（https://github.com/AutoGPTQ/AutoGPTQ/blob/main/docs/INSTALLATION.md）
pip install --upgrade transformers optimum
（因为这里显示我没有optimum库，一起把这俩更新保证兼容）
然后我就发现比之前快的多

from yi.

lyan62 commented on September 15, 2024

@zxdposter 你好请问34B inference需要几张显卡？需要多卡吗？

from yi.

ffhelly commented on September 15, 2024

same too. 8x 4090 . so slow.

from yi.

devillaws commented on September 15, 2024

你好，请问你的gptq版本是多少，官网没看到针对pytorch2.1.2的autogptq版本耶

from yi.

zxdposter commented on September 15, 2024

@lyan62 大概需要20-30G显存

from yi.

GoodDayUp commented on September 15, 2024

确实太慢了，有什么好的方法吗

from yi.

zxdposter commented on September 15, 2024

可能是你直接pip install -r requirements.txt导致的torch不可用。你检查下torch能不能用，或者启动模型时是不是有CUDA extension not installed. 我重新配了个环境解决了： 1.把requirements.txt里的torch那行去掉。 2.找对应你CUDA版本的pytorch版本，比如我cuda11.8.我看到gptq最低支持到pytorch2.1.0。 3.下面是我所有的安装命令： conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia （https://pytorch.org/get-started/previous-versions/） pip install -r requirements.txt （txt已经去掉了torch） pip install auto-gptq==0.5.1 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ （https://github.com/AutoGPTQ/AutoGPTQ/blob/main/docs/INSTALLATION.md） pip install --upgrade transformers optimum （因为这里显示我没有optimum库，一起把这俩更新保证兼容）然后我就发现比之前快的多

@ChinesePainting 感谢提供解决方法，后续我尝试一下。

from yi.

Recommend Projects

v100显卡，加载量化模型Yi-34B-Chat-4bits，推理速度很慢 about yi HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent