Describe the bug 懒人包加载模型错误, To Reproduc

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

我改了环境变量之后都会重启的. 不过我现在觉得真是那个Compiling gcc的问题. 我把WSL2下的chatglm-6

懒人包加载模型错误 about wenda HOT 9 CLOSED

wangfh5 commented on August 23, 2024

懒人包加载模型错误

from wenda.

Comments (9)

l15y commented on August 23, 2024

这个报错不用管的，你又不拿GPU跑

from wenda.

wangfh5 commented on August 23, 2024

但是我打开webui, 输入问题发送, 他就会说"发生错误, 正在重新加载模型".

我是打算用GPU跑的...

from wenda.

l15y commented on August 23, 2024

给我看具体报错

from wenda.

wangfh5 commented on August 23, 2024

127.0.0.1:system:结合以下文段, 用中文回答用户问题。如果无法从中得到答案，忽略文段内容并用中文回答用户问题。
  user:你好 错误 Library cublasLt is not initialized Library cublasLt is not initialized

from wenda.

l15y commented on August 23, 2024

是否正确安装cuda？

from wenda.

wangfh5 commented on August 23, 2024

我确实忘记安装CUDA了, 但是安装好之后, 还是一样的错误. 安装好CUDA之后, 并且将phyx的环境变量去掉, 把Symbol cudaLaunchKernel not found in C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll这一行干掉了. 现在是:

127.0.0.1:你好
错误 Library cudart is not initialized Library cudart is not initialized
glm模型地址 model\chatglm-6b-int4
rwkv模型地址 ..\RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth
rwkv模型参数 cuda fp16i8 *18+
日志记录 True
知识库类型 s
chunk_size 200
chunk_count 3
serving on 0.0.0.0:17860 view at http://127.0.0.1:17860
D:\免安装软件\wenda\WPy64-38100\python-3.8.10.amd64\lib\site-packages\jieba\analyse\tfidf.py:47: ResourceWarning: unclosed file <_io.BufferedReader name='D:\\免安装软件\\wenda\\WPy64-38100\\python-3.8.10.amd64\\lib\\site-packages\\jieba\\analyse\\idf.txt'>
  content = open(new_idf_path, 'rb').read().decode('utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
知识库加载完成
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
C:\Users\Fohong Wang/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py:1229: DeprecationWarning: invalid escape sequence \?
  ["\?", "？"],
No compiled kernel found.
Compiling kernels : C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so
gcc: error: C:\Users\Fohong: No such file or directory
gcc: error: Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c: No such file or directory
gcc: error: Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so: No such file or directory
gcc: fatal error: no input files
compilation terminated.
Compile failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.c -shared -o C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
gcc: error: C:\Users\Fohong: No such file or directory
gcc: error: Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.c: No such file or directory
gcc: error: Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so: No such file or directory
gcc: fatal error: no input files
compilation terminated.
Kernels compiled : C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
模型加载完成

from wenda.

l15y commented on August 23, 2024

先重启试试

from wenda.

wangfh5 commented on August 23, 2024

我改了环境变量之后都会重启的. 不过我现在觉得真是那个Compiling gcc的问题.

我把WSL2下的chatglm-6B模型(这个我是自己在huggingface git clone下载的; ChatGLM仓库的人员问我是不是下载错了某个文件)复制到懒人包里面, 然后爆了和WSL2一样的 sentencepiece_processor.cc(1101) 的错误.
然后我把懒人包下的chatglm-6B-int4模型复制到了WSL2里面, 运行之后终于可以对话了, 输出如下

glm模型地址 model/chatglm-6b-int4
rwkv模型地址 model/RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth
rwkv模型参数 cuda fp16
日志记录 True
知识库类型 s
LLM模型类型 glm6b
chunk_size 400
chunk_count 3
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
serving on 0.0.0.0:17860 view at http://127.0.0.1:17860
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/jieba/analyse/tfidf.py:47: ResourceWarning: unclosed file <_io.BufferedReader name='/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/jieba/analyse/idf.txt'>
  content = open(new_idf_path, 'rb').read().decode('utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
知识库加载完成
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
/home/wangfh5/.cache/huggingface/modules/transformers_modules/local/modeling_chatglm.py:1229: DeprecationWarning: invalid escape sequence '\?'
  ["\?", "？"],
No compiled kernel found.
Compiling kernels : /home/wangfh5/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /home/wangfh5/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.c -shared -o /home/wangfh5/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.so
Kernels compiled : /home/wangfh5/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.so
Load kernel : /home/wangfh5/.cache/huggingface/modules/transformers_modules/local/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 16
Using quantization cache
Applying quantization to glm layers
模型加载完成
127.0.0.1:你好
你好👋！我是人工智能助手 ChatGLM-6B，很高兴见到你，欢迎问我任何问题。

可以看到, 它这里的 Compiling gcc 是成功的.

Windows那边, 即使我把这行编译的命令复制出来, 给文件路径加上双引号, 还是 gcc 有问题编译不出来; 然后我就想着用WSL2给它编译, 结果就是顶楼的那张图, Windows弹窗提醒的"请使用原装介质重新安装".

from wenda.

RenBuShengTian commented on August 23, 2024

我下载了懒人包里的模型，并遇到了相同的问题。
从报错来看，大概是因为本地缺少gcc环境，导致一个负责量化张量的kernal编译失败。
因为懒人包里的模型已经完成了int4量化，无需在本地再一次进行同样的操作，所以这个错误似乎不会影响int4模型的推理。
在我的环境中安装mingw后并添加环境变量后，程序便可正确编译出该动态库，报错消失。

from wenda.

懒人包加载模型错误 about wenda HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent