wenda-llm / wenda Goto Github PK

View Code? Open in Web Editor NEW

6.2K 82.0 807.0 32.77 MB

闻达：一个LLM调用平台。目标为针对特定环境的高效内容生成，同时考虑个人和中小企业的计算资源局限性，以及知识安全和私密性问题

License: GNU Affero General Public License v3.0

Python 5.41% HTML 2.34% CSS 6.82% Batchfile 0.06% Shell 0.02% JavaScript 85.35%

chatglm-6b chatrwkv rwkv

wenda's Introduction

闻达：一个大规模语言模型调用平台

本项目设计目标为实现针对特定环境的高效内容生成，同时考虑个人和中小企业的计算资源局限性，以及知识安全和私密性问题。为达目标，平台化集成了以下能力：

知识库：支持对接本地离线向量库、本地搜索引擎、在线搜索引擎等。
多种大语言模型：目前支持离线部署模型有chatGLM-6B\chatGLM2-6B、chatRWKV、llama系列(不推荐中文用户)、moss(不推荐)、baichuan(需配合lora使用，否则效果差)、Aquila-7B、InternLM，在线API访问openai api和chatGLM-130b api。
Auto脚本：通过开发插件形式的JavaScript脚本，为平台附件功能，实现包括但不限于自定义对话流程、访问外部API、在线切换LoRA模型。
其他实用化所需能力：对话历史管理、内网部署、多用户同时使用等。

交流QQ群：LLM使用和综合讨论群162451840；知识库使用讨论群241773574(已满，请去QQ频道讨论)；Auto开发交流群744842245；QQ频道

闻达：一个大规模语言模型调用平台
基于本项目的二次开发
- wenda-webui
- 接入Word文档软件

安装部署

各模型功能说明

功能	多用户并行	流式输出	CPU	GPU	量化	外挂LoRa
chatGLM-6B/chatGLM2-6B	√	√	需安装编译器	√	预先量化和在线量化	√
RWKV torch	√	√	√	√	预先量化和在线量化
RWKV.cpp	√	√	可用指令集加速		预先量化
Baichuan-7B	√	√	√	√		√
Baichuan-7B (GPTQ)	√	√		√	预先量化
Aquila-7B		官方未实现	√	√
replit			√	√
chatglm130b api	√
openai api	√	√
llama.cpp	√	√	可用指令集加速		预先量化
llama torch	√	√	√	√	预先量化和在线量化
InternLM	√	√	√	√	在线量化

懒人包

百度云

https://pan.baidu.com/s/1idvot-XhEvLLKCbjDQuhyg?pwd=wdai

夸克

链接：https://pan.quark.cn/s/c4cb08de666e 提取码：4b4R

介绍

默认参数在6G显存设备上运行良好。最新版懒人版已集成一键更新功能，建议使用前更新。

使用步骤（以glm6b模型为例）：

下载懒人版主体和模型，模型可以用内置脚本从HF下载，也可以从网盘下载。
如果没有安装CUDA11.8，从网盘下载并安装。
双击运行运行GLM6B.bat。
如果需要生成离线知识库，参考知识库。

自行安装

PS:一定要看example.config.yml，里面对各功能有更详细的说明！！！

1.安装库

通用依赖：pip install -r requirements/requirements.txt 根据使用的知识库进行相应配置

2.下载模型

根据需要，下载对应模型。

建议使用chatRWKV的RWKV-4-Raven-7B-v11，或chatGLM-6B。

3.参数设置

把example.config.yml重命名为config.yml，根据里面的参数说明，填写你的模型下载位置等信息

Auto

auto功能通过JavaScript脚本实现，使用油猴脚本或直接放到autos目录的方式注入至程序，为闻达附加各种自动化功能。

Auto 开发函数列表

函数（皆为异步调用）	功能	说明
send(s,keyword = "",show=true)	发送信息至LLM，返回字符串为模型返回值	s：输入模型文本；keyword:聊天界面显示文本；show：是否在聊天界面显示
add_conversation(role, content)	添加会话信息	role：'AI'、'user'；content：字符串
save_history()	保存会话历史	对话完成后会自动保存，但手动添加的对话须手动保存
find(s, step = 1)	从知识库查找	返回json数组
find_dynamic(s,step=1,paraJson)	从动态知识库查找；参考闻达笔记Auto	paraJson：{libraryStategy:"sogowx:3",maxItmes:2}
zsk(b=true)	开关知识库
lsdh(b=true)	开关历史对话	打开知识库时应关闭历史
speak(s)	使用TTS引擎朗读文本。	调用系统引擎
copy(s)	使用浏览器`clipboard-write`复制文本	需要相关权限

Auto 开发涉及代码段

在左侧功能栏添加内容：

func.push({
    name: "名称",
    question: async () => {
        let answer=await send(app.question)
        alert(answer)
    },
})

在下方选项卡添加内容：

app.plugins.push({ icon: 'note-edit-outline', url: "/static/wdnote/index.html" })

在指定RTST知识库查找:

find_in_memory = async (s, step, memory_name) => {
   response = await fetch("/api/find_rtst_in_memory", {
      method: 'post',
      body: JSON.stringify({
         prompt: s,
         step: step,
         memory_name: memory_name
      }),
      headers: {
         'Content-Type': 'application/json'
      }
   })
   let json = await response.json()
   console.table(json)
   app.zhishiku = json
   return json
}

上传至指定RTST知识库:

upload_rtst_zhishiku = async (title, txt,memory_name) => {
   response = await fetch("/api/upload_rtst_zhishiku", {
      method: 'post',
      body: JSON.stringify({
         title: title,
         txt: txt,
         memory_name: memory_name
      }),
      headers: { 'Content-Type': 'application/json' }
   })
   alert(await response.text())
}

保存指定RTST知识库:

save_rtst = async (memory_name) => {
   response = await fetch("/api/save_rtst_zhishiku", {
      method: 'post',
      body: JSON.stringify({
         memory_name: memory_name
      }),
      headers: { 'Content-Type': 'application/json' }
   })
   alert(await response.text())
}

访问SD_agent:

response = await fetch("/api/sd_agent", {
   method: 'post',
   body: JSON.stringify({
         prompt: `((masterpiece, best quality)), photorealistic,` + Q,
         steps: 20,
         // sampler_name: "DPM++ SDE Karras",
         negative_prompt: `paintings, sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans`
   }),
   headers: {
         'Content-Type': 'application/json'
   }
})
try {
   let json = await response.json()
   add_conversation("AI", '![](data:image/png;base64,' + json.images[0] + ")")
} catch (error) {
   alert("连接SD API失败，请确认已开启agents库，并将SD API地址设置为127.0.0.1:786")
}

部分内置 Auto 使用说明

文件名	功能
0-write_article.js	写论文：根据题目或提纲写论文
0-zsk.js	知识库增强和管理
face-recognition.js	纯浏览器端人脸检测：通过识别嘴巴开合，控制语音输入。因浏览器限制，仅本地或TLS下可用
QQ.js	QQ机器人:配置过程见文件开头注释
block_programming.js	猫猫也会的图块化编程:通过拖动图块实现简单Auto功能
1-draw_use_SD_api.js	通过agents模块（见example.config.yml`<Library>`）调用Stable Diffusion接口绘图

以上功能主要用于展示auto用法，进一步能力有待广大用户进一步发掘。

auto例程

知识库

知识库原理是在搜索后，生成一些提示信息插入到对话里面，知识库的数据就被模型知道了。rtst模式计算语义并在本地数据库中匹配；fess模式（相当于本地搜索引擎）、bing模式均调用搜索引擎搜索获取答案。

为防止爆显存和受限于模型理解能力，插入的数据不能太长，所以有字数和条数限制，这一问题可通过知识库增强Auto解决。

正常使用中，勾选右上角知识库即开启知识库。

有以下几种方案：

rtst模式，sentence_transformers+faiss进行索引，支持预先构建索引和运行中构建。
bing模式，cn.bing搜索，仅国内可用
bingsite模式，cn.bing站内搜索，仅国内可用
fess模式，本地部署的fess搜索，并进行关键词提取

rtst模式

sentence_transformers+faiss进行索引、匹配，并连同上下文返回。目前支持txt和pdf格式。

支持预先构建索引和运行中构建，其中，预先构建索引强制使用cuda，运行中构建根据config.yml(复制example.config.yml)中rtst段的device(embedding运行设备)决定，对于显存小于12G的用户建议使用CPU。

Windows预先构建索引运行：plugins/buils_rtst_default_index.bat。

Linux直接使用wenda环境执行 python plugins/gen_data_st.py

需下载模型置于model文件夹，并将txt格式语料置于txt文件夹。

使用微调模型提高知识库回答准确性

闻达用户“帛凡”，训练并提供的权重合并模型和lora权重文件，详细信息见https://huggingface.co/fb700/chatglm-fitness-RLHF ，使用该模型或者lora权重文件，对比hatglm-6b、chatglm2-6b、百川等模型，在闻达知识库平台中，总结能力可获得显著提升。

模型

GanymedeNil/text2vec-large-chinese 不再推荐，不支持英文且显存占用高
moka-ai/m3e-base 推荐

fess模式

在本机使用默认端口安装fess后可直接运行。否则需修改config.yml(复制example.config.yml)中fess_host的127.0.0.1:8080为相应值。FESS安装教程

知识库调试

清洗知识库文件

安装 utool 工具，uTools 是一个极简、插件化的桌面软件，可以安装各种使用 nodejs 开发的插件。您可以使用插件对闻达的知识库进行数据清洗。请自行安装以下推荐插件：

插件“解散文件夹”，用于将子目录的文件移动到根目录，并删除所有子目录。
插件“重复文件查找”，用于删除目录中的重复文件，原理是对比文件 md5。
插件“文件批量重命名”，用于使用正则匹配和修改文件名，并将分类后的文件名进行知识库的分区操作。

模型配置

chatGLM-6B/chatGLM2-6B

运行：run_GLM6B.bat。

模型位置等参数：修改config.yml(复制example.config.yml)。

默认参数在GTX1660Ti（6G显存）上运行良好。

chatRWKV

支持torch和cpp两种后端实现，运行：run_rwkv.bat。

模型位置等参数：见config.yml(复制example.config.yml)。

torch

可使用内置脚本对模型量化，运行：cov_torch_rwkv.bat。此操作可以加快启动速度。

在安装vc后支持一键启动CUDA加速，运行：run_rwkv_with_vc.bat。强烈建议安装！！！

cpp

可使用内置脚本对torch版模型转换和量化。运行：cov_ggml_rwkv.bat。

设置strategy诸如"Q8_0->8"即支持量化在cpu运行，速度较慢，没有显卡或者没有nvidia显卡的用户使用。

注意：默认windows版本文件为AVX2，默认Liunx版本文件是在debian sid编译的，其他linux发行版本未知。

可以查看：saharNooby/rwkv.cpp，下载其他版本，或者自行编译。

Aquila-7B

运行pip install FlagAI。注意FlagAI依赖很多旧版本的包，需要自己编译，所以如果想基于python3.11运行或者想在一个环境同时跑其他模型，建议去下懒人包
运行：run_Aquila.bat。

模型位置等参数：见config.yml(复制example.config.yml)。注意模型要在这里下：https://model.baai.ac.cn/model-detail/100101

基于本项目的二次开发

wenda-webui

项目调用闻达的 api 接口实现类似于 new bing 的功能。技术栈：vue3 + element-plus + ts

wenda's People

Contributors

Stargazers

Watchers

Forkers

shmctchina zhshch wanddy baifengbai qqr1 cymx333 nuo-o yeshuixing yawudede nanqiai iiheng 91fc hbyido mingyangbj jackrain jujulike jasonpro22 zhongpei daodaoliang fxd2000 iamleon121 gralchemoz iicedream navmeshsword meiwupangzi zhugemengyu diannaojiang eliauktm techventurebuilder soon14 ai-jie01 userzhongjieli titonichen awesome-archive fword solomonleon wupeng-engineer bobqiu give-me-yasoo bigbox smithhua great1001 jaredshuai learning-group1 369863654 linjie830914 yuanmeng1120 zhangnn520 18106574249 thlo7777 good-man-1998 jangocheng myandroid007 philipxiaoxi chalous piapiajing bofan-tunning fengwk ningpengtao-coder bwfyyy marykt qiwy junjiemao hhy5277 liu673 vicleos leaderyangzi hang-9 wangfh5 mrszyl quanmeta klonggan sharpss felixchina2000 thereluctantheroes liashdiash adian98 torney joyle tiaonmmn tian64873493 eos21 kaitiren xcz1997 liwenju0 juniorpotato arlindacor amaryllisupe angusvencent billfelix fisher1248 better319 lukerlv dongqf123 vickzhang staneh bluemain mosterwei13 jokejason tykuyh

wenda's Issues

UserWarning: ChatVectorDBChain is deprecated - please use from langchain.chains import ConversationalRetrievalChain
warnings.warn(
:914: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()User

自定义语料库出现死循环

我检查了一下，txt_out 下面只有一个文件，但你看截图，出现有三个同名文件，我确实训练过几次，难道在哪里记录了每次的训练结果？

提问后，它开始无限重复回答相同内容，然后我中断了。

每次都重启模型？？

127.0.0.1:你好
错误 'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat' 'ChatGLMForConditionalGeneration' object has no attribute 'stream_chat'
glm模型地址 C:\ChatGLM_SERVER\model\chatglm-6b
embeddings模型地址 model\simcse-chinese-roberta-wwm-ext
vectorstore保存地址 xw
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 12%|███████▏ | 1/8 [00:01<00:08, 1.26s/it]D:\AI_GLM\Wenda02\WPy64-31090\python-3.10.9.amd64\lib\site-packages\langchain\chains\conversational_retrieval\base.py:191: UserWarning: ChatVectorDBChain is deprecated - please use from langchain.chains import ConversationalRetrievalChain
warnings.warn(
:914: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
serving on http://127.0.0.1:17860
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:09<00:00, 1.19s/it]

用git，懒人包调用rwkv都报这个错误

Exception in thread Thread-1:
Traceback (most recent call last):
File "E:\LLM\wenda\WPy64-38100\python-3.8.10.amd64\lib\threading.py", line 932, in _bootstrap_inner
self.run()
File "E:\LLM\wenda\WPy64-38100\python-3.8.10.amd64\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "rwkvAPI.py", line 142, in load_model
from rwkv.model import RWKV # pip install rwkv
ModuleNotFoundError: No module named 'rwkv'

疑似不支持chatglm-int4系列模型

错误 expected scalar type Half but found Float expected scalar type Half but found Float
没有11G以上显存，无法确认原生fp16是否正常运行
windows10
cuda12
pytorch2.0.0
python3.10

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

系统: WSL2

准备了一个空虚拟环境, 选用传统索引方式, 安装依赖requirements-sy.txt;
运行run_data_processing.sh, 无问题;
运行run_GLM6B.sh, 提示没有安装torch, 于是安装;
运行run_GLM6B.sh, 报错RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

完整的报错输出如下. 这个错误我一直遇到, 不管是用传统索引还是用语义分割.

glm模型地址 model/chatglm-6b
rwkv模型地址 model/RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth
rwkv模型参数 cuda fp16
日志记录 True
知识库类型 s
LLM模型类型 glm6b
chunk_size 400
chunk_count 3
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
serving on 0.0.0.0:17860 view at http://127.0.0.1:17860
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
<frozen importlib._bootstrap>:1049: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/jieba/analyse/tfidf.py:47: ResourceWarning: unclosed file <_io.BufferedReader name='/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/jieba/analyse/idf.txt'>
  content = open(new_idf_path, 'rb').read().decode('utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
知识库加载完成
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Exception in thread Thread-1 (load_model):
Traceback (most recent call last):
  File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/threading.py", line 1038, in _bootstrap_inner
    self.run()
  File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/threading.py", line 975, in run
    self._target(*self._args, **self._kwargs)
  File "/home/wangfh5/AIGC_projects/wenda/wenda.py", line 117, in load_model
    LLM.load_model()
  File "/home/wangfh5/AIGC_projects/wenda/plugins/llm_glm6b.py", line 25, in load_model
    tokenizer = AutoTokenizer.from_pretrained(settings.glm_path, local_files_only=True, trust_remote_code=True)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 658, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1804, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1959, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangfh5/.cache/huggingface/modules/transformers_modules/local/tokenization_chatglm.py", line 205, in __init__
    self.sp_tokenizer = SPTokenizer(vocab_file, num_image_tokens=num_image_tokens)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangfh5/.cache/huggingface/modules/transformers_modules/local/tokenization_chatglm.py", line 61, in __init__
    self.text_tokenizer = TextTokenizer(vocab_file)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangfh5/.cache/huggingface/modules/transformers_modules/local/tokenization_chatglm.py", line 22, in __init__
    self.sp.Load(model_path)
  File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

建议后续增加接入Stable Diffusion的API接口对接功能

辛苦作者，建议后续增加接入Stable Diffusion的API接口对接功能，类似Text generation webUI这个项目

想要加载ChatGLM-6B应该怎么做？

不是int4的模型，是FP16（无量化）的那个模型。

如何实现类 ChatPDF 功能

能阅读pdf吗

请问ChatPDF功能和自己复制一段文字进输入框的区别是什么呀，我似乎用的不对

发现给一段超长文本txt，比如全本三国演义，似乎是随机选一段加在我的问题前面？这和我自己加的区别是什么呀~
我把输入文字上限改高了一下，发现4000字就快占满4090的24G显存，请问这正常吗？
感谢大佬的项目，安装很方便。

4月9日版那个警告仍然存在，如何去掉？

4月9日版那个警告仍然存在，加载过程如下：
glm模型地址 ..\ChatGLM-6B\model
rwkv模型地址 ..\RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth
rwkv模型参数 cuda fp16i8 *18+
日志记录 True
知识库类型 x
embeddings模型地址 model\simcse-chinese-roberta-wwm-ext
vectorstore保存地址 xw
chunk_size 200
chunk_count 3
serving on 0.0.0.0:17860 view at http://127.0.0.1:17860/
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
知识库加载完成
C:\Users\ASUS/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py:1229: DeprecationWarning: invalid escape sequence ?
["?", "？"],
Loading checkpoint shards: 12%|███████▏ | 1/8 [00:00<00:05, 1.33it/s]D:\wenda\WPy64-38100\python-3.8.10.amd64\lib\site-packages\langchain\chains\conversational_retrieval\base.py:191: UserWarning: ChatVectorDBChain is deprecated - please use from langchain.chains import ConversationalRetrievalChain
warnings.warn(
知识库加载完成
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:05<00:00, 1.58it/s]
模型加载完成

4月5日，仅chatGLM，知识库为sentence-transformers方式发生错误，正在重新加载模型

4月5日，仅chatGLM，知识库为sentence-transformers方式出现错误
报错如下，发生错误，正在重新加载模型

lm模型地址 model\chatglm-6b-int4
embeddings模型地址 model\simcse-chinese-roberta-wwm-ext
vectorstore保存地址 xw
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
D:\zwj\AI\wenda\45sentence-transformers\WPy64-31090\python-3.10.9.amd64\lib\site-packages\langchain\chains\conversational_retrieval\base.py:191: UserWarning: ChatVectorDBChain is deprecated - please use from langchain.chains import ConversationalRetrievalChain
warnings.warn(
:914: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
serving on http://127.0.0.1:17860
Symbol cudaLaunchKernel not found in C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll
No compiled kernel found.
Compiling kernels : C:\Users\zwj.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c
Compiling gcc -O3 -pthread -fopenmp -std=c99 C:\Users\zwj.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\zwj.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so
'gcc' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
Compile failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 C:\Users\zwj.cache\huggingface\modules\transformers_modules\local\quantization_kernels.c -shared -o C:\Users\zwj.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
Kernels compiled : C:\Users\zwj.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
127.0.0.1:你好
错误 Library cublasLt is not initialized Library cublasLt is not initialized
glm模型地址 model\chatglm-6b-int4
embeddings模型地址 model\simcse-chinese-roberta-wwm-ext
vectorstore保存地址 xw
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

有没有考虑视频教程鸭

有没有考虑一下做个视频教程鸭，对我这种小白也友好还能顺带宣传🥰

懒人包解压出错

解压提示“文件已损坏或不是压缩包”，重新下载过很多次都这样，重命名也试过，各种解压软件都试了，求解答呀

666

怎么联系你？

懒人包win7不能正常用

"Could not load dynamic library 'cudart64_110.dll';
文件是有的，还需要装啥？

模型加载问题

我使用的是[ChatGLM-6B]的官方模型，因为是6G显卡，所以加载模型的时候是通过修改参数（quantize(4)）让6G显卡跑起来的。
请问一下这个项目应该在哪里修改模型参数？
或者官方量化后的4-bit模型在哪里下载？我一直没找到。
先谢谢了

老大局域网使用，如何配置啊？

使用的懒人包，目前只能127.0.0.0上使用，局域网访问该服务器，如何设定？

新版为什么两个文件，后缀002是什么文件

建议加入对多显卡的支持

给公司做知识库的，建议能够对4个及以上的显卡做兼容，傻瓜式一键开启和配置。

谁来给个step by step的安装使用说明

pip安装之后，开始迷茫了，不知道接下来做啥啊啊啊。。。。
难道需要去问问gpt咩。。。

这一句报错，因为huggingface没登录，怎么改成本地已经下载的呀

embeddings = HuggingFaceEmbeddings(model_name=model_name)
这一句报错，因为huggingface没登录，怎么改成本地已经下载的呀

发生异常: RepositoryNotFoundError
401 Client Error. (Request ID: Root=1-642c3974-240560a346fb2eb565ed790b)

Repository Not Found for url: https://huggingface.co/api/models/sentence-transformers/text2vec-base-chinese.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/sentence-transformers/text2vec-base-chinese

The above exception was the direct cause of the following exception:

File "E:\AI模型\wenda-main\GLM6BAPI.py", line 152, in
embeddings = HuggingFaceEmbeddings(model_name=model_name)
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-642c3974-240560a346fb2eb565ed790b)

错误！

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Exception in thread Thread-1 (load_model):
Traceback (most recent call last):
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\threading.py", line 953, in run
self._target(*self.args, **self.kwargs)
File "X:\wenda.7z\GLM6BAPI.py", line 144, in load_model
model = AutoModel.from_pretrained(glm_path, local_files_only=True, trust_remote_code=True)
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\site-packages\transformers\models\auto\auto_factory.py", line 459, in from_pretrained
return model_class.from_pretrained(
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\site-packages\transformers\modeling_utils.py", line 2362, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Users\1/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 940, in init
self.quantize(self.config.quantization_bit, self.config.quantization_embeddings, use_quantization_cache=True, empty_init=True)
File "C:\Users\1/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1262, in quantize
from .quantization import quantize, QuantizedEmbedding, QuantizedLinear, load_cpu_kernel
File "C:\Users\1/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 13, in
from cpm_kernels.kernels.base import LazyKernelCModule, KernelFunction, round_up
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\site-packages\cpm_kernels_init.py", line 1, in
from . import library
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\site-packages\cpm_kernels\library_init.py", line 1, in
from . import nvrtc
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\site-packages\cpm_kernels\library\nvrtc.py", line 5, in
nvrtc = Lib("nvrtc")
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\site-packages\cpm_kernels\library\base.py", line 45, in init
lib_path = windows_find_lib(self.__name)
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\site-packages\cpm_kernels\library\base.py", line 39, in windows_find_lib
return lookup_dll(lib_name)
File "X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\site-packages\cpm_kernels\library\base.py", line 16, in lookup_dll
for name in os.listdir(path):
NotADirectoryError: [WinError 267] 目录名称无效。: 'C:\Users\1\AppData\Local\Programs\Python\Python311\python.exe'
X:\wenda.7z\WPy64-31090\python-3.10.9.amd64\lib\site-packages\langchain\chains\conversational_retrieval\base.py:191: UserWarning: ChatVectorDBChain is deprecated - please use from langchain.chains import ConversationalRetrievalChain
warnings.warn(
:914: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
serving on http://127.0.0.1:17860

如何清空知识库？

run_data出错。

webui能够正常使用。

glm模型地址 G:\ChatGLM\WENDAglm\wenda\model\chatglm-6b
rwkv模型地址 ..\RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth
rwkv模型参数 cuda fp16i8 *18+
日志记录 True
chunk_size 100
chunk_count 3
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\69030\AppData\Local\Temp\jieba.cache
Loading model cost 0.516 seconds.
Prefix dict has been built successfully.
Traceback (most recent call last):
File "G:\ChatGLM\WENDAglm\wenda\gen_data.py", line 23, in
data = f.read()
File "G:\ChatGLM\WENDAglm\wenda\WPy64-31090\python-3.10.9.amd64\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
File "G:\ChatGLM\WENDAglm\wenda\WPy64-31090\python-3.10.9.amd64\lib\encodings\utf_16.py", line 61, in _buffer_decode
codecs.utf_16_ex_decode(input, errors, 0, final)
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 88-89: illegal encoding

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "G:\ChatGLM\WENDAglm\wenda\gen_data.py", line 26, in
data = f.read()
File "G:\ChatGLM\WENDAglm\wenda\WPy64-31090\python-3.10.9.amd64\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 3: invalid continuation byte
请按任意键继续. . .

Bug report:UnicodeDecodeError when running "run_data_processing.bat"

感谢作者的付出，赛博菩萨。
一、问题背景：使用的新版的集成包，准备了一个txt文档，且下载好了“simcse-chinese-roberta-wwm-ext”放入models文件夹，想让其构建知识库。
二、问题描述：按照项目运行说明，运行"run_data_processing.bat"文件，出现如下编码格式报错信息：

embeddings模型地址 model\simcse-chinese-roberta-wwm-ext
vectorstore保存地址 xw
Traceback (most recent call last):
File "D:\pythonProject\wenda\gen_data.py", line 20, in
data = f.read()
File "D:\pythonProject\wenda\WPy64-31090\python-3.10.9.amd64\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
File "D:\pythonProject\wenda\WPy64-31090\python-3.10.9.amd64\lib\encodings\utf_16.py", line 61, in _buffer_decode
codecs.utf_16_ex_decode(input, errors, 0, final)
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 192-193: illegal UTF-16 surrogate

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\pythonProject\wenda\gen_data.py", line 23, in
data = f.read()
File "D:\pythonProject\wenda\WPy64-31090\python-3.10.9.amd64\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 81: invalid continuation byte
请按任意键继续. . .

刚刚更新了一下，网页打不开了

Error: 500 Internal Server Error
Sorry, the requested URL 'http://127.0.0.1:17860/' caused an error:

Internal Server Error
Exception:
AttributeError("'LocalResponse' object has no attribute 'setHeader'")
Traceback:
Traceback (most recent call last):
File "C:\python\lib\site-packages\bottle.py", line 876, in _handle
return route.call(**args)
File "C:\python\lib\site-packages\bottle.py", line 1759, in wrapper
rv = callback(*a, **ka)
File "E:\wenda\wenda.py", line 19, in index
response.setHeader( "Pragma", "no-cache" );
AttributeError: 'LocalResponse' object has no attribute 'setHeader'

4月9日的懒人包，知识库类型无论是选s，还是选x，个人数据都无法加载进去。

如题

求帮助

Traceback (most recent call last):
File "C:\Users\adria\AI\chatglm-6b\wenda-main\GLM6BAPI.py", line 149, in
import zhishiku
File "C:\Users\adria\AI\chatglm-6b\wenda-main\zhishiku.py", line 5, in
ix = storage.open_index()
File "C:\Users\adria\anaconda3\envs\wendaGLM\lib\site-packages\whoosh\filedb\filestore.py", line 176, in open_index
return indexclass(self, schema=schema, indexname=indexname)
File "C:\Users\adria\anaconda3\envs\wendaGLM\lib\site-packages\whoosh\index.py", line 421, in init
TOC.read(self.storage, self.indexname, schema=self._schema)
File "C:\Users\adria\anaconda3\envs\wendaGLM\lib\site-packages\whoosh\index.py", line 618, in read
raise EmptyIndexError("Index %r does not exist in %r"
whoosh.index.EmptyIndexError: Index 'MAIN' does not exist in FileStorage('sy')

不知道是不是环境出问题了？配置好好几次都是一样的错误

【bug？】重启项目后，显示如下错误？

设备配置为：32g 内存，3080 16g 显卡，系统为win11 企业版 22h2
运行包为：懒人版
重启运行后第一遍无任何问题，ctrl+c中断退出后，再次运行，出现如下错误：
D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\site-packages\langchain\chains\conversational_retrieval\base.py:191: UserWarning: ChatVectorDBChain is deprecated - please use from langchain.chains import ConversationalRetrievalChain
warnings.warn(
:914: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
serving on http://127.0.0.1:17860
Exception in thread Thread-1 (load_model):
Traceback (most recent call last):
File "D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "D:\pub\wenda\GLM6BAPI.py", line 144, in load_model
model = AutoModel.from_pretrained(glm_path, local_files_only=True, trust_remote_code=True)
File "D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\site-packages\transformers\models\auto\auto_factory.py", line 459, in from_pretrained
return model_class.from_pretrained(
File "D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\site-packages\transformers\modeling_utils.py", line 2362, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Users\adogs/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 927, in init self.lm_head = skip_init(
File "D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\utils\init.py", line 52, in skip_init
return module_cls(*args, **kwargs).to_empty(device=final_device)
File "D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1024, in to_empty
return self._apply(lambda t: torch.empty_like(t, device=device))
File "D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch\nn\modules\module.py", line 1024, in
return self.apply(lambda t: torch.empty_like(t, device=device))
File "D:\pub\wenda\WPy64-31090\python-3.10.9.amd64\lib\site-packages\torch_refs_init.py", line 4254, in empty_like
return torch.empty_strided(
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1069285376 bytes.

模型似乎加载失败

您好，因为我使用的是Ubuntu的系统，所以我将启动代码改为 PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:32 && python GLM6BAPI.py

webui可以正常使用，但是等待模型加载完（我看到nvidia-smi中有显存的使用）后，我提问发现跟模型未加载完的情况一样，
glm的回复都是三个.

请问我这个模型是否加载了？或者是我哪里设置的有问题

日志：

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
<frozen importlib._bootstrap>:914: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
serving on http://10.0.16.61:8502
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
/home/final/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b/fdb7a601d8f8279806124542e11549bdd76f62f6/modeling_chatglm.py:1229: DeprecationWarning: invalid escape sequence '\?'
  ["\?", "？"],
<frozen importlib._bootstrap>:914: ImportWarning: _ImportRedirect.find_spec() not found; falling back to find_module()
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:14<00:00,  1.75s/it]

其中：
我将现在代码修改为这个，因为我并没有找到代码中提到的 sentence-transformers/text2vec-base-chinese

model_name = "sentence-transformers/simcse-chinese-roberta-wwm-ext"

懒人包加载模型错误

Describe the bug
懒人包加载模型错误,

To Reproduce

双击运行数据训练, 成功;
双击运行模型, 失败

输出内容如下:

glm模型地址 model\chatglm-6b-int4
rwkv模型地址 ..\RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth
rwkv模型参数 cuda fp16i8 *18+
日志记录 True
知识库类型 s
chunk_size 200
chunk_count 3
serving on 0.0.0.0:17860 view at http://127.0.0.1:17860
D:\免安装软件\wenda\WPy64-38100\python-3.8.10.amd64\lib\site-packages\jieba\analyse\tfidf.py:47: ResourceWarning: unclosed file <_io.BufferedReader name='D:\\免安装软件\\wenda\\WPy64-38100\\python-3.8.10.amd64\\lib\\site-packages\\jieba\\analyse\\idf.txt'>
  content = open(new_idf_path, 'rb').read().decode('utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
知识库加载完成
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
C:\Users\Fohong Wang/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py:1229: DeprecationWarning: invalid escape sequence \?
  ["\?", "？"],
Symbol cudaLaunchKernel not found in C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common\cudart64_65.dll
No compiled kernel found.
Compiling kernels : C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so
gcc: error: C:\Users\Fohong: No such file or directory
gcc: error: Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c: No such file or directory
gcc: error: Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so: No such file or directory
gcc: fatal error: no input files
compilation terminated.
Compile failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.c -shared -o C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
gcc: error: C:\Users\Fohong: No such file or directory
gcc: error: Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.c: No such file or directory
gcc: error: Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so: No such file or directory
gcc: fatal error: no input files
compilation terminated.
Kernels compiled : C:\Users\Fohong Wang\.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
模型加载完成

这里报错似乎是因为我的用户名有空格, 路径又没有加""导致命令错误, 于是我自己手动把这两个文件编译好了, 但再次运行还是同样的错误.

如此之外, cmd 还弹出了如下警告.

run_data_processing.bat 少了一对引号

原文件：
%PYTHON% plugins/gen_data_%zsk_type%.py

plugins 路径这里应该加上引号：

%PYTHON% "plugins/gen_data_%zsk_type%.py"

局域网远程无法访问

RT
局域网远程访问ip+17860端口访问不了webui，想问问怎么做局域网部署

希望增加对返回代码的缩进的显示以及高亮提示

前端显示的代码没有缩进以及高亮

但是看后端输出的其实是有的：

能否在colab中运行

我尝试使用colab运行此项目

from google.colab import drive
drive.mount('/content/drive')

%cd /content/drive/MyDrive/
!git clone https://github.com/l15y/wenda

# 安装依赖
%cd /content/drive/MyDrive/wenda
!pip install -r requirements.txt
%mkdir /content/drive/MyDrive/wenda/model
%cd /content/drive/MyDrive/wenda/model
!git lfs install
!git clone https://huggingface.co/THUDM/chatglm-6b-int4

%cd /content/drive/MyDrive/wenda
!chmod +x run_GLM6B.sh
!./run_GLM6B.sh

但是报错了

/content/drive/MyDrive/wenda
glm模型地址 model/chatglm-6b-int4
rwkv模型地址 model/RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth
rwkv模型参数 cuda fp16i8 *18+
日志记录 True
chunk_size 200
chunk_count 1
Traceback (most recent call last):
  File "/content/drive/MyDrive/wenda/GLM6BAPI.py", line 141, in <module>
    import zhishiku
  File "/content/drive/MyDrive/wenda/zhishiku.py", line 5, in <module>
    ix = storage.open_index()
  File "/usr/local/lib/python3.9/dist-packages/whoosh/filedb/filestore.py", line 176, in open_index
    return indexclass(self, schema=schema, indexname=indexname)
  File "/usr/local/lib/python3.9/dist-packages/whoosh/index.py", line 421, in __init__
    TOC.read(self.storage, self.indexname, schema=self._schema)
  File "/usr/local/lib/python3.9/dist-packages/whoosh/index.py", line 618, in read
    raise EmptyIndexError("Index %r does not exist in %r"
whoosh.index.EmptyIndexError: Index 'MAIN' does not exist in FileStorage('sy')
2023-04-08 11:35:16.730034: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-04-08 11:35:17.027958: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-08 11:35:18.109132: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/content/drive/MyDrive/wenda/GLM6BAPI.py", line 130, in load_model
    from transformers import AutoModel, AutoTokenizer
  File "/usr/local/lib/python3.9/dist-packages/transformers/__init__.py", line 30, in <module>
    from . import dependency_versions_check
  File "/usr/local/lib/python3.9/dist-packages/transformers/dependency_versions_check.py", line 17, in <module>
    from .utils.versions import require_version, require_version_core
  File "/usr/local/lib/python3.9/dist-packages/transformers/utils/__init__.py", line 34, in <module>
    from .generic import (
  File "/usr/local/lib/python3.9/dist-packages/transformers/utils/generic.py", line 33, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/__init__.py", line 51, in <module>
    from ._api.v2 import compat
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/_api/v2/compat/__init__.py", line 37, in <module>
    from . import v1
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/_api/v2/compat/v1/__init__.py", line 31, in <module>
    from . import compat
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/_api/v2/compat/v1/compat/__init__.py", line 38, in <module>
    from . import v2
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/_api/v2/compat/v1/compat/v2/__init__.py", line 28, in <module>
    from tensorflow._api.v2.compat.v2 import __internal__
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/_api/v2/compat/v2/__init__.py", line 33, in <module>
    from . import compat
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/_api/v2/compat/v2/compat/__init__.py", line 38, in <module>
    from . import v2
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/_api/v2/compat/v2/compat/v2/__init__.py", line 332, in <module>
    from tensorboard.summary._tf import summary
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/summary/__init__.py", line 22, in <module>
    from tensorboard.summary import v1  # noqa: F401
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/summary/v1.py", line 23, in <module>
    from tensorboard.plugins.histogram import summary as _histogram_summary
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/plugins/histogram/summary.py", line 35, in <module>
    from tensorboard.plugins.histogram import summary_v2
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/plugins/histogram/summary_v2.py", line 35, in <module>
    from tensorboard.util import tensor_util
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/util/tensor_util.py", line 20, in <module>
    from tensorboard.compat.tensorflow_stub import dtypes, compat, tensor_shape
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/compat/tensorflow_stub/__init__.py", line 22, in <module>
    from .dtypes import as_dtype  # noqa
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py", line 19, in <module>
    from . import pywrap_tensorflow
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/compat/tensorflow_stub/pywrap_tensorflow.py", line 22, in <module>
    from .io import gfile
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/compat/tensorflow_stub/io/__init__.py", line 17, in <module>
    from . import gfile  # noqa
  File "/usr/local/lib/python3.9/dist-packages/tensorboard/compat/tensorflow_stub/io/gfile.py", line 40, in <module>
    import fsspec
  File "/usr/local/lib/python3.9/dist-packages/fsspec/__init__.py", line 3, in <module>
    from . import _version, caching
  File "/usr/local/lib/python3.9/dist-packages/fsspec/caching.py", line 9, in <module>
    from concurrent.futures import ThreadPoolExecutor
  File "<frozen importlib._bootstrap>", line 1055, in _handle_fromlist
  File "/usr/lib/python3.9/concurrent/futures/__init__.py", line 49, in __getattr__
    from .thread import ThreadPoolExecutor as te
  File "/usr/lib/python3.9/concurrent/futures/thread.py", line 37, in <module>
    threading._register_atexit(_python_exit)
  File "/usr/lib/python3.9/threading.py", line 1414, in _register_atexit
    raise RuntimeError("can't register atexit after shutdown")
RuntimeError: can't register atexit after shutdown

请问有考虑过多平台部署吗，例如部署在linux GPU服务器上

有意pr

下载懒人包，需要requirement文件安装环境吗

下载懒人包，直接压缩，打开.bat运行就可以吗？

刚刚更新了一下，用4月10日的包，自定义语料仍不起作用，请大佬看看是怎么回事。

加载过程如下：

WebUI 端口号 17860
glm模型地址 ..\ChatGLM-6B\model
rwkv模型地址 ..\RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth
rwkv模型参数 cuda fp16i8 *18+
rwkv LoRA 微调启用: ""
日志记录 True
知识库类型 s
LLM模型类型 glm6b
chunk_size 200
chunk_count 3
serving on 0.0.0.0:17860 view at http://127.0.0.1:17860
D:\wenda4.10\WPy64-38100\python-3.8.10.amd64\lib\site-packages\jieba\analyse\tfidf.py:47: ResourceWarning: unclosed file <_io.BufferedReader name='D:\wenda4.10\WPy64-38100\python-3.8.10.amd64\lib\site-packages\jieba\analyse\idf.txt'>
content = open(new_idf_path, 'rb').read().decode('utf-8')
ResourceWarning: Enable tracemalloc to get the object allocation traceback
知识库加载完成
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
C:\Users\ASUS/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py:1229: DeprecationWarning: invalid escape sequence ?
["?", "？"],
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:08<00:00, 1.11s/it]
glm int4量化中，如果已经是量化模型或不需要量化，不要开启
模型加载完成

不错，看描述很赞，关注

看描述不错，
希望能加一些如何使用自有资源的描述说明
比如本地一些文档如何喂入的说明。。。
加油噢，关注了

内网访问好像无效，需要做什么配置？

现在默认启动只能本机运行，我想内网的手机也能访问，是否有网络请求配置相关的参数设置？

gen_data.py中的TXT文件数据格式是什么样的？可以放两条示例数据吗

设置 chunk_size 数值没有效果

Describe the bug
设置 chunk_size 数值没有效果

修改 settings.bat 当中的参数 set chunk_size=200 到更高的数值，实际运行时，观察控制台输出的参考文本(红色)，其截取的长度比修改前没有变化，长度都是约1000字节上下，大概300个中文字左右。
To Reproduce
修改 settings.bat 当中的参数 set chunk_size= 到不同的数值可以复现此问题，例如 400, 800，2000 等。

run_data_processing 时提示找不到库simcse-chinese-roberta-wwm-ext.

glm模型地址 model/chatglm-6b rwkv模型地址 model/RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth rwkv模型参数 cuda fp16 日志记录 True 知识库类型 x embeddings模型地址 model/simcse-chinese-roberta-wwm-ext vectorstore保存地址 xw LLM模型类型 glm6b chunk_size 400 chunk_count 3 开始读取数据 Traceback (most recent call last): File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status response.raise_for_status() File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/model/simcse-chinese-roberta-wwm-ext The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/wangfh5/AIGC_projects/wenda/plugins/gen_data_x.py", line 39, in embeddings = HuggingFaceEmbeddings(model_name=settings.embeddings_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/langchain/embeddings/huggingface.py", line 39, in init self.client = sentence_transformers.SentenceTransformer(self.model_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 87, in init snapshot_download(model_name_or_path, File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/sentence_transformers/util.py", line 442, in snapshot_download model_info = _api.model_info(repo_id=repo_id, revision=revision, token=token) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/huggingface_hub/hf_api.py", line 1624, in model_info hf_raise_for_status(r) File "/home/wangfh5/anaconda3/envs/wenda/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status raise RepositoryNotFoundError(message, response) from e huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6432b193-63f16cd13cfc24fd7ce973a3) Repository Not Found for url: https://huggingface.co/api/models/model/simcse-chinese-roberta-wwm-ext.
Please make sure you specified the correct repo_id and repo_type.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

使用自定义语料库报错

[Document(page_content='药品注册管理办法\n\n（2020年1月22日国家市场监督管理总局令第27号公布自2020年7月1日起施行）\n\n第一章总则', metadata={'source': 'txt_out\药品注册管理办法2020.txt'}), Document(page_content='第一百零九条国家药品监督管理局依法向社会公布药品注册审批事项清单及法律依据、审批要求和办理时限，向申请人公开药品注册进度，向社会公开批准上市药品的审评结论和依据以及监督检查发现的违法违规行为，接受社会监督。\n\n批准上市药品的说明书应当向社会公开并及时更新。其中，疫苗还应当公开标签内容并及时更新。', metadata={'source': 'txt_out\药品注册管理办法2020.txt'}), Document(page_content='申请人取得药品注册证书后，为药品上市许可持有人（以下简称持有人）。\n\n第四条药品注册按照中药、化学药和生物制品等进行分类注册管理。\n\n中药注册按照中药创新药、中药改良型新药、古代经典名方中药复方制剂、同名同方药等进行分类。\n\n化学药注册按照化学药创新药、化学药改良型新药、仿制药等进行分类。\n\n生物制品注册按照生物制品创新药、生物制品改良型新药、已上市生物制品（含生物类似药）等进行分类。', metadata={'source': 'txt_out\药品注册管理办法2020.txt'})]
127.0.0.1:system:结合以下文段, 用中文回答用户问题。如果无法从中得到答案，忽略文段内容并用中文回答用户问题。

药品注册管理办法
（2020年1月22日国家市场监督管理总局令第27号公布自2020年7月1日起施行）
第一章总则

第一百零九条国家药品监督管理局依法向社会公布药品注册审批事项清单及法律依据、审批要求和办理时限，向申请人公开药品注册进度，向社会公开批准上市药品的审评结论和依据以及监督检查发现的违法违规行为，接受社会监督。
批准上市药品的说明书应当向社会公开并及时更新。其中，疫苗还应当公开标签内容并及时更新。

申请人取得药品注册证书后，为药品上市许可持有人（以下简称持有人）。
第四条药品注册按照中药、化学药和生物制品等进行分类注册管理。
中药注册按照中药创新药、中药改良型新药、古代经典名方中药复方制剂、同名同方药等进行分类。
化学药注册按照化学药创新药、化学药改良型新药、仿制药等进行分类。
生物制品注册按照生物制品创新药、生物制品改良型新药、已上市生物制品（含生物类似药）等进行分类。

user:最新药品注册管理办法发布日期

错误 local variable 'ctx' referenced before assignment local variable 'ctx' referenced before assignment
附件是我导入的语料：

药品注册管理办法2020.txt

用懒人包能启动，但一直加载模型失败

后台打印如下 127.0.0.1:who are you? 错误 Library cublasLt is not initialized Library cublasLt is not initialized

请问如何解决？

linux-3090使用glm-gpu模型问题

1.glm60B已经在3090跑起来可以用了
2.在wenda里面也进行了路径替换
3.创建了txt目录，输了多行数据
4.执行脚本适合报错如下
5.求救作者我是哪里出问题了
glm模型地址 /home/hycan/HDD1/opt/zck/nlp/ChatGLM-6B/model
rwkv模型地址 model/RWKV-4-Raven-7B-v7-ChnEng-20230404-ctx2048.pth
rwkv模型参数 cuda fp16i8 *18+
日志记录 True
chunk_size 200
chunk_count 1
Traceback (most recent call last):
File "GLM6BAPI.py", line 140, in
import zhishiku
File "/home/hycan/HDD1/opt/zck/nlp/wenda/zhishiku.py", line 5, in
ix = storage.open_index()
File "/usr/local/anaconda3/lib/python3.8/site-packages/whoosh/filedb/filestore.py", line 176, in open_index
return indexclass(self, schema=schema, indexname=indexname)
File "/usr/local/anaconda3/lib/python3.8/site-packages/whoosh/index.py", line 421, in init
TOC.read(self.storage, self.indexname, schema=self._schema)
File "/usr/local/anaconda3/lib/python3.8/site-packages/whoosh/index.py", line 618, in read
raise EmptyIndexError("Index %r does not exist in %r"
whoosh.index.EmptyIndexError: Index 'MAIN' does not exist in FileStorage('sy')
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:08<00:00, 1.06s/it]
模型加载完成

支持多显卡么，怎么配置

目前有2张卡，请问怎么配置

[help] 运行“运行GLM6B.bat”报错

windows10
cuda12
pytorch2.0.0
python3.10
已下载[simcse-chinese-roberta-wwm-ext并放在sentence-transformers\simcse-chinese-roberta-wwm-ext。并完整安装requirements.txt
修改了GLM6BAPI.py的138行model_path="D:/chatglm-6B-int4"（chatglm的绝对路径）
修改了model_name = "D:/Python/Python310/Lib/site-packages/sentence_transformers/simcse-chinese-roberta-wwm-ext"（simcse-chinese-roberta-wwm-ext的绝对路径）
运行“PS D:\Desktop\wenda-main> .\运行GLM6B.bat”
报错
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
File "D:\Desktop\wenda-main\GLM6BAPI.py", line 162, in
vectorstore = FAISS.load_local('xw', embeddings=embeddings)
File "D:\Python\Python310\lib\site-packages\langchain\vectorstores\faiss.py", line 406, in load_local
index = faiss.read_index(str(path / "index.faiss"))
File "D:\Python\Python310\lib\site-packages\faiss\swigfaiss.py", line 9651, in read_index
return _swigfaiss.read_index(*args)
RuntimeError: Error in __cdecl faiss::FileIOReader::FileIOReader(const char *) at D:\a\faiss-wheels\faiss-wheels\faiss\faiss\impl\io.cpp:68: Error: 'f' failed: could not open xw\index.faiss for reading: No such file or directory
No compiled kernel found.
Compiling kernels : C:\Users\zx.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\zx.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\zx.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so
'gcc' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
Compile failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 C:\Users\zx.cache\huggingface\modules\transformers_modules\local\quantization_kernels.c -shared -o C:\Users\zx.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
Kernels compiled : C:\Users\zx.cache\huggingface\modules\transformers_modules\local\quantization_kernels.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
Exception ignored in: <module 'threading' from 'D:\Python\Python310\lib\threading.py'>
Traceback (most recent call last):
File "D:\Python\Python310\lib\threading.py", line 1567, in _shutdown
lock.acquire()
KeyboardInterrupt: