thudm / visualglm-6b Goto Github PK

View Code? Open in Web Editor NEW

4.1K 40.0 412.0 18.57 MB

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

License: Apache License 2.0

Python 94.01% Shell 5.99%

chatglm-6b gpt multi-modal

visualglm-6b's Introduction

VisualGLM-6B

🤗 HF Repo • ⚒️ SwissArmyTransformer (sat) • 🐦 Twitter

• 📃 [CogView@NeurIPS 21] [GitHub] • 📃 [GLM@ACL 22] [GitHub]

👋 加入我们的 Slack 和 WeChat

News

[2023.10] 欢迎关注智谱AI新一代多模态对话模型CogVLM（ https://github.com/THUDM/CogVLM ），采用视觉专家新架构，在10项权威经典多模态任务上取得第一名。目前开源CogVLM-17B英文模型，即将基于GLM开源中文模型。

介绍

VisualGLM-6B is an open-source, multi-modal dialog language model that supports images, Chinese, and English. The language model is based on ChatGLM-6B with 6.2 billion parameters; the image part builds a bridge between the visual model and the language model through the training of BLIP2-Qformer, with the total model comprising 7.8 billion parameters. Click here for English version.

VisualGLM-6B 是一个开源的，支持图像、中文和英文的多模态对话语言模型，语言模型基于 ChatGLM-6B，具有 62 亿参数；图像部分通过训练 BLIP2-Qformer 构建起视觉模型与语言模型的桥梁，整体模型共78亿参数。

VisualGLM-6B 依靠来自于 CogView 数据集的30M高质量中文图文对，与300M经过筛选的英文图文对进行预训练，中英文权重相同。该训练方式较好地将视觉信息对齐到ChatGLM的语义空间；之后的微调阶段，模型在长视觉问答数据上训练，以生成符合人类偏好的答案。

VisualGLM-6B 由 SwissArmyTransformer(简称sat) 库训练，这是一个支持Transformer灵活修改、训练的工具库，支持Lora、P-tuning等参数高效微调方法。本项目提供了符合用户习惯的huggingface接口，也提供了基于sat的接口。

结合模型量化技术，用户可以在消费级的显卡上进行本地部署（INT4量化级别下最低只需6.3G显存）。

VisualGLM-6B 开源模型旨在与开源社区一起推动大模型技术发展，恳请开发者和大家遵守开源协议，勿将该开源模型和代码及基于该开源项目产生的衍生物用于任何可能给国家和社会带来危害的用途以及用于任何未经过安全评估和备案的服务。目前，本项目官方未基于 VisualGLM-6B 开发任何应用，包括网站、安卓App、苹果 iOS应用及 Windows App 等。

由于 VisualGLM-6B 仍处于v1版本，目前已知其具有相当多的局限性，如图像描述事实性/模型幻觉问题，图像细节信息捕捉不足，以及一些来自语言模型的局限性。尽管模型在训练的各个阶段都尽力确保数据的合规性和准确性，但由于 VisualGLM-6B 模型规模较小，且模型受概率随机性因素影响，无法保证输出内容的准确性，且模型易被误导（详见局限性部分）。在VisualGLM之后的版本中，将会着力对此类问题进行优化。本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。

样例

VisualGLM-6B 可以进行图像的描述的相关知识的问答。

也能结合常识或提出有趣的观点，点击展开/折叠更多样例

友情链接

XrayGLM 是基于visualGLM-6B在X光诊断数据集上微调的X光诊断问答的项目，能根据X光片回答医学相关询问。

点击查看样例

StarGLM 是基于Chat/visualGLM-6B在天文数据集上微调的项目，能回答变星光变曲线相关的信息。

点击查看样例

使用

模型推理

使用pip安装依赖

pip install -i https://pypi.org/simple -r requirements.txt
# 国内请使用aliyun镜像，TUNA等镜像同步最近出现问题，命令如下
pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt

此时默认会安装deepspeed库（支持sat库训练），此库对于模型推理并非必要，同时部分Windows环境安装此库时会遇到问题。如果想绕过deepspeed安装，我们可以将命令改为

pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements_wo_ds.txt
pip install -i https://mirrors.aliyun.com/pypi/simple/ --no-deps "SwissArmyTransformer>=0.4.4"

如果使用Huggingface transformers库调用模型（也需要安装上述依赖包！），可以通过如下代码（其中图像路径为本地路径）：

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda()
image_path = "your image path"
response, history = model.chat(tokenizer, image_path, "描述这张图片。", history=[])
print(response)
response, history = model.chat(tokenizer, image_path, "这张图片可能是在什么场所拍摄的？", history=history)
print(response)

以上代码会由 transformers 自动下载模型实现和参数。完整的模型实现可以在 Hugging Face Hub。如果你从 Hugging Face Hub 上下载模型参数的速度较慢，可以从这里手动下载模型参数文件，并从本地加载模型。具体做法请参考从本地加载模型。关于基于 transformers 库模型的量化、CPU推理、Mac MPS 后端加速等内容，请参考 ChatGLM-6B 的低成本部署。

如果使用SwissArmyTransformer库调用模型，方法类似，可以使用环境变量SAT_HOME决定模型下载位置。在本仓库目录下：

import argparse
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
from model import chat, VisualGLMModel
model, model_args = VisualGLMModel.from_pretrained('visualglm-6b', args=argparse.Namespace(fp16=True, skip_init=True))
from sat.model.mixins import CachedAutoregressiveMixin
model.add_mixin('auto-regressive', CachedAutoregressiveMixin())
image_path = "your image path or URL"
response, history, cache_image = chat(image_path, model, tokenizer, "描述这张图片。", history=[])
print(response)
response, history, cache_image = chat(None, model, tokenizer, "这张图片可能是在什么场所拍摄的？", history=history, image=cache_image)
print(response)

使用sat库也可以轻松进行进行参数高效微调。

模型微调

多模态任务分布广、种类多，预训练往往不能面面俱到。这里我们提供了一个小样本微调的例子，使用20张标注图增强模型回答“背景”问题的能力。

解压fewshot-data.zip以后运行如下命令：

bash finetune/finetune_visualglm.sh

目前支持三种方式的微调：

LoRA：样例中为ChatGLM模型的第0层和第14层加入了rank=10的LoRA微调，可以根据具体情景和数据量调整--layer_range和--lora_rank参数。
QLoRA：如果资源有限，可以考虑使用bash finetune/finetune_visualglm_qlora.sh，QLoRA将ChatGLM的线性层进行了4-bit量化，只需要9.8GB显存即可微调。
P-tuning：可以将--use_lora替换为--use_ptuning，不过不推荐使用，除非模型应用场景非常固定。

训练好以后可以使用如下命令推理：

python cli_demo.py --from_pretrained your_checkpoint_path --prompt_zh 这张图片的背景里有什么内容？

微调前后的效果对比

微调前：

欢迎使用 VisualGLM-6B 模型，输入图像URL或本地路径读图，继续输入内容对话，clear 重新开始，stop 终止程序
请输入图像路径或URL（回车进入纯文本对话）： fewshot-data/2p.png
用户：这张图片的背景里有什么内容？
VisualGLM-6B：人们总是被生活琐事所困扰，无法摆脱束缚.
用户：clear
请输入图像路径或URL（回车进入纯文本对话）： fewshot-data/meme.png
用户：这张图片的背景里有什么内容？
VisualGLM-6B：这张图片的背景是一张狗的图片，它躺在木地板上。
用户：clear
请输入图像路径或URL（回车进入纯文本对话）： fewshot-data/passport.png
用户：这张图片的背景里有什么内容？
VisualGLM-6B：这张图片的背景是一台电脑键盘和两张护照。
用户：clear
请输入图像路径或URL（回车进入纯文本对话）：

微调后：

欢迎使用 VisualGLM-6B 模型，输入图像URL或本地路径读图，继续输入内容对话，clear 重新开始，stop 终止程序
请输入图像路径或URL（回车进入纯文本对话）： fewshot-data/2p.png
用户：这张图片的背景里有什么内容？
VisualGLM-6B：这张图片的背景是雨天，
用户：描述一下这张图片。
VisualGLM-6B：这张图片描绘了雨天的情景，雨水淅淅沥沥地落在地面上。人们穿着雨衣或雨伞，走在雨中，似乎在寻找着什么.
用户：clear
请输入图像路径或URL（回车进入纯文本对话）： fewshot-data/meme.png
用户：这张图片的背景里有什么内容？
VisualGLM-6B：这张图片的背景是木质地板。
用户：什么颜色的地板
VisualGLM-6B：这张图片的背景是蓝色的。
用户：描述一下这张图片。
VisualGLM-6B：这张图片描绘了一只棕色的小狗躺在蓝色的木质地板上。
用户：clear
请输入图像路径或URL（回车进入纯文本对话）： fewshot-data/passport.png
用户：这张图片的背景里有什么内容？
VisualGLM-6B：这张图片的背景是电脑键盘。
用户：描述一下这张图片。
VisualGLM-6B：这张图片的特点是电脑键盘和护照。
用户：有几张护照    
VisualGLM-6B：两张护照。
用户：clear
请输入图像路径或URL（回车进入纯文本对话）：

如果希望把LoRA部分的参数合并到原始的权重，可以调用merge_lora()，例如：

from finetune_visualglm import FineTuneVisualGLMModel
import argparse

model, args = FineTuneVisualGLMModel.from_pretrained('checkpoints/finetune-visualglm-6b-05-19-07-36',
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=True,
        device='cuda',
    ))
model.get_mixin('lora').merge_lora()
args.layer_range = []
args.save = 'merge_lora'
args.mode = 'inference'
from sat.training.model_io import save_checkpoint
save_checkpoint(1, model, None, None, args)

微调需要安装deepspeed库，目前本流程仅支持linux系统，更多的样例说明和Windows系统的流程说明将在近期完成。

部署工具

命令行 Demo

python cli_demo.py

程序会自动下载sat模型，并在命令行中进行交互式的对话，输入指示并回车即可生成回复，输入 clear 可以清空对话历史，输入 stop 终止程序。

程序提供如下超参数控制生成过程与量化精度：

usage: cli_demo.py [-h] [--max_length MAX_LENGTH] [--top_p TOP_P] [--top_k TOP_K] [--temperature TEMPERATURE] [--english] [--quant {8,4}]

optional arguments:
  -h, --help            show this help message and exit
  --max_length MAX_LENGTH
                        max length of the total sequence
  --top_p TOP_P         top p for nucleus sampling
  --top_k TOP_K         top k for top k sampling
  --temperature TEMPERATURE
                        temperature for sampling
  --english             only output English
  --quant {8,4}         quantization bits

需要注意的是，在训练时英文问答对的提示词为Q: A:，而中文为问：答：，在网页demo中采取了中文的提示，因此英文回复会差一些且夹杂中文；如果需要英文回复，请使用cli_demo.py中的--english选项。

我们也提供了继承自ChatGLM-6B的打字机效果命令行工具，此工具使用Huggingface模型：

python cli_demo_hf.py

我们也支持模型并行多卡部署：（需要更新最新版本的sat，如果之前下载了checkpoint，也需要手动删除后重新下载）

torchrun --nnode 1 --nproc-per-node 2 cli_demo_mp.py

网页版 Demo

我们提供了一个基于 Gradio 的网页版 Demo，首先安装 Gradio：pip install gradio。然后下载并进入本仓库运行web_demo.py：

git clone https://github.com/THUDM/VisualGLM-6B
cd VisualGLM-6B
python web_demo.py

程序会自动下载 sat 模型，并运行一个 Web Server，并输出地址。在浏览器中打开输出的地址即可使用。

我们也提供了继承自ChatGLM-6B的打字机效果网页版工具，此工具使用 Huggingface 模型，启动后将运行在:8080端口上：

python web_demo_hf.py

两种网页版 demo 均接受命令行参数--share以生成 gradio 公开链接，接受--quant 4和--quant 8以分别使用4比特量化/8比特量化减少显存占用。

API部署

首先需要安装额外的依赖 pip install fastapi uvicorn，然后运行仓库中的 api.py：

python api.py

程序会自动下载 sat 模型，默认部署在本地的 8080 端口，通过 POST 方法进行调用。下面是用curl请求的例子，一般而言可以也可以使用代码方法进行POST。

echo "{\"image\":\"$(base64 path/to/example.jpg)\",\"text\":\"描述这张图片\",\"history\":[]}" > temp.json
curl -X POST -H "Content-Type: application/json" -d @temp.json http://127.0.0.1:8080

得到的返回值为

  {
    "response":"这张图片展现了一只可爱的卡通羊驼，它站在一个透明的背景上。这只羊驼长着一张毛茸茸的耳朵和一双大大的眼睛，它的身体是白色的，带有棕色斑点。",
    "history":[('描述这张图片', '这张图片展现了一只可爱的卡通羊驼，它站在一个透明的背景上。这只羊驼长着一张毛茸茸的耳朵和一双大大的眼睛，它的身体是白色的，带有棕色斑点。')],
    "status":200,
    "time":"2023-05-16 20:20:10"
  }

我们也提供了使用Huggingface模型的 api_hf.py，用法和sat模型的api一致：

python api_hf.py

模型量化

在Huggingface实现中，模型默认以 FP16 精度加载，运行上述代码需要大概 15GB 显存。如果你的 GPU 显存有限，可以尝试以量化方式加载模型。使用方法如下：

# 按需修改，目前只支持 4/8 bit 量化。下面将只量化ChatGLM，ViT 量化时误差较大
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).quantize(8).half().cuda()

在sat实现中，需先传参将加载位置改为cpu，再进行量化。方法如下，详见cli_demo.py：

from sat.quantization.kernels import quantize
quantize(model, args.quant).cuda()
# 只需要 7GB 显存即可推理

局限性

本项目正处于V1版本视觉和语言模型的参数、计算量都较小，我们总结了如下主要存在的改进方向：

图像描述事实性/模型幻觉问题。在生成图像长描述的时候，距离图像较远时，语言模型的将占主导，有一定可能根据上下文生成并不存在于图像的内容。
属性错配问题。在多物体的场景中，部分物体的某些属性，经常被错误安插到其他物体上。
分辨率问题。本项目使用了224*224的分辨率，也是视觉模型中最为常用的尺寸；然而为了进行更细粒度的理解，更大的分辨率和计算量是必要的。
由于数据等方面原因，模型暂时不具有中文ocr的能力（英文ocr能力有一些），我们会在后续版本中增加这个能力。

协议

本仓库的代码依照 Apache-2.0 协议开源，VisualGLM-6B 模型的权重的使用则需要遵循 Model License。

引用与致谢

如果你觉得我们的工作有帮助的话，请考虑引用下列论文

@inproceedings{du2022glm,
  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={320--335},
  year={2022}
}
@article{ding2021cogview,
  title={Cogview: Mastering text-to-image generation via transformers},
  author={Ding, Ming and Yang, Zhuoyi and Hong, Wenyi and Zheng, Wendi and Zhou, Chang and Yin, Da and Lin, Junyang and Zou, Xu and Shao, Zhou and Yang, Hongxia and others},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  pages={19822--19835},
  year={2021}
}

在VisualGLM-6B的指令微调阶段的数据集中，包含了来自MiniGPT-4和LLAVA项目的一部分英文图文数据，以及许多经典的跨模态工作数据集，衷心感谢他们的贡献。

visualglm-6b's People

Contributors

Stargazers

Watchers

Forkers

coding-alt birnfly favbox vshanyiao cloverforks eatinghungry liduanwei lawrencesun maxmax2016 linjie830914 cyt1984 mowuyy xinxiangbobby kuyz123 cifangyiquan daheige terrydang yukunchen envox pfxjacky cosmoswoon moqingxinai junjun315 moistrain renfengyi uk0 hongwen-sun chenshaoju caszhang zhenglei0410 shangzchao itsharex lemonqc xiaojun207 sun1122 xggnet 369795172 dillanchen adambear kismmmnato0933 jinghai baizhiyong ringeuro allensmile guoruiwang hiyuy zhangxueren9 shuxjweb yiming1993 changguiyang lyq1152227095 alex-songs iamleon121 xszyou rayjue zicheng007 fmwlf hhy5277 dogyman cellinlab xuanjinchen diyism shawnli criticalpulsar yzxzero akon-fiber zeratel assassindesign fb886 tuteng0915 99insight 2132660698 alexlan123 mikejohnsonliu tonyray6 xingtoken nanqiai winning1120xx skyxiaoming ai-awe ai-awe xiongmozhou buyaopa sccbhxc huangbaichao techthiyanes vvanglro betatester137 pocher zheap comymh yuanmouren1hao javader alfasignde nalanqingcheng enigmahong linhong00316 hufeihu vincezengqiang xiedongmingming

visualglm-6b's Issues

Issue with API mode: unexpected keyword argument 'mems'

I'm trying to run the API mode. Copied model data from hugging face. Added the following to the api.py:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("data", trust_remote_code=True)
model = AutoModel.from_pretrained("data", trust_remote_code=True).half().cuda()
model = model.eval()

app = FastAPI()

*all HF model files are in local ./data/

after running the server, request it from curl:

curl -X POST -H "Content-Type: application/json" -d @temp.json http://127.0.0.1:8080

here's the error I got when trying to submit a sample request:

INFO:     127.0.0.1:35234 - "POST / HTTP/1.1" 500 Internal Server Error
Internal Server Errorroot@291f83eb6f53:/VisualGLM-6B# ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/VisualGLM-6B/api.py", line 36, in visual_glm
    answer, history, _ = chat(None, model, tokenizer, input_text, history=history, image=input_image, \
  File "/VisualGLM-6B/model/chat.py", line 141, in chat
    output = filling_sequence(
  File "/usr/local/lib/python3.10/dist-packages/sat/generation/autoregressive_sampling.py", line 108, in filling_sequence
    logits, *output_per_layers = model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: ChatGLMForConditionalGenerationWithImage.forward() got an unexpected keyword argument 'mems'

What did I do wrong? How can I get the API up and running?

Thanks

能否列出具体的硬件资源要求？

我尝试用colab运行 web_demo_hf.py, 执行到 Loading checkpoint shards: 0% 0/5 [00:00<?, ?it/s]^C 直接退出了，看系统内存有个尖峰，应该是超过了默认的12.7GB，我想了解下运行这个模型最低的硬件要求是多少，比如类似下面的具体描述，这样我好找具体的机器来部署，谢谢.

vCPU:
RAM:
GPU RAM:

生成答案的时候返回的token长度超出导致重启服务后Chatbot生成模块显示空白，其他模块正常

specific both input_ids and inputs_embeds at the same time，will use inputs_embeds
Input length of input_ids is 2232，but ‘max_length’ is set to 2048。this can lead to unexpected behavior。you should consider increading ‘max_new_tokens’

使用CPU加载模型推理时出现如下报错

Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\chatglm\lib\site-packages\gradio\routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "C:\ProgramData\anaconda3\envs\chatglm\lib\site-packages\gradio\blocks.py", line 1302, in process_api
result = await self.call_function(
File "C:\ProgramData\anaconda3\envs\chatglm\lib\site-packages\gradio\blocks.py", line 1039, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\ProgramData\anaconda3\envs\chatglm\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\ProgramData\anaconda3\envs\chatglm\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\ProgramData\anaconda3\envs\chatglm\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\ProgramData\anaconda3\envs\chatglm\lib\site-packages\gradio\utils.py", line 491, in async_iteration
return next(iterator)
File "D:\Code\VisualGLM-6B-main\web_demo_hf.py", line 56, in predict
chatbot.append((parse_text(input), ""))
AttributeError: 'NoneType' object has no attribute 'append'求大神解决，感觉CPU部署还是有问题

上传图片第一次问答无论如何都是先输出描述

通过微调可以使模型拥有基本的中文OCR能力吗?

比如准备一些包含中文的图片, 用这些图片对模型进行微调后得到的模型会具有中文OCR能力吗?

输入一张数学试卷，输出了一堆随机数

输入的图片：https://picdl.sunbangyan.cn/2023/05/18/sue5dn.jpg
输入的文字：描述这张图片。
输出：20,36,48,57,69,76,87,95,104,113,122,131,140,149,158,167,176,185,194,203,212,221,230,239,248,257,266,275,284,293,302,311,320,329,338,347,356,365,374,383,392,401,410,419,428,437,446,455,464,473,482,491,500,509,518,527,536,545,554,563,572,581,590,599,608,617,626,635,644,653,662,671,680,689,698,707,716,725,734,743,752,761,770,779,788,797,806,815,824,833,842,851,860,869,878.

微调需要什么配置？

tuning 遇到了 RuntimeError: Error building extension 'fused_adam'

tuning 时用了默认的指令，出现下了如下错误

2023-05-22 16:51:07,239] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py39_cu117/fused_adam...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py39_cu117/fused_adam/build.ninja...
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include/TH -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include/THC -isystem /root/anaconda3/envs/torch20/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -std=c++17 -c /root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
FAILED: multi_tensor_adam.cuda.o
/usr/bin/nvcc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include/TH -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include/THC -isystem /root/anaconda3/envs/torch20/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70 -std=c++17 -c /root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
nvcc fatal : Value 'c++17' is not defined for option 'std'
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -I/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/csrc/adam -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include/TH -isystem /root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/include/THC -isystem /root/anaconda3/envs/torch20/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++14 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/root/anaconda3/envs/torch20/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/nfs_data/VisualGLM-6B-main/finetune_visualglm.py", line 188, in
training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=create_dataset_function, collate_fn=data_collator)
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 98, in training_main
model, optimizer = setup_model_untrainable_params_and_optimizer(args, model)
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 161, in setup_model_untrainable_params_and_optimizer
model, optimizer, _, _ = deepspeed.initialize(
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/init.py", line 165, in initialize
engine = DeepSpeedEngine(args=args,
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 308, in init
self._configure_optimizer(optimizer, model_parameters)
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1162, in _configure_optimizer
basic_optimizer = self._configure_basic_optimizer(model_parameters)
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1224, in _configure_basic_optimizer
optimizer = FusedAdam(
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/adam/fused_adam.py", line 71, in init
fused_adam_cuda = FusedAdamBuilder().load()
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 445, in load
return self.jit_load(verbose)
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 480, in jit_load
op_module = load(name=self.name,
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/root/anaconda3/envs/torch20/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'fused_adam'
VM-3-158-ubuntu:1785083:1800122 [0] NCCL INFO [Service thread] Connection closed by localRank 0
VM-3-158-ubuntu:1785083:1785083 [0] NCCL INFO comm 0x8abbc410 rank 0 nranks 1 cudaDev 0 busId 80 - Abort COMPLETE
VM-3-158-ubuntu:1785083:1800126 [0] NCCL INFO [Service thread] Connection closed by localRank 0
VM-3-158-ubuntu:1785083:1785083 [0] NCCL INFO comm 0x8abc35b0 rank 0 nranks 1 cudaDev 0 busId 80 - Abort COMPLETE
[2023-05-22 16:51:50,540] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 1785083
[2023-05-22 16:51:50,540] [ERROR] [launch.py:434:sigkill_handler] ['/root/anaconda3/envs/torch20/bin/python', '-u', 'finetune_visualglm.py', '--local_rank=0', '--experiment-name', 'finetune-visualglm-6b', '--model-parallel-size', '1', '--mode', 'finetune', '--train-iters', '300', '--resume-dataloader', '--max_source_length', '64', '--max_target_length', '256', '--lora_rank', '10', '--pre_seq_len', '4', '--train-data', './fewshot-data/dataset.json', '--valid-data', './fewshot-data/dataset.json', '--distributed-backend', 'nccl', '--lr-decay-style', 'cosine', '--warmup', '.02', '--checkpoint-activations', '--save-interval', '300', '--eval-interval', '10000', '--save', './checkpoints', '--split', '1', '--eval-iters', '10', '--eval-batch-size', '8', '--zero-stage', '1', '--lr', '0.0001', '--batch-size', '20', '--skip-init', '--fp16', '--use_lora'] exits with return code = 1

windows 安装不了依赖

ERROR: Command errored out with exit status 1: command: 'c:\program files\python\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-iozsj5rb\deepspeed_cd9c08b77eaf40568b910542cdc41a19\setup.py'"'"'; file='"'"'C:\Users\Administrator\AppData\Local\Temp\pip-install-iozsj5rb\deepspeed_cd9c08b77eaf40568b910542cdc41a19\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\Administrator\AppData\Local\Temp\pip-pip-egg-info-sdfey28f' cwd: C:\Users\Administrator\AppData\Local\Temp\pip-install-iozsj5rb\deepspeed_cd9c08b77eaf40568b910542cdc41a19\ Complete output (14 lines): test.c LINK : fatal error LNK1181: cannot open input file 'aio.lib' Traceback (most recent call last): File "", line 1, in File "C:\Users\Administrator\AppData\Local\Temp\pip-install-iozsj5rb\deepspeed_cd9c08b77eaf40568b910542cdc41a19\setup.py", line 162, in abort(f"Unable to pre-compile {op_name}") File "C:\Users\Administrator\AppData\Local\Temp\pip-install-iozsj5rb\deepspeed_cd9c08b77eaf40568b910542cdc41a19\setup.py", line 51, in abort assert False, msg AssertionError: Unable to pre-compile async_io DS_BUILD_OPS=1

model use left-padding or right-padding ??

您好，请问VisualGLM-6B在训练时用的是left-padding还是right-padding，我在用VisualGLM-6B训练reward model时，会报assert divergence_ind >0的错误，打印信息时发现divergence_ind[0]，我去deepspeed上查了一下，说是模型padding导致的，所以想咨询一下，VisualGLM-6B是left-padding还是right-padding

期待您的回复

加入储存长期对话记忆的embedding space是否能解决上下文长度问题？

如果在模型的架构中加入一个储存长期对话记忆的embedding space，是否可以解决上下文长度的问题？

VisualGLM-6B的训练

请问，会公布VisualGLM-6B 的训练代码吗？

transformers version capitable with requirements.txt

My last issue is ambiguous. Sorry about that. Basically, I get this traceback for several transformers versions that meet transformers>=2.27.1. So to not trigger this error, which transformers version on earth did you use? Or are there any other conflicts with package versions? Or something else that originally trigger this error?

Traceback (most recent call last):
  File "web_demo_hf.py", line 5, in <module>
    tokenizer = AutoTokenizer.from_pretrained("./vglm-6b", trust_remote_code=True)
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 663, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\dynamic_module_utils.py", line 177, in get_class_in_module
    module = importlib.import_module(module_path)
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.'

No space left on device，系统盘没有空间，能修改模型下载位置吗

linux下运行python cli_demo.py，直接下载到/root/.sat_models下面去了，我想修改下载路径，可以指定吗？

可以直接用VisualGLM-6b进行reward model的训练吗？

您好，想用VisualGLM-6b进行reward model的训练，目前输入数据是纯文本，自己照着deepspeed_chat改了一下，发现在计算时总出错，具体log如下：
File "/opt/conda/envs/rlhf_tw_test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/rlhf_tw_test/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1024x130344 and 4096x1)

运行web_demo_hf.py时遇到tmp文件写入被拒的问题

跑cli_demo_hf.py命令行界面运行良好，但运行web版本时发现，提交任何文本都不会响应，后经过摸索发现，gradio模块的访问临时文件被服务器拒绝，请问有什么解决方案

求问使用的 chatGLM 的版本

求问 visualGLM后面接入的 chatGLM 是base哪个版本的~ 是最早释放的v0.1.0版本吗

我看到chatGLM 最近才 release 了 v1.1 版本的checkpoint~

另外请问下什么时候会考虑开放全部的训练数据吗~

cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

[2023-05-19 14:50:31,777] [INFO] [RANK 0] > successfully loaded /home/tony/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt
欢迎使用 VisualGLM-6B 模型，输入图像URL或本地路径读图，继续输入内容对话，clear 重新开始，stop 终止程序
请输入图像路径或URL（回车进入纯文本对话）： https://img.caixin.com/2023-05-13/168394947268597_480_320.jpg
cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

训练细节和训练细节

请问下会公开训练数据和训练细节吗？

web应用没有显存清理机制

使用web_demo.py启动一个web应用，传不同的图片上去提问，发现显存占用只增不减，最终有可能会触发显存oom

fintune错误

Traceback (most recent call last):
File "finetune_visualglm.py", line 170, in
args = get_args(args_list)
File "/root/miniconda3/lib/python3.8/site-packages/sat/arguments.py", line 417, in get_args
initialize_distributed(args)
File "/root/miniconda3/lib/python3.8/site-packages/sat/arguments.py", line 500, in initialize_distributed
deepspeed.init_distributed(
TypeError: init_distributed() got an unexpected keyword argument 'world_size'

求一个quantize好的模型

不是显存不够，是内存也不够，无法先加载模型再进行quantize.
有没有quantize 8bit或者4bit的教程？谢谢

本地加载模型室提示找不到 model_config.json

如题，使用的虚拟环境，已经安装所需依赖。在 HuggingFace 上下载的模型，放置到本地文件夹 /data/models/THUDM/visualglm-6b 中。修改 cli_demo.py：

def main():
    ...
    # load model
    model, model_args = VisualGLMModel.from_pretrained(
        "/data/models/THUDM/visualglm-6b",
        args=argparse.Namespace(
        fp16=True,
        skip_init=True,
        use_gpu_initialization=True if (torch.cuda.is_available() and args.quant is None) else False,
        device='cuda' if (torch.cuda.is_available() and args.quant is None) else 'cpu',
        local_files_only=1,
    ))
    ...
    tokenizer = AutoTokenizer.from_pretrained("/data/models/THUDM/visualglm-6b", local_files_only=1, trust_remote_code=True)
    ...

运行 cli_demo.py 时报错如下：

Traceback (most recent call last):
  File "/data/VisualGLM-6B/cli_demo.py", line 100, in <module>
    main()
  File "/data/VisualGLM-6B/cli_demo.py", line 25, in main
    model, model_args = VisualGLMModel.from_pretrained(
  File "/data/VisualGLM-6B/.venv/lib/python3.9/site-packages/sat/model/base_model.py", line 212, in from_pretrained
    args = update_args_with_file(args, path=os.path.join(model_path, 'model_config.json'))
  File "/data/VisualGLM-6B/.venv/lib/python3.9/site-packages/sat/arguments.py", line 423, in update_args_with_file
    with open(path, 'r', encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/models/THUDM/visualglm-6b/model_config.json'

放置模型的目录结构如下：

SAT_HOME里已经有下载好了模型了，运行web_demo里还要再下载一次呢？

Finetune to others language ?

How I can finetune in other language , maybe vietnamese, ? thanks guys, awesome project

打字机页面推理速度很慢

web_demo_hf.py 推理速度很慢，比web_demo.py推理速度慢很多；
于是我直接在 Jupyter 配置同样的代码跑hugging face的模型，去掉网页后推理速度是没问题的，但是在网页上推理速度就很慢，实在不知道为什么了，对gradio不是非常熟悉

接受多张图片作为输入

InstructBLIP 论文中指出，即使他们没有针对视频进行训练和微调，他们在VideoQA测试集上，将Video切帧后直接拼接输入Q-Former，亦有一定的理解能力。想问VisualGLM是否进行过类似实验？

坐等大神给出微调代码

huggingface上的visualglm-6b模型文件是不是有问题？

下载了模型文件之后，运行web_demo_hf.py，加载模型文件这一步就各种报错

一个badcase

Is there will have a paper or a technical report?

Is there will have a paper or technical report?

训练脚本以及训练细节是否会提供？

web_demo_hf.py报错：RuntimeError: GET was unable to find an engine to execute this computation

在上传图片之后，运行报错，错误信息如下：
Traceback (most recent call last):
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/gradio/routes.py", line 412, in run_predict
output = await app.get_blocks().process_api(
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/gradio/blocks.py", line 1299, in process_api
result = await self.call_function(
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/gradio/blocks.py", line 1035, in call_function
prediction = await anyio.to_thread.run_sync(
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/gradio/utils.py", line 491, in async_iteration
return next(iterator)
File "/mnt/amj/VisualGLM-6B/web_demo_hf.py", line 63, in predict
for response, history in model.stream_chat(tokenizer, image_path, input, history, max_length=max_length, top_p=top_p,
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/modeling_chatglm.py", line 1439, in stream_chat
for outputs in self.stream_generate(**inputs, **gen_kwargs):
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/modeling_chatglm.py", line 1291, in stream_generate
outputs = self(
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/modeling_chatglm.py", line 1462, in forward
image_embeds = self.image_encoder(images)
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/visual.py", line 69, in forward
enc = self.vit(image)[0]
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/visualglm-6b/visual.py", line 28, in forward
return super().forward(input_ids=input_ids, position_ids=None, attention_mask=attention_mask, image=image)
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/sat/model/base_model.py", line 144, in forward
return self.transformer(*args, **kwargs)
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/sat/model/transformer.py", line 451, in forward
hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/sat/model/official/vit_model.py", line 55, in word_embedding_forward
embeddings = self.proj(images)
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/mnt/amj/conda/envs/lora/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: GET was unable to find an engine to execute this computation

pip install deepspeed error

[ERROR] Unable to pre-compile async_io

Is this model image only or can it be used with text conversation as well?

As the title, is this model only support conversations with images? If I want to just chat, do I need to run the chatglm model separately? The code seems require image field in the input. Setting it to empty or ignore will generate error.

Torch not compiled with CUDA enabled

请问大佬，按照read me走流程，换了两个环境尝试都报错这个是什么问题呢，应该怎么处理？
报错：
Traceback (most recent call last):
File "", line 1, in
File "D:\soft\python\lib\site-packages\torch\nn\modules\module.py", line 905, in cuda
return self._apply(lambda t: t.cuda(device))
File "D:\soft\python\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "D:\soft\python\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "D:\soft\python\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "D:\soft\python\lib\site-packages\torch\nn\modules\module.py", line 905, in
return self.apply(lambda t: t.cuda(device))
File "D:\soft\python\lib\site-packages\torch\cuda_init.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")

transformers version?

I have 2 conflict problems and I found their corresponding solutions. They ask me to upgrade/downgrade transformers to either 2.26.1 or 2.27.1. This is a problem because whichever one I choose, the other traceback comes up.

For this traceback, people say I should go for 2.27.1

Traceback (most recent call last):
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\gradio\routes.py", line 394, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\gradio\blocks.py", line 1075, in process_api
    result = await self.call_function(
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\gradio\blocks.py", line 898, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\gradio\utils.py", line 549, in async_iteration
    return next(iterator)
  File "web_demo_hf.py", line 63, in predict
    for response, history in model.stream_chat(tokenizer, image_path, input, history, max_length=max_length, top_p=top_p,
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\autograd\grad_mode.py", line 43, in generator_context
    response = gen.send(None)
  File "C:\Users\xxx/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1439, in stream_chat
    for outputs in self.stream_generate(**inputs, **gen_kwargs):
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\autograd\grad_mode.py", line 43, in generator_context
    response = gen.send(None)
  File "C:\Users\xxx/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1291, in stream_generate
    outputs = self(
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\xxx/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1469, in forward    return super().forward(
  File "C:\Users\xxx/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1095, in forward    transformer_outputs = self.transformer(
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\xxx/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 871, in forward
    logger.warning_once("Specify both input_ids and inputs_embeds at the same time, will use inputs_embeds")
AttributeError: 'Logger' object has no attribute 'warning_once'

And for this one, people say I should go for 2.26.1

Traceback (most recent call last):
  File "web_demo_hf.py", line 5, in <module>
    tokenizer = AutoTokenizer.from_pretrained("./vglm-6b", trust_remote_code=True)
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 663, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module.replace(".py", ""))
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\transformers\dynamic_module_utils.py", line 177, in get_class_in_module
    module = importlib.import_module(module_path)
  File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.'

finetune原文件运行三处报错

(visualGLM) root@iZbp1ewp3ew1qt4u8bdh0iZ:~/ai/VisualGLM-6B# bash finetune/finetune_visualglm.sh
finetune/finetune_visualglm.sh: line 5: $'\r': command not found
finetune/finetune_visualglm.sh: line 14: $'\r': command not found
finetune/finetune_visualglm.sh: line 19: $'\r': command not found
finetune/finetune_visualglm.sh: line 22: $'\r': command not found
finetune/finetune_visualglm.sh: line 23: $'\r': command not found
finetune/finetune_visualglm.sh: line 50: $'\r': command not found
finetune/finetune_visualglm.sh: line 51: $'\r': command not found
finetune/finetune_visualglm.sh: line 52: $'\r': command not found
--use_lorat \20 \ 8 \\s \ \dataset.json hostfile_single
[2023-05-23 17:22:18,395] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-05-23 17:22:18,412] [INFO] [runner.py:541:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --m --use_lorat 20 e 8 ns l /dataset.json--enable_each_rank_log=None finetune_visualglm.py
[2023-05-23 17:22:21,237] [INFO] [launch.py:222:main] 0 NCCL_IB_DISABLE=0
[2023-05-23 17:22:21,237] [INFO] [launch.py:222:main] 0 NCCL_DEBUG=info
[2023-05-23 17:22:21,237] [INFO] [launch.py:222:main] 0 NCCL_NET_GDR_LEVEL=2
[2023-05-23 17:22:21,237] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0]}
[2023-05-23 17:22:21,237] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-05-23 17:22:21,237] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-05-23 17:22:21,237] [INFO] [launch.py:247:main] dist_world_size=1
[2023-05-23 17:22:21,237] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0
usage: finetune_visualglm.py [-h] [--num-layers NUM_LAYERS] [--hidden-size HIDDEN_SIZE] [--num-attention-heads NUM_ATTENTION_HEADS]
[--vocab-size VOCAB_SIZE] [--max-sequence-length MAX_SEQUENCE_LENGTH] [--layernorm-order {post,pre,sandwich}]
[--inner-hidden-size INNER_HIDDEN_SIZE] [--hidden-size-per-attention-head HIDDEN_SIZE_PER_ATTENTION_HEAD]
[--model-parallel-size MODEL_PARALLEL_SIZE] [--skip-init] [--use-gpu-initialization]
[--layernorm-epsilon LAYERNORM_EPSILON] [--hidden-dropout HIDDEN_DROPOUT] [--attention-dropout ATTENTION_DROPOUT]
[--make-vocab-size-divisible-by MAKE_VOCAB_SIZE_DIVISIBLE_BY] [--experiment-name EXPERIMENT_NAME]
[--train-iters TRAIN_ITERS] [--batch-size BATCH_SIZE] [--lr LR] [--mode {pretrain,finetune,inference}] [--seed SEED]
[--zero-stage {0,1,2}] [--checkpoint-activations] [--checkpoint-num-layers CHECKPOINT_NUM_LAYERS] [--fp16] [--bf16]
[--gradient-accumulation-steps GRADIENT_ACCUMULATION_STEPS] [--epochs EPOCHS] [--log-interval LOG_INTERVAL]
[--summary-dir SUMMARY_DIR] [--save-args] [--lr-decay-iters LR_DECAY_ITERS]
[--lr-decay-style {constant,linear,cosine,exponential}] [--lr-decay-ratio LR_DECAY_RATIO] [--warmup WARMUP]
[--weight-decay WEIGHT_DECAY] [--save SAVE] [--load LOAD] [--save-interval SAVE_INTERVAL] [--no-save-rng]
[--no-load-rng] [--resume-dataloader] [--distributed-backend DISTRIBUTED_BACKEND] [--local_rank LOCAL_RANK]
[--exit-interval EXIT_INTERVAL] [--eval-batch-size EVAL_BATCH_SIZE] [--eval-iters EVAL_ITERS]
[--eval-interval EVAL_INTERVAL] [--strict-eval] [--train-data TRAIN_DATA [TRAIN_DATA ...]]
[--train-data-weights TRAIN_DATA_WEIGHTS [TRAIN_DATA_WEIGHTS ...]] [--iterable-dataset] [--valid-data [VALID_DATA ...]]
[--test-data [TEST_DATA ...]] [--split SPLIT] [--num-workers NUM_WORKERS] [--block-size BLOCK_SIZE]
[--tokenizer-type TOKENIZER_TYPE] [--temperature TEMPERATURE] [--top_p TOP_P] [--top_k TOP_K] [--num-beams NUM_BEAMS]
[--length-penalty LENGTH_PENALTY] [--no-repeat-ngram-size NO_REPEAT_NGRAM_SIZE] [--min-tgt-length MIN_TGT_LENGTH]
[--out-seq-length OUT_SEQ_LENGTH] [--input-source INPUT_SOURCE] [--output-path OUTPUT_PATH] [--with-id]
[--max-inference-batch-size MAX_INFERENCE_BATCH_SIZE] [--device DEVICE] [--deepspeed]
[--deepspeed_config DEEPSPEED_CONFIG] [--deepscale] [--deepscale_config DEEPSCALE_CONFIG] [--deepspeed_mpi]
--use_lorasualglm.py: error: unrecognized arguments:
[2023-05-23 17:22:26,242] [INFO] [launch.py:428:sigkill_handler] Killing subprocess 128438
[2023-05-23 17:22:26,243] [ERROR] [launch.py:434:sigkill_handler] ['/usr/bin/python3', '-u', 'finetune_visualglm.py', '--local_rank=0', '\r', '--experiment-name', 'finetune-visualglm-6b\r', '\r', '--model-parallel-size', '1\r', '\r', '--mode', 'finetune', '\r', '--train-iters', '300', '\r', '--resume-dataloader', '\r', '--max_source_length', '64', '\r', '--max_target_length', '256', '\r', '--lora_rank', '10\r', '--pre_seq_len', '4\r', '\r', '--train-data', './fewshot-data/dataset.json\r', '\r', '--valid-data', './fewshot-data/dataset.json\r', '\r', '--distributed-backend', 'nccl', '\r', '--lr-decay-style', 'cosine', '\r', '--warmup', '.02', '\r', '--checkpoint-activations', '\r', '--save-interval', '300', '\r', '--eval-interval', '10000', '\r', '--save', './checkpoints', '\r', '--split', '1', '\r', '--eval-iters', '10', '\r', '--eval-batch-size', '8', '\r', '--zero-stage', '1', '\r', '--lr', '0.0001', '\r', '--batch-size', '20', '\r', '--skip-init', '\r', '--fp16', '\r', '--use_lora\r', '\r\r\r'] exits with return code = 2
finetune/finetune_visualglm.sh: line 56: $'\r': command not found
: invalid optione_visualglm.sh: line 57: set: +
set: usage: set [-abefhkmnptuvxBCHP] [-o option-name] [--] [arg ...]

"addmm_impl_cpu_" not implemented for 'Half'

在回车后使用文本时，触发"addmm_impl_cpu_" not implemented for 'Half'
输入图像后触发："slow_conv2d_cpu" not implemented for 'Half'

ERROR: Unknown arg use_final_layernorm

This is my code:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda()

It runs failed, and returned "ValueError: Unknown arg use_final_layernorm."

What's the problem?

如果要基于 chatglm-6b 来训练一个多模态语言模型，glm 的 token部分该怎么处理呢？

最近想模仿微软的 LLaMA 结构训练一个多模态语言模型，也就是要把图像的 token 向量和文本的 embedding 向量拼成一段话输入 chatglm。目前看网上的微调代码都是基于 token 的编码 input_ids 输入模型的，大致看了仓库的代码，这个项目貌似是把图像和文本通过向量输入的，如果是 chatglm，怎么使用嵌入向量输入模型呀，因为我看预测 token 也是预测的 token id，然后再把预测的 id 和输入拼起来再预测下一个。向量输入的话，是把预测 id 的嵌入向量得到再和输入向量拼起来吗？

运行finetune报错

NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2 deepspeed --master_port 16666 --hostfile hostfile_single finetune_visualglm.py --experiment-name finetune-visualglm-6b --model-parallel-size 1 --mode finetune --train-iters 300 --resume-dataloader --max_source_length 64 --max_target_length 256 --lora_rank 10 --pre_seq_len 4 --train-data ./fewshot-data/dataset.json --valid-data ./fewshot-data/dataset.json --distributed-backend nccl --lr-decay-style cosine --warmup .02 --checkpoint-activations --save-interval 300 --eval-interval 10000 --save ./checkpoints --split 1 --eval-iters 10 --eval-batch-size 8 --zero-stage 1 --lr 0.0001 --batch-size 20 --skip-init --fp16 --use_lora
finetune/finetune_visualglm.sh: line 56: deepspeed: command not found

已经尝试过升级deepspeed，还是报错
目前deepspeed版本 0.9.2

linux环境缺少sat包且pip无法安装成功

$ python web_demo_hf.py
Traceback (most recent call last):
File "web_demo_hf.py", line 6, in
model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda()
File "/home/good/anaconda3/envs/visualglm/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 459, in from_pretrained
model_class = get_class_from_dynamic_module(
File "/home/good/anaconda3/envs/visualglm/lib/python3.8/site-packages/transformers/dynamic_module_utils.py", line 425, in get_class_from_dynamic_module
final_module = get_cached_module_file(
File "/home/good/anaconda3/envs/visualglm/lib/python3.8/site-packages/transformers/dynamic_module_utils.py", line 305, in get_cached_module_file
get_cached_module_file(
File "/home/good/anaconda3/envs/visualglm/lib/python3.8/site-packages/transformers/dynamic_module_utils.py", line 267, in get_cached_module_file
modules_needed = check_imports(resolved_module_file)
File "/home/good/anaconda3/envs/visualglm/lib/python3.8/site-packages/transformers/dynamic_module_utils.py", line 150, in check_imports
raise ImportError(
ImportError: This modeling file requires the following packages that were not found in your environment: sat. Run pip install sat

有训练代码吗？

这块有训练代码吗？从头开始训练的版本，而不是finetune版本

Finetune error: RuntimeError: FIND was unable to find an engine to execute this computation

I had download the latest version of VisualGLM-6B. I used the following commands to setup the development environment:

conda create -n glm python=3.9
conda activate glm
git clone https://github.com/THUDM/VisualGLM-6B.git
cd VisualGLM-6B
pip install -i https://mirrors.aliyun.com/pypi/simple/ -r requirements.txt
# edit finetune/finetune_visualglm.sh to set NUM_GPUS_PER_WORKER=2 which is the number of GPU in my server
unzip fewshot-data.zip
bash finetune/finetune_visualglm.sh

It reported errors as below:

Traceback (most recent call last):
  File "/media/zjkj/2t/yantao/VisualGLM-6B/finetune_visualglm.py", line 188, in <module>
    training_main(args, model_cls=model, forward_step_function=forward_step, create_dataset_function=create_dataset_function, collate_fn=data_collator)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 130, in training_main
    iteration, skipped = train(model, optimizer,
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 274, in train
    lm_loss, skipped_iter, metrics = train_step(train_data_iterator,
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/training/deepspeed_training.py", line 348, in train_step
    forward_ret = forward_step(data_iterator, model, args, timers, **kwargs)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/finetune_visualglm.py", line 84, in forward_step
    logits = model(input_ids=tokens, image=image, pre_image=pre_image)[0]
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1724, in forward
    loss = self.module(*inputs, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/official/chatglm_model.py", line 192, in forward
    return super().forward(input_ids=input_ids, attention_mask=attention_mask, position_ids=position_ids, past_key_values=past_key_values, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/base_model.py", line 144, in forward
    return self.transformer(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/transformer.py", line 451, in forward
    hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/model/visualglm.py", line 20, in word_embedding_forward
    image_emb = self.model(**kw_args)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/model/blip2.py", line 65, in forward
    enc = self.vit(image)[0]
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/VisualGLM-6B/model/blip2.py", line 29, in forward
    return super().forward(input_ids=input_ids, position_ids=None, attention_mask=attention_mask, image=image)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/base_model.py", line 144, in forward
    return self.transformer(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/transformer.py", line 451, in forward
    hidden_states = self.hooks['word_embedding_forward'](input_ids, output_cross_layer=output_cross_layer, **kw_args)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/sat/model/official/vit_model.py", line 55, in word_embedding_forward
    embeddings = self.proj(images)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/media/zjkj/2t/yantao/software/anaconda3/envs/vglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: FIND was unable to find an engine to execute this computation

Please note that I found the version of my pytorch is 2.0. Dose VisualGLM-6B have something wrong with Pytorch 2.0?

modeling_chatglm.py里self.dtype具体是指？

您好，我最近在用visualglm做reward model的训练，在修改和查看代码的时候发现modeling_chatglm.py里有一行：torch_image = torch_image.to(self.dtype).to(self.device)，请问这个self.dtype具体是指？我在代码里没有找到相关的定义

话说能支持下苹果的MPS吗,现在mac m2上运行报错

❯ python web_demo.py

[2023-05-21 21:29:01,122] [INFO] DeepSpeed/CUDA is not installed, fallback to Pytorch checkpointing.
[2023-05-21 21:29:01,599] [WARNING] Failed to load cpm_kernels:Unknown platform: darwin
[2023-05-21 21:29:01,601] [INFO] building VisualGLMModel model ...
59203
[2023-05-21 21:29:01,625] [INFO] [RANK 0] > initializing model parallel with size 1
[2023-05-21 21:29:01,627] [INFO] [RANK 0] You are using model-only mode.
For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK.


/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op
  warnings.warn("Initializing zero-element tensors is a no-op")
[2023-05-21 21:29:13,787] [INFO] [RANK 0]  > number of parameters on model parallel rank 0: 7810582016
[2023-05-21 21:29:14,203] [INFO] [RANK 0] Torch not compiled with CUDA enabled
[2023-05-21 21:29:14,203] [INFO] [RANK 0] global rank 0 is loading checkpoint /Users/z/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt
[2023-05-21 21:29:28,809] [INFO] [RANK 0] > successfully loaded /Users/z/.sat_models/visualglm-6b/1/mp_rank_00_model_states.pt
Traceback (most recent call last):
  File "/Users/z/git/VisualGLM-6B/web_demo.py", line 128, in <module>
    main(args)
  File "/Users/z/git/VisualGLM-6B/web_demo.py", line 81, in main
    model, tokenizer = get_infer_setting(gpu_device=0, quant=args.quant)
  File "/Users/z/git/VisualGLM-6B/model/infer_util.py", line 27, in get_infer_setting
    model = model.cuda()
  File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/nn/modules/module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "/Users/z/git/VisualGLM-6B/.direnv/python-3.10.11/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled