thudm / cogvlm2 Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 75.0 1.86 MB

GPT4V-level open-source multi-modal model based on Llama3-8B

License: Apache License 2.0

Python 100.00%

cogvlm language-model multi-modal pretrained-models

cogvlm2's People

Contributors

Stargazers

Watchers

cogvlm2's Issues

Feature request / 功能建议

The readme mentions the existence of a INT4 version of the model -- i'm curious when it is planned on being released?

Motivation / 动机

Having a quantized model will increase inference efficiency.

Your contribution / 您的贡献

n/a

Feature request / 功能建议

加速推理

Motivation / 动机

加速推理

Your contribution / 您的贡献

无

Multi-GPU inference Error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:5 and cuda:6!

System Info / 系統信息

system version: Ubuntu 20.04 LTS
cuda version: 11.8
python version: 3.10.12
torch version: 2.3.0+cu118
xformers version: 0.0.26.post1+cu118

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Bug Info

.../huggingface/modules/transformers_modules/cogvlm2-llama3-chat-19B/visual.py", line 83, in forward
output = mlp_input + mlp_output
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:5 and cuda:6!

Demo is above:

_import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer
from torch.nn.parallel import DistributedDataParallel as DDP
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0, 1, 2, 3, 4, 5, 6, 7"
max_memory_mapping = {0: "20GB", 1: "20GB", 2: "20GB", 3: "20GB", 4: "20GB", 5: "20GB", 6: "20GB", 7: "20GB"}

#MODEL_PATH = "THUDM/cogvlm2-llama3-chat-19B"
MODEL_PATH = "./cogvlm2-llama3-chat-19B"

DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16

tokenizer = AutoTokenizer.from_pretrained(
MODEL_PATH,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
device_map='auto',
max_memory=max_memory_mapping,
load_in_8bit=False,
torch_dtype=TORCH_TYPE,
trust_remote_code=True,
).to(DEVICE).eval()

text_only_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {} ASSISTANT:"

while True:
image_path = input("image path >>>>> ")
if image_path == '':
print('You did not enter image path, the following will be a plain text conversation.')
image = None
text_only_first_query = True
else:
image = Image.open(image_path).convert('RGB')

history = []

while True:
    query = input("Human:")
    if query == "clear":
        break

    if image is None:
        if text_only_first_query:
            query = text_only_template.format(query)
            text_only_first_query = False
        else:
            old_prompt = ''
            for _, (old_query, response) in enumerate(history):
                old_prompt += old_query + " " + response + "\n"
            query = old_prompt + "USER: {} ASSISTANT:".format(query)
    if image is None:
        input_by_model = model.build_conversation_input_ids(
            tokenizer,
            query=query,
            history=history,
            template_version='chat'
        )
    else:
        input_by_model = model.build_conversation_input_ids(
            tokenizer,
            query=query,
            history=history,
            images=[image],
            template_version='chat'
        )
    inputs = {
        'input_ids': input_by_model['input_ids'].unsqueeze(0).to(DEVICE),
        'token_type_ids': input_by_model['token_type_ids'].unsqueeze(0).to(DEVICE),
        'attention_mask': input_by_model['attention_mask'].unsqueeze(0).to(DEVICE),
        'images': [[input_by_model['images'][0].to(DEVICE).to(TORCH_TYPE)]] if image is not None else None,
    }
    gen_kwargs = {
        "max_new_tokens": 2048,
        "pad_token_id": 128002,  
    }
    print(inputs)
    with torch.no_grad():
        outputs = model.generate(**inputs, **gen_kwargs)
        outputs = outputs[:, inputs['input_ids'].shape[1]:]
        response = tokenizer.decode(outputs[0])
        response = response.split("<|end_of_text|>")[0]
        print("\nCogVLM2:", response)
    history.append((query, response))_

Expected behavior / 期待表现

A available multi-gpu run demo in future repo!

支持 VLLM 推理加速

Feature request / 功能建议

希望 CogVLM2 能够支持 vllm 推理加速

Motivation / 动机

推理速度更快

Your contribution / 您的贡献

https://github.com/vllm-project/vllm

请问报错 TypeError: zeros_like(): argument 'input' (position 1) must be Tensor, not NoneType 如何解决？报错在 modeling_cogvlm.py 94行

System Info / 系統信息

使用langchain-chatchat 运行
CUDA版本：12.2
transformers 版本：4.37.2
python 版本：3.11.7
操作系统：Ubuntu 20.04.6

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

模型地址配置完成后，直接运行
模型可以启动，但是向模型提问（stream模式），会先返回一个字(1 token)，然后就会报错：

|   File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-19b-chinese/modeling_cogvlm.py", line 94, in get_expert_mask
|     vision_token_mask = torch.zeros_like(token_type_ids, dtype=torch.bool)
|                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| TypeError: zeros_like(): argument 'input' (position 1) must be Tensor, not NoneType

Expected behavior / 期待表现

希望能使用langchain-chatchat 或 fastchat 来运行模型

Provide OpenAI compatible API as CogVLM1 does

Feature request / 功能建议

https://github.com/THUDM/CogVLM/tree/main/openai_demo

Motivation / 动机

这样很多应用端代码不需要修改

Your contribution / 您的贡献

https://github.com/THUDM/CogVLM/tree/main/openai_demo

Does the visual model freeze during the pre-training phase?

是否可以支持设置stop words？

Feature request / 功能建议

当前basic_demo里的opanai_api_demo.py里是基于model.generate来生成模型推理结果，但是model.generate貌似没有类似stop_words_ids的参数来控制输出及时停止(？)，请问如何才能实现该功能？

另外，请问CogVLM2模型有无方法可以部署加速推理？（类似vllm、lmdeploy，不过这两个框架目前貌似不支持CogVLM系列模型部署加速）

谢谢！

Motivation / 动机

想让模型输出CoT，又不想模型额外输出过多的东西影响推理耗时，所以想设置关键词提前停止模型生成

Your contribution / 您的贡献

暂无~

运行 cli_demo.py 报错 not found in your environment: triton

System Info / 系統信息

cuda 12.3
transformers 4.41.1
python 3.11.9
win10 4070
conda create -n my_CogVLM2-3.11.9 python=3.11.9 conda activate my_CogVLM2-3.11.9

运行完 pip install -r requirements.txt 返回

MarkupSafe-2.1.5 aiofiles-23.2.1 annotated-types-0.7.0 anyio-3.7.1 asyncer-0.0.2 bidict-0.23.1 bitsandbytes-0.43.1 certifi-2024.2.2 chainlit-1.1.202 charset-normalizer-3.3.2 chevron-0.14.0 click-8. 1.7 colorama-0.4.6 dataclasses_json-0.5.14 deprecated-1.2.14 distro-1.9.0 einops-0.8.0 fastapi-0.110.3 fastapi-socketio-0.0.10 filelock-3.14.0 filetype-1.2.0 fsspec-2024.5.0 googleapis-common-protos-1.63.0 grpcio-1.64.0 h11-0.14.0 httpcore-1.0.5 httpx-0.27.0 huggingface-hub-0.23.1 idna-3.7 importlib-metadata-7.0.0 intel-openmp-2021.4.0 jinja2-3.1.4 lazify-0.4.0 literalai-0.0.601 loguru-0.7.2 marshmallow-3.21.2 mkl-2021.4.0 mpmath-1.3.0 mypy-extensions-1.0.0 nest-asyncio-1.6.0 networkx-3.3 numpy-1.26.4 openai-1.30.3 opentelemetry-api-1.24.0 opentelemetry-exporter-otlp-1.24.0 opentelemetry-exporter-otlp-proto-common-1.24.0 opentelemetry-exporter-otlp-pro to-grpc-1.24.0 opentelemetry-exporter-otlp-proto-http-1.24.0 opentelemetry-instrumentation-0.45b0 opentelemetry-proto-1.24.0 opentelemetry-sdk-1.24.0 opentelemetry-semantic-conventions-0.45b0 packaging-23.2 pillow-10.3.0 protobuf-4.25.3 pydantic-2.7.1 pydantic-core-2.18.2 pyjwt-2.8.0 python-dotenv-1.0.1 python-engineio-4.9.1 python-multipart-0.0.9 python-socketio-5.11.2 pyyaml-6.0.1 regex-2024.5.15 requests-2.32.2 safetensors-0.4.3 simp le-websocket-1.0.0 sniffio-1.3.1 sse-starlette-2.1.0 starlette-0.37.2 sympy-1.12 syncer-2.0.3 tbb-2021.12.0 timm-1.0.3 tokenizers-0.19.1 tomli-2.0.1 torch-2.3.0 torchvision-0.18.0 tqdm-4.66.4 transformers-4.41.1 typing-e xtensions-4.12.0 typing-inspect-0.9.0 uptrace-1.24.0 urllib3-2.2.1 uvicorn-0.25.0 watchfiles-0.20.0 win32-setctime-1.1.0 wrapt-1.16.0 wsproto-1.2.0 xformers-0.0.26.post1 zipp-3.18.2

按照文档运行 python cli_demo.py 报错如下

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "F:\workspace\ai\CogVLM2\basic_demo\cli_demo.py", line 21, in <module> model = AutoModelForCausalLM.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\workSoftInstallPath\anaconda3\envs\my_cogvlm2-3.11.9\Lib\site-packages\transformers\models\auto\auto_factory.py", line 550, in from_pretrained model_class = get_class_from_dynamic_module( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\workSoftInstallPath\anaconda3\envs\my_cogvlm2-3.11.9\Lib\site-packages\transformers\dynamic_module_utils.py", line 498, in get_class_from_dynamic_module final_module = get_cached_module_file( ^^^^^^^^^^^^^^^^^^^^^^^ File "F:\workSoftInstallPath\anaconda3\envs\my_cogvlm2-3.11.9\Lib\site-packages\transformers\dynamic_module_utils.py", line 361, in get_cached_module_file get_cached_module_file( File "F:\workSoftInstallPath\anaconda3\envs\my_cogvlm2-3.11.9\Lib\site-packages\transformers\dynamic_module_utils.py", line 323, in get_cached_module_file modules_needed = check_imports(resolved_module_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\workSoftInstallPath\anaconda3\envs\my_cogvlm2-3.11.9\Lib\site-packages\transformers\dynamic_module_utils.py", line 181, in check_imports raise ImportError( ImportError: This modeling file requires the following packages that were not found in your environment: triton. Run pip install triton
单独运行 pip install triton，总是提示
ERROR: Could not find a version that satisfies the requirement triton (from versions: none) ERROR: No matching distribution found for triton

换了 python 版本也不行

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

conda create -n my_CogVLM2-3.11.9 python=3.11.9
conda activate my_CogVLM2-3.11.9
pip install -r requirements.txt
python cli_demo.py

Expected behavior / 期待表现

正常运行

请问basic_demo/cli_demo_multi_gpus.py能降低单张gpu卡的要求吗？

我看到BF16 / FP16 推理 42GB
但是说明中在cli_demo_multi_gpu.py 中，我们使用了 infer_auto_device_map 函数来自动分配模型的不同层到不同的GPU上。你需要设置 max_memory 参数来指定每张GPU的最大内存。例如，如果你有两张GPU，每张GPU的内存为23GiB。
是否是指通过多卡来减少单张卡的gpu内存要求？
我现在有4张16G的卡，通过这个方式{0: '15GiB', 1: '15GiB', 2: '15GiB', 3: '15GiB'} ，还是会导致 CUDA out of memory,请问是我理解的不对还是我设置的不对？

调用接口报错 raise KeyError(key) from None KeyError: 'HOME'

System Info / 系統信息

python3.10.11

Who can help? / 谁可以帮助到您？

模型加载正常，调用报错。
demoweb调用报2024-05-22 22:54:37 - Translation file for zh-CN not found. Using default translation en-US.
2024-05-22 22:54:44 - Translation file for zh-CN not found. Using default translation en-US.

通过api接口调用报：KeyError: 'HOME'
2024-05-23 11:40:55.606 | DEBUG | main:generate_stream_cogvlm:301 - ==== request ====
Do you think this is a spring or winter photo?
INFO: 127.0.0.1:58760 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 408, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\uvicorn\middleware\proxy_headers.py", line 84, in call
return await self.app(scope, receive, send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\fastapi\applications.py", line 1054, in call
await super().call(scope, receive, send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\applications.py", line 123, in call
await self.middleware_stack(scope, receive, send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\middleware\errors.py", line 186, in call
raise exc
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\middleware\errors.py", line 164, in call
await self.app(scope, receive, _send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\middleware\cors.py", line 85, in call
await self.app(scope, receive, send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\middleware\exceptions.py", line 65, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\routing.py", line 756, in call
await self.middleware_stack(scope, receive, send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\routing.py", line 776, in app
await route.handle(scope, receive, send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\routing.py", line 297, in handle
await self.app(scope, receive, send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\starlette\routing.py", line 72, in app
response = await func(request)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\fastapi\routing.py", line 278, in app
raw_response = await run_endpoint_function(
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "D:\app\CogVLM2\basic_demo\openai_api_demo.py", line 162, in create_chat_completion
response = generate_cogvlm(model, tokenizer, gen_params)
File "D:\app\CogVLM2\basic_demo\openai_api_demo.py", line 228, in generate_cogvlm
for response in generate_stream_cogvlm(model, tokenizer, params):
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "D:\app\CogVLM2\basic_demo\openai_api_demo.py", line 334, in generate_stream_cogvlm
model.generate(**inputs, **gen_kwargs)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\transformers\generation\utils.py", line 1736, in generate
result = self._sample(
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\transformers\generation\utils.py", line 2375, in _sample outputs = self(
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 620, in forward
outputs = self.model(
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 402, in forward
return self.llm_forward(
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 486, in llm_forward
layer_outputs = decoder_layer(
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 261, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 204, in forward
query_states, key_states = self.rotary_emb(query_states, key_states, position_ids=position_ids, max_seqlen=position_ids.max() + 1)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\util.py", line 469, in forward
q = apply_rotary_emb_func(
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\util.py", line 329, in apply_rotary_emb
return ApplyRotaryEmb.apply(
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\torch\autograd\function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\util.py", line 255, in forward
out = apply_rotary(
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\util.py", line 212, in apply_rotary
rotary_kernel[grid](
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\triton\runtime\jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "", line 41, in rotary_kernel
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\triton\compiler.py", line 1230, in compile
so_cache_manager = CacheManager(so_cache_key)
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\triton\compiler.py", line 1102, in init
self.cache_dir = os.environ.get('TRITON_CACHE_DIR', default_cache_dir())
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\site-packages\triton\compiler.py", line 1093, in default_cache_dir
return os.path.join(os.environ["HOME"], ".triton", "cache")
File "C:\ProgramData\anaconda3\envs\cogvlm3\lib\os.py", line 680, in getitem
raise KeyError(key) from None
KeyError: 'HOME'

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

起服务调用接口报错

Expected behavior / 期待表现

哪位大佬知道如何解决

ERROR: Cannot install -r basic_demo/requirements.txt (line 7) and uvicorn>=0.29.0 because these package versions have conflicting dependencies.

System Info / 系統信息

用conda创建的新环境，python=3.11，CUDA=12.1，报以上错误QAQ

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

git clone后用conda创建python=3.11的环境，pip install -r requirement.txt后出现以下报错：

ERROR: Cannot install -r basic_demo/requirements.txt (line 7) and uvicorn>=0.29.0 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested uvicorn>=0.29.0
chainlit 1.1.101 depends on uvicorn<0.26.0 and >=0.25.0
The user requested uvicorn>=0.29.0
chainlit 1.1.0 depends on uvicorn<0.26.0 and >=0.25.0
The user requested uvicorn>=0.29.0
chainlit 1.0.506 depends on uvicorn<0.26.0 and >=0.25.0

Expected behavior / 期待表现

能解决这个问题

int4 cli_demo run

System Info / 系統信息

python:3.11
xformers :0.0.26.post1
transformers: 4.41.0

model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
torch_dtype=TORCH_TYPE,
trust_remote_code=True,
load_in_4bit=True,
low_cpu_mem_usage=True
).to(DEVICE).eval()
错误：
.to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

CUDA_VISIBLE_DEVICES=0 python cli_demo.py

Expected behavior / 期待表现

如何运行int4版本

微调的模型不支持中文的模型吗？

使用八个a800无法完成对中文模型微调的任务

cogvlm2-llama3-chinese-chat-19B模型无法在cpu环境下运行

System Info / 系統信息

transformers-4.41.1
cuda-cpu
python3.11
WindowsServer2016Standard
我尝试在无显卡的环境部署该模型，似乎无法正常运作，在一番努力后模型可以正常加载，但在提问时，他会调用xformers，在没有显卡的环境下显然无法调用，即便我安装了xformers。

我的环境能正常运行qwen-vl-chat/Bunny-Llama-3-8B-V，甚至Llama-3-70B也能运行，只是缓慢一些，理论上这是一个可用的环境才对。

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

在无显卡的环境使用应该就可以复现。

Expected behavior / 期待表现

应该可以运行才对。

微调效果不理想，测试微调后的模型感觉完全没学到东西，同一批数据，效果低于cogvlm和cogagent

看日志感觉非常抽象这个是啥原因

pdf文档解析

Feature request / 功能建议

模型对图片的识别效果已经不错了，能否增加对pdf文档的识别功能

Motivation / 动机

拓展模型能力

Your contribution / 您的贡献

无

您好，请问只支持单卡推理嘛，我想问两卡好像不行呀

System Info / 系統信息

GPU：V100 32GB * 2
CUDA: 12.2

Who can help? / 谁可以帮助到您？

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

GPU：V100 32GB * 2
CUDA: 12.2

Expected behavior / 期待表现

请问应该怎么多卡推理

report or any docs

hi,
thanks for your awesome work!
any report or docs can be helpful

thanks

CogVLM2 在昇腾npu服务器上由于triton安装不了无法运行

System Info / 系統信息

架构： ARM 架构
NPU：昇腾910B
python:3.10.12

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

由于在昇腾服务器上无法安装triton ，运行不起来。

Expected behavior / 期待表现

希望能支持NPU服务器的运行

预训练代码

Feature request / 功能建议

请问打算开放预训练代码吗？想尝试用些稍微小点的visual encoder或者换换不同的LLM backbone试试效果

Motivation / 动机

用不同的LLM backbone AND/PR visual encoder 训练

Your contribution / 您的贡献

N/A

请问会支持ollama部署吗，会支持mac OS的系统吗？

请问会支持ollama部署吗？
目前支持mac 上手动部署吗？

请问CogVLM2是否支持多卡微调？

Feature request / 功能建议

目前打算在V100 16G 4卡微调，但是报错，推理没有问题。
https://github.com/THUDM/CogVLM2/blob/main/finetune_demo/README_zh.md
根据教程，需要每张显卡显存大于57G，请问后面是否可以支持多卡微调？

Motivation / 动机

多卡微调

Your contribution / 您的贡献

无

triton在windows不太好装，请问有其他推理版本吗？目前在windows跑cogvlm2-llama3-chinese-chat-19B-int4会报错

Feature request / 功能建议

None

Motivation / 动机

None

Your contribution / 您的贡献

None

CogVLM2的坐标定位能力如何

CogVLM2还是像CogVLM以及CogAgent都具有很强的坐标定位能力嘛？没有看到相关技术文档说明或者使用示例。

能否实现版式复杂多变的表格类的图像的关键信息抽取任务

Feature request / 功能建议

类似于下图：
（来自开放数据集）

使用demo会返回几对关键信息对但是如果想要所有的关键信息是否需要针对特定数据集微调？这个示例只是简单的版式。

Motivation / 动机

无

Your contribution / 您的贡献

无

web_demo THUDM/cogvlm2-llama3-chinese-chat-19B

System Info / 系統信息

python:3.11
xformers :0.0.26.post1
transformers: 4.41.0

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

代码

MODEL_PATH = "THUDM/cogvlm2-llama3-chinese-chat-19B"
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[
0] >= 8 else torch.float16
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, torch_dtype=TORCH_TYPE, trust_remote_code=True, load_in_4bit=True,low_cpu_mem_usage=True).eval()

错误

Traceback (most recent call last):
File "/data/users/hongge/miniconda3/envs/cog/bin/chainlit", line 8, in
sys.exit(cli())
^^^^^
File "/data/users/hongge/miniconda3/envs/cog/lib/python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/users/hongge/miniconda3/envs/cog/lib/python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/data/users/hongge/miniconda3/envs/cog/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/users/hongge/miniconda3/envs/cog/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/users/hongge/miniconda3/envs/cog/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/users/hongge/miniconda3/envs/cog/lib/python3.11/site-packages/chainlit/cli/init.py", line 154, in chainlit_run
run_chainlit(target)
File "/data/users/hongge/miniconda3/envs/cog/lib/python3.11/site-packages/chainlit/cli/init.py", line 56, in run_chainlit
load_module(config.run.module_name)
File "/data/users/hongge/miniconda3/envs/cog/lib/python3.11/site-packages/chainlit/config.py", line 409, in load_module
spec.loader.exec_module(module)
File "", line 940, in exec_module
File "", line 241, in _call_with_frames_removed
File "/mnt/nfs/dev-uclai-alginfra-5/data/users/hongge/workspace/github/CogVLM2/basic_demo/web_demo.py", line 19, in
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/users/hongge/miniconda3/envs/cog/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 909, in from_pretrained
raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.THUDM.cogvlm2-llama3-chinese-chat-19B.c95db649866bb0c0fcc383e95e87e1b01d7d956a.configuration_cogvlm.CogVLMConfig'> to build an AutoTokenizer.

Expected behavior / 期待表现

CogVLM2/basic_demo/web_demo.py
中文模型为啥不能用呢

error with multi_gpus inference

System Info / 系統信息

hi, I have 4 3090 GPUs.

device_map = infer_auto_device_map(
model=model,
max_memory={i: "12GiB" for i in range(torch.cuda.device_count())},
# set 23GiB for each GPU, depends on your GPU memory, you can adjust this value
no_split_module_classes=["CogVLMDecoderLayer"]
)

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

File "/root/.cache/huggingface/modules/transformers_modules/THUDM/cogvlm2-llama3-chinese-chat-19B/8a603a29c162ac4e3e3fa710b3f18fde056e97d0/visual.py", line 83, in forward
output = mlp_input + mlp_output
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:3!

Expected behavior / 期待表现

How to solve this problem？

Finetuning example loss does not decrease over time

System Info / 系統信息

CUDA 12.1, Torch 2.3

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

I've been trying to run the finetuning example for Cog2 (on a dual A100 setup with deepspeed). The loss fluctuates like crazy around 2.0, and it doesn't seem to converge or decrease over time. At first I tried with my own dataset, but as that didn't work also tested the code with the provided labels_en both from the multi-conversation and single conversation, and it does mostly the same (the en version fluctuates maybe around 1.8, up and down).

Tried to decrease/increase lr, no luck. Any suggestions?

BTW, thanks for releasing this great model!

Expected behavior / 期待表现

Loss decreases.

如何不使用 xformers 推理

System Info / 系統信息

系统：ubuntu 20.04
显卡：V100

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

由于显卡不支持xformers，将模型文件中的 visual.py 视觉 Attention 部分改成下面的代码

 
        # out = xops.memory_efficient_attention(
        #     q, k, v, scale=self.scale,
        # )
        out = self.attention(q, k, v)
        output = self.dense(out.view(B, L, -1))
        output = self.output_dropout(output)
        return output

但是实际模型输出结果完全错误，描述内容与图片毫无关系

Expected behavior / 期待表现

模型输出与图片完全不相关

文本较多的场景，想实现 OCR-free的文本提取和表格重建，是不是需要微调才能有好的效果？

Feature request / 功能建议

目前的预训练模型，简单的图片可以，如果文字较多，效果就不太行。。

Motivation / 动机

....

Your contribution / 您的贡献

...

Use learned image-text embedding

Feature request / 功能建议

Hi, is it possible to use the image embedding seperately to do image retrieval based on a query?

Motivation / 动机

Want to do RAG on images.

Your contribution / 您的贡献

Not sure if it's possible yet.

安防场景下使用

Feature request / 功能建议

Motivation / 动机

hi there,请问一下当前版本的多模态大模型在识别安防or真实open场景下的要素和事件的准确度如何，是否有相关的指标可以直接or间接反映模型的能力，以及应用上有什么建议，谢谢。

Your contribution / 您的贡献

[FEATURE] GGUF variant?

Feature request / 功能建议

Please create a GGUF variant since this is the defacto standard for running models locally.

Motivation / 动机

GGUF will make the model more easily accessible and follow a standard that fits running it with model platforms such as Ollama. All Llama derivatives should be able to work on Ollama without much effort.

Your contribution / 您的贡献

See how this one is built up:
https://ollama.com/library/llava

ValueError: `checkpoint` should be the path to a file containing a whole state dict

System Info / 系統信息

NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0

/mnt/data/whang/miniconda3/envs/cogvlm2/bin/python

Python 3.11.9

git pull the latest repo.

Who can help? / 谁可以帮助到您？

python cli_demo_multi_gpus.py 时报错如下：

省略了很多类似的warning等信息。。。
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.60.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.60.post_attention_layernorm.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.input_layernorm.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.attention.query_key_value.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.attention.query_key_value.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.attention.dense.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.attention.dense.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.mlp.fc1.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.mlp.fc1.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.mlp.fc2.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.mlp.fc2.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.61.post_attention_layernorm.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.input_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.input_layernorm.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.attention.query_key_value.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.attention.query_key_value.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.attention.dense.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.attention.dense.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.mlp.fc1.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.mlp.fc1.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.mlp.fc2.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.mlp.fc2.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.post_attention_layernorm.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.transformer.layers.62.post_attention_layernorm.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.linear_proj.linear_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.linear_proj.norm1.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.linear_proj.norm1.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.linear_proj.dense_h_to_4h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.linear_proj.gate_proj.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.linear_proj.dense_4h_to_h.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.conv.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for model.vision.conv.bias: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/torch/nn/modules/module.py:2047: UserWarning: for lm_head.weight: copying from a non-meta parameter in the checkpoint to a meta parameter in the current model, which is a no-op. (Did you mean to pass `assign=True` to assign items in the state dictionary to their corresponding key in the module instead of copying them in place?)
  warnings.warn(f'for {key}: copying from a non-meta parameter in the checkpoint to a meta '
Loading checkpoint shards: 100%|█████████████████████| 8/8 [00:05<00:00,  1.37it/s]
Traceback (most recent call last):
  File "/mnt/data/whang/CogVLM2/basic_demo/cli_demo_multi_gpus.py", line 51, in <module>
    model = load_checkpoint_and_dispatch(model, MODEL_PATH, device_map=device_map, dtype=TORCH_TYPE)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/accelerate/big_modeling.py", line 607, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/mnt/data/whang/miniconda3/envs/cogvlm2/lib/python3.11/site-packages/accelerate/utils/modeling.py", line 1647, in load_checkpoint_in_model
    raise ValueError(
ValueError: `checkpoint` should be the path to a file containing a whole state dict, or the index of a sharded checkpoint, or a folder containing a sharded checkpoint or the whole state dict, but got THUDM/cogvlm2-llama3-chat-19B.

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

git pull https://github.com/THUDM/CogVLM2
cd CogVLM2
conda create -n cogvlm2 python=3.11 -y
conda activate cogvlm2
cd basic_demo
python -m pip install -r requirements
python -m pip install accelerate
python cli_demo_multi_gpus.py

Expected behavior / 期待表现

当然是不报错、正常运行了。

cli demo batch 推理

Feature request / 功能建议

能否支持加速推理呢
batch infer 代码有嘛？

Motivation / 动机

效果很不错，速度能够快一些吗

Your contribution / 您的贡献

暂时没有

提供的CogVLM2/basic_demo/openai_api_demo.py 无流式输出效果

System Info / 系統信息

cuda12.1 torch2.3.0

Who can help? / 谁可以帮助到您？

@zr

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

1、直接运行openai_api_demo.py启动服务
2、使用示例的请求发送请求
3、结果是等所有的文字完成生成后再一下子逐个返回，没有一个个生成的感觉

Expected behavior / 期待表现

希望能做相应的修复

cuda版本

请问cuda11.8支持么？安装requirements.txt之后会报错

使用openai_api_demo的时候，如果初始没有图片，会报错：IndexError: list index out of range

System Info / 系統信息

cuda和pytorch版本：pytorch/pytorch:2.1.2-cuda12.1-cudnn8-devel

Who can help? / 谁可以帮助到您？

@zRzRzRzRzRzRzR

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

日志

INFO:     192.168.18.83:56502 - "POST /v1/chat/completions HTTP/1.0" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 408, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/conda/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/conda/lib/python3.10/site-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/opt/conda/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/opt/xiaoshi-vl-chat/openai_api_demo.py", line 164, in create_chat_completion
    response = generate_cogvlm(model, tokenizer, gen_params)
  File "/opt/xiaoshi-vl-chat/openai_api_demo.py", line 216, in generate_cogvlm
    for response in generate_stream_cogvlm(model, tokenizer, params):
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
    response = gen.send(None)
  File "/opt/xiaoshi-vl-chat/openai_api_demo.py", line 285, in generate_stream_cogvlm
    input_by_model = model.build_conversation_input_ids(tokenizer, query=query, history=history, images=[image_list[-1]])
IndexError: list index out of range

Expected behavior / 期待表现

作为 OpenAI-API-compatible添加到dify的时候，初始会有一个校验，这个时候没有图片数据，希望能够兼容这种情况

basic_demofastapi>=0.111.0 because these package versions have conflicting dependencies.

System Info / 系統信息

ubuntu22.04

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

(CogVLM2_basic_demo) root@com2:~/CogVLM2-main/basic_demo# pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
Collecting chainlit>=1.0.506 (from -r requirements.txt (line 7))
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/1b/5c/2e334ba1de3170354232bce9f0344594cfbececc6f992da8412f64d484f5/chainlit-1.1.0-py3-none-any.whl (4.4 MB)
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/3d/7c/5c94714d7b277f55048df8433c9cd0be3eed7eed82ca89a629d1b3b59c02/chainlit-1.0.506-py3-none-any.whl (4.4 MB)
ERROR: Cannot install -r requirements.txt (line 7) and fastapi>=0.111.0 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested fastapi>=0.111.0
chainlit 1.1.101 depends on fastapi<0.111.0 and >=0.110.1
The user requested fastapi>=0.111.0
chainlit 1.1.0 depends on fastapi<0.111.0 and >=0.110.1
The user requested fastapi>=0.111.0
chainlit 1.0.506 depends on fastapi<0.111.0 and >=0.110.1

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

Expected behavior / 期待表现

应该用什么版本

CogVLM2支持grounding功能吗？

我试了一些图片做grounding，但回答都是"I cannot provide the exact coordinates of the objects in the image as the image does not provide a scale or reference points for measuring"，不知道是不支持grounding功能还是我的提问有问题？

Support text input only

Congraduations. After experience your online demo and view your official docs, I want know that whether it support text input only or not. Is the language model available for chatting?

请问支持代码生成和代码纠错吗

比如让他生成一段代码
或者给出代码及其错误然后让他修改

求快点出GGUF版本！

Feature request / 功能建议

建议尽快推出GGUF版本

Motivation / 动机

方便本地部署

Your contribution / 您的贡献

无

CogVLM2：cogvlm2-llama3-chinese-chat-19B 模型在 win11 x64 平台 4bit 加载下，运行报错。

System Info / 系統信息

环境状态如下：
Windows 11 x64、Python 3.11.9、CUDA 12.1、Torch/torchvision/xformers/transformers/chainlit 关键依赖项，完全按照官方 requirements.txt 安装。后来根据系统提示，加装了：einops-0.8.0、triton-2.1.0、accelerate-0.30.1、psutil-5.9.8系统环境路径设置：
CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
CUDA_VISIBLE_DEVICES=0

为了 4bit 量化加载，修改了 web_demo.py 脚本中模型加载部分的参数，具体如下，原脚本：
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer

MODEL_PATH = "THUDM/cogvlm2-llama3-chat-19B"
TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, torch_dtype=TORCH_TYPE, trust_remote_code=True).to(DEVICE).eval()

修改后脚本：
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer, BitsAndBytesConfig

fp4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="fp4", bnb_4bit_compute_dtype=torch.float32)

MODEL_PATH = "checkpoints/cogvlm2-llama3-chinese-chat-19B"
TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float32
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, torch_dtype=TORCH_TYPE, trust_remote_code=True, quantization_config=fp4_config, device_map="auto").eval()

其它未做任何更改。

执行命令：chainlit run web_demo_me.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
The load_in_4bit and load_in_8bit arguments are deprecated and will be removed in the future versions. Please, pass a BitsAndBytesConfig object in quantization_config argument instead.
2024-05-22 13:44:03 - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set max_memory in to a higher value to use more memory (at your own risk).
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 8/8 [00:55<00:00, 6.90s/it]
2024-05-22 13:45:00 - Your app is available at http://localhost:8000
2024-05-22 13:45:02 - Translation file for zh-CN not found. Using default translation en-US.
2024-05-22 13:45:02 - Translated markdown file for zh-CN not found. Defaulting to chainlit.md.
2024-05-22 13:45:15 - Translation file for zh-CN not found. Using default translation en-US.
2024-05-22 13:45:15 - Translation file for zh-CN not found. Using default translation en-US.
2024-05-22 13:45:15 - Translated markdown file for zh-CN not found. Defaulting to chainlit.md.
main.c
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin/../include\cuda.h(20247): warning C4819: 该文件包含不能在当前代码页(936)中表示的字符。请将该文件保存为 Unicode 格式以防止数据丢失
C:\Users\ADMINI1\AppData\Local\Temp\tmptho3klk7\main.c(10): fatal error C1083: 无法打开包括文件: “Python.h”: No such file or directory
Exception in thread Thread-2 (generate):
Traceback (most recent call last):
File "D:\AITest\CogVLM2\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "D:\AITest\CogVLM2\Python311\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\transformers\generation\utils.py", line 1736, in generate
result = self._sample(
^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\transformers\generation\utils.py", line 2375, in _sample
outputs = self(
^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 620, in forward
outputs = self.model(
^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 402, in forward
return self.llm_forward(
^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 486, in llm_forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 261, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 204, in forward
query_states, key_states = self.rotary_emb(query_states, key_states, position_ids=position_ids, max_seqlen=position_ids.max() + 1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\accelerate\hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\util.py", line 469, in forward
q = apply_rotary_emb_func(
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\util.py", line 329, in apply_rotary_emb
return ApplyRotaryEmb.apply(
^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\torch\autograd\function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\util.py", line 255, in forward
out = apply_rotary(
^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\util.py", line 212, in apply_rotary
rotary_kernel[grid](
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\triton\runtime\jit.py", line 160, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\triton\runtime\jit.py", line 341, in run
device = driver.get_current_device()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\triton\runtime\driver.py", line 22, in getattr
self._initialize_obj()
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\triton\runtime\driver.py", line 19, in _initialize_obj
self._obj = self._init_fn()
^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\triton\runtime\driver.py", line 8, in _create_driver
return actives0
^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\triton\backends\nvidia\driver.py", line 411, in init
self.utils = CudaUtils() # TODO: make static
^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\triton\backends\nvidia\driver.py", line 55, in init
mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\triton\backends\nvidia\driver.py", line 32, in compile_module_from_src
so = _build(name, src_path, tmpdir, library_dir, include_dir, libraries)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\triton\runtime\build.py", line 65, in _build
ret = subprocess.check_call(cc_cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe', 'C:\Users\ADMINI1\AppData\Local\Temp\tmptho3klk7\main.c', '-O3', '-shared', '-lcuda', '-LD:\AITest\CogVLM2\Python311\Lib\site-packages\triton\backends\nvidia\include', '-LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include', '-LC:\Users\ADMINI1\AppData\Local\Temp\tmptho3klk7', '-LD:\AITest\CogVLM2\Python311\Include', '-ID:\AITest\CogVLM2\Python311\Lib\site-packages\triton\backends\nvidia\lib', '-ID:\AITest\CogVLM2\Python311\libs', '-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64', '-o', 'C:\Users\ADMINI1\AppData\Local\Temp\tmptho3klk7\cuda_utils.cp311-win_amd64.pyd']' returned non-zero exit status 2.
2024-05-22 13:46:32 -
Traceback (most recent call last):
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\chainlit\utils.py", line 39, in wrapper
return await user_function(**params_values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\web_demo_me.py", line 218, in main
conv = await request(conv, settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\web_demo_me.py", line 166, in request
async for response in get_response(query, history, gen_kwargs, images):
File "D:\AITest\CogVLM2\web_demo_me.py", line 57, in get_response
for next_text in streamer:
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\transformers\generation\streamers.py", line 223, in next
value = self.text_queue.get(timeout=self.timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\Python311\Lib\queue.py", line 179, in get
raise Empty
_queue.Empty
2024-05-22 13:47:33 - Translation file for zh-CN not found. Using default translation en-US.
2024-05-22 13:47:57 - Translation file for zh-CN not found. Using default translation en-US.
2024-05-22 13:48:10 - can only concatenate str (not "NoneType") to str
Traceback (most recent call last):
File "D:\AITest\CogVLM2\Python311\Lib\site-packages\chainlit\utils.py", line 39, in wrapper
return await user_function(**params_values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\web_demo_me.py", line 218, in main
conv = await request(conv, settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\AITest\CogVLM2\web_demo_me.py", line 166, in request
async for response in get_response(query, history, gen_kwargs, images):
File "D:\AITest\CogVLM2\web_demo_me.py", line 36, in get_response
input_by_model = model.build_conversation_input_ids(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 763, in build_conversation_input_ids
text = _history_to_prompt(template_version, history, query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Administrator.cache\huggingface\modules\transformers_modules\cogvlm2-llama3-chinese-chat-19B\modeling_cogvlm.py", line 563, in _history_to_prompt
prompt += 'Question: ' + old_query + " {} ".format(answer_format) + response + "\n"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str
2024-05-22 13:48:44 - Translation file for zh-CN not found. Using default translation en-US.

看起来模型量化加载正常，Web 界面启动正常，提交 prompt 后终端窗口有些报错，似乎想要做一些 cuda 的本地编译，但好像不成功。GPU占用月16GB左右，符合官方给出的参数，运行并未中断，第二次提交 prompt 仍可以，但报 TypeError 如上，进程并未中断。
Web 界面执行情况如下图：

烦请诸位大佬百忙中加以分析，给些指点，多谢！

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

修改的脚本是：
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer, BitsAndBytesConfig

fp4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="fp4", bnb_4bit_compute_dtype=torch.float32)

替换上述几行后，希望可以复现问题。

Expected behavior / 期待表现

如果能给个官方 4bit 量化运行的脚本就好了，谢谢。

Wondering whether CogVLM2 supports SFT for multi-image QA in a sample

Feature request / 功能建议

Hi, CogVLM2 team.

Thank you for your brilliant work and this neat and easy-to-follow codebase.
This morning, I've read through this repo quickly, and I have some related question to ask. Now, my challenge is to simultaneously input 6 images in a single QA turn, and ask some questions which need to retrieve image information from six images (e.g. What are important objects in these six images and tell me why).
To achieve this target, it seems that I only need to prepare the dataset, modify the following code lines (

CogVLM2/finetune_demo/peft_lora.py

Lines 71 to 93 in 2af4666

 def __getitem__(self, idx): 

 img_name = os.path.join(self.image_dir, self.filenames[idx]) 

 label_name = os.path.join(self.label_dir, self.filenames[idx].replace('.jpg', '.json')) 

 image = Image.open(img_name).convert('RGB') 

 with open(label_name, 'r') as f: 

 label_data = json.load(f) 

 num_rounds = len(label_data["conversations"]) // 2 

 sampled_round_id = random.randint(0, num_rounds - 1) 

 history = [(label_data["conversations"][(sampled_round_id - 1) * 2]["content"], 

 label_data["conversations"][(sampled_round_id - 1) * 2 + 1]["content"])] if ( 

 sampled_round_id > 0 and random.random() > 0.5) else None 

 query = label_data["conversations"][sampled_round_id * 2]["content"] 

 response = label_data["conversations"][sampled_round_id * 2 + 1]["content"] 

 input_data = self.model.build_conversation_input_ids( 

 tokenizer=self.tokenizer, 

 query=query, 

 history=history, 

 images=[image], 

 answer=response 

 )

) , feed the images as a python list and select suitable max_input_tokens. Since currently I don't have enough knowledge about your vision encoders of CogVLM2, so I'm asking the above questions.
Could you please tell me my understanding is right? In addition, if the max_input_tokens is fixed to 8192, we can simultaneously feed 6 images? Thank you again for your great work and look forwards to your reply.

Best regards,
Xuefen

Motivation / 动机

Support multi-image SFT

Your contribution / 您的贡献

None

system message support

Feature request / 功能建议

Great work! Do you consider to support system message in the future?

Motivation / 动机

N/A

Your contribution / 您的贡献

N/A

cogvlm2-grounding版本会有多个目标输出吗？很期待

Feature request / 功能建议

看之前的issue后续是有cogvlm2-grounding版本，想请问下会有多个目标输出吗？

Motivation / 动机

一个图片里同属性的目标会有多个，但是cogvlm-grounding只能出一个目标。后续的cogcom有多个目标输出，cogvlm2-grounding版本会多个目标输出吗？

	def __getitem__(self, idx):
	img_name = os.path.join(self.image_dir, self.filenames[idx])
	label_name = os.path.join(self.label_dir, self.filenames[idx].replace('.jpg', '.json'))

	image = Image.open(img_name).convert('RGB')
	with open(label_name, 'r') as f:
	label_data = json.load(f)

	num_rounds = len(label_data["conversations"]) // 2
	sampled_round_id = random.randint(0, num_rounds - 1)
	history = [(label_data["conversations"][(sampled_round_id - 1) * 2]["content"],
	label_data["conversations"][(sampled_round_id - 1) * 2 + 1]["content"])] if (
	sampled_round_id > 0 and random.random() > 0.5) else None
	query = label_data["conversations"][sampled_round_id * 2]["content"]
	response = label_data["conversations"][sampled_round_id * 2 + 1]["content"]

	input_data = self.model.build_conversation_input_ids(
	tokenizer=self.tokenizer,
	query=query,
	history=history,
	images=[image],
	answer=response
	)

thudm / cogvlm2 Goto Github PK

cogvlm2's People

Contributors

Stargazers

Watchers

Forkers

cogvlm2's Issues

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

Feature request / 功能建议

Motivation / 动机

Your contribution / 您的贡献

Feature request / 功能建议