请问basic_demo/cli_demo_multi_gpus.py能降低单张gpu卡的要求吗？ about cogvlm2 HOT 3 CLOSED

fallbernana123456 commented on September 13, 2024

请问basic_demo/cli_demo_multi_gpus.py能降低单张gpu卡的要求吗？

from cogvlm2.

Comments (3)

zRzRzRzRzRzRzR commented on September 13, 2024

可能是因为一个模块就超过了16G了，只测试了24G的卡

from cogvlm2.

whysirier commented on September 13, 2024

可能是因为一个模块就超过了16G了，只测试了24G的卡

您好，我试了报另一个错误，这是我的代码
import torch
from PIL import Image
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import init_empty_weights, load_checkpoint_and_dispatch, infer_auto_device_map

MODEL_PATH = "/mnt/data/spdi-code/paddlechat/cogvlm2-llama3-chat-19B"
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[0] >= 8 else torch.float16

tokenizer = AutoTokenizer.from_pretrained(
MODEL_PATH,
trust_remote_code=True
)

with init_empty_weights():
model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
)

num_gpus = torch.cuda.device_count()
max_memory_per_gpu = "16GiB"
if num_gpus > 2:
max_memory_per_gpu = f"{round(42 / num_gpus)}"

device_map = infer_auto_device_map(
model=model,
max_memory={i: max_memory_per_gpu for i in range(num_gpus)},
no_split_module_classes=["CogVLMDecoderLayer"]
)
model = load_checkpoint_and_dispatch(model, MODEL_PATH, device_map=device_map, dtype=TORCH_TYPE)
model = model.eval()

text_only_template = "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {} ASSISTANT:"

query = text_only_template.format('您好')
history = []
input_by_model = model.build_conversation_input_ids(
tokenizer,
query=query,
history=history,
template_version='chat'
)

inputs = {
'input_ids': input_by_model['input_ids'].unsqueeze(0).to(DEVICE),
'token_type_ids': input_by_model['token_type_ids'].unsqueeze(0).to(DEVICE),
'attention_mask': input_by_model['attention_mask'].unsqueeze(0).to(DEVICE),
'image': None
}
gen_kwargs = {
"max_new_tokens": 2048,
"pad_token_id": 128002,
}
with torch.no_grad():
outputs = model.generate(**inputs, **gen_kwargs)
outputs = outputs[:, inputs['input_ids'].shape[1]:]
response = tokenizer.decode(outputs[0])
response = response.split("")[0]
print("\nCogVLM2:", response)
history.append((query, response))

from cogvlm2.

zRzRzRzRzRzRzR commented on September 13, 2024

用最新的代码也是这个问题吗，直接用我们切分的办法

from cogvlm2.

Recommend Projects

请问basic_demo/cli_demo_multi_gpus.py能降低单张gpu卡的要求吗？ about cogvlm2 HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent