Comments (19)
再尝试一遍?刚更新了代码
from qwen.
用新代码显存占用: 17150MiB / 32510MiB
from qwen.
关键是官方也没给量化模型。。。我单独开了个#18,希望官方有看到。。。
from qwen.
用你们的DEMO,结果跑不起来,炸显存了,难道只能用量化的吗? torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 0; 23.65 GiB total capacity; 20.85 GiB already allocated; 1.26 GiB free; 20.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
您好,可能是默认使用了fp32精度导致OOM?可以试试拉取我们的最新代码,然后使用fp16精度来加载模型?方法如下:
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()
from qwen.
关键是官方也没给量化模型。。。我单独开了个#18,希望官方有看到。。。
刚才王鹏提了精度问题,打开fp16是一种。量化部分在README有说,看量化章节,只需要加入quantization_config就行
from qwen.
模型重新下载了一边,还是不行啊,难道只能用量化的?。。。。。
Traceback (most recent call last):
File "/home/johnzyx/working/pythonprojects/LLaMA-Efficient-Tuning/src/zyx_QwenDemo.py", line 14, in
response, history = model.chat(tokenizer, "你好", history=None)
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 905, in chat
outputs = self.generate(
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 951, in generate
return super().generate(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1615, in generate
return self.sample(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2737, in sample
outputs = self(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 842, in forward
lm_logits = self.lm_head(hidden_states)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 160, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 286, in pre_forward
set_module_tensor_to_device(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 298, in set_module_tensor_to_device
new_value = value.to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 0; 23.65 GiB total capacity; 20.83 GiB already allocated; 1.18 GiB free; 20.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
from qwen.
初始化模型的时候加上这个试试:torch_dtype=torch.float16
from qwen.
如果加参数fp=16,像上面的
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()
也会报错:
Warning: import flash_attn fail, please install FlashAttention https://github.com/Dao-AILab/flash-attention
Traceback (most recent call last):
File "/home/johnzyx/working/pythonprojects/LLaMA-Efficient-Tuning/src/zyx_QwenDemo.py", line 14, in
response, history = model.chat(tokenizer, "你好", history=None)
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 905, in chat
outputs = self.generate(
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 951, in generate
return super().generate(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1615, in generate
return self.sample(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2750, in sample
next_token_scores = logits_processor(input_ids, next_token_logits)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in call
scores = processor(input_ids, scores)
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/qwen_generation_utils.py", line 349, in call
scores[i, self.eos_token_id] = float(230)
RuntimeError: value cannot be converted to type at::Half without overflow
from qwen.
你是不是改config.json文件了?我修改config.json可以复现你这个报错。去hf上下一遍新的代码吧。
from qwen.
17250MiB / 32510MiB
from qwen.
同样报错 ,已经下载了huggingface上最新的config.json
仍然报错 RuntimeError: value cannot be converted to type at::Half without overflow
from qwen.
同样是楼上的错误,也是24G显存,各种方法都用了。不是OOM就是 overflow
from qwen.
我的config.json
你们看一下
{
"activation": "swiglu",
"apply_residual_connection_post_layernorm": false,
"architectures": [
"QWenLMHeadModel"
],
"auto_map": {
"AutoConfig": "configuration_qwen.QWenConfig",
"AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
},
"attn_pdrop": 0.0,
"bf16": false,
"bias_dropout_fusion": true,
"bos_token_id": 151643,
"embd_pdrop": 0.1,
"eos_token_id": 151643,
"ffn_hidden_size": 22016,
"fp16": false,
"initializer_range": 0.02,
"kv_channels": 128,
"layer_norm_epsilon": 1e-05,
"model_type": "qwen",
"n_embd": 4096,
"n_head": 32,
"n_layer": 32,
"n_positions": 6144,
"no_bias": true,
"onnx_safe": null,
"padded_vocab_size": 151936,
"params_dtype": "torch.bfloat16",
"pos_emb": "rotary",
"resid_pdrop": 0.1,
"rotary_emb_base": 10000,
"rotary_pct": 1.0,
"scale_attn_weights": true,
"seq_length": 2048,
"tie_word_embeddings": false,
"tokenizer_type": "QWenTokenizer",
"transformers_version": "4.31.0",
"use_cache": true,
"use_flash_attn": true,
"vocab_size": 151936,
"use_dynamic_ntk": false,
"use_logn_attn": false
}
from qwen.
你把 bf16 改成true可能就能跑了,我刚测试了一下 指定torch_dtype=torch.float16没用,加载的参数还是bf16的。但是奇怪的是v100是不支持bf16的,不知道我这里怎么跑起来的。
from qwen.
你是不是改config.json文件了?我修改config.json可以复现你这个报错。去hf上下一遍新的代码吧。
能否分享一下您正确的config.json
from qwen.
他这个模型好像只能在bf16下跑,所以要么在config里把fp16设置成false,bf16设置成true,初始化的时候什么都不加,要么两个都设置成false,初始化时加上 torch_dtype=torch.bfloat16 ,我试了一下两种方法都能跑,显存占用都小于20G。
{
"activation": "swiglu",
"apply_residual_connection_post_layernorm": false,
"architectures": [
"QWenLMHeadModel"
],
"auto_map": {
"AutoConfig": "configuration_qwen.QWenConfig",
"AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
},
"attn_pdrop": 0.0,
"bf16": true,
"bias_dropout_fusion": true,
"bos_token_id": 151643,
"embd_pdrop": 0.1,
"eos_token_id": 151643,
"ffn_hidden_size": 22016,
"fp16": false,
"initializer_range": 0.02,
"kv_channels": 128,
"layer_norm_epsilon": 1e-05,
"model_type": "qwen",
"n_embd": 4096,
"n_head": 32,
"n_layer": 32,
"n_positions": 6144,
"no_bias": true,
"onnx_safe": null,
"padded_vocab_size": 151936,
"params_dtype": "torch.bfloat16",
"pos_emb": "rotary",
"resid_pdrop": 0.1,
"rotary_emb_base": 10000,
"rotary_pct": 1.0,
"scale_attn_weights": true,
"seq_length": 2048,
"tie_word_embeddings": false,
"tokenizer_type": "QWenTokenizer",
"transformers_version": "4.31.0",
"use_cache": true,
"use_flash_attn": true,
"vocab_size": 151936,
"use_dynamic_ntk": false,
"use_logn_attn": false
}
用这个应该就能跑
from qwen.
拉取最新的仓库
显卡:4090 24G
use fp32: OOM
use fp16:
'''
scores[i, self.eos_token_id] = float(2**30)
RuntimeError: value cannot be converted to type at::Half without overflow
'''
use bf16: 正常没问题 17031MiB / 23.99GiB
from qwen.
确实只能用bf16=True或者量化的,fp32是不行了
from qwen.
如果加参数fp=16,像上面的 model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval() 也会报错: Warning: import flash_attn fail, please install FlashAttention https://github.com/Dao-AILab/flash-attention Traceback (most recent call last): File "/home/johnzyx/working/pythonprojects/LLaMA-Efficient-Tuning/src/zyx_QwenDemo.py", line 14, in response, history = model.chat(tokenizer, "你好", history=None) File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 905, in chat outputs = self.generate( File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 951, in generate return super().generate( File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1615, in generate return self.sample( File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2750, in sample next_token_scores = logits_processor(input_ids, next_token_logits) File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in call scores = processor(input_ids, scores) File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/qwen_generation_utils.py", line 349, in call scores[i, self.eos_token_id] = float(230) RuntimeError: value cannot be converted to type at::Half without overflow
感谢各位同学的反馈,这个bug是因为float(2**30)超过了fp16的范围,最新代码修复了这个bug。可以再尝试下
from qwen.
Related Issues (20)
- [BUG] <title> code_interpreter 生成的图像只能生成到阿里云上么,不能不传到云上,只在本地保存么? HOT 2
- 指定了模型地址,还是提示 Incorrect path_or_model_id: '/data/shared/Qwen/Qwen-Chat/'
- [BUG] <title> 如何用vllm部署qlora后的模型 HOT 1
- [BUG] CUDA Error: invalid device function /tmp/pip-req-build-5rlg4jgm/ln_fwd_kernels.cuh 236 HOT 4
- [BUG] .CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpecd6su1w/main.c' HOT 3
- how to convert qwen.tiktoken to tokenzier.model HOT 1
- Run Qwen /openai_api.py, Error :Input should be a valid string, body.messages[3].function_call,请问Qwen1.5不支持了么? HOT 1
- pip install csrc/layer_norm 不成功 HOT 1
- [BUG] <title> wrong system prompt check? HOT 2
- [BUG] <title>batch_infer报错:'tuple' object has no attribute 'dtype' HOT 2
- 如何添加`LogitsProcessor`控制结果输出?
- [BUG] <title>lora微调loss异常? HOT 5
- tokenizer.decoder 抛出'utf-8' codec can't decode bytes in position 1-2: unexpected end of data异常 HOT 2
- [BUG] lora微调后,合并成一个模型。这种方式如何加载且推理 HOT 4
- [BUG] Qwen/Qwen-72B-Chat-Int8,不能多GPU并行计算 HOT 1
- Qwen/eval中的评测CEval和CMMLU,开大推理的batchsize评测指标会显著降低 HOT 1
- 请问基于qwen-72b-chat,基于怎样的配置可以在一台4090上训练起来? HOT 4
- 💡 [REQUEST] - <title> 关于lora 模型合并的几个问题 HOT 3
- [BUG] <关于model.generate时发现的源码错误> HOT 2
- [BUG] <Qwen-14B-Chat 输入长文本时无输出结果> HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qwen.