Giter VIP home page Giter VIP logo

Comments (19)

JustinLin610 avatar JustinLin610 commented on May 21, 2024

再尝试一遍?刚更新了代码

from qwen.

Louis-y-nlp avatar Louis-y-nlp commented on May 21, 2024

用新代码显存占用: 17150MiB / 32510MiB

from qwen.

hutianyu2006 avatar hutianyu2006 commented on May 21, 2024

关键是官方也没给量化模型。。。我单独开了个#18,希望官方有看到。。。

from qwen.

logicwong avatar logicwong commented on May 21, 2024

用你们的DEMO,结果跑不起来,炸显存了,难道只能用量化的吗? torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 0; 23.65 GiB total capacity; 20.85 GiB already allocated; 1.26 GiB free; 20.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

您好,可能是默认使用了fp32精度导致OOM?可以试试拉取我们的最新代码,然后使用fp16精度来加载模型?方法如下:

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()

from qwen.

JustinLin610 avatar JustinLin610 commented on May 21, 2024

关键是官方也没给量化模型。。。我单独开了个#18,希望官方有看到。。。

刚才王鹏提了精度问题,打开fp16是一种。量化部分在README有说,看量化章节,只需要加入quantization_config就行

from qwen.

JohnZhuYX avatar JohnZhuYX commented on May 21, 2024

模型重新下载了一边,还是不行啊,难道只能用量化的?。。。。。
Traceback (most recent call last):
File "/home/johnzyx/working/pythonprojects/LLaMA-Efficient-Tuning/src/zyx_QwenDemo.py", line 14, in
response, history = model.chat(tokenizer, "你好", history=None)
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 905, in chat
outputs = self.generate(
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 951, in generate
return super().generate(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1615, in generate
return self.sample(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2737, in sample
outputs = self(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 842, in forward
lm_logits = self.lm_head(hidden_states)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 160, in new_forward
args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 286, in pre_forward
set_module_tensor_to_device(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 298, in set_module_tensor_to_device
new_value = value.to(device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB (GPU 0; 23.65 GiB total capacity; 20.83 GiB already allocated; 1.18 GiB free; 20.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

from qwen.

Louis-y-nlp avatar Louis-y-nlp commented on May 21, 2024

初始化模型的时候加上这个试试:torch_dtype=torch.float16

from qwen.

JohnZhuYX avatar JohnZhuYX commented on May 21, 2024

如果加参数fp=16,像上面的
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval()
也会报错:
Warning: import flash_attn fail, please install FlashAttention https://github.com/Dao-AILab/flash-attention
Traceback (most recent call last):
File "/home/johnzyx/working/pythonprojects/LLaMA-Efficient-Tuning/src/zyx_QwenDemo.py", line 14, in
response, history = model.chat(tokenizer, "你好", history=None)
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 905, in chat
outputs = self.generate(
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 951, in generate
return super().generate(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1615, in generate
return self.sample(
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2750, in sample
next_token_scores = logits_processor(input_ids, next_token_logits)
File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in call
scores = processor(input_ids, scores)
File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/qwen_generation_utils.py", line 349, in call
scores[i, self.eos_token_id] = float(2
30)
RuntimeError: value cannot be converted to type at::Half without overflow

from qwen.

Louis-y-nlp avatar Louis-y-nlp commented on May 21, 2024

你是不是改config.json文件了?我修改config.json可以复现你这个报错。去hf上下一遍新的代码吧。

from qwen.

Louis-y-nlp avatar Louis-y-nlp commented on May 21, 2024

17250MiB / 32510MiB

from qwen.

jackaihfia2334 avatar jackaihfia2334 commented on May 21, 2024

同样报错 ,已经下载了huggingface上最新的config.json
仍然报错 RuntimeError: value cannot be converted to type at::Half without overflow

from qwen.

trexliu avatar trexliu commented on May 21, 2024

同样是楼上的错误,也是24G显存,各种方法都用了。不是OOM就是 overflow

from qwen.

JohnZhuYX avatar JohnZhuYX commented on May 21, 2024

我的config.json
你们看一下
{
"activation": "swiglu",
"apply_residual_connection_post_layernorm": false,
"architectures": [
"QWenLMHeadModel"
],
"auto_map": {
"AutoConfig": "configuration_qwen.QWenConfig",
"AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
},
"attn_pdrop": 0.0,
"bf16": false,
"bias_dropout_fusion": true,
"bos_token_id": 151643,
"embd_pdrop": 0.1,
"eos_token_id": 151643,
"ffn_hidden_size": 22016,
"fp16": false,
"initializer_range": 0.02,
"kv_channels": 128,
"layer_norm_epsilon": 1e-05,
"model_type": "qwen",
"n_embd": 4096,
"n_head": 32,
"n_layer": 32,
"n_positions": 6144,
"no_bias": true,
"onnx_safe": null,
"padded_vocab_size": 151936,
"params_dtype": "torch.bfloat16",
"pos_emb": "rotary",
"resid_pdrop": 0.1,
"rotary_emb_base": 10000,
"rotary_pct": 1.0,
"scale_attn_weights": true,
"seq_length": 2048,
"tie_word_embeddings": false,
"tokenizer_type": "QWenTokenizer",
"transformers_version": "4.31.0",
"use_cache": true,
"use_flash_attn": true,
"vocab_size": 151936,
"use_dynamic_ntk": false,
"use_logn_attn": false
}

from qwen.

Louis-y-nlp avatar Louis-y-nlp commented on May 21, 2024

你把 bf16 改成true可能就能跑了,我刚测试了一下 指定torch_dtype=torch.float16没用,加载的参数还是bf16的。但是奇怪的是v100是不支持bf16的,不知道我这里怎么跑起来的。

from qwen.

jackaihfia2334 avatar jackaihfia2334 commented on May 21, 2024

你是不是改config.json文件了?我修改config.json可以复现你这个报错。去hf上下一遍新的代码吧。

能否分享一下您正确的config.json

from qwen.

Louis-y-nlp avatar Louis-y-nlp commented on May 21, 2024

他这个模型好像只能在bf16下跑,所以要么在config里把fp16设置成false,bf16设置成true,初始化的时候什么都不加,要么两个都设置成false,初始化时加上 torch_dtype=torch.bfloat16 ,我试了一下两种方法都能跑,显存占用都小于20G。

{
  "activation": "swiglu",
  "apply_residual_connection_post_layernorm": false,
  "architectures": [
    "QWenLMHeadModel"
  ],  
  "auto_map": {
    "AutoConfig": "configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "modeling_qwen.QWenLMHeadModel"
  },  
  "attn_pdrop": 0.0,
  "bf16": true,
  "bias_dropout_fusion": true,
  "bos_token_id": 151643,
  "embd_pdrop": 0.1,
  "eos_token_id": 151643,
  "ffn_hidden_size": 22016,
  "fp16": false,
  "initializer_range": 0.02,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-05,
  "model_type": "qwen",
  "n_embd": 4096,
  "n_head": 32, 
  "n_layer": 32, 
  "n_positions": 6144,
  "no_bias": true,
  "onnx_safe": null,
  "padded_vocab_size": 151936,
  "params_dtype": "torch.bfloat16",
  "pos_emb": "rotary",
  "resid_pdrop": 0.1,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 2048,
  "tie_word_embeddings": false,
  "tokenizer_type": "QWenTokenizer",
  "transformers_version": "4.31.0",
  "use_cache": true,
  "use_flash_attn": true,
  "vocab_size": 151936,
  "use_dynamic_ntk": false,
  "use_logn_attn": false
}

用这个应该就能跑

from qwen.

sevenold avatar sevenold commented on May 21, 2024

拉取最新的仓库

显卡:4090 24G
use fp32: OOM
use fp16:
'''
scores[i, self.eos_token_id] = float(2**30)
RuntimeError: value cannot be converted to type at::Half without overflow

'''
use bf16: 正常没问题 17031MiB / 23.99GiB

from qwen.

JohnZhuYX avatar JohnZhuYX commented on May 21, 2024

确实只能用bf16=True或者量化的,fp32是不行了

from qwen.

logicwong avatar logicwong commented on May 21, 2024

如果加参数fp=16,像上面的 model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-7B", device_map="auto", trust_remote_code=True, fp16=True).eval() 也会报错: Warning: import flash_attn fail, please install FlashAttention https://github.com/Dao-AILab/flash-attention Traceback (most recent call last): File "/home/johnzyx/working/pythonprojects/LLaMA-Efficient-Tuning/src/zyx_QwenDemo.py", line 14, in response, history = model.chat(tokenizer, "你好", history=None) File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 905, in chat outputs = self.generate( File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/modeling_qwen.py", line 951, in generate return super().generate( File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 1615, in generate return self.sample( File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/utils.py", line 2750, in sample next_token_scores = logits_processor(input_ids, next_token_logits) File "/home/johnzyx/environment/anaconda-env/python3.10/lib/python3.10/site-packages/transformers/generation/logits_process.py", line 97, in call scores = processor(input_ids, scores) File "/home/johnzyx/.cache/huggingface/modules/transformers_modules/Qwen-7B-Chat/qwen_generation_utils.py", line 349, in call scores[i, self.eos_token_id] = float(230) RuntimeError: value cannot be converted to type at::Half without overflow

感谢各位同学的反馈,这个bug是因为float(2**30)超过了fp16的范围,最新代码修复了这个bug。可以再尝试下

from qwen.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.