code: all-in-one benchmark, where prmopt/2048.txt is replaced with th

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

transformers: 4.38.1/4.37.0 torch: 2.2.0+cpu ipex: 2.2.0 </blockquote

Environment: transformers: 4.38.1/4.37.0; <l

Test BigDL-LLM 2.5.0b20240311 Envirionment: <l

Qwen1.5-7B wrong outputs with 1024 prompts about bigdl HOT 13 CLOSED

Uxito-Ada commented on June 26, 2024

Qwen1.5-7B wrong outputs with 1024 prompts

from bigdl.

Comments (13)

Uxito-Ada commented on June 26, 2024

@ivy-lv11 pls take a look at this

from bigdl.

Uxito-Ada commented on June 26, 2024

transformers: 4.38.1/4.37.0
torch: 2.2.0+cpu
ipex: 2.2.0

from bigdl.

jason-dai commented on June 26, 2024

transformers: 4.38.1/4.37.0 torch: 2.2.0+cpu ipex: 2.2.0

If BF16 output is wrong, you can verify stock pytorch first (without BigDL).

from bigdl.

ivy-lv11 commented on June 26, 2024

Environment:

transformers: 4.38.1/4.37.0;

torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers, pytorch and remove low_bit, the output looks normal.
prompt: 患者

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐跟着二仙，转过山坡，来到一座楼前，只见门额上写着“薄命司”三个字。和尚说：“这里就是咱们要办的事了。”\n士隐随着和尚进了楼，只见里面摆着许多签筒，签筒里装着各色签子。和尚说：“你抽一支签，看看你的命运如何。”士隐随手拿起一支签，签上写着：“甄士隐梦幻识通灵，贾雨村风尘怀闺秀。”

from bigdl.

Uxito-Ada commented on June 26, 2024

Environment:

transformers: 4.38.1/4.37.0;

torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers and pytorch, the output looks normal. prompt: 患者

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐跟着二仙，转过山坡，来到一座楼前，只见门额上写着“薄命司”三个字。和尚说：“这里就是咱们要办的事了。”\n士隐随着和尚进了楼，只见里面摆着许多签筒，签筒里装着各色签子。和尚说：“你抽一支签，看看你的命运如何。”士隐随手拿起一支签，签上写着：“甄士隐梦幻识通灵，贾雨村风尘怀闺秀。”

what is the torch version? torch==2.2.0?

from bigdl.

ivy-lv11 commented on June 26, 2024

Environment:

transformers: 4.38.1/4.37.0;

torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers and pytorch, the output looks normal. prompt: 患者

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐跟着二仙，转过山坡，来到一座楼前，只见门额上写着“薄命司”三个字。和尚说：“这里就是咱们要办的事了。”\n士隐随着和尚进了楼，只见里面摆着许多签筒，签筒里装着各色签子。和尚说：“你抽一支签，看看你的命运如何。”士隐随手拿起一支签，签上写着：“甄士隐梦幻识通灵，贾雨村风尘怀闺秀。”

what is the torch version? torch==2.2.0?

Yes.

from bigdl.

Uxito-Ada commented on June 26, 2024

Environment:

transformers: 4.38.1/4.37.0;

torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers, pytorch and remove low_bit, the output looks normal. prompt: 患者

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐跟着二仙，转过山坡，来到一座楼前，只见门额上写着“薄命司”三个字。和尚说：“这里就是咱们要办的事了。”\n士隐随着和尚进了楼，只见里面摆着许多签筒，签筒里装着各色签子。和尚说：“你抽一支签，看看你的命运如何。”士隐随手拿起一支签，签上写着：“甄士隐梦幻识通灵，贾雨村风尘怀闺秀。”

removing load_in_low_bit and optimize_model runs FP32. If FP32 gave normal outputs, the issue can be related to INT4, which can be compared with Llama.cpp etc. And BF16 can be compared with native Pytorch BF16 support.

from bigdl.

ivy-lv11 commented on June 26, 2024

Use transformers and bf16 by pytorch_autocast_bf16 API in all-in-one benchmark : the output also looks normal.

他们进了园门，但见异彩纷呈，楼阁参差，真是仙境。士隐随着二仙，转过山坡，来到一座楼前，只见一位仙姑端坐在楼上，旁边有一个丫鬟捧着茶盘。仙姑见了士隐，笑道：“甄士隐，你来了。”士隐忙施礼，问道：“仙姑如何认得我?”仙姑说：“你忘了，我在警幻仙子处见过你，还赠过你《好了歌》呢。”士隐这才想起，忙问仙姑：“仙姑为何赠我《好了歌

1）中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等；2）嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等；3）淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些化学物质或药物中毒等；4）单核细胞比例增高见于某些感染、血液系统疾病、急性炎症、慢性粒细胞白血

from bigdl.

Uxito-Ada commented on June 26, 2024

After disabling overriding of qwen2 attention forward (qwen1.5 enjoys a model type of qwen2) in convert.py, normal answer can be generated on SPR:

两旁是一副对联：\n假作真时真亦假，无为有处有还无。\n二人进了里面，见是一座楼阁，楼内挂着“薄命司”的牌子。士隐抬头一看，见里面挂着许多签，签上写着名字，旁边注着诗句和判词。他见签上有个“甄英莲”的名字，就抽出来看，上面写着：\n娇嫩花朵偏遭风雨，聪明女儿薄命终身。\n原是仙家遗种，却落在草莽人家。生于富贵，却死于贫贱。这是她的命，无可奈何。士隐看了，叹了一口气，把签放下。又见一个签上写着“贾

Need to check what is wrong in qwen2_attention_forward_origin.

from bigdl.

ivy-lv11 commented on June 26, 2024

Test BigDL-LLM 2.5.0b20240311

Envirionment:

bigdl-llm version: 2.5.0b20240311
transformers version: 4.37.0
torch version: 2.1.0a0+cxx11.abi

On arc the output looks normal:

1）正常生理情况下，中性粒细胞比例偏高，提示有感染或炎症；2）单核细胞比例偏高，提示有慢性炎症、结核病、白血病等。

However, when running on CPU the output still looks abnormal.

临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断

from bigdl.

Uxito-Ada commented on June 26, 2024

It is found CPU uses different attention module from GPU, Qwen2SdpaAttention, which applies scaled dot product on qkv, and if converted with Qwen2Attention will never give right output.

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (down_proj): LowBitLinear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=151936, bias=False)
)

from bigdl.

ivy-lv11 commented on June 26, 2024

Model architecture

GPU

Use Qwen2attention

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=151936, bias=False)
)

CPU

use Qwen2SdpaAttention

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (down_proj): LowBitLinear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=151936, bias=False)
)

from bigdl.

Uxito-Ada commented on June 26, 2024

Fixed in #10395 and #10409, and new cpu performance data is here.

from bigdl.

Qwen1.5-7B wrong outputs with 1024 prompts about bigdl HOT 13 CLOSED

Comments (13)

Chinese

Chinese

Chinese

Chinese

Test BigDL-LLM 2.5.0b20240311

Model architecture

GPU

CPU

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent