Giter VIP home page Giter VIP logo

Comments (13)

Uxito-Ada avatar Uxito-Ada commented on June 26, 2024

@ivy-lv11 pls take a look at this

from bigdl.

Uxito-Ada avatar Uxito-Ada commented on June 26, 2024

transformers: 4.38.1/4.37.0
torch: 2.2.0+cpu
ipex: 2.2.0

from bigdl.

jason-dai avatar jason-dai commented on June 26, 2024

transformers: 4.38.1/4.37.0 torch: 2.2.0+cpu ipex: 2.2.0

If BF16 output is wrong, you can verify stock pytorch first (without BigDL).

from bigdl.

ivy-lv11 avatar ivy-lv11 commented on June 26, 2024

Environment:

  • transformers: 4.38.1/4.37.0;
  • torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers, pytorch and remove low_bit, the output looks normal.
prompt: 患者

1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐跟着二仙,转过山坡,来到一座楼前,只见门额上写着“薄命司”三个字。和尚说:“这里就是咱们要办的事了。”\n士隐随着和尚进了楼,只见里面摆着许多签筒,签筒里装着各色签子。和尚说:“你抽一支签,看看你的命运如何。”士隐随手拿起一支签,签上写着:“甄士隐梦幻识通灵,贾雨村风尘怀闺秀。”

from bigdl.

Uxito-Ada avatar Uxito-Ada commented on June 26, 2024

Environment:

  • transformers: 4.38.1/4.37.0;
  • torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers and pytorch, the output looks normal. prompt: 患者

1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐跟着二仙,转过山坡,来到一座楼前,只见门额上写着“薄命司”三个字。和尚说:“这里就是咱们要办的事了。”\n士隐随着和尚进了楼,只见里面摆着许多签筒,签筒里装着各色签子。和尚说:“你抽一支签,看看你的命运如何。”士隐随手拿起一支签,签上写着:“甄士隐梦幻识通灵,贾雨村风尘怀闺秀。”

what is the torch version? torch==2.2.0?

from bigdl.

ivy-lv11 avatar ivy-lv11 commented on June 26, 2024

Environment:

  • transformers: 4.38.1/4.37.0;
  • torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers and pytorch, the output looks normal. prompt: 患者

1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐跟着二仙,转过山坡,来到一座楼前,只见门额上写着“薄命司”三个字。和尚说:“这里就是咱们要办的事了。”\n士隐随着和尚进了楼,只见里面摆着许多签筒,签筒里装着各色签子。和尚说:“你抽一支签,看看你的命运如何。”士隐随手拿起一支签,签上写着:“甄士隐梦幻识通灵,贾雨村风尘怀闺秀。”

what is the torch version? torch==2.2.0?

Yes.

from bigdl.

Uxito-Ada avatar Uxito-Ada commented on June 26, 2024

Environment:

  • transformers: 4.38.1/4.37.0;
  • torch: 2.2.0+cpu;

Chinese

When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers, pytorch and remove low_bit, the output looks normal. prompt: 患者

1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血

prompt: 红楼梦

他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐跟着二仙,转过山坡,来到一座楼前,只见门额上写着“薄命司”三个字。和尚说:“这里就是咱们要办的事了。”\n士隐随着和尚进了楼,只见里面摆着许多签筒,签筒里装着各色签子。和尚说:“你抽一支签,看看你的命运如何。”士隐随手拿起一支签,签上写着:“甄士隐梦幻识通灵,贾雨村风尘怀闺秀。”

removing load_in_low_bit and optimize_model runs FP32. If FP32 gave normal outputs, the issue can be related to INT4, which can be compared with Llama.cpp etc. And BF16 can be compared with native Pytorch BF16 support.

from bigdl.

ivy-lv11 avatar ivy-lv11 commented on June 26, 2024

Use transformers and bf16 by pytorch_autocast_bf16 API in all-in-one benchmark : the output also looks normal.

他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐随着二仙,转过山坡,来到一座楼前,只见一位仙姑端坐在楼上,旁边有一个丫鬟捧着茶盘。仙姑见了士隐,笑道:“甄士隐,你来了。”士隐忙施礼,问道:“仙姑如何认得我?”仙姑说:“你忘了,我在警幻仙子处见过你,还赠过你《好了歌》呢。”士隐这才想起,忙问仙姑:“仙姑为何赠我《好了歌
1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些化学物质或药物中毒等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性炎症、慢性粒细胞白血

from bigdl.

Uxito-Ada avatar Uxito-Ada commented on June 26, 2024

After disabling overriding of qwen2 attention forward (qwen1.5 enjoys a model type of qwen2) in convert.py, normal answer can be generated on SPR:

两旁是一副对联:\n假作真时真亦假,无为有处有还无。\n二人进了里面,见是一座楼阁,楼内挂着“薄命司”的牌子。士隐抬头一看,见里面挂着许多签,签上写着名字,旁边注着诗句和判词。他见签上有个“甄英莲”的名字,就抽出来看,上面写着:\n娇嫩花朵偏遭风雨,聪明女儿薄命终身。\n原是仙家遗种,却落在草莽人家。生于富贵,却死于贫贱。这是她的命,无可奈何。士隐看了,叹了一口气,把签放下。又见一个签上写着“贾

Need to check what is wrong in qwen2_attention_forward_origin.

from bigdl.

ivy-lv11 avatar ivy-lv11 commented on June 26, 2024

Test BigDL-LLM 2.5.0b20240311

Envirionment:

  • bigdl-llm version: 2.5.0b20240311
  • transformers version: 4.37.0
  • torch version: 2.1.0a0+cxx11.abi

On arc the output looks normal:

1)正常生理情况下,中性粒细胞比例偏高,提示有感染或炎症;2)单核细胞比例偏高,提示有慢性炎症、结核病、白血病等。

However, when running on CPU the output still looks abnormal.

临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断

from bigdl.

Uxito-Ada avatar Uxito-Ada commented on June 26, 2024

It is found CPU uses different attention module from GPU, Qwen2SdpaAttention, which applies scaled dot product on qkv, and if converted with Qwen2Attention will never give right output.

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (down_proj): LowBitLinear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=151936, bias=False)
)

from bigdl.

ivy-lv11 avatar ivy-lv11 commented on June 26, 2024

Model architecture

GPU

Use Qwen2attention

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2Attention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=True)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=151936, bias=False)
)

CPU

use Qwen2SdpaAttention

Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151936, 4096)
    (layers): ModuleList(
      (0-31): 32 x Qwen2DecoderLayer(
        (self_attn): Qwen2SdpaAttention(
          (q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
          (o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (up_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
          (down_proj): LowBitLinear(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm()
        (post_attention_layernorm): Qwen2RMSNorm()
      )
    )
    (norm): Qwen2RMSNorm()
  )
  (lm_head): LowBitLinear(in_features=4096, out_features=151936, bias=False)
)

from bigdl.

Uxito-Ada avatar Uxito-Ada commented on June 26, 2024

Fixed in #10395 and #10409, and new cpu performance data is here.

from bigdl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.