Comments (13)
@ivy-lv11 pls take a look at this
from bigdl.
transformers: 4.38.1/4.37.0
torch: 2.2.0+cpu
ipex: 2.2.0
from bigdl.
transformers: 4.38.1/4.37.0 torch: 2.2.0+cpu ipex: 2.2.0
If BF16 output is wrong, you can verify stock pytorch first (without BigDL).
from bigdl.
Environment:
- transformers: 4.38.1/4.37.0;
- torch: 2.2.0+cpu;
Chinese
When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers, pytorch and remove low_bit, the output looks normal.
prompt: 患者
1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血
prompt: 红楼梦
他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐跟着二仙,转过山坡,来到一座楼前,只见门额上写着“薄命司”三个字。和尚说:“这里就是咱们要办的事了。”\n士隐随着和尚进了楼,只见里面摆着许多签筒,签筒里装着各色签子。和尚说:“你抽一支签,看看你的命运如何。”士隐随手拿起一支签,签上写着:“甄士隐梦幻识通灵,贾雨村风尘怀闺秀。”
from bigdl.
Environment:
- transformers: 4.38.1/4.37.0;
- torch: 2.2.0+cpu;
Chinese
When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers and pytorch, the output looks normal. prompt: 患者
1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血
prompt: 红楼梦
他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐跟着二仙,转过山坡,来到一座楼前,只见门额上写着“薄命司”三个字。和尚说:“这里就是咱们要办的事了。”\n士隐随着和尚进了楼,只见里面摆着许多签筒,签筒里装着各色签子。和尚说:“你抽一支签,看看你的命运如何。”士隐随手拿起一支签,签上写着:“甄士隐梦幻识通灵,贾雨村风尘怀闺秀。”
what is the torch version? torch==2.2.0?
from bigdl.
Environment:
- transformers: 4.38.1/4.37.0;
- torch: 2.2.0+cpu;
Chinese
When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers and pytorch, the output looks normal. prompt: 患者
1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血
prompt: 红楼梦
他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐跟着二仙,转过山坡,来到一座楼前,只见门额上写着“薄命司”三个字。和尚说:“这里就是咱们要办的事了。”\n士隐随着和尚进了楼,只见里面摆着许多签筒,签筒里装着各色签子。和尚说:“你抽一支签,看看你的命运如何。”士隐随手拿起一支签,签上写着:“甄士隐梦幻识通灵,贾雨村风尘怀闺秀。”
what is the torch version? torch==2.2.0?
Yes.
from bigdl.
Environment:
- transformers: 4.38.1/4.37.0;
- torch: 2.2.0+cpu;
Chinese
When using the 2048 prompt (2048 prompts are truncated to 1024) with original transformers, pytorch and remove low_bit, the output looks normal. prompt: 患者
1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些药物反应等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性白血病、恶性肿瘤、类白血
prompt: 红楼梦
他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐跟着二仙,转过山坡,来到一座楼前,只见门额上写着“薄命司”三个字。和尚说:“这里就是咱们要办的事了。”\n士隐随着和尚进了楼,只见里面摆着许多签筒,签筒里装着各色签子。和尚说:“你抽一支签,看看你的命运如何。”士隐随手拿起一支签,签上写着:“甄士隐梦幻识通灵,贾雨村风尘怀闺秀。”
removing load_in_low_bit and optimize_model runs FP32. If FP32 gave normal outputs, the issue can be related to INT4, which can be compared with Llama.cpp etc. And BF16 can be compared with native Pytorch BF16 support.
from bigdl.
Use transformers and bf16 by pytorch_autocast_bf16
API in all-in-one benchmark : the output also looks normal.
他们进了园门,但见异彩纷呈,楼阁参差,真是仙境。士隐随着二仙,转过山坡,来到一座楼前,只见一位仙姑端坐在楼上,旁边有一个丫鬟捧着茶盘。仙姑见了士隐,笑道:“甄士隐,你来了。”士隐忙施礼,问道:“仙姑如何认得我?”仙姑说:“你忘了,我在警幻仙子处见过你,还赠过你《好了歌》呢。”士隐这才想起,忙问仙姑:“仙姑为何赠我《好了歌
1)中性粒细胞比例增高常见于急性感染、严重组织损伤、白血病、恶性肿瘤、类白血病反应、骨髓增殖性疾病等;2)嗜酸性粒细胞比例增高见于寄生虫感染、过敏反应、皮肤病、慢性粒细胞白血病、嗜酸粒细胞增多症等;3)淋巴细胞比例增高见于病毒感染、结缔组织病、免疫缺陷病、血液系统疾病、某些化学物质或药物中毒等;4)单核细胞比例增高见于某些感染、血液系统疾病、急性炎症、慢性粒细胞白血
from bigdl.
After disabling overriding of qwen2 attention forward (qwen1.5 enjoys a model type of qwen2) in convert.py, normal answer can be generated on SPR:
两旁是一副对联:\n假作真时真亦假,无为有处有还无。\n二人进了里面,见是一座楼阁,楼内挂着“薄命司”的牌子。士隐抬头一看,见里面挂着许多签,签上写着名字,旁边注着诗句和判词。他见签上有个“甄英莲”的名字,就抽出来看,上面写着:\n娇嫩花朵偏遭风雨,聪明女儿薄命终身。\n原是仙家遗种,却落在草莽人家。生于富贵,却死于贫贱。这是她的命,无可奈何。士隐看了,叹了一口气,把签放下。又见一个签上写着“贾
Need to check what is wrong in qwen2_attention_forward_origin.
from bigdl.
Test BigDL-LLM 2.5.0b20240311
Envirionment:
- bigdl-llm version: 2.5.0b20240311
- transformers version: 4.37.0
- torch version: 2.1.0a0+cxx11.abi
On arc the output looks normal:
1)正常生理情况下,中性粒细胞比例偏高,提示有感染或炎症;2)单核细胞比例偏高,提示有慢性炎症、结核病、白血病等。
However, when running on CPU the output still looks abnormal.
临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断临床实验室实验室诊断
from bigdl.
It is found CPU uses different attention module from GPU, Qwen2SdpaAttention
, which applies scaled dot product on qkv, and if converted with Qwen2Attention
will never give right output.
Qwen2ForCausalLM(
(model): Qwen2Model(
(embed_tokens): Embedding(151936, 4096)
(layers): ModuleList(
(0-31): 32 x Qwen2DecoderLayer(
(self_attn): Qwen2SdpaAttention(
(q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
(k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
(v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
(o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): Qwen2RotaryEmbedding()
)
(mlp): Qwen2MLP(
(gate_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
(up_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
(down_proj): LowBitLinear(in_features=11008, out_features=4096, bias=False)
(act_fn): SiLU()
)
(input_layernorm): Qwen2RMSNorm()
(post_attention_layernorm): Qwen2RMSNorm()
)
)
(norm): Qwen2RMSNorm()
)
(lm_head): LowBitLinear(in_features=4096, out_features=151936, bias=False)
)
from bigdl.
Model architecture
GPU
Use Qwen2attention
Qwen2ForCausalLM(
(model): Qwen2Model(
(embed_tokens): Embedding(151936, 4096)
(layers): ModuleList(
(0-31): 32 x Qwen2DecoderLayer(
(self_attn): Qwen2Attention(
(q_proj): Linear(in_features=4096, out_features=4096, bias=True)
(k_proj): Linear(in_features=4096, out_features=4096, bias=True)
(v_proj): Linear(in_features=4096, out_features=4096, bias=True)
(o_proj): Linear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): Qwen2RotaryEmbedding()
)
(mlp): Qwen2MLP(
(gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
(up_proj): Linear(in_features=4096, out_features=11008, bias=False)
(down_proj): Linear(in_features=11008, out_features=4096, bias=False)
(act_fn): SiLU()
)
(input_layernorm): Qwen2RMSNorm()
(post_attention_layernorm): Qwen2RMSNorm()
)
)
(norm): Qwen2RMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=151936, bias=False)
)
CPU
use Qwen2SdpaAttention
Qwen2ForCausalLM(
(model): Qwen2Model(
(embed_tokens): Embedding(151936, 4096)
(layers): ModuleList(
(0-31): 32 x Qwen2DecoderLayer(
(self_attn): Qwen2SdpaAttention(
(q_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
(k_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
(v_proj): LowBitLinear(in_features=4096, out_features=4096, bias=True)
(o_proj): LowBitLinear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): Qwen2RotaryEmbedding()
)
(mlp): Qwen2MLP(
(gate_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
(up_proj): LowBitLinear(in_features=4096, out_features=11008, bias=False)
(down_proj): LowBitLinear(in_features=11008, out_features=4096, bias=False)
(act_fn): SiLU()
)
(input_layernorm): Qwen2RMSNorm()
(post_attention_layernorm): Qwen2RMSNorm()
)
)
(norm): Qwen2RMSNorm()
)
(lm_head): LowBitLinear(in_features=4096, out_features=151936, bias=False)
)
from bigdl.
Fixed in #10395 and #10409, and new cpu performance data is here.
from bigdl.
Related Issues (20)
- Is there a way to run ollama with IPEX-LLM on CPU HOT 1
- ubuntu 22.04 MTL 165h benchmark Aborted (core dumped) HOT 2
- IPEX-LLM with Langchain-chatchat runs into httpcore.RemoteProtocolError in MTL with iGPU HOT 2
- IPEX-LLM(llama.cpp) met core dump when run Qwen-7B-Q4_K_M.gguf on Intel ARC770 HOT 4
- Quantized model loading method expects the model should be locally available. HOT 2
- Ollama Linux seg fault with GPU on Ubuntu 22.04 HOT 3
- Ollama on Windows not working HOT 4
- GLM-4-9B-Chat missing 'import math' HOT 1
- Error running llama.cpp with IPEX-LLM on MTL iGPU following quickstart guide (Native API returns: -30 (PI_ERROR_INVALID_VALUE)) HOT 16
- [utils] invalidInputError during RuntimeError HOT 8
- [qwen2][windows][MTL] 6-8k input token OOM HOT 3
- Qwen-7b int8 inference speed is too slow, only 3-4 tokens/s. HOT 2
- libze_loader.so.1: cannot open shared object file: No such file or directory HOT 1
- RuntimeError: could not create a primitive HOT 3
- run GLM4-9b-chat on MTL iGPU get error: pyo3_runtime.PanicException: index out of bounds: HOT 2
- Could you help to enable GLM-4v-9b? HOT 2
- llama2 Segmentation fault (core dumped) on i7-9700K HOT 4
- Can faster whisper run on Intel iGPU or Arc DGPU? HOT 2
- vllm converting model to sym_int4 even when --load-in-low-bit sym_int4 not set HOT 3
- Cannot find dGPU when install ollama on Windows HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigdl.