Comments (3)
Hi, guys. I notice that BigDL utilizes BigDL nano and ggml to accelerate int8/int4 computations. I wonder how to invoke these APIs in LLMs like LLAMA. Specifically, I want to accelerate the linear layers in Huggingface-version LLAMA (torch-based).
We are not using nano or ggml in bigdl-llm; see examples at https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model
from bigdl.
Thanks, i got it. BTW, do you have the README on using BigDL optimized int4/int8 quantized computation library? Maybe the use case of quantized matmul in Linear layers in models like LLAMA?
from bigdl.
If you are using HuggingFace transformers to load your LLAMA model, you can refer to the llama2 example here https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2
If you are using customized code to load LLAMA model, you can use optimize_model
to optimize it. optimize_model
can be applied to arbitrary pytorch models w/ low-bit optimizations. Refer to this example for how to use it: https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/CPU/PyTorch-Models/Model/llama2/generate.py#L49
from bigdl.
Related Issues (20)
- Feature request: Support fp16 with self-speculative decoding on XPU in ipex_llm.serving.fastchat.ipex_llm_worker HOT 5
- phi-3-mini support HOT 1
- IndexError: list index out of range when ipex_fp16_gpu test_api is used in all-in-one HOT 2
- Fastchat serving embeddings? HOT 4
- unable to run inference in linux environment HOT 9
- Performance drop for neural-chat 7b with new repo of ipex-llm(2.5.0b20240425) vllm serving. HOT 22
- 2nd latency of llama3-8B-Instruct with int4 & all-in-one tool issue HOT 1
- Unable to invoke the torch installed via the setup tutorial. HOT 2
- can not find gpu with linux system HOT 4
- MTL 165H ubuntu22.04 can't benchmark qwen/Qwen-7B-Chat HOT 1
- Docker image (intelanalytics/ipex-llm-xpu): Documentation stated I would need to disable iGPU to use A770. When will you fix this issue since disabling iGPU is problematic? HOT 6
- IPEX-LLM on Intel Max Series 1100 for inference libintel-ext-pt-gpu.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev HOT 7
- Speech T5 on XPU on Intel Arc GPU 770 taking 8 seconds and for CPU it takes 3 seconds ?? HOT 2
- Phi-3 model performance on MeteorLake GPU HOT 2
- Main Memory continued decline with ipex-llm for local LLM inference on Intel Arc GPU. HOT 1
- stable version release requirement for arc GPU
- Crash when using llama.dll HOT 1
- ipex-llm Llama.cpp port inside ipex-llm Docker containers getting SIGBUS HOT 4
- Not able to profile LLAMA2 on iGFX (windows) HOT 1
- failed to run piqa test with sym_int4 precison by harness HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bigdl.