Comments (7)
How many cpu cores are assigned to the virtual machine? The cpu has 12 cores and 24 threads, you start the program with -t 12? And which model do you use?
from alpaca.cpp.
How many cpu cores are assigned to the virtual machine? The cpu has 12 cores and 24 threads, you start the program with -t 12? And which model do you use?
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 42 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 22
On-line CPU(s) list: 0-21
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2696 v2 @ 2.50GHz
CPU family: 6
Model: 62
Thread(s) per core: 1
Core(s) per socket: 11
Socket(s): 2
Stepping: 4
i changed code to run on 18 threads, because it always defaults to 4, even with 18 threads is slow
from alpaca.cpp.
My cpu has 16 threads, but only 8 physical cores, with 8 threads it's faster on my system. Do you use the 7B, 13B or 30B model and how much time per token?
You can get it like this:
./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long."
...
634.18 ms per token
Adjust the model filename/path and the threads. On my system the text generation with the 30b model is not fast too.
from alpaca.cpp.
My cpu has 16 threads, but only 8 physical cores, with 8 threads it's faster on my system. Do you use the 7B, 13B or 30B model and how much time per token? You can get it like this:
./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long." ... 634.18 ms per token
Adjust the model filename/path and the threads. On my system the text generation with the 30b model is not fast too.
I'm using 7B version. here is same 'prompt' you had (./chat -m ggml-alpaca-7b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long.")
main: mem per token = 14434244 bytes
main: load time = 4293.24 ms
main: sample time = 172.82 ms
main: predict time = 372913.59 ms / 2155.57 ms per token
main: total time = 386418.75 ms
from alpaca.cpp.
It is memory bound.
model.type | model.size | quantization | blas | context.size | eta | perplexity | efficiency |
---|---|---|---|---|---|---|---|
llama | 7B | q4_0 | 1 | 2048 | 0.96 | 5.56 | 0.19 |
llama | 7B | q4_0 | 1 | 1024 | 0.67 | 5.71 | 0.26 |
alpaca | 7B | q4_0 | 1 | 2048 | 0.95 | 5.77 | 0.18 |
alpaca | 7B | q4_0 | 1 | 1024 | 0.67 | 5.93 | 0.25 |
llama | 7B | q4_0 | 1 | 512 | 0.53 | 6.46 | 0.29 |
alpaca | 7B | q4_0 | 1 | 512 | 0.53 | 6.65 | 0.28 |
YMMV with alpacas.
from alpaca.cpp.
@reddiamond1234, have you tried compiling from source? See #88 .
Went from 4868.26 ms per token
to 890.21 ms per token
for me when testing ./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long."
(7B model)
from alpaca.cpp.
@reddiamond1234, have you tried compiling from source? See #88 .
Went from
4868.26 ms per token
to890.21 ms per token
for me when testing./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long."
(7B model)
problem was with my CPU, there are no AVX and it makes it slow.
from alpaca.cpp.
Related Issues (20)
- windows 10, fails to get to chat prompt HOT 8
- Issues with Monte Carlo Tree Search on alpaca.cpp
- CUDA support? HOT 3
- Input Handling HOT 1
- Segmentation Fault with 7b on Raspberry Pi 3 HOT 5
- ### Instruction: ### Response: HOT 3
- Random seed not working correctly, "too many inputs", chinese symbols?
- How to use with python HOT 1
- Can i use it on linux with nvidia gpu HOT 4
- minimum specs
- Does not perceive other languages HOT 5
- Are there any plans to add support for 13B and beyond? HOT 7
- Redpajama dataset?
- using ggml-alpaca-13b-q4.bin HOT 1
- Archive Repository HOT 1
- main: failed to load model from 'ggml-alpaca-7b-q4.bin' HOT 4
- llama_model_load: failed to open 'ggml-alpaca-7b-native-q4.bin' HOT 1
- HuggingFaceH4/starchat-alpha Cpp LLM
- Adding risc-v architecture support HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alpaca.cpp.