Hi, my alpaca is running very slow and i dont know why. i am running it on ubuntu VM,

It is memory bound. model.type mode

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Running very slow about alpaca.cpp HOT 7 CLOSED

reddiamond1234 commented on June 10, 2024

Running very slow

from alpaca.cpp.

Comments (7)

supportend commented on June 10, 2024

How many cpu cores are assigned to the virtual machine? The cpu has 12 cores and 24 threads, you start the program with -t 12? And which model do you use?

from alpaca.cpp.

reddiamond1234 commented on June 10, 2024

How many cpu cores are assigned to the virtual machine? The cpu has 12 cores and 24 threads, you start the program with -t 12? And which model do you use?

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 42 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 22
On-line CPU(s) list: 0-21
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2696 v2 @ 2.50GHz
CPU family: 6
Model: 62
Thread(s) per core: 1
Core(s) per socket: 11
Socket(s): 2
Stepping: 4

i changed code to run on 18 threads, because it always defaults to 4, even with 18 threads is slow

from alpaca.cpp.

supportend commented on June 10, 2024

My cpu has 16 threads, but only 8 physical cores, with 8 threads it's faster on my system. Do you use the 7B, 13B or 30B model and how much time per token?
You can get it like this:

./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long."
...
634.18 ms per token

Adjust the model filename/path and the threads. On my system the text generation with the 30b model is not fast too.

from alpaca.cpp.

reddiamond1234 commented on June 10, 2024

My cpu has 16 threads, but only 8 physical cores, with 8 threads it's faster on my system. Do you use the 7B, 13B or 30B model and how much time per token? You can get it like this:
./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long."
...
634.18 ms per token
Adjust the model filename/path and the threads. On my system the text generation with the 30b model is not fast too.

I'm using 7B version. here is same 'prompt' you had (./chat -m ggml-alpaca-7b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long.")

main: mem per token = 14434244 bytes
main: load time = 4293.24 ms
main: sample time = 172.82 ms
main: predict time = 372913.59 ms / 2155.57 ms per token
main: total time = 386418.75 ms

from alpaca.cpp.

gjmulder commented on June 10, 2024

It is memory bound.

model.type	model.size	quantization	blas	context.size	eta	perplexity	efficiency
llama	7B	q4_0	1	2048	0.96	5.56	0.19
llama	7B	q4_0	1	1024	0.67	5.71	0.26
alpaca	7B	q4_0	1	2048	0.95	5.77	0.18
alpaca	7B	q4_0	1	1024	0.67	5.93	0.25
llama	7B	q4_0	1	512	0.53	6.46	0.29
alpaca	7B	q4_0	1	512	0.53	6.65	0.28

YMMV with alpacas.

from alpaca.cpp.

MaskyS commented on June 10, 2024

@reddiamond1234, have you tried compiling from source? See #88 .

Went from 4868.26 ms per token to 890.21 ms per token for me when testing ./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long." (7B model)

from alpaca.cpp.

reddiamond1234 commented on June 10, 2024

@reddiamond1234, have you tried compiling from source? See #88 .

Went from 4868.26 ms per token to 890.21 ms per token for me when testing ./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long." (7B model)

problem was with my CPU, there are no AVX and it makes it slow.

from alpaca.cpp.

Running very slow about alpaca.cpp HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent