Giter VIP home page Giter VIP logo

Comments (7)

supportend avatar supportend commented on June 10, 2024

How many cpu cores are assigned to the virtual machine? The cpu has 12 cores and 24 threads, you start the program with -t 12? And which model do you use?

from alpaca.cpp.

reddiamond1234 avatar reddiamond1234 commented on June 10, 2024

How many cpu cores are assigned to the virtual machine? The cpu has 12 cores and 24 threads, you start the program with -t 12? And which model do you use?

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 42 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 22
On-line CPU(s) list: 0-21
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2696 v2 @ 2.50GHz
CPU family: 6
Model: 62
Thread(s) per core: 1
Core(s) per socket: 11
Socket(s): 2
Stepping: 4

i changed code to run on 18 threads, because it always defaults to 4, even with 18 threads is slow

from alpaca.cpp.

supportend avatar supportend commented on June 10, 2024

My cpu has 16 threads, but only 8 physical cores, with 8 threads it's faster on my system. Do you use the 7B, 13B or 30B model and how much time per token?
You can get it like this:

./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long."
...
634.18 ms per token

Adjust the model filename/path and the threads. On my system the text generation with the 30b model is not fast too.

from alpaca.cpp.

reddiamond1234 avatar reddiamond1234 commented on June 10, 2024

My cpu has 16 threads, but only 8 physical cores, with 8 threads it's faster on my system. Do you use the 7B, 13B or 30B model and how much time per token? You can get it like this:

./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long."
...
634.18 ms per token

Adjust the model filename/path and the threads. On my system the text generation with the 30b model is not fast too.

I'm using 7B version. here is same 'prompt' you had (./chat -m ggml-alpaca-7b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long.")

main: mem per token = 14434244 bytes
main: load time = 4293.24 ms
main: sample time = 172.82 ms
main: predict time = 372913.59 ms / 2155.57 ms per token
main: total time = 386418.75 ms

from alpaca.cpp.

gjmulder avatar gjmulder commented on June 10, 2024

It is memory bound.

model.type model.size quantization blas context.size eta perplexity efficiency
llama 7B q4_0 1 2048 0.96 5.56 0.19
llama 7B q4_0 1 1024 0.67 5.71 0.26
alpaca 7B q4_0 1 2048 0.95 5.77 0.18
alpaca 7B q4_0 1 1024 0.67 5.93 0.25
llama 7B q4_0 1 512 0.53 6.46 0.29
alpaca 7B q4_0 1 512 0.53 6.65 0.28

YMMV with alpacas.

from alpaca.cpp.

MaskyS avatar MaskyS commented on June 10, 2024

@reddiamond1234, have you tried compiling from source? See #88 .

Went from 4868.26 ms per token to 890.21 ms per token for me when testing ./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long." (7B model)

from alpaca.cpp.

reddiamond1234 avatar reddiamond1234 commented on June 10, 2024

@reddiamond1234, have you tried compiling from source? See #88 .

Went from 4868.26 ms per token to 890.21 ms per token for me when testing ./chat -m ggml-alpaca-30b-q4.bin --color -t 8 --temp 0.8 -p "Write a text about Linux, 50 words long." (7B model)

problem was with my CPU, there are no AVX and it makes it slow.

from alpaca.cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.