Giter VIP home page Giter VIP logo

Comments (26)

nsarrazin avatar nsarrazin commented on July 17, 2024 2

I'll have a look at this tonight, thanks for reporting!

from serge.

panicsteve avatar panicsteve commented on July 17, 2024 1

In the case where it's still happening, can you run the following:

docker compose up -d
docker compose exec api bash
llama -m weights/your_model.bin

see if it loads & starts outputting stuff. If it doesn't please send me the traceback.

7B model works for me, but 13B is not happy:

root@8730b2d5f0fd:/usr/src/app# llama -m weights/ggml-alpaca-13B-q4_0.bin
main: seed = 1679613493
llama_model_load: loading model from 'weights/ggml-alpaca-13B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from 'weights/ggml-alpaca-13B-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
llama_init_from_file: failed to load model
main: error: failed to load model 'weights/ggml-alpaca-13B-q4_0.bin'

(MacBook Pro M1 Max, 32 GB RAM)

from serge.

qn1213 avatar qn1213 commented on July 17, 2024 1

7B

run commnad : llama -m weights/ggml-alpaca-7B-q4_0.bin
main: seed = 1679625493
llama_model_load: loading model from 'weights/ggml-alpaca-7B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'weights/ggml-alpaca-7B-q4_0.bin'
llama_model_load: ..............................Killed`

13B

run command : llama -m weights/ggml-alpaca-13B-q4_0.bin
main: seed = 1679625532
llama_model_load: loading model from 'weights/ggml-alpaca-13B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 5120
llama_model_load: n_mult = 256
llama_model_load: n_head = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size = 800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from 'weights/ggml-alpaca-13B-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
llama_init_from_file: failed to load model
main: error: failed to load model 'weights/ggml-alpaca-13B-q4_0.bin'`

I'm using docker on Mac mini 2018 (i5 Series)

from serge.

nsarrazin avatar nsarrazin commented on July 17, 2024 1

Indeed low RAM is often the issue, but no need for graphics card on this repo. I updated the RAM requirements in the README, make sure you have enough free !

from serge.

maxstanger avatar maxstanger commented on July 17, 2024

same issue, but i choose 30B

from serge.

ThellraAK avatar ThellraAK commented on July 17, 2024

Don't know what hardware you guys are on, but it took 10+ minutes for me to get a response on an I5-6500 using the smallest one.

from serge.

nsarrazin avatar nsarrazin commented on July 17, 2024

Can you check the logs of the api container, see if it's maybe converting the model and hanging because of that ?

Also are you all on windows with WSL?

from serge.

Wellmare avatar Wellmare commented on July 17, 2024

I have an i5-7400, but the important thing is that I used alpaca with llama without ui and it responded in 10 seconds on the same model

from serge.

Wellmare avatar Wellmare commented on July 17, 2024

I use wsl

from serge.

Wellmare avatar Wellmare commented on July 17, 2024

Where can I see the logs?

from serge.

maxstanger avatar maxstanger commented on July 17, 2024

i use wsl 2, win 11, intel i7 11800

from serge.

ThellraAK avatar ThellraAK commented on July 17, 2024

I'm on a Ubuntu Cloud Image running on Proxmox with Host CPU.

Trying to figure out how to run the Straight API on a different node to see if there is a difference.

File weights/ggml-alpaca-7B-q4_0.bin already converted INFO: 172.18.0.2:56482 - "GET /models HTTP/1.1" 200 OK INFO: 172.18.0.2:56498 - "GET /chats HTTP/1.1" 200 OK INFO: 172.18.0.3:46222 - "GET /chat/1db1f49e-ca45-4a13-8aee-f6a82bde7b76 HTTP/1.1" 200 OK INFO: 172.18.0.3:46224 - "GET /chat/8a32fda3-cc0e-46d0-afed-e7c079a588e4 HTTP/1.1" 200 OK INFO: 172.18.0.3:46236 - "GET /chat/1db1f49e-ca45-4a13-8aee-f6a82bde7b76 HTTP/1.1" 200 OK INFO: 172.18.0.3:50102 - "GET /models HTTP/1.1" 200 OK INFO: 172.18.0.3:47664 - "GET /chat/8a32fda3-cc0e-46d0-afed-e7c079a588e4 HTTP/1.1" 200 OK INFO: 172.18.0.3:47680 - "GET /chat/1db1f49e-ca45-4a13-8aee-f6a82bde7b76 HTTP/1.1" 200 OK INFO: 172.18.0.3:55744 - "GET /chat/1db1f49e-ca45-4a13-8aee-f6a82bde7b76 HTTP/1.1" 200 OK INFO: 172.18.0.3:48880 - "GET /chat/8a32fda3-cc0e-46d0-afed-e7c079a588e4 HTTP/1.1" 200 OK INFO: 172.18.0.2:35214 - "GET /models HTTP/1.1" 200 OK INFO: 172.18.0.2:35224 - "GET /chats HTTP/1.1" 200 OK INFO: 172.18.0.2:48390 - "POST /chat?temp=0.1&top_k=50&max_length=256&top_p=0.95&model=ggml-alpaca-13B-q4_0.bin&repeat_last_n=64&repeat_penalty=1.3&preprompt=Below+is+an+instruction+that+describes+a+task.+Write+a+response+that+appropriately+completes+the+request.+The+response+must+be+accurate%2C+concise+and+evidence-based+whenever+possible.+A+complete+answer+is+always+ended+by+%5Bend+of+text%5D. HTTP/1.1" 200 OK INFO: 172.18.0.2:48390 - "GET /chats HTTP/1.1" 200 OK INFO: 172.18.0.2:48402 - "GET /chat/25b1b3b5-226c-4143-9491-07a8f7acfd51 HTTP/1.1" 200 OK INFO: 172.18.0.3:54912 - "GET /chat/8a32fda3-cc0e-46d0-afed-e7c079a588e4 HTTP/1.1" 200 OK INFO: 172.18.0.3:54920 - "GET /chat/1db1f49e-ca45-4a13-8aee-f6a82bde7b76 HTTP/1.1" 200 OK INFO: 172.18.0.3:54926 - "GET /chat/8a32fda3-cc0e-46d0-afed-e7c079a588e4 HTTP/1.1" 200 OK INFO: 172.18.0.3:46472 - "GET /chat/25b1b3b5-226c-4143-9491-07a8f7acfd51/question?prompt=[PROMPT] HTTP/1.1" 200 OK INFO: 172.18.0.3:44188 - "GET /models HTTP/1.1" 200 OK

from serge.

ThellraAK avatar ThellraAK commented on July 17, 2024

Yeah, using https://github.com/ggerganov/llama.cpp with

./main -m ./models/ggml-alpaca-7b-q4.bin --color -f ./prompts/alpaca.txt -ins

Is outrageously faster than serge.

from serge.

ThellraAK avatar ThellraAK commented on July 17, 2024

Thank you for making this.

If there's anything I can do to help, please let me know.

from serge.

adeleglise avatar adeleglise commented on July 17, 2024

Same here, running on macos, on a macbook pro M2, with podman.

οΏ½INFO:     10.89.0.8:45776 - "GET /chat/637636b2-7e17-4508-a5a9-be04d1c6e894/question?prompt=What+is+a+bridge%3F HTTP/1.1" 200 OK
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 436, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 69, in app
    await response(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 227, in __call__
    async with anyio.create_task_group() as task_group:
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 662, in __aexit__
    raise exceptions[0]
  File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 230, in wrap
    await func()
  File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 219, in stream_response
    async for data in self.body_iterator:
  File "/usr/src/app/main.py", line 161, in event_generator
    async for output in generate(
  File "/usr/src/app/utils/generate.py", line 65, in generate
    raise ValueError(error_output.decode("utf-8"))
ValueError: main: seed = 1679580662
llama_model_load: loading model from '/usr/src/app/weights/ggml-alpaca-7B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB

from serge.

ygean avatar ygean commented on July 17, 2024

keep watching, met the same issue on Intel(R) Core(TM) i5-8400 CPU @ 2.80GHz,

from serge.

OzzyKampha avatar OzzyKampha commented on July 17, 2024

Same problem here: on WSL

from serge.

nsarrazin avatar nsarrazin commented on July 17, 2024

Hey everyone! Can you try to grab the latest main, rebuild the docker container and tell me if it's still happening?

In the case where it's still happening, can you run the following:

docker compose up -d
docker compose exec api bash
llama -m weights/your_model.bin --n_parts 1

see if it loads & starts outputting stuff. If it doesn't please send me the traceback.

from serge.

NikitaGolovko avatar NikitaGolovko commented on July 17, 2024

Similar issues here as with @panicsteve.

7B model works well for me.
13B and 30B are failing (with different errors):

/usr/src/app# llama -m weights/ggml-alpaca-13B-q4_0.bin
main: seed = 1679620095
llama_model_load: loading model from 'weights/ggml-alpaca-13B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: n_parts = 2
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/2 from 'weights/ggml-alpaca-13B-q4_0.bin'
llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
llama_init_from_file: failed to load model
main: error: failed to load model 'weights/ggml-alpaca-13B-q4_0.bin'
/usr/src/app# llama -m weights/ggml-alpaca-30B-q4_0.bin
main: seed = 1679622447
llama_model_load: loading model from 'weights/ggml-alpaca-30B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 6656
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 52
llama_model_load: n_layer = 60
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 17920
llama_model_load: n_parts = 4
llama_model_load: ggml ctx size = 20951.50 MB
Segmentation fault

I'm using docker on WSL.

from serge.

maxstanger avatar maxstanger commented on July 17, 2024

wsl 2:
root@f951b7cb75a6:/usr/src/app# llama -m weights/ggml-alpaca-30B-q4_0.bin
main: seed = 1679648468
llama_model_load: loading model from 'weights/ggml-alpaca-30B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 6656
llama_model_load: n_mult = 256
llama_model_load: n_head = 52
llama_model_load: n_layer = 60
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 17920
llama_model_load: n_parts = 4
llama_model_load: ggml ctx size = 20951.50 MB
Segmentation fault

from serge.

QuaxelBrod avatar QuaxelBrod commented on July 17, 2024

Stumbled accros the issue, too. Could be you are running out of memory...
On Windows/WSL2 it seems your dockerimages are restricted to memory usage. here https://learn.microsoft.com/en-us/windows/wsl/wsl-config you find information to increase memory and CPU for you WSL2 containers.
Not shure if that helps anyone, but fixed the issue with the output like #27 (comment) for me.

from serge.

Qualzz avatar Qualzz commented on July 17, 2024

llama_model_load: llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file

For 30B

from serge.

aaroneden avatar aaroneden commented on July 17, 2024

Hey everyone! Can you try to grab the latest main, rebuild the docker container and tell me if it's still happening?

In the case where it's still happening, can you run the following:

docker compose up -d
docker compose exec api bash
llama -m weights/your_model.bin

see if it loads & starts outputting stuff. If it doesn't please send me the traceback.

root@69f29a3e3123:/usr/src/app# llama -m weights/ggml-alpaca-7B-q4_0.bin
main: seed = 1679694795
llama_model_load: loading model from 'weights/ggml-alpaca-7B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'weights/ggml-alpaca-7B-q4_0.bin'
llama_model_load: ...................................Killed

from serge.

timdingman-scale avatar timdingman-scale commented on July 17, 2024
main: seed = 1679712774
llama_model_load: loading model from 'weights/ggml-alpaca-7B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'weights/ggml-alpaca-7B-q4_0.bin'
llama_model_load: ...................Killed

from serge.

qn1213 avatar qn1213 commented on July 17, 2024

Most of the time, the error is caused by low RAM or VRAM.
Using a research graphics card seems to be the answer.
KakaoTalk_20230325_011412467
KakaoTalk_20230325_003939677

from serge.

tadasgedgaudas avatar tadasgedgaudas commented on July 17, 2024

I'm getting this:

root@personal-gpt-tasks-546548ffbb-69c85:/app# llama -m /mnt/data/weights/ggml-alpaca-7B-q4_0.bin --n_parts 1
main: seed = 1679775129
llama_model_load: loading model from '/mnt/data/weights/ggml-alpaca-7B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
Illegal instruction (core dumped)

from serge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.