$ curl -X POST -H 'Content-Type: application/json' -d '{ "messages": [{"role": "user", "content": "How are you?"}], "model": "starchat-beta.ggmlv3.q4_0.bin", "stream": false}' http://starchat-beta:8000/v1/chat/completions
Internal Server Error
INFO: Started server process [1]
INFO: Waiting for application startup.
Downloading (โฆ)beta.ggmlv3.q4_0.bin: 100%|โโโโโโโโโโ| 10.7G/10.7G [03:09<00:00, 56.6MB/s]
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
error loading model: unexpectedly reached end of file
llama_load_model_from_file: failed to load model
INFO: <ip>:40350 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
raise e
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 227, in app
solved_result = await solve_dependencies(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/dependencies/utils.py", line 622, in solve_dependencies
solved = await call(**sub_values)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/get_llm.py", line 45, in get_llm
return AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/ctransformers/hub.py", line 157, in from_pretrained
return LLM(
^^^^
File "/usr/local/lib/python3.11/site-packages/ctransformers/llm.py", line 214, in __init__
raise RuntimeError(
RuntimeError: Failed to create LLM 'llama' from './models/starchat-beta.ggmlv3.q4_0.bin'.
Any ideas if I'm doing something wrong (i.e. request structure is incorrect), or there is a legitimate issue with the image/deployment?