Giter VIP home page Giter VIP logo

ialacol's Introduction

ialacol's People

Contributors

chenhunghan avatar damianoneill avatar dependabot[bot] avatar donbale avatar ktaletsk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ialacol's Issues

Error when trying to use Falcon-7B

I get the same error when Falcon 7B pod starts. Looks like the model file was deleted: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-7B-GGML/commit/7343df6eea4cfef162077380d075b49fdc9364ee

Deploy

helm install falcon-7b ialacol/ialacol -f examples/values/falcon-7b.yaml

Pod logs:

INFO:     Started server process [1]
INFO:     Waiting for application startup.
Error downloading model: 404 Client Error. (Request ID: Root=1-64d11005-11d0783d650eb9652ee63f84)

Entry Not Found for url: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-7B-GGML/resolve/main/wizard-falcon-7b.ggmlv3.q4_1.bin.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Error when trying to use Starchat Beta

I get the following error when requesting anything from Starchat Beta model

Deploy

helm install starchat-beta ialacol/ialacol

Request and error

$ curl -X POST -H 'Content-Type: application/json' -d '{ "messages": [{"role": "user", "content": "How are you?"}], "model": "starchat-beta.ggmlv3.q4_0.bin", "stream": false}' http://starchat-beta:8000/v1/chat/completions
Internal Server Error

Pod logs:

INFO:     Started server process [1]
INFO:     Waiting for application startup.
Downloading (โ€ฆ)beta.ggmlv3.q4_0.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 10.7G/10.7G [03:09<00:00, 56.6MB/s]
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
error loading model: unexpectedly reached end of file
llama_load_model_from_file: failed to load model
INFO:     <ip>:40350 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 227, in app
    solved_result = await solve_dependencies(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/dependencies/utils.py", line 622, in solve_dependencies
    solved = await call(**sub_values)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/get_llm.py", line 45, in get_llm
    return AutoModelForCausalLM.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/ctransformers/hub.py", line 157, in from_pretrained
    return LLM(
           ^^^^
  File "/usr/local/lib/python3.11/site-packages/ctransformers/llm.py", line 214, in __init__
    raise RuntimeError(
RuntimeError: Failed to create LLM 'llama' from './models/starchat-beta.ggmlv3.q4_0.bin'.

Any ideas if I'm doing something wrong (i.e. request structure is incorrect), or there is a legitimate issue with the image/deployment?

accelerator for project

Hi @chenhunghan, very cool project. was looking for something like this when i saw that Falcon was out. Any recs on appropriate accelerators for using the 7B and 40B models? I'm going to try your project out, and I'm wondering if you've already experimented with appropriate node pool specifications for running the deployment.

happy to add some tf-templates in the repo for some different providers if you want to collaborate

Mix streamings and threads count for GPTQ Models bug

Hi @chenhunghan
When using streaming two seperate queries at once to the llm server the tokens from query 1 appears in query 2 response.
for example if i execute two queries at thesame time via two seperate python3 scripts
Script A: using streams.py example to ask "What is photosynthesis"
Script B: using streams.py example to ask "What is an airplane"

Sometimes the results from what is an airplane in Script B shows up in the result stream for Script A and vice-verca what could be the issue? could be THREADS value? right now i set to "1" according to the guide where is says use "1" for GPTQ models, i am using a very large AWS EC2 instance with many vcpus, should i bump up the threads or what is the solution to this?
Thank you @chenhunghan

Downloading models fail with timeouts, retry is not enabled.

When downloading a number of models (clearly from Hugging Face), I get the following error every time, seemingly at exactly the same point as well for each respective model that I have tried, even very small ones.
ERROR: Error downloading model: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.

I think the code needs to be modified to include retries.

It would also be helpful if the documentation described how we can download models manually from within the pod.

Helm install fails

gengwg@gengwg-mbp:~$ helm repo add ialacol https://chenhunghan.github.io/ialacol
"ialacol" has been added to your repositories
gengwg@gengwg-mbp:~$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "ialacol" chart repository
...Successfully got an update from the "harbor" chart repository
Update Complete. โŽˆHappy Helming!โŽˆ
gengwg@gengwg-mbp:~$ helm install llama-2-7b-chat ialacol/ialacol
Error: INSTALLATION FAILED: template: ialacol/templates/deployment.yaml:29:29: executing "ialacol/templates/deployment.yaml" at <.Values.deployment.env.DEFAULT_MODEL_HG_REPO_ID>: nil pointer evaluating interface {}.DEFAULT_MODEL_HG_REPO_ID

Deployment fails to respond with errors

I have been able to deploy this to Okteto and the Volume has been created and the pod successfully loads a model.
However, when I port-forward into it, which kubectl says worked, and I connect to localhost:8000, I only get "{"detail":"Not Found"}" in my browser and log for the pod is as follows:
2023-08-24 17:14:36.55 UTCllamacpp-64bdc45bc5-fk6h9llamacppINFO: Application startup complete.
2023-08-24 17:14:36.55 UTCllamacpp-64bdc45bc5-fk6h9llamacppINFO: Uvicorn running on http://0.0.0.0:8000/ (Press CTRL+C to quit)
2023-08-24 17:15:28.35 UTCllamacpp-64bdc45bc5-fk6h9llamacppINFO: 10.8.26.12:32790 - "GET / HTTP/1.1" 404 Not Found
2023-08-24 17:19:14.65 UTCllamacpp-64bdc45bc5-fk6h9llamacppINFO: 127.0.0.1:58060 - "GET / HTTP/1.1" 404 Not Found
2023-08-24 17:19:14.91 UTCllamacpp-64bdc45bc5-fk6h9llamacppINFO: 127.0.0.1:58060 - "GET /favicon.ico HTTP/1.1" 404 Not Found
2023-08-24 17:19:21.12 UTCllamacpp-64bdc45bc5-fk6h9llamacppINFO: 127.0.0.1:58064 - "GET / HTTP/1.1" 404 Not Found
2023-08-24 17:19:22.11 UTCllamacpp-64bdc45bc5-fk6h9llamacppINFO: 127.0.0.1:58064 - "GET / HTTP/1.1" 404 Not Found
2023-08-24 17:42:47.20 UTCllamacpp-64bdc45bc5-fk6h9llamacppINFO: 127.0.0.1:58222 - "GET / HTTP/1.1" 404 Not Found

How can I fix this or is this a known bug in the Image as it seems like the webserver is misconfigured.

Support GPTQ model

To add support for GPTQ model, we will need to have CI to build/push image with GPTQ tag.

Issue with GPU-accelerated LLAMA 2

I cannot get GPU-accelerated LLAMA 2 working. This is the pod logs when the error happens:

INFO:     Started server process [1]
INFO:     Waiting for application startup.
Downloading (โ€ฆ)chat.ggmlv3.q4_0.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3.79G/3.79G [00:09<00:00, 393MB/s]
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
WARNING: failed to allocate 0.08 MB of pinned memory: CUDA driver version is insufficient for CUDA runtime version
CUDA error 35 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:4236: CUDA driver version is insufficient for CUDA runtime version

I am using g5.2xlarge EC2 instance with NVIDIA A10G GPU.
This is the output of nvidia-smi from within the pod:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A10G         On   | 00000000:00:1E.0 Off |                    0 |
|  0%   26C    P8    15W / 300W |      0MiB / 22731MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Snippet of my Helm values

deployment:
  image: ghcr.io/chenhunghan/ialacol-cuda11:latest
  env:
    DEFAULT_MODEL_HG_REPO_ID: TheBloke/Llama-2-7B-Chat-GGML
    DEFAULT_MODEL_FILE: llama-2-7b-chat.ggmlv3.q4_0.bin
    GPU_LAYERS: 20

Any ideas where the mismatch is coming from?

Storage Class value named differently in PVC templates and documentation examples

There is a mismatch between docs and templates that I noticed:

Templates expect .Values.model.persistence.storageClassName:

But the example values all have .Values.model.persistence.storageClass, i.e.

As a result, inattentive users might not notice the difference and use not the intended storage class, but the default one

Images with :cuda and :metal tags

In order to add support for CUDA 11/Metal, we might want to add :cuda, :metal tags to the docker images. We might need to update the CI for building images.

Quickly get started with ialacol.

Screenshot 2023-07-21 at 9 20 00 PM

Hi So deployed the pod for openllama-7b, On a kind cluster. On a device that does not have any GPU. Will this model be able to run on a system with 16 GB ram. I have port forwarded the pod but with the curl not getting any responses. Any suggestions on what might be the issue?

What all i did:
helm repo add ialacol https://chenhunghan.github.io/ialacol
helm repo update
helm install openllama-7b ialacol/ialacol

kubectl port-forward svc/openllama-7b 8000:8000

curl -X POST
-H 'Content-Type: application/json'
-d '{ "messages": [{"role": "user", "content": "How are you?"}], "model": "open_llama_7b-q4_0-ggjt.bin"}'
http://localhost:8000/v1/chat/completions

Auto detecting threads

Hi @chenhunghan,

nice project works good so far thanks for making this public available!

But i think the auto detecting threads feature does not work correctly, if i do not give the environment variable it runs on 8 threads although i have for example 16. After setting the env variable to 16 all threads are utilized.

Also one question, if i send requests in parallel is it normal that ialacol exits?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.