serge-chat / serge Goto Github PK

A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.

License: Apache License 2.0

Python 28.95% JavaScript 1.08% CSS 4.95% HTML 0.33% Svelte 55.95% TypeScript 2.75% Dockerfile 1.27% Shell 3.44% Smarty 1.28%

llama alpaca docker fastapi llamacpp python web svelte sveltekit tailwindcss

serge's Issues

Assembler Error

While trying to build on an aarch64 VPS, it errors out:

 => ERROR [serge-api builder 4/4] RUN cd llama.cpp &&     make &&     mv main llama                                                                                                            5.4s
 => [serge-web 6/6] COPY . .                                                                                                                                                                   0.1s
------
 > [serge-api builder 4/4] RUN cd llama.cpp &&     make &&     mv main llama:
#0 0.422 I llama.cpp build info: 
#0 0.423 I UNAME_S:  Linux
#0 0.423 I UNAME_P:  unknown
#0 0.423 I UNAME_M:  aarch64
#0 0.423 I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mcpu=native
#0 0.423 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native
#0 0.423 I LDFLAGS:  
#0 0.423 I CC:       cc (GCC) 10.2.0
#0 0.423 I CXX:      g++ (GCC) 10.2.0
#0 0.423 
#0 0.424 cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mcpu=native   -c ggml.c -o ggml.o
#0 4.559 Assembler messages:
#0 4.559 Error: unknown architectural extension `sb+ssbs'
#0 4.559 Error: unrecognized option -march=armv8.2-a+crypto+fp16+rcpc+dotprod+sb+ssbs
#0 5.212 make: *** [Makefile:221: ggml.o] Error 1
------
failed to solve: process "/bin/sh -c cd llama.cpp &&     make &&     mv main llama" did not complete successfully: exit code: 2```

EMFILE: too many open files, watch '/usr/src/app/web/vite.config.ts'

Bug description

Error: EMFILE: too many open files, watch '/usr/src/app/web/vite.config.ts' is presented, but the host system and docker container has a limit of 1048576

Steps to reproduce

Deploy YAML on K8 cluster:
https://github.com/nsarrazin/serge/wiki/Integrating-Serge-in-your-orchestration#kubernetes-example

Environment Information

Docker image: ghcr.io/nsarrazin/serge:release
Containrd
OS Linux Ubuntu 22.04 LTS
CPU: 48 x Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (2 Sockets)

Screenshots

No response

Relevant log output

> [email protected] dev
> vite dev --host 0.0.0.0 --port 8008
INFO: Started server process [12]
INFO: Waiting for application startup.
INFO: main initializing database connection
▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]
tsconfig.json:2:12:
2 │ "extends": "./.svelte-kit/tsconfig.json",
╵ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
node:internal/errors:490
ErrorCaptureStackTrace(err);
^
Error: EMFILE: too many open files, watch '/usr/src/app/web/vite.config.ts'
at FSWatcher.<computed> (node:internal/fs/watchers:247:19)
at Object.watch (node:fs:2350:36)
at createFsWatchInstance (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50313:17)
at setFsWatchListener (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50360:15)
at NodeFsHandler._watchWithNodeFs (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50515:14)
at NodeFsHandler._handleFile (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50579:23)
at NodeFsHandler._addToNodeFs (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50821:21)
at async file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:51817:21
at async Promise.all (index 1)
Emitted 'error' event on FSWatcher instance at:
at FSWatcher._handleError (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:52013:10)
at NodeFsHandler._addToNodeFs (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50829:18)
at async file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:51817:21
at async Promise.all (index 1) {
errno: -24,
syscall: 'watch',
code: 'EMFILE',
path: '/usr/src/app/web/vite.config.ts',
filename: '/usr/src/app/web/vite.config.ts'
}
Node.js v19.8.1



### Confirmations

- [X] I'm running the latest version of the main branch.
- [X] I checked existing issues to see if this has already been described.

Websocket error

Ive got it to work locally on my M2 with the help of the issue before.

But I got stuck with the Websocket.
When I ask something, than it loads endless.

WebSocket connection to 'ws://localhost:8008/' failed: There was a bad response from the server.

Ive tried in chrome and Safari.

Docker build failing on new docker-compose.yaml: [internal] load metadata for docker.io/library/gcc:11:

Bug description

Docker build failing on new docker-compose.yaml

[+] Building 0.3s (2/4)
=> [internal] load build definition from Dockerfile
[+] Building 0.3s (4/4) FINISHED
=> [internal] load build definition from Dockerfile
=> => transferring dockerfile: 69B
=> [internal] load .dockerignore
=> => transferring context: 69B
=> ERROR [internal] load metadata for docker.io/library/ubuntu:22.04 gcc:11

[internal] load metadata for docker.io/library/ubuntu:22.04:
[internal] load metadata for docker.io/library/gcc:11:
failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: rpc error: code = Unknown desc = error getting credentials - err: exit status 1, out: ``

Steps to reproduce

Clean install.
Wipe out old dangling images, containers, networks etc.

Pull the latest code from main (sha 4047fbe)
Run `sudo docker compose build | optionally include [--no-cache]

Environment Information

OS: Apple Silicon MacOS Ventura, M2 Max
sudo docker version
Client:
Cloud integration: v1.0.31
Version: 20.10.23
API version: 1.41
Go version: go1.18.10
Git commit: 7155243
Built: Thu Jan 19 17:35:19 2023
OS/Arch: darwin/arm64
Context: default
Experimental: true

Server: Docker Desktop 4.17.0 (99724)
Engine:
Version: 20.10.23
API version: 1.41 (minimum version 1.12)
Go version: go1.18.10
Git commit: 6051f14
Built: Thu Jan 19 17:31:28 2023
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.18
GitCommit: 2456e983eb9e37e47538f59ea18f2043c9a73640
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0

Screenshots

No response

Relevant log output

[+] Building 0.3s (2/4)
 => [internal] load build definition from Dockerfile                                                                                                                            0.0s
[+] Building 0.3s (4/4) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                            0.0s
 => => transferring dockerfile: 69B                                                                                                                                             0.0s
 => [internal] load .dockerignore                                                                                                                                               0.0s
 => => transferring context: 69B                                                                                                                                                0.0s
 => ERROR [internal] load metadata for docker.io/library/ubuntu:22.04                                                                                                           0.3s
 => ERROR [internal] load metadata for docker.io/library/gcc:11                                                                                                                 0.3s
------
 > [internal] load metadata for docker.io/library/ubuntu:22.04:
------
------
 > [internal] load metadata for docker.io/library/gcc:11:
------
failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: rpc error: code = Unknown desc = error getting credentials - err: exit status 1, out: ``

Confirmations

I'm running the latest version of the main branch.
I checked existing issues to see if this has already been described.

Connection error when downloading models from huggingface

When I run docker compose exec api python3 /usr/src/app/utils/download.py tokenizer 30B, the following error appears in a certain point of the installation:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 444, in _error_catcher
    yield
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 567, in read
    data = self._fp_read(amt) if not fp_closed else b""
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 533, in _fp_read
    return self._fp.read(amt) if amt is not None else self._fp.read()
  File "/usr/lib/python3.10/http/client.py", line 465, in read
    s = self.fp.read(amt)
  File "/usr/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.10/ssl.py", line 1274, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.10/ssl.py", line 1130, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 628, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 566, in read
    with self._error_catcher():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 449, in _error_catcher
    raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/utils/download.py", line 53, in <module>
    download_models(args.model)
  File "/usr/src/app/utils/download.py", line 35, in download_models
    huggingface_hub.hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1326, in hf_hub_download
    http_get(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 538, in http_get
    for chunk in r.iter_content(chunk_size=10 * 1024 * 1024):
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 822, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.

The docker image does not build on M2 macbook air: failed to solve: executor failed running [/bin/sh -c cd llama.cpp && make && mv main llama]: exit code: 2

Bug description

Steps to reproduce

The steps I followed:

do as the readme suggests.

and it crushed.

Environment Information

docker --version
Docker version 23.0.1, build a5ee5b1dfc

OS:
sw_vers

ProductName: macOS
ProductVersion: 13.2.1
BuildVersion: 22D68

M2 chipset

Screenshots

No response

Relevant log output

docker compose up -d
[+] Building 35.3s (13/30)
 => [internal] load build definition from Dockerfile                       0.0s
 => => transferring dockerfile: 1.52kB                                     0.0s
 => [internal] load .dockerignore                                          0.0s
 => => transferring context: 71B                                           0.0s
 => [internal] load metadata for docker.io/library/ubuntu:22.04            2.6s
 => [internal] load metadata for docker.io/library/gcc:12                  2.6s
 => [internal] load build context                                          0.0s
 => => transferring context: 149.39kB                                      0.0s
 => [deployment  1/21] FROM docker.io/library/ubuntu:22.04@sha256:67211c1  3.1s
 => => resolve docker.io/library/ubuntu:22.04@sha256:67211c14fa74f070d27c  0.0s
 => => sha256:cd741b12a7eaa64357041c2d3f4590c898313a7f8 27.35MB / 27.35MB  2.6s
 => => sha256:67211c14fa74f070d27cc59d69a7fa9aeff8e28ea11 1.13kB / 1.13kB  0.0s
 => => sha256:537da24818633b45fcb65e5285a68c3ec1f3db25f5ae547 424B / 424B  0.0s
 => => sha256:bab8ce5c00ca3ef91e0d3eb4c6e6d6ec7cffa9574c4 2.32kB / 2.32kB  0.0s
 => => extracting sha256:cd741b12a7eaa64357041c2d3f4590c898313a7f8f65cd15  0.4s
 => [llama_builder 1/4] FROM docker.io/library/gcc:12@sha256:b12d1e7c37e  30.3s
 => => resolve docker.io/library/gcc:12@sha256:b12d1e7c37e101fd76848570b8  0.0s
 => => sha256:ba265c6e20b2489ecfef524fad8f28916c9d92a9e63 9.19kB / 9.19kB  0.0s
 => => sha256:7971239fe1d69763272ccc0b2527efa95547d37c536 5.15MB / 5.15MB  2.1s
 => => sha256:b2eeecc98d6bc3812474852a39ce0a97be52fc7b961 2.22kB / 2.22kB  0.0s
 => => sha256:8022b074731d9ecee7f4fba79b993920973811dda 53.70MB / 53.70MB  5.1s
 => => sha256:b12d1e7c37e101fd76848570b81352fe9546dd1caad 1.43kB / 1.43kB  0.0s
 => => sha256:26c861b53509d61c37240d2f80efb3a351d2f1d7f 10.87MB / 10.87MB  4.3s
 => => sha256:1714880ecc1c021a5f708f4369f91d3c2c53b998 54.68MB / 54.68MB  15.2s
 => => sha256:895a945a1f9ba441c2748501c4d46569edfbc2 189.73MB / 189.73MB  25.7s
 => => sha256:cd267d572e2202b3070cca7993eb424a4084c7844 16.13kB / 16.13kB  5.5s
 => => extracting sha256:8022b074731d9ecee7f4fba79b993920973811dda168bbc0  0.7s
 => => sha256:5f1a14b7155767f4a80c696309effd494189de 125.97MB / 125.97MB  23.5s
 => => extracting sha256:7971239fe1d69763272ccc0b2527efa95547d37c53630ed0  0.1s
 => => extracting sha256:26c861b53509d61c37240d2f80efb3a351d2f1d7f4f8e8ec  0.1s
 => => sha256:d29d4e33051b1fab13de7c854ee4fdac99d73675 10.02kB / 10.02kB  15.9s
 => => extracting sha256:1714880ecc1c021a5f708f4369f91d3c2c53b998a56d563d  0.7s
 => => sha256:f54184d767dfe3575b7a0f3411dec9c55dad00dc2e 1.89kB / 1.89kB  16.2s
 => => extracting sha256:895a945a1f9ba441c2748501c4d46569edfbc2bfbdb9b47d  2.2s
 => => extracting sha256:cd267d572e2202b3070cca7993eb424a4084c7844e7725d4  0.0s
 => => extracting sha256:5f1a14b7155767f4a80c696309effd494189dec7c5e06eba  1.9s
 => => extracting sha256:d29d4e33051b1fab13de7c854ee4fdac99d736756e704e57  0.0s
 => => extracting sha256:f54184d767dfe3575b7a0f3411dec9c55dad00dc2ea8d1e5  0.0s
 => [deployment  2/21] WORKDIR /usr/src/app                                0.1s
 => [deployment  3/21] RUN apt update                                      5.0s
 => CANCELED [deployment  4/21] RUN apt-get install -y python3-pip curl   24.4s
 => [llama_builder 2/4] WORKDIR /tmp                                       0.2s
 => [llama_builder 3/4] RUN git clone https://github.com/ggerganov/llama.  1.3s
 => ERROR [llama_builder 4/4] RUN cd llama.cpp &&     make &&     mv main  0.5s
------
 > [llama_builder 4/4] RUN cd llama.cpp &&     make &&     mv main llama:
#0 0.265 I llama.cpp build info:
#0 0.265 I UNAME_S:  Linux
#0 0.265 I UNAME_P:  unknown
#0 0.265 I UNAME_M:  aarch64
#0 0.265 I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mcpu=native
#0 0.265 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native
#0 0.265 I LDFLAGS:
#0 0.265 I CC:       cc (GCC) 12.2.0
#0 0.265 I CXX:      g++ (GCC) 12.2.0
#0 0.265
#0 0.265 cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mcpu=native   -c ggml.c -o ggml.o
#0 0.484 In file included from ggml.c:137:
#0 0.484 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h: In function 'ggml_vec_dot_f16':
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:29182:1: error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch
#0 0.485 29182 | vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
#0 0.485       | ^~~~~~~~~
#0 0.485 ggml.c:799:37: note: called from here
#0 0.485   799 |     #define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
#0 0.485       |                                     ^~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:823:41: note: in expansion of macro 'GGML_F16x8_FMA'
#0 0.485   823 |     #define GGML_F16_VEC_FMA            GGML_F16x8_FMA
#0 0.485       |                                         ^~~~~~~~~~~~~~
#0 0.485 ggml.c:1321:22: note: in expansion of macro 'GGML_F16_VEC_FMA'
#0 0.485  1321 |             sum[j] = GGML_F16_VEC_FMA(sum[j], ax[j], ay[j]);
#0 0.485       |                      ^~~~~~~~~~~~~~~~
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:29182:1: error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch
#0 0.485 29182 | vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
#0 0.485       | ^~~~~~~~~
#0 0.485 ggml.c:799:37: note: called from here
#0 0.485   799 |     #define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
#0 0.485       |                                     ^~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:823:41: note: in expansion of macro 'GGML_F16x8_FMA'
#0 0.485   823 |     #define GGML_F16_VEC_FMA            GGML_F16x8_FMA
#0 0.485       |                                         ^~~~~~~~~~~~~~
#0 0.485 ggml.c:1321:22: note: in expansion of macro 'GGML_F16_VEC_FMA'
#0 0.485  1321 |             sum[j] = GGML_F16_VEC_FMA(sum[j], ax[j], ay[j]);
#0 0.485       |                      ^~~~~~~~~~~~~~~~
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:28760:1: error: inlining failed in call to 'always_inline' 'vaddq_f16': target specific option mismatch
#0 0.485 28760 | vaddq_f16 (float16x8_t __a, float16x8_t __b)
#0 0.485       | ^~~~~~~~~
#0 0.485 ggml.c:805:22: note: called from here
#0 0.485   805 |             x[2*i] = vaddq_f16(x[2*i], x[2*i+1]);                 \
#0 0.485       |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:826:41: note: in expansion of macro 'GGML_F16x8_REDUCE'
#0 0.485   826 |     #define GGML_F16_VEC_REDUCE         GGML_F16x8_REDUCE
#0 0.485       |                                         ^~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:1326:5: note: in expansion of macro 'GGML_F16_VEC_REDUCE'
#0 0.485  1326 |     GGML_F16_VEC_REDUCE(sumf, sum);
#0 0.485       |     ^~~~~~~~~~~~~~~~~~~
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:28760:1: error: inlining failed in call to 'always_inline' 'vaddq_f16': target specific option mismatch
#0 0.485 28760 | vaddq_f16 (float16x8_t __a, float16x8_t __b)
#0 0.485       | ^~~~~~~~~
#0 0.485 ggml.c:808:22: note: called from here
#0 0.485   808 |             x[4*i] = vaddq_f16(x[4*i], x[4*i+2]);                 \
#0 0.485       |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:826:41: note: in expansion of macro 'GGML_F16x8_REDUCE'
#0 0.485   826 |     #define GGML_F16_VEC_REDUCE         GGML_F16x8_REDUCE
#0 0.485       |                                         ^~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:1326:5: note: in expansion of macro 'GGML_F16_VEC_REDUCE'
#0 0.485  1326 |     GGML_F16_VEC_REDUCE(sumf, sum);
#0 0.485       |     ^~~~~~~~~~~~~~~~~~~
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:28760:1: error: inlining failed in call to 'always_inline' 'vaddq_f16': target specific option mismatch
#0 0.485 28760 | vaddq_f16 (float16x8_t __a, float16x8_t __b)
#0 0.485       | ^~~~~~~~~
#0 0.485 ggml.c:811:22: note: called from here
#0 0.485   811 |             x[8*i] = vaddq_f16(x[8*i], x[8*i+4]);                 \
#0 0.485       |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:826:41: note: in expansion of macro 'GGML_F16x8_REDUCE'
#0 0.485   826 |     #define GGML_F16_VEC_REDUCE         GGML_F16x8_REDUCE
#0 0.485       |                                         ^~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:1326:5: note: in expansion of macro 'GGML_F16_VEC_REDUCE'
#0 0.485  1326 |     GGML_F16_VEC_REDUCE(sumf, sum);
#0 0.485       |     ^~~~~~~~~~~~~~~~~~~
#0 0.501 make: *** [Makefile:221: ggml.o] Error 1
------
failed to solve: executor failed running [/bin/sh -c cd llama.cpp &&     make &&     mv main llama]: exit code: 2

Confirmations

I'm running the latest version of the main branch.
I checked existing issues to see if this has already been described.

npm install timeout

Bug description

hello to all, I have a problem with the deployment of the container,
I have a server under ubuntu with a docker infrastructure, when I want to install the project, the installation crashes at step 17/21 at the time of the npm install in the web folder (err timeout), I can't explain myself too the why.
if anyone has an idea i'm a taker.

thank

Steps to reproduce

docker compose up -d

Environment Information

ubuntu server
intel xeon
64go ram

Screenshots

No response

Relevant log output

No response

Confirmations

I'm running the latest version of the main branch.
I checked existing issues to see if this has already been described.

too slow?

I'm using it under Windwos 11 with alpaca 7B
Ok, it's great overall, but I have a native cpp version (chat.exe) and it's running 2 times faster than your docker version.
Also, how to use the API ? I saw in docker something like 127.0.0.1:35272 - "GET /chat/5fe89704-c7ca-4a67-9ec2-f267689b0ffe/question?prompt=No%2C+it%27s+actually+14 HTTP/1.1" 200 OK
But where to look for proper API documentation ?

Feature request: configurable resource usage

It would be great if (especially) the API would scale (more than 4 cpu cores) or even be able to run multiple instances in parallel.
This would greatly enhanced the usability on systems with plenty of power or even clusters.

LangChain LLM

Hi, I noticed there was interest in using LangChain with Alpaca and you did a lot of the work already needed for streaming so I wondered if I could built on it and make it into a LLM class, here is my progress so far, just on a Gist right now:
https://gist.github.com/lukestanley/6517823485f88a40a09979c1a19561ce
I mention it in this existing LangChain issue:
langchain-ai/langchain#1777
Obviously feel free to do what you like with my small contribution.

Models folder does not exist

In the README file, I think you mean to say that models should be downloaded inside api/weights folder not the non-existent models folder.

Can't run the API server without the web server

Currently you need to run both API & Web server because they're behind nginx and if the web container is not started then nginx cannot find the web hostname and fails.

Would be nice to find a way to optioanlly just run the API server for integration with other services.

[Enhancement] Add the Ability to Use NVIDIA for Docker

Allow large language models with graphics card with large RAM.

https://github.com/NVIDIA/nvidia-docker

Github Container Reg is private

It seems that the GitHub container registry tied to this repo is not readable by an anonymous source (i.e. docker pull). Example error below:

Error:

Error response from daemon: Head "https://ghcr.io/v2/nsarrazin/serge/manifests/release": denied: denied

Missing --host parameter in deploy.sh for K8S type deployment using release docker image

Bug description

A missing parameter in deploy.sh can lead to failure to join the web service on Kubernetes. I have seen that adding the following to deploy.sh fix the issue:

cd api && uvicorn main:app --host 0.0.0.0 --port 9124 --root-path /api/ &

Here is the error log if needed.

INFO:   main    initializing models
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:9124 (Press CTRL+C to quit)
INFO:   main    models are ready
11:26:38 AM [vite] http proxy error at /chats:
Error: connect ECONNREFUSED ::1:9124
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1532:16)
11:26:38 AM [vite] http proxy error at /chat/420689cd-99de-477e-8ea0-b0ec82f51830:
Error: connect ECONNREFUSED ::1:9124
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1532:16)
SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at Proxy.eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:286:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async load (+layout.ts:12:17)
    at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
    at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/index.js:169:13)
SyntaxError: Unexpected token 'I', "Internal S"... is not valid JSON
    at JSON.parse (<anonymous>)
    at Proxy.eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:286:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async load (+layout.ts:12:17)
    at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
    at async Module.respond_with_error (/node_modules/@sveltejs/kit/src/runtime/server/page/respond_with_error.js:52:17)
    at async resolve (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:12)
    at async Module.respond (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:240:20)
    at async file:///usr/src/app/web/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:505:22
11:27:03 AM [vite] http proxy error at /chats:
Error: connect ECONNREFUSED ::1:9124
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1532:16)
11:27:03 AM [vite] http proxy error at /models:
Error: connect ECONNREFUSED ::1:9124
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1532:16)
SyntaxError: Unexpected end of JSON input
    at JSON.parse (<anonymous>)
    at Proxy.eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:286:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async load (+layout.ts:12:17)
    at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
    at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/index.js:169:13)

This should not have any impact on the deployment on Docker.

Steps to reproduce

kubectl run serge-dev --image=ghcr.io/nsarrazin/serge:release --port='8008' --port='9124' --expose=true
service/serge-dev created pod/serge-dev created
Go to firefox and enter service IP exposing the webservice
Get error 500:

Environment Information

OS: Rocky Linux 8.7
Kubernetes Version: Kubernetes 1.25.6
Browser: Firefox 111.0.1

Screenshots

No response

Relevant log output

No response

Confirmations

I'm running the latest version of the main branch.
I checked existing issues to see if this has already been described.

MongoDB Issue on Apple M2 Silicon Chip

Hello,

can someone please help to set up the Docker image on my Mac with M2 Chip.
bitnami/containers#14169
This solution seems not to work.

Would be great if a Solution comes up.

Thank you

Add AVX512 instruction

I got the changes to add in the Makefile and gml.c the compilation support of the AVX512 instruction on the CPU in llama.
I tested quickly, it seems indeed faster.

Here is code : https://github.com/Ameobea/alpaca.cpp/tree/llama-avx512-support

Serge on Cloudron - Support one-click deployment

Thank you for creating Serge. It is wonderful to have the option of a self-hosted AI.

Deployment is easy if you are technical, but not so easy for ordinary people. If you support Serge on Cloudron, people will be able to deploy it with one click.

Most of the work is already completed, as you have a Docker image. Try the demo on the Cloudron page to see how easy it is:
https://cloudron.io

A thread was started to try and support Serge. Please introduce yourself and see if it can be completed soon:
https://forum.cloudron.io/topic/8872/serge-llama-made-easy-self-hosted-ai-chat

PS Why choose Discord when there are Free Software alternatives like Element?
Also, the 30B-q4 model only gives the "Loading" response and never talks to us...

Questions about the copyright

Hello,

First of all, thank you for making this! I have a question about the copyright part of the project though. If I understand this correctly, Meta only released such models for Academic students, but these are converted ones.

1.) As an academic student, can I use these freely? Is it safe and legal to upload, modify, finetune these models?
2.) If I'm not an academic student, can I still use these?

Thank you!

LangChain integration

Pretty low hanging fruit with the wrapper we have, would be great to create a custom LangChain LLM wrapper for llama.cpp.

Then we could use it in the API and do all sorts of cool things with Serge.

Any ways it will work an a raspberry pi?

I get this error
MongoDB requires ARMv8.2-A

I'm running it on a raspberry pi 4. Everything works accept the mongo db which exites as soon as I start it

Refactor the API

Currently it's a bit of a mess with little to no structure.

I'll be working on making things a bit more structured and expendable.

CI/CD Docker pipeline

Love the project and I would love to contribute! My first thought its building a quick CI/CD pipeline with GitHub actions that allows any merge into a "release" branch would trigger a build of a docker image and upload it to the GitHub package repo. This way people can use their own docker-compose.yml (or the provided one in the repo) and pull the images without building them. Would I be able to get this setup?

Edit: Grammar

As an api image builder, I would like to ignore *.bin.old files as well.

README:

The old weights will be renamed to *.bin.old and the new weights will be named *.bin.

https://github.com/nsarrazin/serge/blob/b5ff9d154142ca918347604e0fd89dd3b003fab0/api/.dockerignore#L1

Option for pre-loading specific models into memory

Not sure if this feature is possible, but I'd like the ability to specify (preferably in my .env file) models to leave pre-loaded in memory. It shouldn't be the default choice, but it would allow bandwidth-constrained servers to run faster, as well as reducing overall latency when running as an API.

Thanks for making this, and I look forward to seeing your plans for the API refactor! 😃

Implementing custom stop sequences

Would be cool to have in the API.

AI does not respond

I've got everything installed on Windows with WSL2. UI work, but the eternal loading of the answer

Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json

While trying to figure out why prompts were hanging I check the logs and found this:

6-be6b-40cb-95cb-69c96b0f9d05" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0" "-"
web_1 |
web_1 | > [email protected] dev
web_1 | > vite dev --host 0.0.0.0 --port 9123
web_1 |
web_1 | ▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]
web_1 |
web_1 | tsconfig.json:2:12:
web_1 | 2 │ "extends": "./.svelte-kit/tsconfig.json",
web_1 | ╵ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
web_1 |
web_1 |
web_1 | Forced re-optimization of dependencies
web_1 |
web_1 | VITE v4.2.0 ready in 1285 ms
web_1 |
web_1 | ➜ Local: http://localhost:9123/
web_1 | ➜ Network: http://172.19.0.4:9123/
web_1 | 5:37:04 PM [vite-plugin-svelte] ssr compile in progress ...
web_1 |

error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch

Hi!

Sorry to make this an issue, but I'm running into it! I've followed the README and am trying to get it running but I run into quite a few errors. Maybe I'm just missing a dependency or something like that, but I haven't quite figured it out for myself yet and am wondering if others might be running into the same thing? I've tried this on two fairly clean Ubuntu 22.04 machines with the same results.

After the initial docker stuff does its pulls, I run into these lines of output:

Status: Downloaded newer image for gcc:10.2
 ---> 987c8580a041
Step 2/12 : WORKDIR /tmp
 ---> Running in 6eb681888247
Removing intermediate container 6eb681888247
 ---> 0999a4b386ae
Step 3/12 : RUN git clone https://github.com/ggerganov/llama.cpp.git --branch master-d5850c5
 ---> Running in 7f705b3f31b9
Cloning into 'llama.cpp'...
Note: checking out 'd5850c53ca179b9674b98f35d359763416a3cc11'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

Removing intermediate container 7f705b3f31b9
 ---> 1a27e05ce64f
Step 4/12 : RUN cd llama.cpp &&     make &&     mv main llama
 ---> Running in eae1a4f90a3a
I llama.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  unknown
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:  
I CC:       cc (GCC) 10.2.0
I CXX:      g++ (GCC) 10.2.0

cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -pthread -mavx -msse3   -c ggml.c -o ggml.o
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
ggml.c: In function 'ggml_vec_dot_f16':
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1319:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1319 |             ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1318:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1318 |             ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1318:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1318 |             ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1319:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1319 |             ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1318:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1318 |             ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
                 from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
   52 | _mm256_cvtph_ps (__m128i __A)
      | ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
  915 | #define GGML_F32Cx8_LOAD(x)     _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
  925 | #define GGML_F16_VEC_LOAD(p, i)     GGML_F32Cx8_LOAD(p)
      |                                     ^~~~~~~~~~~~~~~~
ggml.c:1319:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
 1319 |             ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
      |                     ^~~~~~~~~~~~~~~~~
make: *** [Makefile:221: ggml.o] Error 1
The command '/bin/sh -c cd llama.cpp &&     make &&     mv main llama' returned a non-zero code: 2
ERROR: Service 'api' failed to build : Build failed

Switching conversations while the chat is answering shows the answer in whichever chat is focused.

If you ask an answer and switch conversations while it's answering you, the answer will follow you around and continue streaming.

If you refresh the page afterwards, it's gone and the answer will be in the right chat, so this is just a rendering bug, not a big deal but would be nice to fix.

Replace hardcoded tweaks to llama.cpp with a more permanent solution

Currently the compiled llama.cpp binary we use only supports alpaca. The source had to be modified to accept a model as a single file (alpaca 13B is a single file, as opposed to the 2-part model expected for LLaMa 13B). But doing so breaks compatibility for other LLaMa based models.

Relevant changes here.
https://github.com/nsarrazin/serge/blob/a837ea48e017289a21a9574b0fe862f541874a14/api/Dockerfile.api#L18-L20

We could make this more generic, but maybe it needs to be handled in llama.cpp instead ? Not sure yet.

Add the possibility to upload PDFs in the front part of the application to read it

Create on the front end the possibility to upload a PDF file for example with the following procedure:

Parsing the file to read it
Split content to respect "limit" of tokens for model to send.
To be able to make queries on the elements of the document from the prompt.

Add a responsive design for mobile

Chat bar is doesn't block the chat window, so messages will be occluded by it
No hamburger icon to display menu
Chats don't occupy all of the width they could on the screen

Great work. I'm super impressed by this project!

Do you plan to support LoRa models?

https://github.com/Beomi/KoAlpaca
I loaded that model to use it, but I get an error.

root@4bcef8bd0b49:/usr/src/app# llama -m weights/koAlpaca_65B.bin
main: seed = 1679636956
llama_model_load: loading model from 'weights/koAlpaca_65B.bin' - please wait ...
llama_model_load: invalid model file 'weights/koAlpaca_65B.bin' (bad magic)
llama_init_from_file: failed to load model

Using pre-existing weights does not load into the interface

Using ggml-alpaca-30b-q4.bin in the api/weights folder and rebuilding the entire app, it still does not appear as a model that can be selected in the settings.

How to delete stored chats

Is there a way to delete stored chats?

OAuth Support

Down the road, I believe OAuth support would be awesome for those using self hosted authentication applications, such as Authentik being one of the most configurable.

cannot start new chat after the update

Bug description

the start new chat button is unclickable update the updates made to the repo

Steps to reproduce

Start docker
go to http://localhost:8008/

Environment Information

docker v 4.17.1
Windows 11 Pro

Screenshots

Relevant log output

No response

Confirmations

I'm running the latest version of the main branch.
I checked existing issues to see if this has already been described.

Web page just gives 500 error

Trying to get this running, but when I visit port 8008 I just get a "500 Internal Error" page. Are you able to help?

Logs from the containers:

[root@box serge]# docker compose up 
[+] Running 5/5
 ⠿ Network serge_default      Created                                                                                                                                   0.1s
 ⠿ Container serge-web-1      Created                                                                                                                                   1.2s
 ⠿ Container serge-nginx-1    Created                                                                                                                                   0.1s
 ⠿ Container serge-mongodb-1  Created                                                                                                                                   0.1s
 ⠿ Container serge-api-1      Created                                                                                                                                   0.0s
Attaching to serge-api-1, serge-mongodb-1, serge-nginx-1, serge-web-1
serge-nginx-1    | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
serge-nginx-1    | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
serge-nginx-1    | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
serge-nginx-1    | 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
serge-nginx-1    | 10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
serge-nginx-1    | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
serge-nginx-1    | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
serge-nginx-1    | /docker-entrypoint.sh: Configuration complete; ready for start up
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: using the "epoll" event method
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: nginx/1.23.3
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: built by gcc 12.2.1 20220924 (Alpine 12.2.1_git20220924-r4) 
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: OS: Linux 6.2.2-1.el8.elrepo.x86_64
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker processes
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 29
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 30
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 31
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 32
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 33
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 34
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 35
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 36
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 37
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 38
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 39
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 40
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 41
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 42
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 43
serge-nginx-1    | 2023/03/23 13:23:34 [notice] 1#1: start worker process 44
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.927+00:00"},"s":"I",  "c":"NETWORK",  "id":4915701, "ctx":"-","msg":"Initialized wire specification","attr":{"spec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":17},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":17},"outgoing":{"minWireVersion":6,"maxWireVersion":17},"isInternalClient":true}}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"-","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationDonorService","namespace":"config.tenantMigrationDonors"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationRecipientService","namespace":"config.tenantMigrationRecipients"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"ShardSplitDonorService","namespace":"config.tenantSplitDonors"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I",  "c":"CONTROL",  "id":5945603, "ctx":"main","msg":"Multi threading initialized"}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I",  "c":"CONTROL",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"600d9ce93974"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"6.0.4","gitVersion":"44ff59461c1353638a71e710f385a566bcd2f547","openSSLVersion":"OpenSSL 3.0.2 15 Mar 2022","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu2204","distarch":"x86_64","target_arch":"x86_64"}}}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"22.04"}}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"*"}}}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I",  "c":"STORAGE",  "id":22270,   "ctx":"initandlisten","msg":"Storage engine to use detected by data files","attr":{"dbpath":"/data/db","storageEngine":"wiredTiger"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I",  "c":"STORAGE",  "id":22297,   "ctx":"initandlisten","msg":"Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem","tags":["startupWarnings"]}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=6935M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,remove=true,path=journal,compressor=snappy),builtin_extension_config=(zstd=(compression_level=6)),file_manager=(close_idle_time=600,close_scan_interval=10,close_handle_minimum=2000),statistics_log=(wait=0),json_output=(error,message),verbose=[recovery_progress:1,checkpoint_progress:1,compact_progress:1,backup:0,checkpoint:0,compact:0,evict:0,history_store:0,recovery:0,rts:0,salvage:0,tiered:0,timestamp:0,transaction:0,verify:0,log:0],"}}
serge-api-1      | INFO:     Will watch for changes in these directories: ['/usr/src/app']
serge-api-1      | INFO:     Uvicorn running on http://0.0.0.0:9124 (Press CTRL+C to quit)
serge-api-1      | INFO:     Started reloader process [1] using WatchFiles
serge-web-1      | 
serge-web-1      | > [email protected] dev
serge-web-1      | > vite dev --host 0.0.0.0 --port 9123
serge-web-1      | 
serge-api-1      | Process SpawnProcess-1:
serge-api-1      | Traceback (most recent call last):
serge-api-1      |   File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
serge-api-1      |     self.run()
serge-api-1      |   File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
serge-api-1      |     self._target(*self._args, **self._kwargs)
serge-api-1      |   File "/usr/local/lib/python3.8/dist-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
serge-api-1      |     target(sockets=sockets)
serge-api-1      |   File "/usr/local/lib/python3.8/dist-packages/uvicorn/server.py", line 59, in run
serge-api-1      |     return asyncio.run(self.serve(sockets=sockets))
serge-api-1      |   File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
serge-api-1      |     return loop.run_until_complete(main)
serge-api-1      |   File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
serge-api-1      |   File "/usr/local/lib/python3.8/dist-packages/uvicorn/server.py", line 66, in serve
serge-api-1      |     config.load()
serge-api-1      |   File "/usr/local/lib/python3.8/dist-packages/uvicorn/config.py", line 471, in load
serge-api-1      |     self.loaded_app = import_from_string(self.app)
serge-api-1      |   File "/usr/local/lib/python3.8/dist-packages/uvicorn/importer.py", line 24, in import_from_string
serge-api-1      |     raise exc from None
serge-api-1      |   File "/usr/local/lib/python3.8/dist-packages/uvicorn/importer.py", line 21, in import_from_string
serge-api-1      |     module = importlib.import_module(module_str)
serge-api-1      |   File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
serge-api-1      |     return _bootstrap._gcd_import(name[level:], package, level)
serge-api-1      |   File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
serge-api-1      |   File "<frozen importlib._bootstrap>", line 991, in _find_and_load
serge-api-1      |   File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
serge-api-1      |   File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
serge-api-1      |   File "<frozen importlib._bootstrap_external>", line 848, in exec_module
serge-api-1      |   File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
serge-api-1      |   File "/usr/src/app/main.py", line 4, in <module>
serge-api-1      |     from typing import Annotated
serge-api-1      | ImportError: cannot import name 'Annotated' from 'typing' (/usr/lib/python3.8/typing.py)
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.483+00:00"},"s":"I",  "c":"STORAGE",  "id":4795906, "ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":554}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.483+00:00"},"s":"I",  "c":"RECOVERY", "id":23987,   "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.490+00:00"},"s":"W",  "c":"CONTROL",  "id":22120,   "ctx":"initandlisten","msg":"Access control is not enabled for the database. Read and write access to data and configuration is unrestricted","tags":["startupWarnings"]}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.490+00:00"},"s":"W",  "c":"CONTROL",  "id":22178,   "ctx":"initandlisten","msg":"/sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'","tags":["startupWarnings"]}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.490+00:00"},"s":"W",  "c":"CONTROL",  "id":5123300, "ctx":"initandlisten","msg":"vm.max_map_count is too low","attr":{"currentValue":65530,"recommendedMinimum":1677720,"maxConns":838860},"tags":["startupWarnings"]}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.492+00:00"},"s":"I",  "c":"NETWORK",  "id":4915702, "ctx":"initandlisten","msg":"Updated wire specification","attr":{"oldSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":17},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":17},"outgoing":{"minWireVersion":6,"maxWireVersion":17},"isInternalClient":true},"newSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":17},"incomingInternalClient":{"minWireVersion":17,"maxWireVersion":17},"outgoing":{"minWireVersion":17,"maxWireVersion":17},"isInternalClient":true}}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.492+00:00"},"s":"I",  "c":"REPL",     "id":5853300, "ctx":"initandlisten","msg":"current featureCompatibilityVersion value","attr":{"featureCompatibilityVersion":"6.0","context":"startup"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.492+00:00"},"s":"I",  "c":"STORAGE",  "id":5071100, "ctx":"initandlisten","msg":"Clearing temp directory"}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.492+00:00"},"s":"I",  "c":"CONTROL",  "id":20536,   "ctx":"initandlisten","msg":"Flow Control is enabled on this deployment"}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.493+00:00"},"s":"I",  "c":"FTDC",     "id":20625,   "ctx":"initandlisten","msg":"Initializing full-time diagnostic data capture","attr":{"dataDirectory":"/data/db/diagnostic.data"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.500+00:00"},"s":"I",  "c":"REPL",     "id":6015317, "ctx":"initandlisten","msg":"Setting new configuration state","attr":{"newState":"ConfigReplicationDisabled","oldState":"ConfigPreStart"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.500+00:00"},"s":"I",  "c":"STORAGE",  "id":22262,   "ctx":"initandlisten","msg":"Timestamp monitor starting"}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.501+00:00"},"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-27017.sock"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.501+00:00"},"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"0.0.0.0"}}
serge-mongodb-1  | {"t":{"$date":"2023-03-23T13:23:35.501+00:00"},"s":"I",  "c":"NETWORK",  "id":23016,   "ctx":"listener","msg":"Waiting for connections","attr":{"port":27017,"ssl":"off"}}
serge-web-1      | 
serge-web-1      | Forced re-optimization of dependencies
serge-web-1      | 
serge-web-1      |   VITE v4.2.0  ready in 556 ms
serge-web-1      | 
serge-web-1      |   ➜  Local:   http://localhost:9123/
serge-web-1      |   ➜  Network: http://172.23.0.3:9123/
serge-web-1      | TypeError: fetch failed
serge-web-1      |     at fetch (/usr/src/app/node_modules/undici/index.js:109:13)
serge-web-1      |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
serge-web-1      |     at async Object.eval [as fetch] (/node_modules/@sveltejs/kit/src/runtime/server/fetch.js:27:10)
serge-web-1      |     at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:195:18)
serge-web-1      |     at async load (+layout.ts:11:12)
serge-web-1      |     at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
serge-web-1      |     at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/index.js:169:13)
serge-nginx-1    | 10.150.1.5 - - [23/Mar/2023:13:23:40 +0000] "GET / HTTP/1.1" 500 1029 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36" "-"
serge-web-1      | TypeError: fetch failed
serge-web-1      |     at fetch (/usr/src/app/node_modules/undici/index.js:109:13)
serge-web-1      |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
serge-web-1      |     at async Object.eval [as fetch] (/node_modules/@sveltejs/kit/src/runtime/server/fetch.js:27:10)
serge-web-1      |     at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:195:18)
serge-web-1      |     at async load (+layout.ts:11:12)
serge-web-1      |     at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
serge-web-1      |     at async Module.respond_with_error (/node_modules/@sveltejs/kit/src/runtime/server/page/respond_with_error.js:52:17)
serge-web-1      |     at async resolve (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:12)
serge-web-1      |     at async Module.respond (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:240:20)
serge-web-1      |     at async file:///usr/src/app/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:505:22
serge-nginx-1    | 10.150.1.5 - - [23/Mar/2023:13:23:40 +0000] "GET /favicon.ico HTTP/1.1" 500 1019 "http://192.168.1.110:8008/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36" "-"

Restart Loop

Bug description

It's looping a restart

Steps to reproduce

i just followed the setup steps and wehen i tried running
docker compose exec serge python3 /usr/src/app/api/utils/download.py tokenizer 7B
I got an error saying the container is restarting, so i checked docker an there it says what i pastet into relevant log output

Environment Information

Docker version 20.10.23, build 7155243
Windows 11
Ryzen 7 5800x

Screenshots

No response

Relevant log output

2023-03-25 14:23:57 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:23:58 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:00 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:02 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:03 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:06 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:10 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:17 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:31 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:58 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:30 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:32 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:33 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:35 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:37 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:39 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:43 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:51 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:26:04 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:26:31 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:27:23 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:24 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:45 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:46 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:48 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:50 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:51 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:54 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:58 serge-serge-1  | /bin/sh: 1: /usr/src/app/deploy.sh: not found

Confirmations

I'm running the latest version of the main branch.
I checked existing issues to see if this has already been described.

exit code 100

Bug description

failed to solve: executor failed running [/bin/sh -c apt-get install -y python3-pip]: exit code: 100

execute command "docker compose up -d"

Steps to reproduce

install docker
install dependences
execute the commands for the instructions

Environment Information

Docker version 20.10.17, build 100c70180f

Screenshots

Relevant log output

Ubuntu

Confirmations

I'm running the latest version of the main branch.
I checked existing issues to see if this has already been described.

i cant get it in install on windows 11

C:\Windows\System32>pip install serge
ERROR: Could not find a version that satisfies the requirement serge (from versions: none)
ERROR: No matching distribution found for serge

C:\Windows\System32>git clone https://github.com/nsarrazin/serge.git && cd serge
Cloning into 'serge'...
remote: Enumerating objects: 434, done.
remote: Counting objects: 100% (109/109), done.
remote: Compressing objects: 100% (38/38), done.
remote: Total 434 (delta 80), reused 74 (delta 70), pack-reused 325
Receiving objects: 100% (434/434), 95.56 KiB | 1.99 MiB/s, done.
Resolving deltas: 100% (247/247), done.

C:\Windows\System32\serge>
C:\Windows\System32\serge>cp .env.sample .env
'cp' is not recognized as an internal or external command,
operable program or batch file.

C:\Windows\System32\serge>
C:\Windows\System32\serge>docker compose up -d
'docker' is not recognized as an internal or external command,
operable program or batch file.

C:\Windows\System32\serge>docker compose exec api python3 /usr/src/app/utils/download.py tokenizer 7B
'docker' is not recognized as an internal or external command,
operable program or batch file.

Stream model output to UI

Currently the generated answer from the model only gets sent to the client after it is done generating.

It would drastically improve UX if it could instead stream the answer as it generates it, reducing the impression of latency.

This will require implementing Server-sent events in the API.

If you refresh while the chat is generating, the answer does not get saved to conversations.

Title describes it.

It probably has something to do with the client closing the connection and therefore the API server has an issue and doesn´t save the output to the conversation.

Implementation of the ReAct pattern for LLMs to access on external content from web

This is amazing feature, not necessary very complicated.

"The ReAct pattern (for Reason+Act) is described in this paper. It's a pattern where you implement additional actions that an LLM can take - searching Wikipedia or running calculations for example - and then teach it how to request that those actions are run, then feed their results back into the LLM."

Here is an example of implementation : https://til.simonwillison.net/llms/python-react-pattern

API Server is hanging while converting model files

Tried setting it up exactly as README guides, but I keep getting this error, resulting in a 500 error:

A server error occurred. See below: main: seed = 1679852543

Hello,

I've installed Serge on a MacBook Pro with 16GB Memory. I'm trying to use the 13B Model, but get the following error for any message I send:

A server error occurred. See below: main: seed = 1679852543 llama_model_load: loading model from '/usr/src/app/weights/ggml-alpaca-13B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 5120
llama_model_load: n_mult = 256
llama_model_load: n_head = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 13824
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size = 800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/1 from '/usr/src/app/weights/ggml-alpaca-13B-q4_0.bin'
llama_model_load: ............................................. done
llama_model_load: model size = 7759.39 MB / num tensors = 363
system_info: n_threads = 4 / 5 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |

(the message is interrupted and doesn't display anything else).

I've tried running the 7B model and it works fine.

Any idea what could be the issue here? Not enough Memory causing a crash perhaps?

Thank you

VITE is not insalled in the docker image

Since we're mounting the node_modules, it works if the host has VITE installed. But in a more isolated scenario where the node_modules are in a volume or just part of the image, VITE should already be installed in the image.

Context: I don't use npm at all (don't have it installed) and so running the container didn't work (VITE not found). Worked after I logged into the container and npm install vite

invalid model file '/usr/src/app/weights/ggml-alpaca-7B-q4_0.bin' (too old, regenerate your model files!)

Hi,

So I got this error.

I tried converting the model (7B) using the provided convert.py but that just doesn't do anything. No error message, no other output, no converted file.

Am I missing something obvious?

Thanks!

serge-chat / serge Goto Github PK

serge's Issues

Bug description

Steps to reproduce

Environment Information

Screenshots

Relevant log output

Bug description

Steps to reproduce

Environment Information

Screenshots

Relevant log output

Confirmations

Bug description

Steps to reproduce

Environment Information

Screenshots

Relevant log output

Confirmations

Bug description

Steps to reproduce

Environment Information

Screenshots

Relevant log output

Confirmations

Bug description

Steps to reproduce

Environment Information

Screenshots

Relevant log output

Confirmations

Bug description

Steps to reproduce

Environment Information

Screenshots

Relevant log output

Confirmations

Bug description

Steps to reproduce

Environment Information

Screenshots

Relevant log output

Confirmations

Bug description

Steps to reproduce

Environment Information

Screenshots

Relevant log output

Confirmations

Recommend Projects

Recommend Topics

Recommend Org