serge-chat / serge Goto Github PK
View Code? Open in Web Editor NEWA web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
Home Page: https://serge.chat
License: Apache License 2.0
A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
Home Page: https://serge.chat
License: Apache License 2.0
While trying to build on an aarch64 VPS, it errors out:
=> ERROR [serge-api builder 4/4] RUN cd llama.cpp && make && mv main llama 5.4s
=> [serge-web 6/6] COPY . . 0.1s
------
> [serge-api builder 4/4] RUN cd llama.cpp && make && mv main llama:
#0 0.422 I llama.cpp build info:
#0 0.423 I UNAME_S: Linux
#0 0.423 I UNAME_P: unknown
#0 0.423 I UNAME_M: aarch64
#0 0.423 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mcpu=native
#0 0.423 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native
#0 0.423 I LDFLAGS:
#0 0.423 I CC: cc (GCC) 10.2.0
#0 0.423 I CXX: g++ (GCC) 10.2.0
#0 0.423
#0 0.424 cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mcpu=native -c ggml.c -o ggml.o
#0 4.559 Assembler messages:
#0 4.559 Error: unknown architectural extension `sb+ssbs'
#0 4.559 Error: unrecognized option -march=armv8.2-a+crypto+fp16+rcpc+dotprod+sb+ssbs
#0 5.212 make: *** [Makefile:221: ggml.o] Error 1
------
failed to solve: process "/bin/sh -c cd llama.cpp && make && mv main llama" did not complete successfully: exit code: 2```
Error: EMFILE: too many open files, watch '/usr/src/app/web/vite.config.ts' is presented, but the host system and docker container has a limit of 1048576
Deploy YAML on K8 cluster:
https://github.com/nsarrazin/serge/wiki/Integrating-Serge-in-your-orchestration#kubernetes-example
Docker image: ghcr.io/nsarrazin/serge:release
Containrd
OS Linux Ubuntu 22.04 LTS
CPU: 48 x Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (2 Sockets)
No response
> [email protected] dev
> vite dev --host 0.0.0.0 --port 8008
INFO: Started server process [12]
INFO: Waiting for application startup.
INFO: main initializing database connection
▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]
tsconfig.json:2:12:
2 │ "extends": "./.svelte-kit/tsconfig.json",
╵ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
node:internal/errors:490
ErrorCaptureStackTrace(err);
^
Error: EMFILE: too many open files, watch '/usr/src/app/web/vite.config.ts'
at FSWatcher.<computed> (node:internal/fs/watchers:247:19)
at Object.watch (node:fs:2350:36)
at createFsWatchInstance (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50313:17)
at setFsWatchListener (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50360:15)
at NodeFsHandler._watchWithNodeFs (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50515:14)
at NodeFsHandler._handleFile (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50579:23)
at NodeFsHandler._addToNodeFs (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50821:21)
at async file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:51817:21
at async Promise.all (index 1)
Emitted 'error' event on FSWatcher instance at:
at FSWatcher._handleError (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:52013:10)
at NodeFsHandler._addToNodeFs (file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:50829:18)
at async file:///usr/src/app/web/node_modules/vite/dist/node/chunks/dep-c167897e.js:51817:21
at async Promise.all (index 1) {
errno: -24,
syscall: 'watch',
code: 'EMFILE',
path: '/usr/src/app/web/vite.config.ts',
filename: '/usr/src/app/web/vite.config.ts'
}
Node.js v19.8.1
### Confirmations
- [X] I'm running the latest version of the main branch.
- [X] I checked existing issues to see if this has already been described.
Ive got it to work locally on my M2 with the help of the issue before.
But I got stuck with the Websocket.
When I ask something, than it loads endless.
WebSocket connection to 'ws://localhost:8008/' failed: There was a bad response from the server.
Ive tried in chrome and Safari.
Docker build failing on new docker-compose.yaml
[+] Building 0.3s (2/4)
=> [internal] load build definition from Dockerfile
[+] Building 0.3s (4/4) FINISHED
=> [internal] load build definition from Dockerfile
=> => transferring dockerfile: 69B
=> [internal] load .dockerignore
=> => transferring context: 69B
=> ERROR [internal] load metadata for docker.io/library/ubuntu:22.04 gcc:11
[internal] load metadata for docker.io/library/ubuntu:22.04:
[internal] load metadata for docker.io/library/gcc:11:
failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: rpc error: code = Unknown desc = error getting credentials - err: exit status 1, out: ``
Clean install.
Wipe out old dangling images, containers, networks etc.
main
(sha 4047fbe)OS: Apple Silicon MacOS Ventura, M2 Max
sudo docker version
Client:
Cloud integration: v1.0.31
Version: 20.10.23
API version: 1.41
Go version: go1.18.10
Git commit: 7155243
Built: Thu Jan 19 17:35:19 2023
OS/Arch: darwin/arm64
Context: default
Experimental: true
Server: Docker Desktop 4.17.0 (99724)
Engine:
Version: 20.10.23
API version: 1.41 (minimum version 1.12)
Go version: go1.18.10
Git commit: 6051f14
Built: Thu Jan 19 17:31:28 2023
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.18
GitCommit: 2456e983eb9e37e47538f59ea18f2043c9a73640
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
No response
[+] Building 0.3s (2/4)
=> [internal] load build definition from Dockerfile 0.0s
[+] Building 0.3s (4/4) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 69B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 69B 0.0s
=> ERROR [internal] load metadata for docker.io/library/ubuntu:22.04 0.3s
=> ERROR [internal] load metadata for docker.io/library/gcc:11 0.3s
------
> [internal] load metadata for docker.io/library/ubuntu:22.04:
------
------
> [internal] load metadata for docker.io/library/gcc:11:
------
failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: rpc error: code = Unknown desc = error getting credentials - err: exit status 1, out: ``
When I run docker compose exec api python3 /usr/src/app/utils/download.py tokenizer 30B, the following error appears in a certain point of the installation:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 444, in _error_catcher
yield
File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 567, in read
data = self._fp_read(amt) if not fp_closed else b""
File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 533, in _fp_read
return self._fp.read(amt) if amt is not None else self._fp.read()
File "/usr/lib/python3.10/http/client.py", line 465, in read
s = self.fp.read(amt)
File "/usr/lib/python3.10/socket.py", line 705, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.10/ssl.py", line 1274, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.10/ssl.py", line 1130, in read
return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
yield from self.raw.stream(chunk_size, decode_content=True)
File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 628, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 566, in read
with self._error_catcher():
File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 449, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/utils/download.py", line 53, in <module>
download_models(args.model)
File "/usr/src/app/utils/download.py", line 35, in download_models
huggingface_hub.hf_hub_download(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1326, in hf_hub_download
http_get(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 538, in http_get
for chunk in r.iter_content(chunk_size=10 * 1024 * 1024):
File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 822, in generate
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cdn-lfs.huggingface.co', port=443): Read timed out.
The steps I followed:
and it crushed.
docker --version
Docker version 23.0.1, build a5ee5b1dfc
OS:
sw_vers
ProductName: macOS
ProductVersion: 13.2.1
BuildVersion: 22D68
M2 chipset
No response
docker compose up -d
[+] Building 35.3s (13/30)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.52kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 71B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:22.04 2.6s
=> [internal] load metadata for docker.io/library/gcc:12 2.6s
=> [internal] load build context 0.0s
=> => transferring context: 149.39kB 0.0s
=> [deployment 1/21] FROM docker.io/library/ubuntu:22.04@sha256:67211c1 3.1s
=> => resolve docker.io/library/ubuntu:22.04@sha256:67211c14fa74f070d27c 0.0s
=> => sha256:cd741b12a7eaa64357041c2d3f4590c898313a7f8 27.35MB / 27.35MB 2.6s
=> => sha256:67211c14fa74f070d27cc59d69a7fa9aeff8e28ea11 1.13kB / 1.13kB 0.0s
=> => sha256:537da24818633b45fcb65e5285a68c3ec1f3db25f5ae547 424B / 424B 0.0s
=> => sha256:bab8ce5c00ca3ef91e0d3eb4c6e6d6ec7cffa9574c4 2.32kB / 2.32kB 0.0s
=> => extracting sha256:cd741b12a7eaa64357041c2d3f4590c898313a7f8f65cd15 0.4s
=> [llama_builder 1/4] FROM docker.io/library/gcc:12@sha256:b12d1e7c37e 30.3s
=> => resolve docker.io/library/gcc:12@sha256:b12d1e7c37e101fd76848570b8 0.0s
=> => sha256:ba265c6e20b2489ecfef524fad8f28916c9d92a9e63 9.19kB / 9.19kB 0.0s
=> => sha256:7971239fe1d69763272ccc0b2527efa95547d37c536 5.15MB / 5.15MB 2.1s
=> => sha256:b2eeecc98d6bc3812474852a39ce0a97be52fc7b961 2.22kB / 2.22kB 0.0s
=> => sha256:8022b074731d9ecee7f4fba79b993920973811dda 53.70MB / 53.70MB 5.1s
=> => sha256:b12d1e7c37e101fd76848570b81352fe9546dd1caad 1.43kB / 1.43kB 0.0s
=> => sha256:26c861b53509d61c37240d2f80efb3a351d2f1d7f 10.87MB / 10.87MB 4.3s
=> => sha256:1714880ecc1c021a5f708f4369f91d3c2c53b998 54.68MB / 54.68MB 15.2s
=> => sha256:895a945a1f9ba441c2748501c4d46569edfbc2 189.73MB / 189.73MB 25.7s
=> => sha256:cd267d572e2202b3070cca7993eb424a4084c7844 16.13kB / 16.13kB 5.5s
=> => extracting sha256:8022b074731d9ecee7f4fba79b993920973811dda168bbc0 0.7s
=> => sha256:5f1a14b7155767f4a80c696309effd494189de 125.97MB / 125.97MB 23.5s
=> => extracting sha256:7971239fe1d69763272ccc0b2527efa95547d37c53630ed0 0.1s
=> => extracting sha256:26c861b53509d61c37240d2f80efb3a351d2f1d7f4f8e8ec 0.1s
=> => sha256:d29d4e33051b1fab13de7c854ee4fdac99d73675 10.02kB / 10.02kB 15.9s
=> => extracting sha256:1714880ecc1c021a5f708f4369f91d3c2c53b998a56d563d 0.7s
=> => sha256:f54184d767dfe3575b7a0f3411dec9c55dad00dc2e 1.89kB / 1.89kB 16.2s
=> => extracting sha256:895a945a1f9ba441c2748501c4d46569edfbc2bfbdb9b47d 2.2s
=> => extracting sha256:cd267d572e2202b3070cca7993eb424a4084c7844e7725d4 0.0s
=> => extracting sha256:5f1a14b7155767f4a80c696309effd494189dec7c5e06eba 1.9s
=> => extracting sha256:d29d4e33051b1fab13de7c854ee4fdac99d736756e704e57 0.0s
=> => extracting sha256:f54184d767dfe3575b7a0f3411dec9c55dad00dc2ea8d1e5 0.0s
=> [deployment 2/21] WORKDIR /usr/src/app 0.1s
=> [deployment 3/21] RUN apt update 5.0s
=> CANCELED [deployment 4/21] RUN apt-get install -y python3-pip curl 24.4s
=> [llama_builder 2/4] WORKDIR /tmp 0.2s
=> [llama_builder 3/4] RUN git clone https://github.com/ggerganov/llama. 1.3s
=> ERROR [llama_builder 4/4] RUN cd llama.cpp && make && mv main 0.5s
------
> [llama_builder 4/4] RUN cd llama.cpp && make && mv main llama:
#0 0.265 I llama.cpp build info:
#0 0.265 I UNAME_S: Linux
#0 0.265 I UNAME_P: unknown
#0 0.265 I UNAME_M: aarch64
#0 0.265 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mcpu=native
#0 0.265 I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -mcpu=native
#0 0.265 I LDFLAGS:
#0 0.265 I CC: cc (GCC) 12.2.0
#0 0.265 I CXX: g++ (GCC) 12.2.0
#0 0.265
#0 0.265 cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mcpu=native -c ggml.c -o ggml.o
#0 0.484 In file included from ggml.c:137:
#0 0.484 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h: In function 'ggml_vec_dot_f16':
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:29182:1: error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch
#0 0.485 29182 | vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
#0 0.485 | ^~~~~~~~~
#0 0.485 ggml.c:799:37: note: called from here
#0 0.485 799 | #define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
#0 0.485 | ^~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:823:41: note: in expansion of macro 'GGML_F16x8_FMA'
#0 0.485 823 | #define GGML_F16_VEC_FMA GGML_F16x8_FMA
#0 0.485 | ^~~~~~~~~~~~~~
#0 0.485 ggml.c:1321:22: note: in expansion of macro 'GGML_F16_VEC_FMA'
#0 0.485 1321 | sum[j] = GGML_F16_VEC_FMA(sum[j], ax[j], ay[j]);
#0 0.485 | ^~~~~~~~~~~~~~~~
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:29182:1: error: inlining failed in call to 'always_inline' 'vfmaq_f16': target specific option mismatch
#0 0.485 29182 | vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
#0 0.485 | ^~~~~~~~~
#0 0.485 ggml.c:799:37: note: called from here
#0 0.485 799 | #define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
#0 0.485 | ^~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:823:41: note: in expansion of macro 'GGML_F16x8_FMA'
#0 0.485 823 | #define GGML_F16_VEC_FMA GGML_F16x8_FMA
#0 0.485 | ^~~~~~~~~~~~~~
#0 0.485 ggml.c:1321:22: note: in expansion of macro 'GGML_F16_VEC_FMA'
#0 0.485 1321 | sum[j] = GGML_F16_VEC_FMA(sum[j], ax[j], ay[j]);
#0 0.485 | ^~~~~~~~~~~~~~~~
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:28760:1: error: inlining failed in call to 'always_inline' 'vaddq_f16': target specific option mismatch
#0 0.485 28760 | vaddq_f16 (float16x8_t __a, float16x8_t __b)
#0 0.485 | ^~~~~~~~~
#0 0.485 ggml.c:805:22: note: called from here
#0 0.485 805 | x[2*i] = vaddq_f16(x[2*i], x[2*i+1]); \
#0 0.485 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:826:41: note: in expansion of macro 'GGML_F16x8_REDUCE'
#0 0.485 826 | #define GGML_F16_VEC_REDUCE GGML_F16x8_REDUCE
#0 0.485 | ^~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:1326:5: note: in expansion of macro 'GGML_F16_VEC_REDUCE'
#0 0.485 1326 | GGML_F16_VEC_REDUCE(sumf, sum);
#0 0.485 | ^~~~~~~~~~~~~~~~~~~
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:28760:1: error: inlining failed in call to 'always_inline' 'vaddq_f16': target specific option mismatch
#0 0.485 28760 | vaddq_f16 (float16x8_t __a, float16x8_t __b)
#0 0.485 | ^~~~~~~~~
#0 0.485 ggml.c:808:22: note: called from here
#0 0.485 808 | x[4*i] = vaddq_f16(x[4*i], x[4*i+2]); \
#0 0.485 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:826:41: note: in expansion of macro 'GGML_F16x8_REDUCE'
#0 0.485 826 | #define GGML_F16_VEC_REDUCE GGML_F16x8_REDUCE
#0 0.485 | ^~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:1326:5: note: in expansion of macro 'GGML_F16_VEC_REDUCE'
#0 0.485 1326 | GGML_F16_VEC_REDUCE(sumf, sum);
#0 0.485 | ^~~~~~~~~~~~~~~~~~~
#0 0.485 /usr/local/lib/gcc/aarch64-linux-gnu/12.2.0/include/arm_neon.h:28760:1: error: inlining failed in call to 'always_inline' 'vaddq_f16': target specific option mismatch
#0 0.485 28760 | vaddq_f16 (float16x8_t __a, float16x8_t __b)
#0 0.485 | ^~~~~~~~~
#0 0.485 ggml.c:811:22: note: called from here
#0 0.485 811 | x[8*i] = vaddq_f16(x[8*i], x[8*i+4]); \
#0 0.485 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:826:41: note: in expansion of macro 'GGML_F16x8_REDUCE'
#0 0.485 826 | #define GGML_F16_VEC_REDUCE GGML_F16x8_REDUCE
#0 0.485 | ^~~~~~~~~~~~~~~~~
#0 0.485 ggml.c:1326:5: note: in expansion of macro 'GGML_F16_VEC_REDUCE'
#0 0.485 1326 | GGML_F16_VEC_REDUCE(sumf, sum);
#0 0.485 | ^~~~~~~~~~~~~~~~~~~
#0 0.501 make: *** [Makefile:221: ggml.o] Error 1
------
failed to solve: executor failed running [/bin/sh -c cd llama.cpp && make && mv main llama]: exit code: 2
hello to all, I have a problem with the deployment of the container,
I have a server under ubuntu with a docker infrastructure, when I want to install the project, the installation crashes at step 17/21 at the time of the npm install in the web folder (err timeout), I can't explain myself too the why.
if anyone has an idea i'm a taker.
thank
docker compose up -d
ubuntu server
intel xeon
64go ram
No response
No response
I'm using it under Windwos 11 with alpaca 7B
Ok, it's great overall, but I have a native cpp version (chat.exe) and it's running 2 times faster than your docker version.
Also, how to use the API ? I saw in docker something like 127.0.0.1:35272 - "GET /chat/5fe89704-c7ca-4a67-9ec2-f267689b0ffe/question?prompt=No%2C+it%27s+actually+14 HTTP/1.1" 200 OK
But where to look for proper API documentation ?
It would be great if (especially) the API would scale (more than 4 cpu cores) or even be able to run multiple instances in parallel.
This would greatly enhanced the usability on systems with plenty of power or even clusters.
Hi, I noticed there was interest in using LangChain with Alpaca and you did a lot of the work already needed for streaming so I wondered if I could built on it and make it into a LLM class, here is my progress so far, just on a Gist right now:
https://gist.github.com/lukestanley/6517823485f88a40a09979c1a19561ce
I mention it in this existing LangChain issue:
langchain-ai/langchain#1777
Obviously feel free to do what you like with my small contribution.
In the README file, I think you mean to say that models should be downloaded inside api/weights
folder not the non-existent models
folder.
Currently you need to run both API & Web server because they're behind nginx and if the web container is not started then nginx cannot find the web
hostname and fails.
Would be nice to find a way to optioanlly just run the API server for integration with other services.
Allow large language models with graphics card with large RAM.
It seems that the GitHub container registry tied to this repo is not readable by an anonymous source (i.e. docker pull). Example error below:
Error:
Error response from daemon: Head "https://ghcr.io/v2/nsarrazin/serge/manifests/release": denied: denied
A missing parameter in deploy.sh can lead to failure to join the web service on Kubernetes. I have seen that adding the following to deploy.sh fix the issue:
cd api && uvicorn main:app --host 0.0.0.0 --port 9124 --root-path /api/ &
Here is the error log if needed.
INFO: main initializing models
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:9124 (Press CTRL+C to quit)
INFO: main models are ready
11:26:38 AM [vite] http proxy error at /chats:
Error: connect ECONNREFUSED ::1:9124
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1532:16)
11:26:38 AM [vite] http proxy error at /chat/420689cd-99de-477e-8ea0-b0ec82f51830:
Error: connect ECONNREFUSED ::1:9124
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1532:16)
SyntaxError: Unexpected end of JSON input
at JSON.parse (<anonymous>)
at Proxy.eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:286:19)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async load (+layout.ts:12:17)
at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/index.js:169:13)
SyntaxError: Unexpected token 'I', "Internal S"... is not valid JSON
at JSON.parse (<anonymous>)
at Proxy.eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:286:19)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async load (+layout.ts:12:17)
at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
at async Module.respond_with_error (/node_modules/@sveltejs/kit/src/runtime/server/page/respond_with_error.js:52:17)
at async resolve (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:12)
at async Module.respond (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:240:20)
at async file:///usr/src/app/web/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:505:22
11:27:03 AM [vite] http proxy error at /chats:
Error: connect ECONNREFUSED ::1:9124
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1532:16)
11:27:03 AM [vite] http proxy error at /models:
Error: connect ECONNREFUSED ::1:9124
at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1532:16)
SyntaxError: Unexpected end of JSON input
at JSON.parse (<anonymous>)
at Proxy.eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:286:19)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async load (+layout.ts:12:17)
at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/index.js:169:13)
This should not have any impact on the deployment on Docker.
kubectl run serge-dev --image=ghcr.io/nsarrazin/serge:release --port='8008' --port='9124' --expose=true
service/serge-dev created pod/serge-dev created
Go to firefox and enter service IP exposing the webservice
OS: Rocky Linux 8.7
Kubernetes Version: Kubernetes 1.25.6
Browser: Firefox 111.0.1
No response
No response
Hello,
can someone please help to set up the Docker image on my Mac with M2 Chip.
bitnami/containers#14169
This solution seems not to work.
Would be great if a Solution comes up.
Thank you
I got the changes to add in the Makefile and gml.c the compilation support of the AVX512 instruction on the CPU in llama.
I tested quickly, it seems indeed faster.
Here is code : https://github.com/Ameobea/alpaca.cpp/tree/llama-avx512-support
Thank you for creating Serge. It is wonderful to have the option of a self-hosted AI.
Deployment is easy if you are technical, but not so easy for ordinary people. If you support Serge on Cloudron, people will be able to deploy it with one click.
Most of the work is already completed, as you have a Docker image. Try the demo on the Cloudron page to see how easy it is:
https://cloudron.io
A thread was started to try and support Serge. Please introduce yourself and see if it can be completed soon:
https://forum.cloudron.io/topic/8872/serge-llama-made-easy-self-hosted-ai-chat
PS Why choose Discord when there are Free Software alternatives like Element?
Also, the 30B-q4 model only gives the "Loading" response and never talks to us...
Hello,
First of all, thank you for making this! I have a question about the copyright part of the project though. If I understand this correctly, Meta only released such models for Academic students, but these are converted ones.
1.) As an academic student, can I use these freely? Is it safe and legal to upload, modify, finetune these models?
2.) If I'm not an academic student, can I still use these?
Thank you!
Pretty low hanging fruit with the wrapper we have, would be great to create a custom LangChain LLM wrapper for llama.cpp
.
Then we could use it in the API and do all sorts of cool things with Serge.
I get this error
MongoDB requires ARMv8.2-A
I'm running it on a raspberry pi 4. Everything works accept the mongo db which exites as soon as I start it
Currently it's a bit of a mess with little to no structure.
I'll be working on making things a bit more structured and expendable.
Love the project and I would love to contribute! My first thought its building a quick CI/CD pipeline with GitHub actions that allows any merge into a "release" branch would trigger a build of a docker image and upload it to the GitHub package repo. This way people can use their own docker-compose.yml (or the provided one in the repo) and pull the images without building them. Would I be able to get this setup?
Edit: Grammar
README:
The old weights will be renamed to *.bin.old and the new weights will be named *.bin.
Not sure if this feature is possible, but I'd like the ability to specify (preferably in my .env file) models to leave pre-loaded in memory. It shouldn't be the default choice, but it would allow bandwidth-constrained servers to run faster, as well as reducing overall latency when running as an API.
Thanks for making this, and I look forward to seeing your plans for the API refactor! 😃
Would be cool to have in the API.
While trying to figure out why prompts were hanging I check the logs and found this:
6-be6b-40cb-95cb-69c96b0f9d05" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0" "-"
web_1 |
web_1 | > [email protected] dev
web_1 | > vite dev --host 0.0.0.0 --port 9123
web_1 |
web_1 | ▲ [WARNING] Cannot find base config file "./.svelte-kit/tsconfig.json" [tsconfig.json]
web_1 |
web_1 | tsconfig.json:2:12:
web_1 | 2 │ "extends": "./.svelte-kit/tsconfig.json",
web_1 | ╵ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
web_1 |
web_1 |
web_1 | Forced re-optimization of dependencies
web_1 |
web_1 | VITE v4.2.0 ready in 1285 ms
web_1 |
web_1 | ➜ Local: http://localhost:9123/
web_1 | ➜ Network: http://172.19.0.4:9123/
web_1 | 5:37:04 PM [vite-plugin-svelte] ssr compile in progress ...
web_1 |
Hi!
Sorry to make this an issue, but I'm running into it! I've followed the README and am trying to get it running but I run into quite a few errors. Maybe I'm just missing a dependency or something like that, but I haven't quite figured it out for myself yet and am wondering if others might be running into the same thing? I've tried this on two fairly clean Ubuntu 22.04 machines with the same results.
After the initial docker stuff does its pulls, I run into these lines of output:
Status: Downloaded newer image for gcc:10.2
---> 987c8580a041
Step 2/12 : WORKDIR /tmp
---> Running in 6eb681888247
Removing intermediate container 6eb681888247
---> 0999a4b386ae
Step 3/12 : RUN git clone https://github.com/ggerganov/llama.cpp.git --branch master-d5850c5
---> Running in 7f705b3f31b9
Cloning into 'llama.cpp'...
Note: checking out 'd5850c53ca179b9674b98f35d359763416a3cc11'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
Removing intermediate container 7f705b3f31b9
---> 1a27e05ce64f
Step 4/12 : RUN cd llama.cpp && make && mv main llama
---> Running in eae1a4f90a3a
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: unknown
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mavx -msse3
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS:
I CC: cc (GCC) 10.2.0
I CXX: g++ (GCC) 10.2.0
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mavx -msse3 -c ggml.c -o ggml.o
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
from ggml.c:155:
ggml.c: In function 'ggml_vec_dot_f16':
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
52 | _mm256_cvtph_ps (__m128i __A)
| ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
915 | #define GGML_F32Cx8_LOAD(x) _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
925 | #define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx8_LOAD(p)
| ^~~~~~~~~~~~~~~~
ggml.c:1319:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
1319 | ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
| ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
52 | _mm256_cvtph_ps (__m128i __A)
| ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
915 | #define GGML_F32Cx8_LOAD(x) _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
925 | #define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx8_LOAD(p)
| ^~~~~~~~~~~~~~~~
ggml.c:1318:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
1318 | ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
| ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
52 | _mm256_cvtph_ps (__m128i __A)
| ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
915 | #define GGML_F32Cx8_LOAD(x) _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
925 | #define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx8_LOAD(p)
| ^~~~~~~~~~~~~~~~
ggml.c:1318:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
1318 | ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
| ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
52 | _mm256_cvtph_ps (__m128i __A)
| ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
915 | #define GGML_F32Cx8_LOAD(x) _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
925 | #define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx8_LOAD(p)
| ^~~~~~~~~~~~~~~~
ggml.c:1319:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
1319 | ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
| ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
52 | _mm256_cvtph_ps (__m128i __A)
| ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
915 | #define GGML_F32Cx8_LOAD(x) _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
925 | #define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx8_LOAD(p)
| ^~~~~~~~~~~~~~~~
ggml.c:1318:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
1318 | ax[j] = GGML_F16_VEC_LOAD(x + i + j*GGML_F16_EPR, j);
| ^~~~~~~~~~~~~~~~~
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:113,
from ggml.c:155:
/usr/local/lib/gcc/x86_64-linux-gnu/10.2.0/include/f16cintrin.h:52:1: error: inlining failed in call to 'always_inline' '_mm256_cvtph_ps': target specific option mismatch
52 | _mm256_cvtph_ps (__m128i __A)
| ^~~~~~~~~~~~~~~
ggml.c:915:33: note: called from here
915 | #define GGML_F32Cx8_LOAD(x) _mm256_cvtph_ps(_mm_loadu_si128((__m128i *)(x)))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:925:37: note: in expansion of macro 'GGML_F32Cx8_LOAD'
925 | #define GGML_F16_VEC_LOAD(p, i) GGML_F32Cx8_LOAD(p)
| ^~~~~~~~~~~~~~~~
ggml.c:1319:21: note: in expansion of macro 'GGML_F16_VEC_LOAD'
1319 | ay[j] = GGML_F16_VEC_LOAD(y + i + j*GGML_F16_EPR, j);
| ^~~~~~~~~~~~~~~~~
make: *** [Makefile:221: ggml.o] Error 1
The command '/bin/sh -c cd llama.cpp && make && mv main llama' returned a non-zero code: 2
ERROR: Service 'api' failed to build : Build failed
If you ask an answer and switch conversations while it's answering you, the answer will follow you around and continue streaming.
If you refresh the page afterwards, it's gone and the answer will be in the right chat, so this is just a rendering bug, not a big deal but would be nice to fix.
Currently the compiled llama.cpp
binary we use only supports alpaca. The source had to be modified to accept a model as a single file (alpaca 13B is a single file, as opposed to the 2-part model expected for LLaMa 13B). But doing so breaks compatibility for other LLaMa based models.
Relevant changes here.
https://github.com/nsarrazin/serge/blob/a837ea48e017289a21a9574b0fe862f541874a14/api/Dockerfile.api#L18-L20
We could make this more generic, but maybe it needs to be handled in llama.cpp
instead ? Not sure yet.
Create on the front end the possibility to upload a PDF file for example with the following procedure:
Great work. I'm super impressed by this project!
https://github.com/Beomi/KoAlpaca
I loaded that model to use it, but I get an error.
root@4bcef8bd0b49:/usr/src/app# llama -m weights/koAlpaca_65B.bin
main: seed = 1679636956
llama_model_load: loading model from 'weights/koAlpaca_65B.bin' - please wait ...
llama_model_load: invalid model file 'weights/koAlpaca_65B.bin' (bad magic)
llama_init_from_file: failed to load model
Using ggml-alpaca-30b-q4.bin in the api/weights folder and rebuilding the entire app, it still does not appear as a model that can be selected in the settings.
Is there a way to delete stored chats?
Down the road, I believe OAuth support would be awesome for those using self hosted authentication applications, such as Authentik being one of the most configurable.
the start new chat button is unclickable update the updates made to the repo
Start docker
go to http://localhost:8008/
docker v 4.17.1
Windows 11 Pro
No response
Trying to get this running, but when I visit port 8008 I just get a "500 Internal Error" page. Are you able to help?
Logs from the containers:
[root@box serge]# docker compose up
[+] Running 5/5
⠿ Network serge_default Created 0.1s
⠿ Container serge-web-1 Created 1.2s
⠿ Container serge-nginx-1 Created 0.1s
⠿ Container serge-mongodb-1 Created 0.1s
⠿ Container serge-api-1 Created 0.0s
Attaching to serge-api-1, serge-mongodb-1, serge-nginx-1, serge-web-1
serge-nginx-1 | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
serge-nginx-1 | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
serge-nginx-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
serge-nginx-1 | 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
serge-nginx-1 | 10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
serge-nginx-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
serge-nginx-1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
serge-nginx-1 | /docker-entrypoint.sh: Configuration complete; ready for start up
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: using the "epoll" event method
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: nginx/1.23.3
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: built by gcc 12.2.1 20220924 (Alpine 12.2.1_git20220924-r4)
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: OS: Linux 6.2.2-1.el8.elrepo.x86_64
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker processes
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 29
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 30
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 31
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 32
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 33
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 34
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 35
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 36
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 37
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 38
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 39
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 40
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 41
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 42
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 43
serge-nginx-1 | 2023/03/23 13:23:34 [notice] 1#1: start worker process 44
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.927+00:00"},"s":"I", "c":"NETWORK", "id":4915701, "ctx":"-","msg":"Initialized wire specification","attr":{"spec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":17},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":17},"outgoing":{"minWireVersion":6,"maxWireVersion":17},"isInternalClient":true}}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I", "c":"CONTROL", "id":23285, "ctx":"-","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I", "c":"NETWORK", "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I", "c":"REPL", "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationDonorService","namespace":"config.tenantMigrationDonors"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I", "c":"REPL", "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationRecipientService","namespace":"config.tenantMigrationRecipients"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I", "c":"REPL", "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"ShardSplitDonorService","namespace":"config.tenantSplitDonors"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.928+00:00"},"s":"I", "c":"CONTROL", "id":5945603, "ctx":"main","msg":"Multi threading initialized"}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I", "c":"CONTROL", "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"600d9ce93974"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"6.0.4","gitVersion":"44ff59461c1353638a71e710f385a566bcd2f547","openSSLVersion":"OpenSSL 3.0.2 15 Mar 2022","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu2204","distarch":"x86_64","target_arch":"x86_64"}}}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I", "c":"CONTROL", "id":51765, "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"22.04"}}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I", "c":"CONTROL", "id":21951, "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"*"}}}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I", "c":"STORAGE", "id":22270, "ctx":"initandlisten","msg":"Storage engine to use detected by data files","attr":{"dbpath":"/data/db","storageEngine":"wiredTiger"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I", "c":"STORAGE", "id":22297, "ctx":"initandlisten","msg":"Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem","tags":["startupWarnings"]}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:34.929+00:00"},"s":"I", "c":"STORAGE", "id":22315, "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=6935M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,remove=true,path=journal,compressor=snappy),builtin_extension_config=(zstd=(compression_level=6)),file_manager=(close_idle_time=600,close_scan_interval=10,close_handle_minimum=2000),statistics_log=(wait=0),json_output=(error,message),verbose=[recovery_progress:1,checkpoint_progress:1,compact_progress:1,backup:0,checkpoint:0,compact:0,evict:0,history_store:0,recovery:0,rts:0,salvage:0,tiered:0,timestamp:0,transaction:0,verify:0,log:0],"}}
serge-api-1 | INFO: Will watch for changes in these directories: ['/usr/src/app']
serge-api-1 | INFO: Uvicorn running on http://0.0.0.0:9124 (Press CTRL+C to quit)
serge-api-1 | INFO: Started reloader process [1] using WatchFiles
serge-web-1 |
serge-web-1 | > [email protected] dev
serge-web-1 | > vite dev --host 0.0.0.0 --port 9123
serge-web-1 |
serge-api-1 | Process SpawnProcess-1:
serge-api-1 | Traceback (most recent call last):
serge-api-1 | File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
serge-api-1 | self.run()
serge-api-1 | File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
serge-api-1 | self._target(*self._args, **self._kwargs)
serge-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
serge-api-1 | target(sockets=sockets)
serge-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/server.py", line 59, in run
serge-api-1 | return asyncio.run(self.serve(sockets=sockets))
serge-api-1 | File "/usr/lib/python3.8/asyncio/runners.py", line 44, in run
serge-api-1 | return loop.run_until_complete(main)
serge-api-1 | File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
serge-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/server.py", line 66, in serve
serge-api-1 | config.load()
serge-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/config.py", line 471, in load
serge-api-1 | self.loaded_app = import_from_string(self.app)
serge-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/importer.py", line 24, in import_from_string
serge-api-1 | raise exc from None
serge-api-1 | File "/usr/local/lib/python3.8/dist-packages/uvicorn/importer.py", line 21, in import_from_string
serge-api-1 | module = importlib.import_module(module_str)
serge-api-1 | File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
serge-api-1 | return _bootstrap._gcd_import(name[level:], package, level)
serge-api-1 | File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
serge-api-1 | File "<frozen importlib._bootstrap>", line 991, in _find_and_load
serge-api-1 | File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
serge-api-1 | File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
serge-api-1 | File "<frozen importlib._bootstrap_external>", line 848, in exec_module
serge-api-1 | File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
serge-api-1 | File "/usr/src/app/main.py", line 4, in <module>
serge-api-1 | from typing import Annotated
serge-api-1 | ImportError: cannot import name 'Annotated' from 'typing' (/usr/lib/python3.8/typing.py)
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.483+00:00"},"s":"I", "c":"STORAGE", "id":4795906, "ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":554}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.483+00:00"},"s":"I", "c":"RECOVERY", "id":23987, "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.490+00:00"},"s":"W", "c":"CONTROL", "id":22120, "ctx":"initandlisten","msg":"Access control is not enabled for the database. Read and write access to data and configuration is unrestricted","tags":["startupWarnings"]}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.490+00:00"},"s":"W", "c":"CONTROL", "id":22178, "ctx":"initandlisten","msg":"/sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'","tags":["startupWarnings"]}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.490+00:00"},"s":"W", "c":"CONTROL", "id":5123300, "ctx":"initandlisten","msg":"vm.max_map_count is too low","attr":{"currentValue":65530,"recommendedMinimum":1677720,"maxConns":838860},"tags":["startupWarnings"]}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.492+00:00"},"s":"I", "c":"NETWORK", "id":4915702, "ctx":"initandlisten","msg":"Updated wire specification","attr":{"oldSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":17},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":17},"outgoing":{"minWireVersion":6,"maxWireVersion":17},"isInternalClient":true},"newSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":17},"incomingInternalClient":{"minWireVersion":17,"maxWireVersion":17},"outgoing":{"minWireVersion":17,"maxWireVersion":17},"isInternalClient":true}}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.492+00:00"},"s":"I", "c":"REPL", "id":5853300, "ctx":"initandlisten","msg":"current featureCompatibilityVersion value","attr":{"featureCompatibilityVersion":"6.0","context":"startup"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.492+00:00"},"s":"I", "c":"STORAGE", "id":5071100, "ctx":"initandlisten","msg":"Clearing temp directory"}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.492+00:00"},"s":"I", "c":"CONTROL", "id":20536, "ctx":"initandlisten","msg":"Flow Control is enabled on this deployment"}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.493+00:00"},"s":"I", "c":"FTDC", "id":20625, "ctx":"initandlisten","msg":"Initializing full-time diagnostic data capture","attr":{"dataDirectory":"/data/db/diagnostic.data"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.500+00:00"},"s":"I", "c":"REPL", "id":6015317, "ctx":"initandlisten","msg":"Setting new configuration state","attr":{"newState":"ConfigReplicationDisabled","oldState":"ConfigPreStart"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.500+00:00"},"s":"I", "c":"STORAGE", "id":22262, "ctx":"initandlisten","msg":"Timestamp monitor starting"}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.501+00:00"},"s":"I", "c":"NETWORK", "id":23015, "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-27017.sock"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.501+00:00"},"s":"I", "c":"NETWORK", "id":23015, "ctx":"listener","msg":"Listening on","attr":{"address":"0.0.0.0"}}
serge-mongodb-1 | {"t":{"$date":"2023-03-23T13:23:35.501+00:00"},"s":"I", "c":"NETWORK", "id":23016, "ctx":"listener","msg":"Waiting for connections","attr":{"port":27017,"ssl":"off"}}
serge-web-1 |
serge-web-1 | Forced re-optimization of dependencies
serge-web-1 |
serge-web-1 | VITE v4.2.0 ready in 556 ms
serge-web-1 |
serge-web-1 | ➜ Local: http://localhost:9123/
serge-web-1 | ➜ Network: http://172.23.0.3:9123/
serge-web-1 | TypeError: fetch failed
serge-web-1 | at fetch (/usr/src/app/node_modules/undici/index.js:109:13)
serge-web-1 | at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
serge-web-1 | at async Object.eval [as fetch] (/node_modules/@sveltejs/kit/src/runtime/server/fetch.js:27:10)
serge-web-1 | at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:195:18)
serge-web-1 | at async load (+layout.ts:11:12)
serge-web-1 | at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
serge-web-1 | at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/index.js:169:13)
serge-nginx-1 | 10.150.1.5 - - [23/Mar/2023:13:23:40 +0000] "GET / HTTP/1.1" 500 1029 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36" "-"
serge-web-1 | TypeError: fetch failed
serge-web-1 | at fetch (/usr/src/app/node_modules/undici/index.js:109:13)
serge-web-1 | at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
serge-web-1 | at async Object.eval [as fetch] (/node_modules/@sveltejs/kit/src/runtime/server/fetch.js:27:10)
serge-web-1 | at async eval (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:195:18)
serge-web-1 | at async load (+layout.ts:11:12)
serge-web-1 | at async Module.load_data (/node_modules/@sveltejs/kit/src/runtime/server/page/load_data.js:162:17)
serge-web-1 | at async Module.respond_with_error (/node_modules/@sveltejs/kit/src/runtime/server/page/respond_with_error.js:52:17)
serge-web-1 | at async resolve (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:12)
serge-web-1 | at async Module.respond (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:240:20)
serge-web-1 | at async file:///usr/src/app/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:505:22
serge-nginx-1 | 10.150.1.5 - - [23/Mar/2023:13:23:40 +0000] "GET /favicon.ico HTTP/1.1" 500 1019 "http://192.168.1.110:8008/" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36" "-"
It's looping a restart
i just followed the setup steps and wehen i tried running
docker compose exec serge python3 /usr/src/app/api/utils/download.py tokenizer 7B
I got an error saying the container is restarting, so i checked docker an there it says what i pastet into relevant log output
Docker version 20.10.23, build 7155243
Windows 11
Ryzen 7 5800x
No response
2023-03-25 14:23:57 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:23:58 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:00 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:02 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:03 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:06 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:10 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:17 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:31 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:24:58 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:30 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:32 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:33 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:35 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:37 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:39 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:43 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:25:51 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:26:04 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:26:31 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:27:23 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:24 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:45 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:46 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:48 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:50 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:51 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:54 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
2023-03-25 14:28:58 serge-serge-1 | /bin/sh: 1: /usr/src/app/deploy.sh: not found
failed to solve: executor failed running [/bin/sh -c apt-get install -y python3-pip]: exit code: 100
execute command "docker compose up -d"
install docker
install dependences
execute the commands for the instructions
Docker version 20.10.17, build 100c70180f
Ubuntu
C:\Windows\System32>pip install serge
ERROR: Could not find a version that satisfies the requirement serge (from versions: none)
ERROR: No matching distribution found for serge
C:\Windows\System32>git clone https://github.com/nsarrazin/serge.git && cd serge
Cloning into 'serge'...
remote: Enumerating objects: 434, done.
remote: Counting objects: 100% (109/109), done.
remote: Compressing objects: 100% (38/38), done.
remote: Total 434 (delta 80), reused 74 (delta 70), pack-reused 325
Receiving objects: 100% (434/434), 95.56 KiB | 1.99 MiB/s, done.
Resolving deltas: 100% (247/247), done.
C:\Windows\System32\serge>
C:\Windows\System32\serge>cp .env.sample .env
'cp' is not recognized as an internal or external command,
operable program or batch file.
C:\Windows\System32\serge>
C:\Windows\System32\serge>docker compose up -d
'docker' is not recognized as an internal or external command,
operable program or batch file.
C:\Windows\System32\serge>docker compose exec api python3 /usr/src/app/utils/download.py tokenizer 7B
'docker' is not recognized as an internal or external command,
operable program or batch file.
Currently the generated answer from the model only gets sent to the client after it is done generating.
It would drastically improve UX if it could instead stream the answer as it generates it, reducing the impression of latency.
This will require implementing Server-sent events in the API.
Title describes it.
It probably has something to do with the client closing the connection and therefore the API server has an issue and doesn´t save the output to the conversation.
This is amazing feature, not necessary very complicated.
"The ReAct pattern (for Reason+Act) is described in this paper. It's a pattern where you implement additional actions that an LLM can take - searching Wikipedia or running calculations for example - and then teach it how to request that those actions are run, then feed their results back into the LLM."
Here is an example of implementation : https://til.simonwillison.net/llms/python-react-pattern
Hello,
I've installed Serge on a MacBook Pro with 16GB Memory. I'm trying to use the 13B Model, but get the following error for any message I send:
A server error occurred. See below: main: seed = 1679852543 llama_model_load: loading model from '/usr/src/app/weights/ggml-alpaca-13B-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 5120
llama_model_load: n_mult = 256
llama_model_load: n_head = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 13824
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size = 800.00 MB, n_mem = 20480
llama_model_load: loading model part 1/1 from '/usr/src/app/weights/ggml-alpaca-13B-q4_0.bin'
llama_model_load: ............................................. done
llama_model_load: model size = 7759.39 MB / num tensors = 363
system_info: n_threads = 4 / 5 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
(the message is interrupted and doesn't display anything else).
I've tried running the 7B model and it works fine.
Any idea what could be the issue here? Not enough Memory causing a crash perhaps?
Thank you
Since we're mounting the node_modules, it works if the host has VITE installed. But in a more isolated scenario where the node_modules are in a volume or just part of the image, VITE should already be installed in the image.
Context: I don't use npm at all (don't have it installed) and so running the container didn't work (VITE not found). Worked after I logged into the container and npm install vite
Hi,
So I got this error.
I tried converting the model (7B) using the provided convert.py but that just doesn't do anything. No error message, no other output, no converted file.
Am I missing something obvious?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.