mintplex-labs / anything-llm Goto Github PK

View Code? Open in Web Editor NEW

19.4K 161.0 2.1K 39.92 MB

The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.

Home Page: https://anythingllm.com

License: MIT License

Shell 0.05% JavaScript 97.07% HTML 0.15% CSS 2.36% Dockerfile 0.34% HCL 0.05%

rag lmstudio localai vector-database ollama local-llm chromadb desktop-app llama3 llamacpp

anything-llm's Introduction

AnythingLLM: The all-in-one AI app you were looking for.
Chat with your docs, use AI Agents, hyper-configurable, multi-user, & no frustrating set up required.

| | Docs | Hosted Instance

English · 简体中文 · 日本語

👉 AnythingLLM for desktop (Mac, Windows, & Linux)! Download Now

A full-stack application that enables you to turn any document, resource, or piece of content into context that any LLM can use as references during chatting. This application allows you to pick and choose which LLM or Vector Database you want to use as well as supporting multi-user management and permissions.

Watch the demo!

Product Overview

AnythingLLM is a full-stack application where you can use commercial off-the-shelf LLMs or popular open source LLMs and vectorDB solutions to build a private ChatGPT with no compromises that you can run locally as well as host remotely and be able to chat intelligently with any documents you provide it.

AnythingLLM divides your documents into objects called workspaces. A Workspace functions a lot like a thread, but with the addition of containerization of your documents. Workspaces can share documents, but they do not talk to each other so you can keep your context for each workspace clean.

Cool features of AnythingLLM

🆕 Multi-modal support (both closed and open-source LLMs!)
👤 Multi-user instance support and permissioning Docker version only
🦾 Agents inside your workspace (browse the web, run code, etc)
💬 Custom Embeddable Chat widget for your website Docker version only
📖 Multiple document type support (PDF, TXT, DOCX, etc)
Simple chat UI with Drag-n-Drop funcitonality and clear citations.
100% Cloud deployment ready.
Works with all popular closed and open-source LLM providers.
Built-in cost & time-saving measures for managing very large documents compared to any other chat UI.
Full Developer API for custom integrations!
Much more...install and find out!

Supported LLMs, Embedder Models, Speech models, and Vector Databases

Language Learning Models:

Embedder models:

Audio Transcription models:

AnythingLLM Built-in (default)
OpenAI

TTS (text-to-speech) support:

Native Browser Built-in (default)
PiperTTSLocal - runs in browser
OpenAI TTS
ElevenLabs

STT (speech-to-text) support:

Native Browser Built-in (default)

Vector Databases:

Technical Overview

This monorepo consists of three main sections:

frontend: A viteJS + React frontend that you can run to easily create and manage all your content the LLM can use.
server: A NodeJS express server to handle all the interactions and do all the vectorDB management and LLM interactions.
collector: NodeJS express server that process and parses documents from the UI.
docker: Docker instructions and build process + information for building from source.
embed: Submodule for generation & creation of the web embed widget.
browser-extension: Submodule for the chrome browser extension.

🛳 Self Hosting

Mintplex Labs & the community maintain a number of deployment methods, scripts, and templates that you can use to run AnythingLLM locally. Refer to the table below to read how to deploy on your preferred environment or to automatically deploy.

Docker	AWS	GCP	Digital Ocean	Render.com

Railway	RepoCloud	Elestio

or set up a production AnythingLLM instance without Docker →

How to setup for development

yarn setup To fill in the required .env files you'll need in each of the application sections (from root of repo).
- Go fill those out before proceeding. Ensure server/.env.development is filled or else things won't work right.
yarn dev:server To boot the server locally (from root of repo).
yarn dev:frontend To boot the frontend locally (from root of repo).
yarn dev:collector To then run the document collector (from root of repo).

Learn about documents

Learn about vector caching

Telemetry & Privacy

AnythingLLM by Mintplex Labs Inc contains a telemetry feature that collects anonymous usage information.

More about Telemetry & Privacy for AnythingLLM

Why?

We use this information to help us understand how AnythingLLM is used, to help us prioritize work on new features and bug fixes, and to help us improve AnythingLLM's performance and stability.

Opting out

Set DISABLE_TELEMETRY in your server or docker .env settings to "true" to opt out of telemetry. You can also do this in-app by going to the sidebar > Privacy and disabling telemetry.

What do you explicitly track?

We will only track usage details that help us make product and roadmap decisions, specifically:

Typ of your installation (Docker or Desktop)
When a document is added or removed. No information about the document. Just that the event occurred. This gives us an idea of use.
Type of vector database in use. Let's us know which vector database provider is the most used to prioritize changes when updates arrive for that provider.
Type of LLM in use. Let's us know the most popular choice and prioritize changes when updates arrive for that provider.
Chat is sent. This is the most regular "event" and gives us an idea of the daily-activity of this project across all installations. Again, only the event is sent - we have no information on the nature or content of the chat itself.

You can verify these claims by finding all locations Telemetry.sendTelemetry is called. Additionally these events are written to the output log so you can also see the specific data which was sent - if enabled. No IP or other identifying information is collected. The Telemetry provider is PostHog - an open-source telemetry collection service.

View all telemetry events in source code

👋 Contributing

create issue
create PR with branch name format of <issue number>-<short name>
LGTM from core-team

🌟 Contributors

🔗 More Products

VectorAdmin: An all-in-one GUI & tool-suite for managing vector databases.
OpenAI Assistant Swarm: Turn your entire library of OpenAI assistants into one single army commanded from a single agent.

anything-llm's People

Contributors

Stargazers

Watchers

Forkers

whitespur starlab-llm jakefranko rossman22590 experiencesnetwork gladiopeace daniel-nieto ericwangzq xayaraj kamote rsercano seshakiran jaredkirby twelvearrays svaren xxu3-sc febbrile liog-dev pmnathan ilovechrisbaker projectramo eduals touristshaun decentralised-ai omar501 mrroarke seichris markscrivo roskideluge frasergr while-basic jeffrey-devhkg fellow-soft aheruz altrome thbmelatoe georggr ai-mou mkammes marchdigital sul15tan11 jefedeoro altafr jwaltz vortextech01 everblake gebeer lucifer911 conversed15 skrillll theblacktesla beastboy2431 hipnologo enzo-endzone davemounthill rcmtcristian damonclifford ronivaldo ukaserge kuntal-c lxmbxl gorankirovski elicherla01 goswamig linlinda113 newmedia2 ebailey78 arosstale anuxs snowcittysolutions nickbisesi3 aidanost knightcn1983 techventurebuilder jayhere1 bclark86 larsbjorstrup cojo75 hhy5277 peterpuwang rc9909 joaopiopedreira quidlab kleyoit bojangster azizzmoe gurjfromkaydo m14t timothyasp dst1213 devosonder buphnezz cmgramse yuudaiape zhangyan685116 b1162 devilankur18 bizrockman gayanl-git simonslamka

anything-llm's Issues

Error: spawn xdg-open ENOENT

using:

1st screen
yarn setup
yarn prod:backend

1nd screen
cd frontend
yarn install
cd ..
yarn prod:frontend

the error is from 'yarn prod:frontend'
VITE v4.3.9 ready in 1111 ms

➜ Local: http://localhost:3000/
➜ press h to show help
node:events:491
throw er; // Unhandled 'error' event
^

Error: spawn xdg-open ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:283:19)
at onErrorNT (node:internal/child_process:476:16)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
Emitted 'error' event on ChildProcess instance at:
at ChildProcess._handle.onexit (node:internal/child_process:289:12)
at onErrorNT (node:internal/child_process:476:16)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
errno: -2,
code: 'ENOENT',
syscall: 'spawn xdg-open',
path: 'xdg-open',
spawnargs: [ 'http://localhost:3000/' ]
}

Node.js v18.16.0
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

commit:
https://github.com/Mintplex-Labs/anything-llm/tree/8199fcc077e4a70caf9a8c529dcb81467e2a2574

Authentication password workflow

When AUTH_TOKEN is present it should prompt the user to login when in production mode - as this method is disabled if in development or AUTH_TOKEN is unset in the env

Clarify instructions in Docker readme

Update Docker instructions to remove references to yarn as well as add note about UID and GID.

[LanceDB] Segfault when running docker-compose up -d --build

On my Intel Mac I get the below segfault when running docker-compose up -d --build

 > [build-stage 2/2] RUN cd ./frontend/ && yarn build && yarn cache clean:
#0 0.520 yarn run v1.22.19
#0 0.554 $ vite build
#0 0.646 Segmentation fault
#0 0.659 error Command failed with exit code 139.
#0 0.659 info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
------
failed to solve: process "/bin/sh -c cd ./frontend/ && yarn build && yarn cache clean" did not complete successfully: exit code: 139

[lancedb] SQLITE_BUSY when embedding many documents

When embedding a larger number of docs (> 100 (>100k tokens)), the DB seems to lock itself during the process (using lancedb as vector db):

INSERT INTO document_vectors (docId, vectorId) VALUES ('f03e0ec4-1307-4928-b982-91a8a2e34563', '5a791ec8-f1be-4276-8928-5e129accc0d6')
{ result: 0 }
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

Error: SQLITE_BUSY: database is locked
--> in Statement#run([
  '4f09f8e7-b1a8-4c0b-82e0-bbde80473612',
  '05-26-2023-db4e845d-9c45-4d80-9581-73c53844132c.json',
  'custom-documents/05-26-2023-db4e845d-9c45-4d80-9581-73c53844132c.json',
  2,
  '{"id":"db4e845d-9c45-4d80-9581-73c53844132c","url":"file:///home/<username>/git/anything-llm/collector/hotdir/processed/05-26-2023.md","title":"05-26-2023.md","description":"a custom file uploaded by the user.","published":"2023-06-12 09:26:29","wordCount":12758,"token_count_estimate":2849}'
], [Function (anonymous)])
    at /home/<username>/git/anything-llm/server/node_modules/sqlite/build/Statement.js:80:23
    at new Promise (<anonymous>)
    at Statement.run (/home/<username>/git/anything-llm/server/node_modules/sqlite/build/Statement.js:78:16)
    at Object.addDocuments (/home/<username>/git/anything-llm/server/models/documents.js:89:12)
    at async /home/<username>/git/anything-llm/server/endpoints/workspaces.js:35:7 {
  errno: 5,
  code: 'SQLITE_BUSY',
  __augmented: true
}

Node.js v18.16.0
[nodemon] app crashed - waiting for file changes before starting...

Cannot tell if this is caused by the larger number of docs or if the issue is caused by token count (as in, the issue would be caused even with low doc but high token count).

port 5000 is in use by default in macOS Monterey

in macOS Monterey, port 5000 is used by default (Airplay Receiver). This causes the dev server not to launch, and the frontend to receive 403 errors (from the Monterey Airplay service).

My suggestion would be to set the default / fallback port of the server process to 3001 instead of 5000.

Text input field character limit

The input box currently has a char limit of 240. Why? I realize that this might be related to potential difficulties when running the vector search, though. would swapping to ST instead of OpenAI potentially help, if the limit is related to the search?

Transform application into Dockerized application

Instead of 3 separate folders - we can package the entire application into a known environment to prevent conflicting library installs so those who run Docker on their machine can participate in the fun!

credit: @ronivaldo

connect #25

requirements.txt issue on Ubuntu

Hey @timothycarambat , heard about this project yesterday and love the concept!

I am running Ubuntu on WSL2 and I am able to get the frontend and server up but when I try to run the install in the /collector folder, I get the following:

"error: PyObjC requires macOS to build"

I'm wondering if everything in that requirements.txt is needed for this to run on Linux? If so then I'm out of luck

Conversation mode

Readme says there are 2 conversation modes: "Two chat modes conversation and query. Conversation retains previous questions and amendments. Query is simple QA against your documents"

How do i use Conversation mode? I seem to be stuck in QA. (or is that not implemented yet?)

Solution for the Docker error "no such file or directory" (Mac)

Hi. when executing the command docker-compose up -d --build an error has occurred
"zsh: no such file or directory: /Applications/Docker.app/Contents/Resources/bin/docker-composesource"

it helped me to fix the command on
/Applications/Docker.app/Contents/Resources/bin/docker-compose up -d --build

I hope this helps anyone with a similar problem.

Input length, input text colors

It appears any text entered gets truncated after 240 characters (?)

Edit:

Looks like the textarea maxlength is limited to 240 characters.

Also, unless you have a dark mode setting, my text in the same input textbox is white - with a white background - so I can't see what I'm typing.

docker build failing in macOS(Apple M1, macOS Ventura)

[+] Building 87.7s (6/21)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.47kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 299B 0.0s
=> [internal] load metadata for docker.io/library/ubuntu:jammy-20230522 3.4s
=> [base 1/7] FROM docker.io/library/ubuntu:jammy-20230522@sha256:ac58ff7fe25edc58bdf0067ca99df00014dbd032e2246d30a722fa348fd799a5 3.2s
=> => resolve docker.io/library/ubuntu:jammy-20230522@sha256:ac58ff7fe25edc58bdf0067ca99df00014dbd032e2246d30a722fa348fd799a5 0.0s
=> => sha256:77bdd217935d10f0e753ed84118e9b11d3ab0a66a82bdf322087354ccd833733 424B / 424B 0.0s
=> => sha256:2767693332e5523a2734b82f57d1a91510c92237912a96fec46352785e120b3f 2.32kB / 2.32kB 0.0s
=> => sha256:952b15bbc7fb957dead5972b258558130aeda588416c0a7a861e916fc08b36d7 27.35MB / 27.35MB 2.4s
=> => sha256:ac58ff7fe25edc58bdf0067ca99df00014dbd032e2246d30a722fa348fd799a5 1.13kB / 1.13kB 0.0s
=> => extracting sha256:952b15bbc7fb957dead5972b258558130aeda588416c0a7a861e916fc08b36d7 0.6s
=> [internal] load build context 0.0s
=> => transferring context: 487.47kB 0.0s
=> ERROR [base 2/7] RUN DEBIAN_FRONTEND=noninteractive apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-r 80.9s
.
.
.
.
.

#5 76.47 Selecting previously unselected package yarn.
#5 76.49 (Reading database ... 30669 files and directories currently installed.)
#5 76.49 Preparing to unpack yarn_1.22.19_all.deb ...
#5 76.49 Unpacking yarn (1.22.19-1) ...
#5 76.59 Setting up yarn (1.22.19-1) ...
#5 76.61 % Total % Received % Xferd Average Speed Time Time Time Current
#5 76.61 Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 26.9M 100 26.9M 0 0 7256k 0 0:00:03 0:00:03 --:--:-- 12.7M
#5 80.53 dpkg: error processing archive pandoc-3.1.3-1-amd64.deb (--install):
#5 80.53 package architecture (amd64) does not match system (arm64)
#5 80.54 Errors were encountered while processing:
#5 80.54 pandoc-3.1.3-1-amd64.deb

failed to solve: rpc error: code = Unknown desc = executor failed running [/bin/sh -c DEBIAN_FRONTEND=noninteractive apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-recommends curl libgfortran5 python3 python3-pip tzdata netcat libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils && curl -fsSL https://deb.nodesource.com/setup_18.x | bash - && apt-get install -yq --no-install-recommends nodejs && curl -LO https://github.com/yarnpkg/yarn/releases/download/v1.22.19/yarn_1.22.19_all.deb && dpkg -i yarn_1.22.19_all.deb && rm yarn_1.22.19_all.deb && curl -LO https://github.com/jgm/pandoc/releases/download/3.1.3/pandoc-3.1.3-1-amd64.deb && dpkg -i pandoc-3.1.3-1-amd64.deb && rm pandoc-3.1.3-1-amd64.deb && rm -rf /var/lib/apt/lists/* /usr/share/icons && dpkg-reconfigure -f noninteractive tzdata && python3 -m pip install --no-cache-dir virtualenv]: exit code: 1

Coding Support

First off, this project looks extraordinarily useful!

If a user wanted help debugging code - or prototyping apps - what would be the best way to handle this? Push to a github repo for the LLM to index, or perhaps flatten code into a PDF?

Keep up the good work!

Error "Could not send chat". (Mac 16 Intel)

hi! before writing, I looked at all the errors that may be related, completely reinstalled the repository twice, tried what they offer in different messages, but the error remained.I don't have a red Api or other, I wrote everything I need to write in .env files, but the error remains.
Mac, Intel iCore 5, 16Gb, Macos ventura 13.4

Configured env keys but getting PinecoreError: failed getting project name

node:internal/process/promises:288
triggerUncaughtException(err, true /* fromPromise */);
^

[PineconeError: Failed getting project name. TypeError: fetch failed]

I installed everything. After I run the server, when I run the front end, the server crashed with the above error

Maintainer's OpenAI organization hardcoded

https://github.com/Mintplex-Labs/anything-llm/blob/f0fd91db6f0af98d4ee994b5e516255d08d7352f/server/utils/openAi/index.js#LL6C1-L6C1

It looks like someone's OpenAI organization is included in the code? I believe this line caused an HTTP 401 response for me when used with my API Key.

API keys wont go green.

My .env.development looks like this:

SERVER_PORT=3001
OPEN_AI_KEY=sk-.....
OPEN_MODEL_PREF='gpt-3.5-turbo'
CACHE_VECTORS="true"

# Enable all below if you are using vector database: Chroma.
# VECTOR_DB="chroma"
# CHROMA_ENDPOINT='http://localhost:8000'

# Enable all below if you are using vector database: Pinecone.
VECTOR_DB="pinecone"
PINECONE_ENVIRONMENT=us-wes...
PINECONE_API_KEY=b5......
PINECONE_INDEX=default

# Enable all below if you are using vector database: LanceDB.
# VECTOR_DB="lancedb"

# CLOUD DEPLOYMENT VARIRABLES ONLY
# AUTH_TOKEN="hunter2" # This is the password to your application if remote hosting.
# JWT_SECRET="my-random-string-for-seeding" # Only needed if AUTH_TOKEN is set. Please generate random string at least 12 chars long.
# STORAGE_DIR= # absolute filesystem path with no trailing slash

with all correct and cross checked keys. Am I doing something obviously wrong?

When I send a chat message, I get a return: Could not send chat.
I check the API keys and they are RED. "Ensure all fields are green before attempting to use AnythingLLM or it may not function as expected!" in red.
I've tried everything I could think of. Could anyone help?

Failed getting project name. TypeError: fetch failed [PineconeError: Failed getting project name. TypeError: fetch failed]

Not sure exactly how to resolve this. Tried a few things. Even the style doesn't seem to load properly, but I was able to create a workspace.

Console error on server:
yarn dev:server yarn dev:server yarn run v1.22.19 $ cd server && yarn dev $ NODE_ENV=development nodemon --ignore documents --ignore vector-cache --trace-warnings index.js [nodemon] 2.0.22 [nodemon] to restart at any time, enter rs[nodemon] watching path(s): *.* [nodemon] watching extensions: js,mjs,json [nodemon] startingnode --trace-warnings index.jsExample app listening on port 3001 SELECT * FROM workspaces Failed getting project name. TypeError: fetch failed [PineconeError: Failed getting project name. TypeError: fetch failed] Failed getting project name. TypeError: fetch failed [PineconeError: Failed getting project name. TypeError: fetch failed] SELECT * FROM workspaces

Browser Error:
GET http://localhost:3001/system/system-vectors Status 500 Internal Server Error VersionHTTP/1.1 Transferred335 B (21 B size) Referrer Policystrict-origin-when-cross-origin

Tried a few workarounds but seems this is where I keep getting stuck, on Linux Mint OS.

Thanks for all your hard work!
Kaboom.ski

ImportError: cannot import name 'Token' from 'prompt_toolkit.token'

After getting yarn setup successfully (after a lot of troubleshooting by installing missing dependencies and other things), here is an error I got when proceeding to run python main.py from collector folder:

(my-venv) PS C:\Users\rahim\OneDrive\Desktop\TEST\anything-llm\collector> python main.py
Traceback (most recent call last):
  File "C:\Users\rahim\OneDrive\Desktop\TEST\anything-llm\collector\main.py", line 2, in <module>
    from whaaaaat import prompt, Separator
  File "C:\Users\rahim\OneDrive\Desktop\TEST\anything-llm\my-venv\lib\site-packages\whaaaaat\__init__.py", line 6, in <module>
    from prompt_toolkit.token import Token
ImportError: cannot import name 'Token' from 'prompt_toolkit.token' (C:\Users\rahim\OneDrive\Desktop\TEST\anything-llm\my-venv\lib\site-packages\prompt_toolkit\token.py)

When I opened up the token.py script mentioned in the error, this is what it looks like for me:

"""
"""

from __future__ import annotations

__all__ = [
    "ZeroWidthEscape",
]

ZeroWidthEscape = "[ZeroWidthEscape]"

I'm stumped by this stage and can't self-troubleshoot any further. I'm running on Windows 10. Please help. Any ideas to resolve this?

Error with adding documents on newly initialized project

The server and frontend are set up and working, but when trying to add documents to a workspace (clicking the + folder icon), I get this error:

Uncaught TypeError: Cannot read properties of undefined (reading 'items') at toggleSelection

Which is from this line:

const folderItems = directories.items.find((item) => item.name === parent).items;

And I'm not sure how to solve it

Mobile Styles

AnythingLLM does not work on smaller screens or mobile at all. Will need to fix that for those who deploy hosted versions and want to use it on the go.

Selecting more docs to embed force-restarts the server, embedding only one doc

On a testing macOS 13.4 (M1), I have encountered an issue that, when selecting more longer documents, crashes the server, which then autorestarts. This occurs when using Pinecone without any "advanced" config on top. There is no error message, just a notice that the server has restarted. No more info available rn

Dockerfile cleanup and enforce Unix line endings

The docker entrypoint and cmd scripts are a bit redundant. The CMD dual_boot.sh script should just move to the entrypoint script and we remove the CMD altogether.

We should also enforce Unix line endings via .gitattributes as some Windows users have had problems when building the image related to line endings mismatch.

Refer Directly to Docs

This is a SOLID project.

Would it be possible to leverage a specific prompt that would refer to a document(s) directly?

Say I need to pull A from Page X and pull B from Page Y, being able to directly point to where the information can be found would be helpful.

[Error] libgfortran.so.5: No such file or directory

With a fresh repo, I receive the following error from the server upon first visiting localhost:3001 after running yarn dev:server and yarn dev:frontend. This is on Ubuntu 22.04 running in WSL:

$ NODE_ENV=development nodemon --ignore documents --ignore vector-cache --trace-warnings index.js
[nodemon] 2.0.22
[nodemon] to restart at any time, enter `rs`
[nodemon] watching path(s): *.*
[nodemon] watching extensions: js,mjs,json
[nodemon] starting `node --trace-warnings index.js`
Example app listening on port 3001
libgfortran.so.5: cannot open shared object file: No such file or directory Error: libgfortran.so.5: cannot open shared object file: No such file or directory
    at Module._extensions..node (node:internal/modules/cjs/loader:1338:18)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12)
    at Module.require (node:internal/modules/cjs/loader:1141:19)
    at require (node:internal/modules/cjs/helpers:110:18)
    at getPlatformLibrary (/home/jwaltz/git_projects/anything-llm/server/node_modules/vectordb/native.js:23:16)
    at Object.<anonymous> (/home/jwaltz/git_projects/anything-llm/server/node_modules/vectordb/native.js:33:21)
    at Module._compile (node:internal/modules/cjs/loader:1254:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
    at Module.load (node:internal/modules/cjs/loader:1117:32) {
  code: 'ERR_DLOPEN_FAILED'
}

The call stack indicates this is coming from vectordb (lance), but I have lance commented out in .env.development. I've noticed lance has caused some issues with requiring certain libc files and has especially complicated the docker implementation.

What should I do to go about fixing this? I guess I could try to install all of the missing deps required by vectordb to my OS but I feel like that shouldn't be my first option. Is it possible to avoid these requirements by not calling on vectordb if lance isn't the preferred vector database according to the .env file(s)?

element-ui.js | Uncaught TypeError: Cannot read properties of undefined (reading 'prototype')

Seems to be running a bit smoother now. Nothing is green in settings... Not sure how to switch to convo mode without unnattaching docs. No console errors. Does this prototype error actually worth sharing?
ement-ui.js:1 Uncau'ght TypeError: Cannot read properties of undefined (reading 'prototype') at Module.<anonymous> (element-ui.js:1:127637) at n (element-ui.js:1:406) at Object.<anonymous> (element-ui.js:1:93343) at n (element-ui.js:1:406) at element-ui.js:1:1211 at element-ui.js:1:1221 at element-ui.js:1:234 at element-ui.js:1:243 (

MacOS: yarn dev:frontend issues with vite.config.js, failed to load config

Have NPM, Yarn installed. Yarn dev:server runs without issue, but yarn dev:frontend results in:

❯ yarn dev:frontend
yarn run v1.22.19
$ cd frontend && yarn start
warning package.json: No license field
$ vite --open
failed to load config from /Users//Developer/anything-llm/frontend/vite.config.js
error when starting dev server:
Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'vite' imported from /Users//Developer/anything-llm/frontend/vite.config.js.timestamp-1686430326577-c946bfcc5297d.mjs
at new NodeError (node:internal/errors:405:5)
at packageResolve (node:internal/modules/esm/resolve:781:9)
at moduleResolve (node:internal/modules/esm/resolve:830:20)
at defaultResolve (node:internal/modules/esm/resolve:1035:11)
at DefaultModuleLoader.resolve (node:internal/modules/esm/loader:269:12)
at DefaultModuleLoader.getModuleJob (node:internal/modules/esm/loader:153:32)
at ModuleWrap. (node:internal/modules/esm/module_job:76:33)
at link (node:internal/modules/esm/module_job:75:36)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Have tried removing dir and re-cloning, as well as what I could find "googling" around. Any ideas?

How can I get started (development environment)

I'm sorry for the question, but how can I start the server if I'm not deploying through docker? it is written how to upload documents and what to do next, I did not understand

upd: it was not obvious to me that the continuation of the installation was on the links below

upd: upd: But there is also no way to start the server in development mode.

Lengthy response and codeblock appearance

Lengthy ChatGPT responses appear on top of previous responses.
Example:

ChatGPT response with markdown code block gives three backticks (```) , but no codeblock.

Example response for code:

Misguiding error message when creating a workspace

Problem:

Creating a workspace gave me this error:

This is misguiding, the logs point to a problem with SQLite.

anything-llm    | SQLITE_CANTOPEN: unable to open database file [Error: SQLITE_CANTOPEN: unable to open database file] {
anything-llm    |   errno: 14,
anything-llm    |   code: 'SQLITE_CANTOPEN'
anything-llm    | }
anything-llm    | fetch failed TypeError: fetch failed
anything-llm    |     at Object.fetch (node:internal/deps/undici/undici:11457:11)
anything-llm    |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
anything-llm    |     at async ChromaClient.heartbeat (/app/server/node_modules/chromadb/dist/main/ChromaClient.js:66:26)
anything-llm    |     at async Object.connect (/app/server/utils/vectorDbProviders/chroma/index.js:22:21)
anything-llm    |     at async Object.totalIndicies (/app/server/utils/vectorDbProviders/chroma/index.js:34:24)
anything-llm    |     at async /app/server/endpoints/system.js:84:27 {
anything-llm    |   cause: Error: connect ECONNREFUSED 127.0.0.1:8000
anything-llm    |       at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16) {
anything-llm    |     errno: -111,
anything-llm    |     code: 'ECONNREFUSED',
anything-llm    |     syscall: 'connect',
anything-llm    |     address: '127.0.0.1',
anything-llm    |     port: 8000
anything-llm    |   }
anything-llm    | }

The TypeError went away after switching to Pinecone but the SQLITE_CANTOPEN persists.

Solution:

Improve error handling in the UI so that users don't get misguiding errors anymore and make the error message with SQLITE_CANTOPEN more descriptive (which file is the problem?)

package.json has unrelated name, version and description

Looks like it's referencing a name and description from another project and version is 1.0.0.

增加支持局域网运行，增加中文支持

希望能增加支持局域网运行，增加中文支持

Chat errors out

Installed successfully on Windows Docker Desktop.
Chroma Local Install is being used as Vector DB.
Docker build was successful.

Workspace created successfully.
Document loaded successful.

However when I try to chat, a message pops up as below...

localhost:3001 says
Could not send chat

No other error or notification.
Gets no response in chat.

Kindly help.

Define temperature as environment variable

Would it be a reasonable feature to allow temperature to be a user-defined environment variable instead of hardcoded as 0.7? I ask because when using the app as an internal knowledgebase I think a lower value or even a temperature of 0 would be beneficial.

I'd be happy to create a PR for this.

Watch.PY Youtube Timeout

Works for Substack and single page URLs but when I try for Youtube, after adding Cloud API I see:
`python3 main.py
? What kind of data would you like to add to convert into long-term memory? YouTube Channel
Paste in the URL of a YouTube channel: https://www.youtube.com/@eyeonai3425
Need to map username to channelId - this can take a while sometimes.
Traceback (most recent call last):
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 403, in _make_request
self._validate_conn(conn)
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1053, in validate_conn
conn.connect()
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/connection.py", line 419, in connect
self.sock = ssl_wrap_socket(
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/util/ssl.py", line 449, in ssl_wrap_socket
ssl_sock = ssl_wrap_socket_impl(
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/util/ssl.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/usr/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/usr/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
socket.timeout: _ssl.c:1106: The handshake operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/sepad/.local/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 798, in urlopen
retries = retries.increment(
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
raise value
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 714, in urlopen
httplib_response = self._make_request(
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 406, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File "/home/sepad/.local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 357, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='yt.lemnoslife.com', port=443): Read timed out. (read timeout=20)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/sepad/Documents/Apps/AnythingLLM/anything-llm-master/collector/main.py", line 81, in
main()
File "/home/sepad/Documents/Apps/AnythingLLM/anything-llm-master/collector/main.py", line 65, in main
youtube()
File "/home/sepad/Documents/Apps/AnythingLLM/anything-llm-master/collector/scripts/youtube.py", line 13, in youtube
channel_id = get_channel_id(channel_link)
File "/home/sepad/Documents/Apps/AnythingLLM/anything-llm-master/collector/scripts/yt_utils.py", line 18, in get_channel_id
response = requests.get(f"https://yt.lemnoslife.com/channels?handle={handle}", timeout=20)
File "/home/sepad/.local/lib/python3.9/site-packages/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
File "/home/sepad/.local/lib/python3.9/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "/home/sepad/.local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/sepad/.local/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/sepad/.local/lib/python3.9/site-packages/requests/adapters.py", line 532, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='yt.lemnoslife.com', port=443): Read timed out. (read timeout=20)
`
Do I need to increase timeout delay or is this because I'm on Linux?

Thanks!
Kaboom.ski

Workspace.js Line: 51 | Error parsing the JSON data in file

`POST
http://localhost:3001/workspace/kaboomski-studios/chat
Status
500
Internal Server Error
VersionHTTP/1.1
Transferred335 B (21 B size)
Referrer Policystrict-origin-when-cross-origin

Access-Control-Allow-Origin
http://localhost:3000
Connection
keep-alive
Content-Length
21
Content-Type
text/plain; charset=utf-8
Date
Sun, 11 Jun 2023 07:15:34 GMT
ETag
W/"15-/6VXivhc2MKdLfIkLcUE47K6aH0"
Keep-Alive
timeout=5
Vary
Origin
X-Powered-By
Express

Accept
/
Accept-Encoding
gzip, deflate
Accept-Language
en-US,en;q=0.5
Authorization
null
Connection
keep-alive
Content-Length
69
Content-Type
text/plain;charset=UTF-8
Host
localhost:3001
Origin
http://localhost:3000
Referer
http://localhost:3000/
Sec-Fetch-Dest
empty
Sec-Fetch-Mode
cors
Sec-Fetch-Site
same-site
User-Agent
Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0
SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data workspace.js:51:17
chatResult workspace.js:51
(Async: promise callback)
sendChat workspace.js:50
fetchReply index.jsx:49
ChatContainer index.jsx:66
React 15
handleSubmit index.jsx:16
React 15`

Default prompting kills off creativity/hypothetical scenarios

Hey there,

Based on my several experiments using both gpt-3.5-turbo-16k and gpt-4, I have observed that the responses returned by the API are incredibly rigid. Both models continuously refuse to be creative. I do writing as a side hustle and wanted to use semantic search as a tool for finding connections in a vast universe that I had already written, but here, it appears that the models either respond if they "find a match" or simply say that they don't know. Therefore, if you want to ask about hypothetical scenarios or developmental queries (e.g. "expand character XYZ based on the available details") based on some context given by your documents. I am pretty sure prompting is the issue here, so it would likely help if there was a user-friendly way to manipulate the prompt. I might take a look at it later, though I don't have much experience with web dev.

Thanks!

Can't build docker image on Mac M1 Max, keep getting following error

ARCH=arm64 docker-compose up -d --build
WARN[0000] The "CLOUD_BUILD" variable is not set. Defaulting to a blank string.
[+] Building 0.2s (3/3) FINISHED
=> [anything-llm internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 3.99kB 0.0s
=> [anything-llm internal] load .dockerignore 0.0s
=> => transferring context: 299B 0.0s
=> ERROR [anything-llm internal] load metadata for docker.io/library/ubu 0.1s

[anything-llm internal] load metadata for docker.io/library/ubuntu:jammy-20230522:

failed to solve: ubuntu:jammy-20230522: error getting credentials - err: docker-credential-desktop resolves to executable in current directory (./docker-credential-desktop), out: ``

other languages error

When trying to add links with foreign letters (like the lovely ø or æ for intance) the log gives error
addDocumentToNamespace PineconeClient: Error calling upsert: PineconeError: not supported value type Failed to vectorize website

and english articles go through as they should

RequiredError: Required parameter collectionId was null or undefined when calling count.

This happens when I try to add a new document from the frontend. The chroma version is the latest the commit from their github repo.

chromadb        | Rebuilding hnsw to ensure architecture compatibility
anything-llm    | Example app listening on port 3001
clickhouse_1    | Processing configuration file '/etc/clickhouse-server/config.xml'.
clickhouse_1    | Merging configuration file '/etc/clickhouse-server/config.d/backup_disk.xml'.
clickhouse_1    | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
clickhouse_1    | Logging trace to /var/log/clickhouse-server/clickhouse-server.log
clickhouse_1    | Logging errors to /var/log/clickhouse-server/clickhouse-server.err.log
clickhouse_1    | Processing configuration file '/etc/clickhouse-server/config.xml'.
clickhouse_1    | Merging configuration file '/etc/clickhouse-server/config.d/backup_disk.xml'.
clickhouse_1    | Merging configuration file '/etc/clickhouse-server/config.d/docker_related_config.xml'.
clickhouse_1    | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.xml'.
clickhouse_1    | Processing configuration file '/etc/clickhouse-server/users.xml'.
clickhouse_1    | Merging configuration file '/etc/clickhouse-server/users.d/chroma.xml'.
clickhouse_1    | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/users.xml'.
chromadb        | Collecting hnswlib
chromadb        |   Downloading hnswlib-0.7.0.tar.gz (33 kB)
chromadb        |   Installing build dependencies: started
chromadb        |   Installing build dependencies: finished with status 'done'
chromadb        |   Getting requirements to build wheel: started
chromadb        |   Getting requirements to build wheel: finished with status 'done'
chromadb        |   Preparing metadata (pyproject.toml): started
chromadb        |   Preparing metadata (pyproject.toml): finished with status 'done'
chromadb        | Collecting numpy
chromadb        |   Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
chromadb        |      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 2.1 MB/s eta 0:00:00
chromadb        | Building wheels for collected packages: hnswlib
chromadb        |   Building wheel for hnswlib (pyproject.toml): started
chromadb        |   Building wheel for hnswlib (pyproject.toml): finished with status 'done'
chromadb        |   Created wheel for hnswlib: filename=hnswlib-0.7.0-cp310-cp310-linux_x86_64.whl size=2148123 sha256=82c011330d7581e4637330c3d816a83b7bd3b2f6c931434218cc723001d57037
chromadb        |   Stored in directory: /tmp/pip-ephem-wheel-cache-794boqr0/wheels/8a/ae/ec/235a682e0041fbaeee389843670581ec6c66872db856dfa9a4
chromadb        | Successfully built hnswlib
chromadb        | Installing collected packages: numpy, hnswlib
chromadb        |   Attempting uninstall: numpy
chromadb        |     Found existing installation: numpy 1.24.3
chromadb        |     Uninstalling numpy-1.24.3:
chromadb        |       Successfully uninstalled numpy-1.24.3
chromadb        |   Attempting uninstall: hnswlib
chromadb        |     Found existing installation: hnswlib 0.7.0
chromadb        |     Uninstalling hnswlib-0.7.0:
chromadb        |       Successfully uninstalled hnswlib-0.7.0
chromadb        | Successfully installed hnswlib-0.7.0 numpy-1.24.3
chromadb        | WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
chromadb        | 
chromadb        | [notice] A new release of pip is available: 23.0.1 -> 23.1.2
chromadb        | [notice] To update, run: pip install --upgrade pip
chromadb        | 2023-06-14 10:35:19 INFO     chromadb.telemetry.posthog Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
chromadb        | 2023-06-14 10:35:19 INFO     chromadb        Running Chroma using direct local API.
chromadb        | 2023-06-14 10:35:19 INFO     chromadb        Using Clickhouse for database
chromadb        | 2023-06-14 10:35:20 INFO     uvicorn.error   Started server process [50]
chromadb        | 2023-06-14 10:35:20 INFO     uvicorn.error   Waiting for application startup.
chromadb        | 2023-06-14 10:35:20 INFO     uvicorn.error   Application startup complete.
chromadb        | 2023-06-14 10:35:20 INFO     uvicorn.error   Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
anything-llm    | SELECT * FROM workspaces WHERE slug = 'reports'
anything-llm    | SELECT * FROM workspace_documents WHERE workspaceId = 1 
anything-llm    | SELECT * FROM workspaces WHERE slug = 'reports'
anything-llm    | SELECT * FROM workspace_documents WHERE workspaceId = 1 
anything-llm    | Adding new vectorized document into namespace reports
anything-llm    | Chunks created from document: 1
chromadb        | 2023-06-14 10:49:58 INFO     uvicorn.access  172.24.0.3:47680 - "GET /api/v1/heartbeat HTTP/1.1" 200
chromadb        | 2023-06-14 10:49:58 INFO     chromadb.db.clickhouse collection with name reports already exists, returning existing collection
chromadb        | 2023-06-14 10:49:58 WARNING  chromadb.api.models.Collection No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction
chromadb        | 2023-06-14 10:50:02 INFO     uvicorn.access  172.24.0.3:47686 - "POST /api/v1/collections HTTP/1.1" 200
anything-llm    | Inserting vectorized chunks into Chroma collection.
anything-llm    | addDocumentToNamespace Required parameter collectionId was null or undefined when calling add.
anything-llm    | Failed to vectorize custom-documents/october2019-final-pg1-f5df906b-935d-4a4e-8b14-c1245edac3be.json
anything-llm    | SELECT * FROM workspaces WHERE slug = 'reports'
anything-llm    | SELECT * FROM workspace_documents WHERE workspaceId = 1 
anything-llm    | SELECT * FROM workspaces  
chromadb        | 2023-06-14 10:50:03 INFO     uvicorn.access  172.24.0.3:47680 - "GET /api/v1/heartbeat HTTP/1.1" 200
anything-llm    | SELECT * FROM workspaces WHERE slug = 'reports'
anything-llm    | SELECT * FROM workspace_documents WHERE workspaceId = 1 
chromadb        | 2023-06-14 10:50:03 WARNING  chromadb.api.models.Collection No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction
anything-llm    | SELECT * FROM workspaces WHERE slug = 'reports'
anything-llm    | SELECT * FROM workspace_documents WHERE workspaceId = 1 
anything-llm    | SELECT * FROM workspace_chats WHERE workspaceId = 1 AND include = true  ORDER BY id ASC
chromadb        | 2023-06-14 10:50:03 INFO     uvicorn.access  172.24.0.3:47686 - "GET /api/v1/collections HTTP/1.1" 200
chromadb        | 2023-06-14 10:50:03 WARNING  chromadb.api.models.Collection No embedding_function provided, using default embedding function: SentenceTransformerEmbeddingFunction
chromadb        | 2023-06-14 10:50:03 INFO     uvicorn.access  172.24.0.3:47680 - "GET /api/v1/collections/reports HTTP/1.1" 200
anything-llm    | Required parameter collectionId was null or undefined when calling count. RequiredError: Required parameter collectionId was null or undefined when calling count.
anything-llm    |     at Object.count (/app/server/node_modules/chromadb/dist/main/generated/api.js:150:23)
anything-llm    |     at Object.count (/app/server/node_modules/chromadb/dist/main/generated/api.js:730:91)
anything-llm    |     at ApiApi.count (/app/server/node_modules/chromadb/dist/main/generated/api.js:1188:58)
anything-llm    |     at Collection.count (/app/server/node_modules/chromadb/dist/main/Collection.js:164:41)
anything-llm    |     at Object.totalIndicies (/app/server/utils/vectorDbProviders/chroma/index.js:42:40)
anything-llm    |     at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
anything-llm    |     at async /app/server/endpoints/system.js:74:27 {
anything-llm    |   field: 'collectionId'
anything-llm    | }

error running on windows 10

I tried running it on windows 10, got following error:

'cp' is not recognized as an internal or external command,
operable program or batch file.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
PS E:\source\repos\anything-llm>

[Windows Problem]:Error running on installing packages for collector

pip install - requirements.txt in /collector folder gives this error
From the initial inspection it seems this is an windows error

Steps to reproduce:

cd collector
pip install -r requirements.txt

OpenAI 400 Error on long(ish) chat history

I'm getting this error after two dozen or so queries (trimmed for brevity and privacy):

Error: Request failed with status code 400
    at createError (/home/anything-llm/server/node_modules/axios/lib/core/createError.js:16:15)
    at settle (/home/anything-llm/server/node_modules/axios/lib/core/settle.js:17:12)
    at IncomingMessage.handleStreamEnd (/home/anything-llm/server/node_modules/axios/lib/adapters/http.js:322:11)
    at IncomingMessage.emit (node:events:525:35)
    at endReadableNT (node:internal/streams/readable:1359:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
Request failed with status code 400 Error: Request failed with status code 400
    at createError (/home/anything-llm/server/node_modules/axios/lib/core/createError.js:16:15)
    at settle (/home/anything-llm/server/node_modules/axios/lib/core/settle.js:17:12)
    at IncomingMessage.handleStreamEnd (/home/anything-llm/server/node_modules/axios/lib/adapters/http.js:322:11)
    at IncomingMessage.emit (node:events:525:35)
    at endReadableNT (node:internal/streams/readable:1359:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  config: {
    transitional: {
      silentJSONParsing: true,
      forcedJSONParsing: true,
      clarifyTimeoutError: false
    },
    adapter: [Function: httpAdapter],
    transformRequest: [ [Function: transformRequest] ],
    transformResponse: [ [Function: transformResponse] ],
    timeout: 0,
    xsrfCookieName: 'XSRF-TOKEN',
    xsrfHeaderName: 'X-XSRF-TOKEN',
    maxContentLength: -1,
    maxBodyLength: -1,
    validateStatus: [Function: validateStatus],
    headers: {
      Accept: 'application/json, text/plain, */*',
      'Content-Type': 'application/json',
      'User-Agent': 'OpenAI/NodeJS/3.2.1',
      Authorization: 'Bearer sk-XXX',
      'Content-Length': 20889
    },
    method: 'post',
    data: `{"model":"gpt-3.5-turbo","temperature":0.7,"n":1,"messages":[{"role":"system","content":""},{"role":"user","content":"hello"},{"role":"assistant","content":"Hello! How can I assist you today?"},{"role":"user","content":"how do i add documents to my workspace?"},{"role":"user","content":"how`... 10881 more characters,
    url: 'https://api.openai.com/v1/chat/completions'
  },
  request: <ref *1> ClientRequest {
    _events: [Object: null prototype] {
      abort: [Function (anonymous)],
      aborted: [Function (anonymous)],
      connect: [Function (anonymous)],
      error: [Function (anonymous)],
      socket: [Function (anonymous)],
      timeout: [Function (anonymous)],
      finish: [Function: requestOnFinish]
    },
    _eventsCount: 7,
    _maxListeners: undefined,
    outputData: [],
    outputSize: 0,
    writable: true,
    destroyed: false,
    _last: true,
    chunkedEncoding: false,
    shouldKeepAlive: false,
    maxRequestsOnConnectionReached: false,
    _defaultKeepAlive: true,
    useChunkedEncodingByDefault: true,
    sendDate: false,
    _removedConnection: false,
    _removedContLen: false,
    _removedTE: false,
    strictContentLength: false,
    _contentLength: 20889,
    _hasBody: true,
    _trailer: '',
    finished: true,
    _headerSent: true,
    _closed: false,
    socket: TLSSocket {
      _tlsOptions: [Object],
      _secureEstablished: true,
      _securePending: false,
      _newSessionPending: false,
      _controlReleased: true,
      secureConnecting: false,
      _SNICallback: null,
      servername: 'api.openai.com',
      alpnProtocol: false,
      authorized: true,
      authorizationError: null,
      encrypted: true,
      _events: [Object: null prototype],
      _eventsCount: 10,
      connecting: false,
      _hadError: false,
      _parent: null,
      _host: 'api.openai.com',
      _closeAfterHandlingError: false,
      _readableState: [ReadableState],
      _maxListeners: undefined,
      _writableState: [WritableState],
      allowHalfOpen: false,
      _sockname: null,
      _pendingData: null,
      _pendingEncoding: '',
      server: undefined,
      _server: null,
      ssl: [TLSWrap],
      _requestCert: true,
      _rejectUnauthorized: true,
      parser: null,
      _httpMessage: [Circular *1],
      [Symbol(res)]: [TLSWrap],
      [Symbol(verified)]: true,
      [Symbol(pendingSession)]: null,
      [Symbol(async_id_symbol)]: 7105,
      [Symbol(kHandle)]: [TLSWrap],
      [Symbol(lastWriteQueueSize)]: 0,
      [Symbol(timeout)]: null,
      [Symbol(kBuffer)]: null,
      [Symbol(kBufferCb)]: null,
      [Symbol(kBufferGen)]: null,
      [Symbol(kCapture)]: false,
      [Symbol(kSetNoDelay)]: false,
      [Symbol(kSetKeepAlive)]: true,
      [Symbol(kSetKeepAliveInitialDelay)]: 60,
      [Symbol(kBytesRead)]: 0,
      [Symbol(kBytesWritten)]: 0,
      [Symbol(connect-options)]: [Object]
    },
    _header: 'POST /v1/chat/completions HTTP/1.1\r\n' +
      'Accept: application/json, text/plain, */*\r\n' +
      'Content-Type: application/json\r\n' +
      'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
      'Authorization: Bearer sk-XXX\r\n' +
      'Content-Length: 20889\r\n' +
      'Host: api.openai.com\r\n' +
      'Connection: close\r\n' +
      '\r\n',
    _keepAliveTimeout: 0,
    _onPendingData: [Function: nop],
    agent: Agent {
      _events: [Object: null prototype],
      _eventsCount: 2,
      _maxListeners: undefined,
      defaultPort: 443,
      protocol: 'https:',
      options: [Object: null prototype],
      requests: [Object: null prototype] {},
      sockets: [Object: null prototype],
      freeSockets: [Object: null prototype] {},
      keepAliveMsecs: 1000,
      keepAlive: false,
      maxSockets: Infinity,
      maxFreeSockets: 256,
      scheduling: 'lifo',
      maxTotalSockets: Infinity,
      totalSocketCount: 1,
      maxCachedSessions: 100,
      _sessionCache: [Object],
      [Symbol(kCapture)]: false
    },
    socketPath: undefined,
    method: 'POST',
    maxHeaderSize: undefined,
    insecureHTTPParser: undefined,
    joinDuplicateHeaders: undefined,
    path: '/v1/chat/completions',
    _ended: true,
    res: IncomingMessage {
      _readableState: [ReadableState],
      _events: [Object: null prototype],
      _eventsCount: 4,
      _maxListeners: undefined,
      socket: [TLSSocket],
      httpVersionMajor: 1,
      httpVersionMinor: 1,
      httpVersion: '1.1',
      complete: true,
      rawHeaders: [Array],
      rawTrailers: [],
      joinDuplicateHeaders: undefined,
      aborted: false,
      upgrade: false,
      url: '',
      method: null,
      statusCode: 400,
      statusMessage: 'Bad Request',
      client: [TLSSocket],
      _consuming: false,
      _dumped: false,
      req: [Circular *1],
      responseUrl: 'https://api.openai.com/v1/chat/completions',
      redirects: [],
      [Symbol(kCapture)]: false,
      [Symbol(kHeaders)]: [Object],
      [Symbol(kHeadersCount)]: 40,
      [Symbol(kTrailers)]: null,
      [Symbol(kTrailersCount)]: 0
    },
    aborted: false,
    timeoutCb: null,
    upgradeOrConnect: false,
    parser: null,
    maxHeadersCount: null,
    reusedSocket: false,
    host: 'api.openai.com',
    protocol: 'https:',
    _redirectable: Writable {
      _writableState: [WritableState],
      _events: [Object: null prototype],
      _eventsCount: 3,
      _maxListeners: undefined,
      _options: [Object],
      _ended: true,
      _ending: true,
      _redirectCount: 0,
      _redirects: [],
      _requestBodyLength: 20889,
      _requestBodyBuffers: [],
      _onNativeResponse: [Function (anonymous)],
      _currentRequest: [Circular *1],
      _currentUrl: 'https://api.openai.com/v1/chat/completions',
      [Symbol(kCapture)]: false
    },
    [Symbol(kCapture)]: false,
    [Symbol(kBytesWritten)]: 0,
    [Symbol(kEndCalled)]: true,
    [Symbol(kNeedDrain)]: false,
    [Symbol(corked)]: 0,
    [Symbol(kOutHeaders)]: [Object: null prototype] {
      accept: [Array],
      'content-type': [Array],
      'user-agent': [Array],
      authorization: [Array],
      'content-length': [Array],
      host: [Array]
    },
    [Symbol(errored)]: null,
    [Symbol(kUniqueHeaders)]: null
  },
  response: {
    status: 400,
    statusText: 'Bad Request',
    headers: {
      date: 'Sat, 17 Jun 2023 18:59:52 GMT',
      'content-type': 'application/json',
      'content-length': '281',
      connection: 'close',
      'access-control-allow-origin': '*',
      'openai-organization': 'the-sentry',
      'openai-processing-ms': '310',
      'openai-version': '2020-10-01',
      'strict-transport-security': 'max-age=15724800; includeSubDomains',
      'x-ratelimit-limit-requests': '3500',
      'x-ratelimit-limit-tokens': '90000',
      'x-ratelimit-remaining-requests': '3499',
      'x-ratelimit-remaining-tokens': '85903',
      'x-ratelimit-reset-requests': '17ms',
      'x-ratelimit-reset-tokens': '2.73s',
      'x-request-id': '20f2dbbb737e663d3a0829b29a84f5da',
      'cf-cache-status': 'DYNAMIC',
      server: 'cloudflare',
      'cf-ray': '7d8d883afccb3ea6-CPT',
      'alt-svc': 'h3=":443"; ma=86400'
    },
    config: {
      transitional: [Object],
      adapter: [Function: httpAdapter],
      transformRequest: [Array],
      transformResponse: [Array],
      timeout: 0,
      xsrfCookieName: 'XSRF-TOKEN',
      xsrfHeaderName: 'X-XSRF-TOKEN',
      maxContentLength: -1,
      maxBodyLength: -1,
      validateStatus: [Function: validateStatus],
      headers: [Object],
      method: 'post',
      data: `{"model":"gpt-3.5-turbo","temperature":0.7,"n":1,"messages":[{"role":"system","content":""},{"role":"user","content":"how`... 10881 more characters,
      url: 'https://api.openai.com/v1/chat/completions'
    },
    request: <ref *1> ClientRequest {
      _events: [Object: null prototype],
      _eventsCount: 7,
      _maxListeners: undefined,
      outputData: [],
      outputSize: 0,
      writable: true,
      destroyed: false,
      _last: true,
      chunkedEncoding: false,
      shouldKeepAlive: false,
      maxRequestsOnConnectionReached: false,
      _defaultKeepAlive: true,
      useChunkedEncodingByDefault: true,
      sendDate: false,
      _removedConnection: false,
      _removedContLen: false,
      _removedTE: false,
      strictContentLength: false,
      _contentLength: 20889,
      _hasBody: true,
      _trailer: '',
      finished: true,
      _headerSent: true,
      _closed: false,
      socket: [TLSSocket],
      _header: 'POST /v1/chat/completions HTTP/1.1\r\n' +
        'Accept: application/json, text/plain, */*\r\n' +
        'Content-Type: application/json\r\n' +
        'User-Agent: OpenAI/NodeJS/3.2.1\r\n' +
        'Authorization: Bearer sk-XXX\r\n' +
        'Content-Length: 20889\r\n' +
        'Host: api.openai.com\r\n' +
        'Connection: close\r\n' +
        '\r\n',
      _keepAliveTimeout: 0,
      _onPendingData: [Function: nop],
      agent: [Agent],
      socketPath: undefined,
      method: 'POST',
      maxHeaderSize: undefined,
      insecureHTTPParser: undefined,
      joinDuplicateHeaders: undefined,
      path: '/v1/chat/completions',
      _ended: true,
      res: [IncomingMessage],
      aborted: false,
      timeoutCb: null,
      upgradeOrConnect: false,
      parser: null,
      maxHeadersCount: null,
      reusedSocket: false,
      host: 'api.openai.com',
      protocol: 'https:',
      _redirectable: [Writable],
      [Symbol(kCapture)]: false,
      [Symbol(kBytesWritten)]: 0,
      [Symbol(kEndCalled)]: true,
      [Symbol(kNeedDrain)]: false,
      [Symbol(corked)]: 0,
      [Symbol(kOutHeaders)]: [Object: null prototype],
      [Symbol(errored)]: null,
      [Symbol(kUniqueHeaders)]: null
    },
    data: { error: [Object] }
  },
  isAxiosError: true,
  toJSON: [Function: toJSON]
}

the workspace history seems pretty long

SELECT SUM(CAST(json_extract(metadata, '$.token_count_estimate') AS INTEGER)) as total_token_count
FROM workspace_documents where workspaceId = 1;
1177372

According to the error, the actual payload is probably in the 15000 character range. In any case, it works again if I delete the chats from the workspace.

The readme needs work to enable installation by following the instructions.

This looks like a great project, and I am eager to get it going!

I have tried installing this on a Ubuntu running in WSL by following the instructions.

~/anything-llm$ yarn setup
00h00m00s 0/0: : ERROR: [Errno 2] No such file or directory: 'setup'

Then I tried the exact same workflow after cloning the repo on Mac, and it worked... Why?

The next hurdle was the fact, that nothing in the readme mentions that you need to install docker desktop on Mac. This may be self-explanatory to many, but I just followed the instructions, and kept getting the error that the docker daemon was nowhere to be found. Everything else was installed from the command line, so it is not as obvious as you would think, that docker desktop has to be installed first, and it is not mentioned anywhere.

Now all seems to be working after figuring that out, except that I get an error that the localhost could not send the chat. That is open in another issue.

Sitemap collector tries downloading/parsing image files.

Sitemap collector tries downloading/parsing image files (ie: PNG)

It appears it only exclude PDF files, but perhaps it should only be including htm/html files instead?

Debian glibc issue

Hi,

First of all, great tool! Makes it easy to use an existing knowledge base and then dynamically ask about it using natural language. Wanted to build something like this myself, but you just saved me some time. Thank you!

But, after having fixed the requirements file as per #5, I have encountered an issue with libssl, which required manually building from source due to Debian not being a rolling distro. That fixed the libssl issue, but then, I encountered an issue with a version mismatch of glibc, which is apparently required by nodejs.

uname -a: Linux hostname 5.10.0-22-amd64 #1 SMP Debian 5.10.178-3 (2023-04-22) x86_64 GNU/Linux

yarn run v1.22.19
$ cd server && yarn dev
$ NODE_ENV=development nodemon --ignore documents --ignore vector-cache --trace-warnings index.js
[nodemon] 2.0.22
[nodemon] to restart at any time, enter `rs`
[nodemon] watching path(s): *.*
[nodemon] watching extensions: js,mjs,json
[nodemon] starting `node --trace-warnings index.js`
Example app listening on port 3001
/lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.35' not found (required by /home/<username>/git/anything-llm/server/node_modules/vectordb/x86_64-unknown-linux-gnu.node) Error: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.35' not found (required by /home/<username>/git/anything-llm/server/node_modules/vectordb/x86_64-unknown-linux-gnu.node)
    at Module._extensions..node (node:internal/modules/cjs/loader:1338:18)
    at Module.load (node:internal/modules/cjs/loader:1117:32)
    at Module._load (node:internal/modules/cjs/loader:958:12)
    at Module.require (node:internal/modules/cjs/loader:1141:19)
    at require (node:internal/modules/cjs/helpers:110:18)
    at getPlatformLibrary (/home/<username>/git/anything-llm/server/node_modules/vectordb/native.js:23:16)
    at Object.<anonymous> (/home/<username>/git/anything-llm/server/node_modules/vectordb/native.js:33:21)
    at Module._compile (node:internal/modules/cjs/loader:1254:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)
    at Module.load (node:internal/modules/cjs/loader:1117:32) {
  code: 'ERR_DLOPEN_FAILED'
}

This err is raised when the frontend is launched and opened in a browser. I haven't been able to fix the glibc issue, as opposed to libssl. I am afraid that it would mess up my Debian install.

Thanks

Segmentation fault (core dumped) under Ubuntu 22.04

I am experiencing a segfault when submitting a second request in a new workspace with 117 embedded docs (~169k tokens).

..[$] <()> yarn dev:server
yarn run v1.22.19
$ cd server && yarn dev
$ NODE_ENV=development nodemon --ignore documents --ignore vector-cache --trace-warnings index.js
[nodemon] 2.0.22
[nodemon] to restart at any time, enter `rs`
[nodemon] watching path(s): *.*
[nodemon] watching extensions: js,mjs,json
[nodemon] starting `node --trace-warnings index.js`
Example app listening on port 3001
SELECT * FROM workspaces WHERE slug = 'obsidian'
SELECT * FROM workspaces  
SELECT * FROM workspace_documents WHERE workspaceId = 1 
SELECT * FROM workspaces WHERE slug = 'obsidian'
SELECT * FROM workspace_documents WHERE workspaceId = 1 
SELECT * FROM workspace_chats WHERE workspaceId = 1 AND include = true  ORDER BY id ASC
SELECT * FROM workspaces WHERE slug = 'obsidian'
SELECT * FROM workspace_documents WHERE workspaceId = 1 
SELECT * FROM workspaces WHERE slug = 'obsidian'
SELECT * FROM workspace_documents WHERE workspaceId = 1 
Segmentation fault (core dumped)
[nodemon] app crashed - waiting for file changes before starting...

browser console:

TypeError: NetworkError when attempting to fetch resource. [workspace.js:51:17](http://localhost:3002/src/models/workspace.js)
    chatResult workspace.js:51
    (Async: promise callback)
    sendChat workspace.js:50
    fetchReply index.jsx:49
    ChatContainer index.jsx:66
    React 15
    handleSubmit index.jsx:32
    captureEnter index.jsx:22
    React 23
    <anonymous> main.jsx:9

I noticed that "resaving" a workspace fixes the issue, but only until a second request is sent. Sending one works, but the second one causes the segfault. At that point, restarting the server first and the frontend second and then resaving the workspace in the frontend, once again, fixes the issue temporarily, in the same manner. The core that was dumped is 337MB and appears to contain sensitive info, so not sharing now.

unable to run collector; not able to install whaaaat (mac os 12.2.1)

Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/private/var/folders/f2/k2pm6hv14vdb975wm5msdk540000gn/T/pip-install-8mictb59/whaaaaat_da2d91bcef8244be854586bbc3776280/setup.py", line 10, in
long_description = pypandoc.convert('README.md', format='md', to='rst')
AttributeError: module 'pypandoc' has no attribute 'convert'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed