Comments (7)
An update: The LLM reloads in VRAM every time only when using a third-party embedding provider (I use Ollama snow-flake-arctic-22m). There are no issues with the original AnythingLLM embedder.
from anything-llm.
@Smocvin Are you using Ollama for your embedder or LLM or both? The issue only mentions you using Ollama as your LLM
from anything-llm.
When I use LLM -> Ollama and embedder "snow-flake-arctic-22m" -> Ollama = LLM in VRAM is uploading and dowloading
When I use LLM -> Ollama and embedder Original AnythingLLM = no issues with LLM uploading and dowloading in VRAM
from anything-llm.
It is related to this issue and would be solved by this PR:
- Ollama model unloads after 5 minutes #1585
The loading into vRAM is because we mlock
it for future call performance. The timeout will resolve that matter however if you use Ollama for embedding and LLM and the system cannot hold both models in VRAM it will kick one out to make room for the one it needs it both cannot fit.
That is what is going on here - linked PR will fix that so moving discussion there
from anything-llm.
I see the problem. The issue is not with AnythingLLM; it's with Ollama. Ollama cannot hold two models simultaneously, even though I am using small models that should fit in the VRAM.
from anything-llm.
Solution to Ollama VRAM Issue
I found the solution to the issue:
Open the file /etc/systemd/system/ollama.service.
In the [Service] section, add the following line:
Environment="OLLAMA_MAX_LOADED_MODELS=3"
Save the file and close it.
Next, execute the following commands in the terminal:
systemctl daemon-reload
systemctl restart ollama
This configuration change allows Ollama to hold up to three models simultaneously, resolving the VRAM reloading issue.
from anything-llm.
@Smocvin TIL! I did not know that was even a ENV key for ollama 😆
from anything-llm.
Related Issues (20)
- [BUG]: Doesn't work under proxy environment? HOT 1
- [FEAT]: Enahance :workspaceId/users API and models/workspace.js workspaceUsers() function
- can not load models[BUG]: HOT 1
- [BUG]: website pricing page has bad link HOT 1
- [BUG]: Failed to add document, document extraction failed. HOT 3
- [BUG]: When I attempt to use the Vector Database provided by Qdrant, the chatbot responds with the error message: ‘Could not respond to message. Fetch failed.’ HOT 1
- [BUG]: 504 Gateway Time-out HOT 2
- [FEAT]: adjustable .env file location HOT 5
- [BUG]: Unexpected Model Switch Issue HOT 3
- Invalid file upload. ENOENT: no such file or directory HOT 4
- [FEAT]: Add support for Claude 3.5 Sonnet model
- [FEAT]: Custom SERP entry HOT 3
- [BUG]: Cannot read properties of undefined (reading '0') HOT 7
- [BUG]: Some confluence doc name cannot be saved - throwing ENOENT: no such file or directory HOT 4
- [FEAT]: Generic OpenAI as Workspace Agent LLM Provider
- [FEAT]: Branch off chat to new thread
- Discord LLM bot [FEAT]: HOT 2
- [BUG]: Ollama models not loading with http://172.17.0.1:11434 on Debian 12 HOT 4
- [BUG]: ENABLE_HTTPS="false" still wants KEY and CERT PATH defined. HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from anything-llm.