Comments (19)
Yes, the context length is -1 because ollama doesn't provide the context size, nor does OpenAI etc.
But I just added other changes so 4096 is default to ease the UX.
from h2ogpt.
FYI I made ollama work for listing models. It doesn't have OpenAI API for it, but I used their native version. check out the new load button and see this description: https://github.com/h2oai/h2ogpt/blob/main/docs/README_ui.md#models-tab
github is slow to update the picture, but it has a new button on top now for loading models from the server.
So at least now you don't have to manually select the model or context length to get something going.
from h2ogpt.
Can you give me a the ollama command you are using? Nominally you might have to setup the "url" part in some way to tell it which port, key, etc. like described here:
https://github.com/h2oai/h2ogpt/blob/main/src/gen.py#L550-L606
from h2ogpt.
It's http://yourIP::11434 or it could also be /v1
But those don't work...
Is there anything else we'd need to add other than the server URL? It would also help if there was some feedback as to whether there was a successful connection to the API. Like, there's not even an "OK" or "accept" button to say "take these settings now, I've finished editing". You just tab back over to chat and try it?
I actually can't tell if h2o is built to connect to other LLM apps like Ollama, as it seems like h2o wants to download and run its own model rather than hook into something else.
from h2ogpt.
I'm sure it's very easy, just never ran ollama. In those docs I shared, one would do something like:
vllm_chat:https://IP/v1
You can't just put the url because we support a variety of inference servers and we can't distinguish otherwise.
from h2ogpt.
I actually can't tell if h2o is built to connect to other LLM apps like Ollama, as it seems like h2o wants to download and run its own model rather than hook into something else.
It definitely supports many inference servers including OpenAI types, we use all the time. It's easiest to specify --inference_server
on the CLI for stable use, or yes you can specify in the UI if you specify base model (i.e. what endpoint says model name is) and inference server string.
from h2ogpt.
So the model name in the model settings tab in h2o, has to match the name of the model I've loaded in Ollama?
What if I'm trying out some models, I have to keep changing the name to match?
I am not using CLI but gui. What is the "inference server string" please? There's nothing like that on the models tab.
Basically I just launch to get to the gui.
conda activate h2ogpt
cd C:\AI\Text\h2ogpt
python generate.py
Then I assumed I could just put in the Ollama API URL
Does it go in "Choose/Enter Base Model" as that seems to take a name or a URL
or
Does it go in "Choose/Enter Server"
Would you guys consider putting something in the docs or doing a YT video. Your YT channel is super bare, but maybe your target market is giant enterprises and not enthusiasts.
Currently you have the most configurable RAG options out of any local LLM app I can find, and I've been looking a LONG time at many options.
You could really play on that strength but you seem to have no one on socials or youtube telling everyone about it.
Edit:
As far as I can see from chats, discords, reddit. AnythingLLM is your biggest competitor as far as end-users (individuals) are concerned. The difference is h2o is a lot harder to install, and your RAG options are better (I think).
But AnythingLLM is easier to set up and connect to Ollama - they even have the Ollama option pre-prepared in the GUI.
This will show you what I mean and from about 6min shows entering the API URL. Anything auto detects the model that Ollama has loaded.
https://www.youtube.com/watch?v=IJYC6zf86lU
from h2ogpt.
I've shared the answer to your question, but I'm not sure you've tried it. It's not hard at all.
base model: Needs to be name of model running in olllama
server: vllm_chat:https://IP/v1 or vllm:https://IP/v1
prompt_type: Set to correct prompt type
E.g.
base model: h2oai/h2ogpt-4096-llama2-70b-chat
server: vllm:192.168.1.64:5000
prompt_type: llama2
use "vllm_chat" to let ollama fully handle prompting, then prompt_type should be "plain"
Click load button at top.
from h2ogpt.
Will keep this issue open to improve, e.g. when enter server, get model list and populate the base models with that.
from h2ogpt.
I've shared the answer to your question, but I'm not sure you've tried it. It's not hard at all.
I mentioned that some of the difficulty comes from semi-ambiguous naming of the inputs/buttons.
base model: Needs to be name of model running in olllama server: vllm_chat:https://IP/v1 or vllm:https://IP/v1prompt_type: Set to correct prompt type
Apps like AnythingLLM auto-detect this. I pointed out if someone has a range of local models, if having an exact-name-match is a requirement, it's a point of friction.
E.g.
base model: h2oai/h2ogpt-4096-llama2-70b-chat server: vllm:192.168.1.64:5000 prompt_type: llama2
use "vllm_chat" to let ollama fully handle prompting, then prompt_type should be "plain"
So it doesn't work at all without vllm_chat? I hadn't seen that in the docs anywhere until you've mentioned it in this thread (buried deep? Please consider surfacing this to help other local-LLM users). I entered the API url without vllm: and nothing worked - so I did try it. I did also try:
Click load button at top.
But I still couldn't get it to respond in chat.
When clicking load I get:
Error
Incorrect path_or_model_id: 'phind-codellama:34b-v2-q6_K'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Your screenshot isn't showing a working connection to Ollama? Ollama doesn't use port 5000.
I have tried my machine IP, doesn't work
vllm_chat:192.168.0.217:11434/v1
The Ollama API URL, doesn't work
vllm_chat: 127.0.0.1:11434/v1
Taking off the /v1 on either, doesn't work. Everytime, red box pops up with error. So it's not as easy as you say?
I've done what you said, I've tried to troubleshoot this myself.
What are you using, if you're not using Ollama?
If you're not using Ollama, can you maybe please try (I can see from your port you're not) and THEN tell me if I'm lazy and not trying.
As for changing the title of the thread, since Ollama still isn't connecting with h2o I think it should still be "way to use h2o with Ollama"???? Not just the UI issue.
Example screenshots, as I said I tried many variations of the server with/without v1, using the Ollama IP, using my IP, v4 and v6 IP etc. Nothing connects.
By contrast
Installed AnythingLLM, plugged in http://127.0.0.1:11434 and boom, it found Phind already running, took 1 second.
Like I said, h2o has better RAG options, so I would prefer to use it if possible.
from h2ogpt.
Ok I'll try ollama and report back.
from h2ogpt.
Yes it was super easy to use ollama in h2oGPT. Just noted from their instructions that the port is 11434 by default: https://github.com/ollama/ollama/blob/main/docs/openai.md#openai-python-library
ollama llama2
then separate in h2oGPT run:
python generate.py
then in UI go to models tab and set:
base model: llama2
server: vllm_chat:http://localhost:11434/v1/
prompt_type: plain
max_seq_len: 4096
Then click load button. The max_seq_len is in the right side bar in "context length" one.
Then use:
from h2ogpt.
It's not buried. We explain everything about inference servers in docs/readme_InferenceServers.md that's linked in main readme.md. I added more docs for it and added ollama as supported in main readme.md.
from h2ogpt.
ollama does not have a correct OpenAI compatible endpoint for listing models.
from h2ogpt.
ollama does not have a correct OpenAI compatible endpoint for listing models.
Ok my bad then, I was going on what I saw/read. Maybe I have misinterpreted this
https://ollama.com/blog/openai-compatibility
https://www.youtube.com/watch?v=38jlvmBdBrU
Doing exactly what you said through does not work for me.
Possibly I need a git pull? Would ability to do this have been added in the past month?
Well, I did a git pull now I have an error.
line 47, in <module>
from langchain_mistralai import ChatMistralAI
ModuleNotFoundError: No module named 'langchain_mistralai'
Maybe the issue is also the URL, so instead of 127.etc or 192.myipaddress.etc
I have to use "localhost"? As that's a key difference in your latest reply.
Which is also odd because other apps that hook into Ollama don't use localhost they use the regular IP/Url.
I have to solve the new error from pulling the newest before I can try it again.
from h2ogpt.
OK, it works.
Thanks for taking the time to troubleshoot this. I appreciate it greatly.
I am not sure what solved it exactly -
- server format of localhost as opposed to explicit IP
- git pull to update h2o itself
or
3) and this is a funny one - make sure context length is not the default of -1.
I did both the above (1 and 2) and it still didn't work, got the red popup error.
Then notice context length...
The default context length given to me by h2o was -1.
As soon as I changed it to 4096, the model loaded.
I never even looked at the context length, because it should run with just about anything. 256, 2048, 4096. But -1? I would not think to myself "hey I better check to see if it's got a negative context length on a fresh instal!!".
Anyway, a gotcha for new people to watch out for, if anyone finds this issue in the future.
All solved, thanks kindly again.
from h2ogpt.
This is further improved to handle some corners. And this is fixed: #1452
from h2ogpt.
langchain_mistralai was added 2 days ago in requirements_optional_langchain.txt
from h2ogpt.
This is great
Today is a good day
Thank you!
from h2ogpt.
Related Issues (20)
- vLLM GROQ issue HOT 1
- Mac OS auto installer doesn't work after manual uninstallation
- RuntimeError: An error occurred while downloading using `hf_transfer`. HOT 1
- python dependency module version tweaks HOT 1
- AWQ Model Works from UI in Windows, But Fails When Launched from .bat File HOT 6
- Rest API for inference locally HOT 5
- HuggingFaceM4/idefics2-8b as vision model
- How to delete content in user_paste HOT 2
- Can you make_db from documents stored on another (for example, PostgreSQL) HOT 2
- No way to save prompt/response pairs in a database?
- error intalling from linux_install_full.sh HOT 5
- Failed to import transformers.pipelines HOT 6
- Intel ARC GPU support
- Document Storage HOT 2
- How should I upload my personal data to the h2o website I deployed and make it persistent? HOT 1
- Collection Selection showen multiple times HOT 1
- ValueError: load_in_8bit must be a boolean HOT 5
- Question: correct prompts template for llama3-instruct HOT 9
- httpx.ConnectError with --openai_server=True --ssl-verify=False HOT 12
- h2ogpt on ubuntu server HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2ogpt.