Hi Is anyone using this - or CAN it even be used - with Ollama? For the life o

How to use ollama about h2ogpt HOT 19 CLOSED

ewebgh33 commented on June 7, 2024

How to use ollama

from h2ogpt.

Comments (19)

pseudotensor commented on June 7, 2024 1

Yes, the context length is -1 because ollama doesn't provide the context size, nor does OpenAI etc.

But I just added other changes so 4096 is default to ease the UX.

from h2ogpt.

pseudotensor commented on June 7, 2024 1

FYI I made ollama work for listing models. It doesn't have OpenAI API for it, but I used their native version. check out the new load button and see this description: https://github.com/h2oai/h2ogpt/blob/main/docs/README_ui.md#models-tab

github is slow to update the picture, but it has a new button on top now for loading models from the server.

So at least now you don't have to manually select the model or context length to get something going.

from h2ogpt.

pseudotensor commented on June 7, 2024

Can you give me a the ollama command you are using? Nominally you might have to setup the "url" part in some way to tell it which port, key, etc. like described here:

https://github.com/h2oai/h2ogpt/blob/main/src/gen.py#L550-L606

from h2ogpt.

ewebgh33 commented on June 7, 2024

It's http://yourIP::11434 or it could also be /v1
But those don't work...
Is there anything else we'd need to add other than the server URL? It would also help if there was some feedback as to whether there was a successful connection to the API. Like, there's not even an "OK" or "accept" button to say "take these settings now, I've finished editing". You just tab back over to chat and try it?

I actually can't tell if h2o is built to connect to other LLM apps like Ollama, as it seems like h2o wants to download and run its own model rather than hook into something else.

from h2ogpt.

pseudotensor commented on June 7, 2024

I'm sure it's very easy, just never ran ollama. In those docs I shared, one would do something like:

vllm_chat:https://IP/v1

You can't just put the url because we support a variety of inference servers and we can't distinguish otherwise.

from h2ogpt.

pseudotensor commented on June 7, 2024

I actually can't tell if h2o is built to connect to other LLM apps like Ollama, as it seems like h2o wants to download and run its own model rather than hook into something else.

It definitely supports many inference servers including OpenAI types, we use all the time. It's easiest to specify --inference_server on the CLI for stable use, or yes you can specify in the UI if you specify base model (i.e. what endpoint says model name is) and inference server string.

from h2ogpt.

ewebgh33 commented on June 7, 2024

So the model name in the model settings tab in h2o, has to match the name of the model I've loaded in Ollama?
What if I'm trying out some models, I have to keep changing the name to match?

I am not using CLI but gui. What is the "inference server string" please? There's nothing like that on the models tab.

Basically I just launch to get to the gui.

conda activate h2ogpt
cd C:\AI\Text\h2ogpt
python generate.py

Then I assumed I could just put in the Ollama API URL
Does it go in "Choose/Enter Base Model" as that seems to take a name or a URL
or
Does it go in "Choose/Enter Server"
Would you guys consider putting something in the docs or doing a YT video. Your YT channel is super bare, but maybe your target market is giant enterprises and not enthusiasts.

Currently you have the most configurable RAG options out of any local LLM app I can find, and I've been looking a LONG time at many options.
You could really play on that strength but you seem to have no one on socials or youtube telling everyone about it.

Edit:
As far as I can see from chats, discords, reddit. AnythingLLM is your biggest competitor as far as end-users (individuals) are concerned. The difference is h2o is a lot harder to install, and your RAG options are better (I think).
But AnythingLLM is easier to set up and connect to Ollama - they even have the Ollama option pre-prepared in the GUI.
This will show you what I mean and from about 6min shows entering the API URL. Anything auto detects the model that Ollama has loaded.
https://www.youtube.com/watch?v=IJYC6zf86lU

from h2ogpt.

pseudotensor commented on June 7, 2024

I've shared the answer to your question, but I'm not sure you've tried it. It's not hard at all.

base model: Needs to be name of model running in olllama
server: vllm_chat:https://IP/v1 or vllm:https://IP/v1
prompt_type: Set to correct prompt type

E.g.

base model: h2oai/h2ogpt-4096-llama2-70b-chat
server: vllm:192.168.1.64:5000
prompt_type: llama2

use "vllm_chat" to let ollama fully handle prompting, then prompt_type should be "plain"

Click load button at top.

from h2ogpt.

pseudotensor commented on June 7, 2024

Will keep this issue open to improve, e.g. when enter server, get model list and populate the base models with that.

from h2ogpt.

ewebgh33 commented on June 7, 2024

I've shared the answer to your question, but I'm not sure you've tried it. It's not hard at all.

I mentioned that some of the difficulty comes from semi-ambiguous naming of the inputs/buttons.

base model: Needs to be name of model running in olllama server: vllm_chat:https://IP/v1 or vllm:https://IP/v1prompt_type: Set to correct prompt type

Apps like AnythingLLM auto-detect this. I pointed out if someone has a range of local models, if having an exact-name-match is a requirement, it's a point of friction.

E.g.

base model: h2oai/h2ogpt-4096-llama2-70b-chat server: vllm:192.168.1.64:5000 prompt_type: llama2

use "vllm_chat" to let ollama fully handle prompting, then prompt_type should be "plain"

So it doesn't work at all without vllm_chat? I hadn't seen that in the docs anywhere until you've mentioned it in this thread (buried deep? Please consider surfacing this to help other local-LLM users). I entered the API url without vllm: and nothing worked - so I did try it. I did also try:

Click load button at top.

But I still couldn't get it to respond in chat.
When clicking load I get:

Error
Incorrect path_or_model_id: 'phind-codellama:34b-v2-q6_K'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

Your screenshot isn't showing a working connection to Ollama? Ollama doesn't use port 5000.

I have tried my machine IP, doesn't work
vllm_chat:192.168.0.217:11434/v1

The Ollama API URL, doesn't work
vllm_chat: 127.0.0.1:11434/v1

Taking off the /v1 on either, doesn't work. Everytime, red box pops up with error. So it's not as easy as you say?
I've done what you said, I've tried to troubleshoot this myself.
What are you using, if you're not using Ollama?
If you're not using Ollama, can you maybe please try (I can see from your port you're not) and THEN tell me if I'm lazy and not trying.

As for changing the title of the thread, since Ollama still isn't connecting with h2o I think it should still be "way to use h2o with Ollama"???? Not just the UI issue.

Example screenshots, as I said I tried many variations of the server with/without v1, using the Ollama IP, using my IP, v4 and v6 IP etc. Nothing connects.

By contrast
Installed AnythingLLM, plugged in http://127.0.0.1:11434 and boom, it found Phind already running, took 1 second.
Like I said, h2o has better RAG options, so I would prefer to use it if possible.

from h2ogpt.

pseudotensor commented on June 7, 2024

Ok I'll try ollama and report back.

from h2ogpt.

pseudotensor commented on June 7, 2024

Yes it was super easy to use ollama in h2oGPT. Just noted from their instructions that the port is 11434 by default: https://github.com/ollama/ollama/blob/main/docs/openai.md#openai-python-library

ollama llama2

then separate in h2oGPT run:

python generate.py

then in UI go to models tab and set:

base model: llama2
server: vllm_chat:http://localhost:11434/v1/
prompt_type: plain
max_seq_len: 4096

Then click load button. The max_seq_len is in the right side bar in "context length" one.

Then use:

from h2ogpt.

pseudotensor commented on June 7, 2024

It's not buried. We explain everything about inference servers in docs/readme_InferenceServers.md that's linked in main readme.md. I added more docs for it and added ollama as supported in main readme.md.

from h2ogpt.

pseudotensor commented on June 7, 2024

ollama does not have a correct OpenAI compatible endpoint for listing models.

from h2ogpt.

ewebgh33 commented on June 7, 2024

ollama does not have a correct OpenAI compatible endpoint for listing models.

Ok my bad then, I was going on what I saw/read. Maybe I have misinterpreted this
https://ollama.com/blog/openai-compatibility
https://www.youtube.com/watch?v=38jlvmBdBrU

Doing exactly what you said through does not work for me.
Possibly I need a git pull? Would ability to do this have been added in the past month?
Well, I did a git pull now I have an error.

 line 47, in <module>
    from langchain_mistralai import ChatMistralAI
ModuleNotFoundError: No module named 'langchain_mistralai'

Maybe the issue is also the URL, so instead of 127.etc or 192.myipaddress.etc
I have to use "localhost"? As that's a key difference in your latest reply.
Which is also odd because other apps that hook into Ollama don't use localhost they use the regular IP/Url.

I have to solve the new error from pulling the newest before I can try it again.

from h2ogpt.

ewebgh33 commented on June 7, 2024

OK, it works.

Thanks for taking the time to troubleshoot this. I appreciate it greatly.

I am not sure what solved it exactly -

server format of localhost as opposed to explicit IP
git pull to update h2o itself

or
3) and this is a funny one - make sure context length is not the default of -1.
I did both the above (1 and 2) and it still didn't work, got the red popup error.
Then notice context length...
The default context length given to me by h2o was -1.

As soon as I changed it to 4096, the model loaded.

I never even looked at the context length, because it should run with just about anything. 256, 2048, 4096. But -1? I would not think to myself "hey I better check to see if it's got a negative context length on a fresh instal!!".

Anyway, a gotcha for new people to watch out for, if anyone finds this issue in the future.

All solved, thanks kindly again.

from h2ogpt.

pseudotensor commented on June 7, 2024

This is further improved to handle some corners. And this is fixed: #1452

from h2ogpt.

pseudotensor commented on June 7, 2024

langchain_mistralai was added 2 days ago in requirements_optional_langchain.txt

from h2ogpt.

ewebgh33 commented on June 7, 2024

This is great
Today is a good day
Thank you!

from h2ogpt.

How to use ollama about h2ogpt HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent