Giter VIP home page Giter VIP logo

Comments (11)

Infinoid avatar Infinoid commented on July 30, 2024 1

Understood, thanks for the clarification. I will need to investigate this further.

No problem. Good luck with it!

If you don't mind, can you tell me which version of Ollama are you running? ollama -v
The reason I'm asking about the Ollama version is because parallel inference is a relatively recent feature:

Good to know. I've only started using it very recently, so I was unfamiliar with the history.

% podman exec ollama ollama -v
ollama version is 0.2.8

from hollama.

Infinoid avatar Infinoid commented on July 30, 2024

Here are some other things I checked:

  • It doesn't seem to matter which model is used; their outputs are all jumbled in the same way.
  • I am able to send two prompts to my ollama service at the same time using curl, and the responses look just fine.
  • In Chrome Devtools, I can see two /api/generate requests running in parallel, and their responses both look fine.

In both curl and chrome devtools, if I ignore all of the JSON and just look at the "response" fields, each story is readable. So I think it's a problem with the UI applying responses to the wrong session tab.

from hollama.

fmaclen avatar fmaclen commented on July 30, 2024

@Infinoid thanks for the bug report.

I'm currently working on a PR to fix another issue that I think would also address this one (though maybe partially).

In fact, can you try it out?
https://api-chat-endpoint.hollama.pages.dev/

The implementation in that environment is not 100% finished but I just did a quick test with 2 completions running simultaneously in 2 tabs and they didn't get jumbled 🤞

What I did notice is that the Ollama's GPU usage remained high after the completions on both sessions were over, and it coincidentally went down after I closed the tabs.

I also noticed that the completion on the 2nd tab wasn't saved to localStorage when I closed it, which would indicate it probably got overwritten by the 1st completion process.

from hollama.

Infinoid avatar Infinoid commented on July 30, 2024

I should mention that this bug is much more visible when using a slow model, for instance a large model that doesn't fit into GPU memory. If it takes 15 or more seconds to generate the full response, it should be easy to reproduce this issue.

from hollama.

Infinoid avatar Infinoid commented on July 30, 2024

In fact, can you try it out? https://api-chat-endpoint.hollama.pages.dev/

I tried it, and it behaves a bit differently.

I used gemma2:27b for this one, because it's nice and slow.

I asked one session to write a story about a frog. Then I asked a second session to write a story about a stick insect.

It's still running... but what I see right now is that the first session doesn't have a response, and the second session looks like it started in the middle of a frog story. I don't see two interleaved responses, but one of the responses is truncated and appears in the wrong tab, while the other response isn't visible at all.

image

from hollama.

Infinoid avatar Infinoid commented on July 30, 2024

Upon completion, I see two responses in the second session tab:

  1. a truncated frog story
  2. a truncated frog story, with a full stick insect story immediately after it.

Here's the point where the first story ended and the second story started:

After all, there were countless worlds to explore, countless stories waiting to be told, and Ferdinand, the Emerald Prophet frog, was just getting started.Bartholomew "Bart" Branchington had always prided himself on his camouflage.

Searching through both session tabs, I only see my second (stick insect) prompt; my first (frog) prompt has vanished.

from hollama.

fmaclen avatar fmaclen commented on July 30, 2024

Understood, thanks for the clarification. I will need to investigate this further.

If you don't mind, can you tell me which version of Ollama are you running? ollama -v
The reason I'm asking about the Ollama version is because parallel inference is a relatively recent feature: https://github.com/ollama/ollama/releases/tag/v0.2.0

from hollama.

fmaclen avatar fmaclen commented on July 30, 2024

The I'm no longer able to replicate the issue with the "jumbled responses" in https://hollama.fernando.is now that #125 is merged.

Here's a video of 2 simultaneous completions using gemma2:27b with the prompts:

Could you write an isekai very short story where the protagonist is a STICK INSECT?
Could you write an isekai very short story where the protagonist is a FROG?
parallel.completions.mp4

The completion was speed up 8x during video editing.

That being said, there is another bug present (visible in the video) in which the session finishes last will override the first one during saveSession(): #127

from hollama.

Infinoid avatar Infinoid commented on July 30, 2024

To clarify, I was running in a single browser tab. I clicked the "New session" button and was switching between those two. So when I said "session tab", that's what I was referring to.

I don't know how having 2 separate browser tabs affects this. That should be two separate instances of the hollama application, right?

from hollama.

fmaclen avatar fmaclen commented on July 30, 2024

To clarify, I was running in a single browser tab. I clicked the "New session" button and was switching between those two.

Understood, thanks for the clarification. I was worried I overlooked a detail in your initial report, that's why I didn't close the issue 😅

Indeed, starting a new session (in the same tab) while another one is running totally causes jumbled/broken completions (and other issues too).

Fixing that use-case is not trivial and will likely involve a large refactor which I'm hesitant to do at this point.
That being said, the app should:

  • Disable the "New session" button while a completion is in progress.
  • Or abort the current completion if "New session" is clicked.

I don't know how having 2 separate browser tabs affects this. That should be two separate instances of the hollama application, right?

No, it's the same instance, unless you open the 2nd tab in Incognito mode.

from hollama.

Infinoid avatar Infinoid commented on July 30, 2024

Indeed, starting a new session (in the same tab) while another one is running totally causes jumbled/broken completions (and other issues too).

Fixing that use-case is not trivial and will likely involve a large refactor which I'm hesitant to do at this point. That being said, the app should:

  • Disable the "New session" button while a completion is in progress.
  • Or abort the current completion if "New session" is clicked.

If two sessions already existed, you can flip between them and submit new queries in both; that's where the confusion happens. So maybe make those other sessions unclickable, too.

I think this approach is workable for now. As long as the user doesn't expect it to work, they won't complain when it doesn't. :)

Thanks for confirming that the issue is real, glad I'm not going crazy!

from hollama.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.