The program doesn't seem to "remember" what was said previously, so it's difficult to

The model can't seem to keep track of a conversation. about alpaca.cpp HOT 7 OPEN

athu16 commented on May 19, 2024

The model can't seem to keep track of a conversation.

from alpaca.cpp.

Comments (7)

salmon-coder commented on May 19, 2024 1

If you're willing to manually retype the conversation history, then you can get your question answered, like so:

from alpaca.cpp.

athu16 commented on May 19, 2024

If you're willing to manually retype the conversation history, then you can get your question answered, like so:

Thanks! I guess that'll do for now.
Hoping that it is integrated within the program itself... I don't think the original llama.cpp repo has this issue.

from alpaca.cpp.

salmon-coder commented on May 19, 2024

After playing around with it some more, I'm somewhat more confused -- but I no longer think that the model doesn't have 'conversational memory'.

Also, the chat.cpp file is identical in this repo vs the one it was forked from, so that suggests that the chat logic is the same

Yet even if it can sometimes 'remember previous conversation', it does so only very intermittently, so imo your original report is basically correct, there is a lot of engineering work we can do here to improve the model's conversational memory

from alpaca.cpp.

salmon-coder commented on May 19, 2024

I am working on a version that more explicitly conveys the idea to Llama that there is a single-threaded conversation and its job is only to respond to the user. Curious whether anybody else has made any kind of significant progress with this.

from alpaca.cpp.

abrahambone commented on May 19, 2024

I have also seen a few cases of indisputable conversational memory across 2 or 3 separate questions, but it's been very rare. No time to work on this myself, unfortunately, but I look forward to seeing what folks come up with to make it a properly conversational tool.

from alpaca.cpp.

kha84 commented on May 19, 2024

I guess the biggest problem will be - the "emulated" conversational memory, i.e. when you add the whole (or just summary of) your previous conversation as a part of your prompt, will quickly hit the limit of number of tokens this model can take as an input.

This video explains it quite nicely - https://www.youtube.com/watch?v=VW5LBavIfY4&feature=youtu.be

from alpaca.cpp.

dan-dean commented on May 19, 2024

I am working on a version that more explicitly conveys the idea to Llama that there is a single-threaded conversation and its job is only to respond to the user. Curious whether anybody else has made any kind of significant progress with this.

https://github.com/deep-diver/Alpaca-LoRA-Serve

Implements a functional context system and has a demo running on a cloud instance which shows promise. My local testing shows that alpaca.cpp looks like it doesn't remember history, which makes me confused about the -c and --ctx_size params for alpaca.cpp because they clearly don't work.
Their(LoRA-Serve) implementation is targeted towards CPUs with the VRAM capacity to run these models, unlike the CPU based alpaca.cpp. Seeing it refactored for CPU applications would be nice.

from alpaca.cpp.

The model can't seem to keep track of a conversation. about alpaca.cpp HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent