Giter VIP home page Giter VIP logo

Comments (7)

salmon-coder avatar salmon-coder commented on May 19, 2024 1

If you're willing to manually retype the conversation history, then you can get your question answered, like so:

Screenshot from 2023-03-18 11-11-05

from alpaca.cpp.

athu16 avatar athu16 commented on May 19, 2024

If you're willing to manually retype the conversation history, then you can get your question answered, like so:

Thanks! I guess that'll do for now.
Hoping that it is integrated within the program itself... I don't think the original llama.cpp repo has this issue.

from alpaca.cpp.

salmon-coder avatar salmon-coder commented on May 19, 2024

After playing around with it some more, I'm somewhat more confused -- but I no longer think that the model doesn't have 'conversational memory'.

Also, the chat.cpp file is identical in this repo vs the one it was forked from, so that suggests that the chat logic is the same

Screenshot from 2023-03-18 17-28-39

Yet even if it can sometimes 'remember previous conversation', it does so only very intermittently, so imo your original report is basically correct, there is a lot of engineering work we can do here to improve the model's conversational memory

from alpaca.cpp.

salmon-coder avatar salmon-coder commented on May 19, 2024

I am working on a version that more explicitly conveys the idea to Llama that there is a single-threaded conversation and its job is only to respond to the user. Curious whether anybody else has made any kind of significant progress with this.

from alpaca.cpp.

abrahambone avatar abrahambone commented on May 19, 2024

I have also seen a few cases of indisputable conversational memory across 2 or 3 separate questions, but it's been very rare. No time to work on this myself, unfortunately, but I look forward to seeing what folks come up with to make it a properly conversational tool.

from alpaca.cpp.

kha84 avatar kha84 commented on May 19, 2024

I guess the biggest problem will be - the "emulated" conversational memory, i.e. when you add the whole (or just summary of) your previous conversation as a part of your prompt, will quickly hit the limit of number of tokens this model can take as an input.

This video explains it quite nicely - https://www.youtube.com/watch?v=VW5LBavIfY4&feature=youtu.be

from alpaca.cpp.

dan-dean avatar dan-dean commented on May 19, 2024

I am working on a version that more explicitly conveys the idea to Llama that there is a single-threaded conversation and its job is only to respond to the user. Curious whether anybody else has made any kind of significant progress with this.

https://github.com/deep-diver/Alpaca-LoRA-Serve

Implements a functional context system and has a demo running on a cloud instance which shows promise. My local testing shows that alpaca.cpp looks like it doesn't remember history, which makes me confused about the -c and --ctx_size params for alpaca.cpp because they clearly don't work.
Their(LoRA-Serve) implementation is targeted towards CPUs with the VRAM capacity to run these models, unlike the CPU based alpaca.cpp. Seeing it refactored for CPU applications would be nice.

from alpaca.cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.