Giter VIP home page Giter VIP logo

Comments (3)

woheller69 avatar woheller69 commented on July 28, 2024

It seems that the model always needs to evaluate its own previous answer as part of the prompt.
In the following examples my own new prompt was quite short every time.
Number of tokens in prompt eval is always the number of previous tokens in answer plus a few more:

llama_print_timings: load time = 2561.54 ms
llama_print_timings: sample time = 157.69 ms / 59 runs ( 2.67 ms per token, 374.15 tokens per second)
llama_print_timings: prompt eval time = 143124.12 ms / 356 tokens ( 402.03 ms per token, 2.49 tokens per second)
llama_print_timings: eval time = 28492.60 ms / 58 runs ( 491.25 ms per token, 2.04 tokens per second)
llama_print_timings: total time = 62822.10 ms / 414 tokens
Inference terminated
Llama.generate: prefix-match hit

llama_print_timings: load time = 2561.54 ms
llama_print_timings: sample time = 429.62 ms / 154 runs ( 2.79 ms per token, 358.46 tokens per second)
llama_print_timings: prompt eval time = 8277.50 ms / 88 tokens ( 94.06 ms per token, 10.63 tokens per second)
llama_print_timings: eval time = 75161.81 ms / 153 runs ( 491.25 ms per token, 2.04 tokens per second)
llama_print_timings: total time = 86088.57 ms / 241 tokens
Inference terminated
Llama.generate: prefix-match hit

llama_print_timings: load time = 2561.54 ms
llama_print_timings: sample time = 195.41 ms / 68 runs ( 2.87 ms per token, 347.98 tokens per second)
llama_print_timings: prompt eval time = 15788.26 ms / 163 tokens ( 96.86 ms per token, 10.32 tokens per second)
llama_print_timings: eval time = 33127.41 ms / 67 runs ( 494.44 ms per token, 2.02 tokens per second)
llama_print_timings: total time = 50100.81 ms / 230 tokens

Shouldn`t the model already find a tokenization of its previous answer? Or can it be that the applied chat template differs a bit from what the model used in its answer so it does not recognize it?

from llama-cpp-agent.

woheller69 avatar woheller69 commented on July 28, 2024

maybe related to this?
abetlen/llama-cpp-python#893 (comment)
My guess is that the chat template differs from that used in the model response ( maybe just a \n or whatever) and it threrefore does not recognise it anymore.
So every answer has to be processed once when it is inserted via the template for a multiturn conversation.
Can't we just use the template only for new requests and keep a record of the history exactly as the model already knows it?
For the next request we send the exact history plus the new request wrapped by the template

from llama-cpp-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.