Giter VIP home page Giter VIP logo

Comments (8)

pseudotensor avatar pseudotensor commented on June 7, 2024

add_chat_history_to_context is by default true.

Under what case are you having some issue with history? cli mode?

from h2ogpt.

planetMatrix avatar planetMatrix commented on June 7, 2024

yes cli

from h2ogpt.

pseudotensor avatar pseudotensor commented on June 7, 2024

Try again, I fixed some excessive limits. But also consider making max_seq_len larger to 2048 and max_new_tokens smaller like 512.

from h2ogpt.

pseudotensor avatar pseudotensor commented on June 7, 2024

image

from h2ogpt.

planetMatrix avatar planetMatrix commented on June 7, 2024

Perfect! This is working marvelously.

However is there any way to enable it when --langchain_mode=UserData?

Many thanks for your unwavering support really appreciate it.

from h2ogpt.

pseudotensor avatar pseudotensor commented on June 7, 2024

Is it not working with langchain mode enabled?

from h2ogpt.

planetMatrix avatar planetMatrix commented on June 7, 2024

No it is not unfortunately

from h2ogpt.

pseudotensor avatar pseudotensor commented on June 7, 2024

Your choices of CLI options is too limiting. The 2nd round of chat would have something quite large in terms of number of tokens, esp. if you allow 1024 new tokens and 1024 is max tokens.

Also, your choice of llamacpp_dict is very limiting in terms of speed, it's about 10x slower than defaults.

Also, you don't need to pass load_4bit (unused for llama.cpp models) or add_chat_history_to_context (default is True).

A better version that is fast and can handle the long history with documents is (zephyr is not good for chat history + rag):

CUDA_VISIBLE_DEVICES=0 python generate.py --cli=True --langchain_model=UserData --score_model=None --base_model=https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf --prompt_type=mistral --max_seq_len=2048 --max_new_tokens=128 --top_k_docs=3 --metadata_in_context=False --chunk-size=128 --add_disk_models_to_ui=False --pre_load_embedding_model=True --embedding_gpu_id=cpu --cut_distance=10000 --hf_embedding_model=BAAI/bge-base-en-v1.5

image

But as you see, it still can't really tell difference between what is in the prior user prompt via documents and document prompting vs. you as user prompt. Because we add additional prompting when doing docQA, not just your question.

from h2ogpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.