Enter an instruction: what is 2+2? According to basic arithmetic, the sum of two (

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

How to --add_chat_history_to_context=True ? about h2ogpt HOT 8 CLOSED

planetMatrix commented on June 7, 2024

How to --add_chat_history_to_context=True ?

from h2ogpt.

Comments (8)

pseudotensor commented on June 7, 2024

add_chat_history_to_context is by default true.

Under what case are you having some issue with history? cli mode?

from h2ogpt.

planetMatrix commented on June 7, 2024

yes cli

from h2ogpt.

pseudotensor commented on June 7, 2024

Try again, I fixed some excessive limits. But also consider making max_seq_len larger to 2048 and max_new_tokens smaller like 512.

from h2ogpt.

pseudotensor commented on June 7, 2024

from h2ogpt.

planetMatrix commented on June 7, 2024

Perfect! This is working marvelously.

However is there any way to enable it when --langchain_mode=UserData?

Many thanks for your unwavering support really appreciate it.

from h2ogpt.

pseudotensor commented on June 7, 2024

Is it not working with langchain mode enabled?

from h2ogpt.

planetMatrix commented on June 7, 2024

No it is not unfortunately

from h2ogpt.

pseudotensor commented on June 7, 2024

Your choices of CLI options is too limiting. The 2nd round of chat would have something quite large in terms of number of tokens, esp. if you allow 1024 new tokens and 1024 is max tokens.

Also, your choice of llamacpp_dict is very limiting in terms of speed, it's about 10x slower than defaults.

Also, you don't need to pass load_4bit (unused for llama.cpp models) or add_chat_history_to_context (default is True).

A better version that is fast and can handle the long history with documents is (zephyr is not good for chat history + rag):

CUDA_VISIBLE_DEVICES=0 python generate.py --cli=True --langchain_model=UserData --score_model=None --base_model=https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf --prompt_type=mistral --max_seq_len=2048 --max_new_tokens=128 --top_k_docs=3 --metadata_in_context=False --chunk-size=128 --add_disk_models_to_ui=False --pre_load_embedding_model=True --embedding_gpu_id=cpu --cut_distance=10000 --hf_embedding_model=BAAI/bge-base-en-v1.5

But as you see, it still can't really tell difference between what is in the prior user prompt via documents and document prompting vs. you as user prompt. Because we add additional prompting when doing docQA, not just your question.

from h2ogpt.

Recommend Projects

How to --add_chat_history_to_context=True ? about h2ogpt HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent