Comments (8)
add_chat_history_to_context is by default true.
Under what case are you having some issue with history? cli mode?
from h2ogpt.
yes cli
from h2ogpt.
Try again, I fixed some excessive limits. But also consider making max_seq_len larger to 2048 and max_new_tokens smaller like 512.
from h2ogpt.
from h2ogpt.
Perfect! This is working marvelously.
However is there any way to enable it when --langchain_mode=UserData?
Many thanks for your unwavering support really appreciate it.
from h2ogpt.
Is it not working with langchain mode enabled?
from h2ogpt.
No it is not unfortunately
from h2ogpt.
Your choices of CLI options is too limiting. The 2nd round of chat would have something quite large in terms of number of tokens, esp. if you allow 1024 new tokens and 1024 is max tokens.
Also, your choice of llamacpp_dict is very limiting in terms of speed, it's about 10x slower than defaults.
Also, you don't need to pass load_4bit (unused for llama.cpp models) or add_chat_history_to_context (default is True).
A better version that is fast and can handle the long history with documents is (zephyr is not good for chat history + rag):
CUDA_VISIBLE_DEVICES=0 python generate.py --cli=True --langchain_model=UserData --score_model=None --base_model=https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf --prompt_type=mistral --max_seq_len=2048 --max_new_tokens=128 --top_k_docs=3 --metadata_in_context=False --chunk-size=128 --add_disk_models_to_ui=False --pre_load_embedding_model=True --embedding_gpu_id=cpu --cut_distance=10000 --hf_embedding_model=BAAI/bge-base-en-v1.5
But as you see, it still can't really tell difference between what is in the prior user prompt via documents and document prompting vs. you as user prompt. Because we add additional prompting when doing docQA, not just your question.
from h2ogpt.
Related Issues (20)
- GGML_ ASSERT: /private/var/folders/g0/p18kgc7d571crl09kq2j1hpm0000gp/T/pip-install-mmgwywty/llama-cpp-python_7e131ef142c044a6a72760eb49448fe4/ven dor/llama.cpp/ggml-backend.c:212: offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds" Fatal Python error: Aborted HOT 23
- HYDE question HOT 12
- Improve UI for getting model list for OpenAI-compatible APIs using vllm_chat HOT 1
- Git Pull breaks install. What's the standard update process? HOT 2
- Is it possible to change the name of the ai models? HOT 1
- How can I create a REST API for Flutter SDK? HOT 2
- create Mock data from document? HOT 2
- Add capability to take soft prompt embedding instead of text embedding
- Question cut_distance vs document score HOT 1
- unable to upload txt when running from wsl HOT 3
- Error related with libpango when starting the application (Windows 11) HOT 1
- Document sources (.txt) not showing the correct HEAD value in table HOT 2
- Max output length: Maximum number of new tokens in LLM response in CLI Parameter? HOT 1
- What does H20GPT talk to? HOT 1
- [Question] 100k PDFs?! Is h2ogpt my solution? HOT 4
- Groq integration HOT 4
- option to delay startup and have inference servers keep trying until up instead of skipping them. HOT 1
- How to use exllama v2 with local GPTQ file ? HOT 3
- Failed base_model=h2oai/h2ogpt-4096-llama2-7b-chat HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from h2ogpt.