Comments (45)
Indexing files:
src/khoj/routers/indexer.py
from myai.abn.khoj.
Initial Data: how to code them down instead of manually doing it on the frontend?
http://localhost:42110/server/admin/
from myai.abn.khoj.
Accepted files:
src/khoj/interface/web/chat.html
from myai.abn.khoj.
API to answer chat:
// Generate backend API URL to execute query
let url = /api/chat?q=${encodeURIComponent(query)}&n=${resultsCount}&client=web&stream=true&conversation_id=${conversationID}®ion=${region}&city=${city}&country=${countryName}&timezone=${timezone}
;
// Call specified ABN API
let response = await fetch(url);
let rawResponse = "";
let references = null;
from myai.abn.khoj.
Maybe loading the indication here:
from myai.abn.khoj.
from myai.abn.khoj.
Database:
src/khoj/database/models/init.py
from myai.abn.khoj.
Init: maybe change here:
src/khoj/utils/initialization.py
from myai.abn.khoj.
from myai.abn.khoj.
how / when are the models downloaded?
from myai.abn.khoj.
Seems that it will download from HuggingFace at runtime.
from myai.abn.khoj.
from myai.abn.khoj.
Compute Embeddings, Load Pre-computed embeddings:
src/khoj/search_type/text_search.py
from myai.abn.khoj.
src/khoj/processor/conversation/prompts.py
many prompts
from myai.abn.khoj.
Go to the OpenAI settings in the server admin settings to add an OpenAI processor conversation config. This is where you set your API key and server API base URL. The API base URL is optional - it's only relevant if you're using another OpenAI-compatible proxy server.
Go over to configure your chat model options. Set the chat-model field to a supported chat model1 of your choice. For example, you can specify gpt-4-turbo-preview if you're using OpenAI.
Make sure to set the model-type field to OpenAI.
The tokenizer and max-prompt-size fields are optional. Set them only if you're sure of the tokenizer or token limit for the model you're using. Contact us if you're unsure what to do here.
Configure Offline Chat
No need to setup a conversation processor config!
Go over to configure your chat model options. Set the chat-model field to a supported chat model1 of your choice. For example, we recommend NousResearch/Hermes-2-Pro-Mistral-7B-GGUF, but any gguf model on huggingface should work.
Make sure to set the model-type to Offline. Do not set openai config.
The tokenizer and max-prompt-size fields are optional. Set them only when using a non-standard model (i.e not mistral, gpt or llama2 model) when you know the token limit.
from myai.abn.khoj.
from myai.abn.khoj.
Successfully configure Khoj with OpenAI:
from myai.abn.khoj.
src/khoj/database/models/init.py
from myai.abn.khoj.
src/khoj/migrations/migrate_processor_config_openai.py
from myai.abn.khoj.
The URL should be without /chat, because Khoj appends that automatically. If I add, it will be duplicate.
from myai.abn.khoj.
BadRequestError: Error code:
myaiabnkhoj-server-1 | 400 - {'error': {'message':
myaiabnkhoj-server-1 | 'response_format` does not
myaiabnkhoj-server-1 | support streaming', 'type':
myaiabnkhoj-server-1 | 'invalid_request_error'}}
from myai.abn.khoj.
PROMPTS:
src/khoj/processor/conversation/prompts.py
from myai.abn.khoj.
from myai.abn.khoj.
src/khoj/configure.py
from myai.abn.khoj.
https://docs.khoj.dev/get-started/setup/
The tokenizer and max-prompt-size fields are optional. Set them only if you're sure of the tokenizer or token limit for the model you're using. Contact us if you're unsure what to do here.
from myai.abn.khoj.
src/khoj/processor/conversation/utils.py
from myai.abn.khoj.
def truncate_messages(
messages: list[ChatMessage],
max_prompt_size,
model_name: str,
loaded_model: Optional[Llama] = None,
tokenizer_name=None,
) -> list[ChatMessage]:
"""Truncate messages to fit within max prompt size supported by model"""
default_tokenizer = "hf-internal-testing/llama-tokenizer"
try:
if loaded_model:
encoder = loaded_model.tokenizer()
elif model_name.startswith("gpt-"):
encoder = tiktoken.encoding_for_model(model_name)
elif tokenizer_name:
if tokenizer_name in state.pretrained_tokenizers:
encoder = state.pretrained_tokenizers[tokenizer_name]
else:
encoder = AutoTokenizer.from_pretrained(tokenizer_name)
state.pretrained_tokenizers[tokenizer_name] = encoder
else:
encoder = download_model(model_name).tokenizer()
except:
if default_tokenizer in state.pretrained_tokenizers:
encoder = state.pretrained_tokenizers[default_tokenizer]
else:
encoder = AutoTokenizer.from_pretrained(default_tokenizer)
state.pretrained_tokenizers[default_tokenizer] = encoder
logger.warning(
f"Fallback to default chat model tokenizer: {tokenizer_name}.\nConfigure tokenizer for unsupported model: {model_name} in Khoj settings to improve context stuffing."
)
from myai.abn.khoj.
Lets try this:
google-bert/bert-base-uncased
https://huggingface.co/docs/transformers/en/main_classes/tokenizer
from myai.abn.khoj.
from myai.abn.khoj.
Oh, in the code: default_tokenizer = "hf-internal-testing/llama-tokenizer"
from myai.abn.khoj.
http://localhost:42110/server/admin/database/agent/1/change/
from myai.abn.khoj.
Upload Files: src/khoj/interface/web/chat.html
from myai.abn.khoj.
All the things (uploads, etc.) are implemented in API, so I can just play with the APIs
from myai.abn.khoj.
src/khoj/routers/api_chat.py
from myai.abn.khoj.
Sync/index data:
Simply edit this config file and let Khoj Desktop do the job.
{
"files": [
{
"path": "/home/thanhson/Downloads/RFP#2024-Amgen-01 Biding App Upgrade.pdf"
}
],
"folders": [],
"khojToken": "kk-yHlnpZ4zKsw-ocgn9_WxUPRkgl4Fa3cECmNACl4XmVA",
"hostURL": "https://app.khoj.dev",
"lastSync": []
}
~
from myai.abn.khoj.
The backup seems to work. But where does it store?
from myai.abn.khoj.
from myai.abn.khoj.
from myai.abn.khoj.
@indexer.
@auth_router.
@web_client.
@subscription_router.
@notion_router.
@api_chat.
@api_agents.
from myai.abn.khoj.
maybe change this has_documents to initialize with initial documents:
has_documents
from myai.abn.khoj.
Embeddings:
src/khoj/processor/embeddings.py
Text Search:
src/khoj/search_type/text_search.py
from myai.abn.khoj.
from myai.abn.khoj.
I created my own embeddings and search at ABNScripts
from myai.abn.khoj.
Prompts, nice:
/home/thanhson/Workspace/myai.abn.khoj/src/khoj/processor/conversation/prompts.py
from langchain.prompts import PromptTemplate
Personality
--
personality = PromptTemplate.from_template(
"""
You are ABNCopilot, a smart, inquisitive and helpful personal assistant.
Use your general knowledge and past conversation with the user as context to inform your responses.
You were created by AbnAsia.org. with the following capabilities:
- You CAN REMEMBER ALL NOTES and PERSONAL INFORMATION FOREVER that the user ever shares with you.
- Users can share files and other information with you using the Khoj Desktop, Obsidian or Emacs app. They can also drag and drop their files into the chat window.
- You CAN generate images, look-up real-time information from the internet, set reminders and answer questions based on the user's notes.
- Say "I don't know" or "I don't understand" if you don't know what to say or if you don't know the answer to a question.
- Make sure to use the specific LaTeX math mode delimiters for your response. LaTex math mode specific delimiters as following
- inline math mode :
\\(
and\\)
- display math mode: insert linebreak after opening
$$
,\\[
and before closing$$
,\\]
- inline math mode :
- Ask crisp follow-up questions to get additional context, when the answer cannot be inferred from the provided notes or past conversations.
- Sometimes the user will share personal information that needs to be remembered, like an account ID or a residential address. These can be acknowledged with a simple "Got it" or "Okay".
- Provide inline references to quotes from the user's notes or any web pages you refer to in your responses in markdown format. For example, "The farmer had ten sheep. 1". ALWAYS CITE YOUR SOURCES AND PROVIDE REFERENCES. Add them inline to directly support your claim.
Note: More information about you, the company or ABN apps for download can be found at https://abnasia.org.
Today is {current_date} in UTC.
""".strip()
)
custom_personality = PromptTemplate.from_template(
"""
You are {name}, an Ai agent from ABN Asia.
Use your general knowledge and past conversation with the user as context to inform your responses.
from myai.abn.khoj.
https://docs.khoj.dev/get-started/setup
Configure Chat Model: Setup which chat model you'd want to use. Khoj supports local and online chat models.
MULTIPLE CHAT MODELS
Add a ServerChatSettings with Default and Summarizer fields set to your preferred chat model via the admin panel. Otherwise Khoj defaults to use the first chat model in your ChatModelOptions for all non chat response generation tasks.
from myai.abn.khoj.
Related Issues (20)
- Where to change this?
- Khoi tao truoc mot so tai lieu theo nganh.
- Gan user theo nganh.
- Fix this
- Change the Agent Description text here.
- Agent: link knowledge base there too
- Merging commit failed HOT 4
- Preconfigure Khoj with a set of data
- Script to index a file and push to Khoj's database. HOT 2
- Install something to view the postgres database.
- Install LightLLM and hide GROG behind that HOT 1
- How does this query RAG? I need to know that to improve my Ai worker bot.
- Khoj error with LiteLLM and GROQ HOT 15
- Khoj Database Problem
- Extract the transformers here to use in my Ai bots
- Ollama with Khoj HOT 2
- Why this database authentication failure? HOT 1
- This autokills: https://chat.abnasia.org/ HOT 2
- Try to run the compiled Khoj to see if the LLAMA problems are there or not. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from myai.abn.khoj.