Advanced Retrieval-Augmented Generation (RAG) System

This project implements an Advanced RAG system designed to work on a regular PC using all free resources, leveraging various APIs and tools to achieve this.

Models Used

Embedding Model: Utilized nomic-embed-text-v1 using HuggingFace API.
Reranker: Utilized rerank-english-v2.0 using Cohere API.
Language Model (LLM): Leveraged Groq API with llama3-70b-8192.

System Overview

The RAG system consists of the following components:

Chunking and Embedding:

Text data is chunked into manageable pieces. Each chunk is embedded using a model from HuggingFace. Embeddings are stored in a vector database (ChromaDB).

Retrieval and Reranking:

Relevant chunks are retrieved from ChromaDB based on the query. Retrieved chunks are reranked using the Cohere API to ensure the most relevant chunks are prioritized.

Response Generation:

The top-ranked chunks are passed to the Llama model (via Groq API) to generate a coherent and relevant response.

How to start

Clone the repository

git clone https://github.com/AnasAber/RAG_in_CPU.git

Install the dependencies

pip install -r requirements.txt

Set up the setup.py file

py setup.py install

Set up the environment variables

export GROQ_API_KEY="your_groq_api_key"
export COHERE_API_KEY="your_cohere_api_key"
export HUGGINGFACE_API_KEY="your_hugging

Run the app.py file

python app.py

The reason why I'm using a virtual environment is to avoid any conflicts with the dependencies (I had to manually change things in configuration files), and to make sure that the project runs smoothly.

This project's RAG uses semantic search using ChromaDB, I'll work on doing a combination of Hybrid Search and a HyDE following the best practices of RAG mentioned in the following paper: link

If you encounter an error just hit me up, make a pull request, or report an issue, and I'll happily respond.

Disadvantages

For cohere API, it's free for testing and unlimited, but not for production use as it's paid

Next goals

See if there's a fast and good alternative to cohere api
Evaluating the performance of this RAG pipeline
Implement a combination of Hybrid Search and HyDE
Add Repacking after Reranking, and before giving the prompt back to the model

anasaber / rag_in_cpu Goto Github PK

rag_in_cpu's Introduction

Advanced Retrieval-Augmented Generation (RAG) System

Models Used

System Overview

Chunking and Embedding:

Retrieval and Reranking:

Response Generation:

How to start

Disadvantages

Next goals

rag_in_cpu's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent