The following is a test implementation of RAG by using a PDF for document retrieval with Gemini API. It used chromaDB as the vector database to store embeddings of the uploaded pdf. It doesn't cover the edges cases as of the initial version.
Create a .env
file with your API_KEY of google's gemini.
$pip install -r requirements.txt
$python gemini.py
Query can be changed from gemini.py
file.
(utils.py
consists of core logic)
It is done by using `pypdf` library that let's us extract the contents of a pdf as a single string
Since LLMs are restricted by their context length, so we'll divide the text into chunks of small size. Each paragraph is split into chunks for the sake of simplicity.
Stored in chroma db using a certain collection_name which can be used in future to load this collection.
The collection_name and path of the stored db can be used to load this collection