Giter VIP home page Giter VIP logo

juridia's Introduction

ChainPDF

A Query-Answer chatbot for PDFs using local Large language and Embeddings models. Please read this readme fully before using.

image info

Table of Contents

Features

  • Conversational Capabilities with PDFs:

Utilizes the conversational retrieval chain from "langchain" to retrieve information from PDFs and respond to user queries while keeping a memory of the interactions and sources of the information retrieved.

  • Fully Local:

Apart from downloading the necessary libraries, this chatbot can be used offline, ensuring privacy and accessibility even without an internet connection.

  • Fully Customizable:

Easily customize various parameters, prompt templates, models, and more to tailor the chatbot's behavior to your specific needs.

  • GPU Support:

Leverage your GPU to accelerate the text-to-embedding process and text generation (applicable when using a local llm backend).

  • Streamlit GUI:

An example provided using a Streamlit GUI for a user-friendly interface.

  • Multiple LLM Backend Choices:

Choose between using the textgen webui API, OpenAI API (compatible server API), or directly use llama.cpp for your preferred language model backend.

  • Multiple Embeddings Models Choices:

Select from Sentence Transformer or Instructor Transformer embedding models for different use cases.

Installation

In Windows:

  • Requirements:
    • Python 3.10+: Ensure you have the latest Python version installed.
    • Git: Install Git and remember to select "Add to PATH" during installation for both Python and Git.
  1. Clone the Repository:
    git clone https://github.com/sebaxzero/juridia
    
  2. Navigate to the Project Directory:
    cd juridia
    
  3. Create a Virtual Environment (Optional but recommended):
    • You can create a Python virtual environment using the following command:
      python -m venv "venv"
      
    • Activate the created venv using:
      call "venv\Scripts\activate.bat"
      
  • Optionally, you can use the provided create_conda_env.bat file to create a Miniconda environment, which will also install the requirements, so you can skip step 4. This will take some time (this requires a CUDA compatible GPU).
  1. Install Requirements:
    pip install -r requirements.txt
    

OpenAI API and LLama.cpp

These libraries are not included in the requirements.txt.

  • To install the OpenAI library, use the following command:
    pip install openai
    
  • To install llama.cpp, use the following command:
    pip install llama-cpp-python
    

NOTE: This will enable llama.cpp to be used on the CPU. For GPU acceleration, see the next section.

GPU

For GPU usage, I recommend using the provided create_conda_env.bat file. This will create a Miniconda virtual environment, install torch compiled with CUDA alongside the requirements, this will accelerate the text-to-embedding process which can take hours on cpu with large amount of data.

To install llama.cpp with GPU acceleration, use the following command:

SET LLAMA_CLBLAST=1 && SET CMAKE_ARGS="-DLLAMA_CLBLAST=on" && SET FORCE_CMAKE=1 && python -m pip install llama-cpp-python

If you are using TextGen Webui, installed with the one-click-installers, you can use the same conda environment to run the code, so there's no need to download another instance of CUDA libraries, which are big. To do this, open cmd_windows.bat and follow steps 1, 2, and 4.

Usage

To run the application, activate the environment using your selected method of installation, for example cmd_windows.bat if instaled using TextGen Webui env, init_conda_env.bat for the provided conda env, then use the following command:

streamlit run st_interface_en.py

How it works

uses an Embedding model to create embeddings from the provided document and stores it in a vectorStore (see example). When a user asks a question, the chatbot passes the question to a retriever, which retrieves relevant data from the vectorstore. The retrieved information is then passed to the qa chain, obtaining a response from the selected llm backend alongside the retrieved pieces of information.

Code Example

This is a simplified example of the code:

from src.main import chatbot

chatbot = Chatbot(llm='TextGen', k=5, name='example')
while True:
    prompt :str = input('query: ')
    answer, source_documents = chatbot.query(prompt=prompt) 
    print("answer:","\n",answer,"\n\n")

This code reads any document (.pdf or .txt) placed in the ./Documents directory, saving the generated VectorStore in ./Sessions/example/Index. It then retrieves 3 relevant chunks of information passed to the chain, which is sent to the Texgen webui API to obtain an answer.

notebook example

Open In Colab

'PDF langchain example.ipynb' is a simplified version for better undertanding of how the code works, it is not meant for use.

Contributing

Contributions are welcome! Feel free to contribute to this project and make it even better.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.