Giter VIP home page Giter VIP logo

big_data_project's Introduction

Big_Data_Project

Overview

This is a Query Augmented Gneration application that enhances SQL query generation using LLama3-8B. The application connects to a PostgreSQL database and allows users to interact with a natural language interface to query database information. The UI is managed via the streamlit framework.

Setup Instructions

Follow these steps to set up and run the application:

Prerequisites

  1. Python 3.x installed on your system.
  2. pip package manager installed.
  3. Hadoop and Spark installed on your system (here is available a guide for the installation over Windows: "https://medium.com/@deepaksrawat1906/a-step-by-step-guide-to-installing-pyspark-on-windows-3589f0139a30")
  4. PostgreSQL database instance with necessary access credentials

Installation

  1. Clone the repository:
git clone https://github.com/ADP2000/Big_Data_Project
cd Big_Data_Project
  1. Install dependencies:
pip install -r requirements.txt

Configuration

Set up environment variables:

  1. Create a .env file in the root directory.

  2. Add the following variable to .env:

GROQ_API_KEY = API_KEY_GROQ

replace API_KEY_GROQ with your api key groq available via the groq cloud service

Running the Application

  1. Run the Streamlit app:
streamlit run app.py
  1. Your default web browser will open with the application running. If not, visit http://localhost:8501 in your browser.

Usage

  • Upon running the application, you will see a form to enter your PostgreSQL database connection details (DB NAME, DB USER, DB PASSWORD, DB HOST, DB PORT).
  • After submitting valid database connection details, you can interact with the natural language interface to query the database.
  • Example queries you can try:
    • "What table and his attributes there are in this database?"
    • "Count the average number of rows for each tables on database."
    • "Count the number of rows on database."

Additional Notes

  • Ensure your PostgreSQL database is accessible from the network where you run this application.
  • This application uses Streamlit for the web interface, SQLAlchemy for database connectivity, and Spark for SQL querying.

big_data_project's People

Contributors

adp2000 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.