Giter VIP home page Giter VIP logo

hazemabdelkawy / sunnahgpt Goto Github PK

View Code? Open in Web Editor NEW
71.0 7.0 9.0 54.05 MB

SunnahGPT is a natural language processing (NLP) project aimed at scraping hadith data from the popular website sunnah.com and applying OpenAI's GPT-3.5 model to generate textual embeddings for each hadith

Home Page: https://hazemabdelkawy.github.io/SunnahGPT/

Python 0.01% Jupyter Notebook 0.12% HTML 99.87%
deep-learning embeddings gpt-3 hadith islamic natural-language-generation natural-language-processing natural-language-understanding quran sunnah

sunnahgpt's Introduction

Abstract

SunnahGPT is a natural language processing (NLP) project aimed at scraping hadith data from the popular website sunnah.com and applying OpenAI's GPT-3.5 model to generate textual embeddings for each hadith. The project is designed to provide researchers and developers with a comprehensive and structured dataset of hadiths, complete with accurate Arabic and English translations, reference information, and GPT-3.5 embeddings. We believe that SunnahGPT represents a significant step forward in the field of NLP and Islamic studies, providing researchers with a powerful tool for analyzing and understanding hadith literature.

Introduction

Hadith literature is an essential source of Islamic jurisprudence and theology, offering insights into the teachings of the Prophet Muhammad (PBUH) and the early Islamic community. Despite its importance, hadith literature remains challenging to study and analyze, particularly given the sheer volume of texts and the complexity of the Arabic language. In recent years, natural language processing (NLP) has emerged as a powerful tool for analyzing text data, offering researchers new ways to uncover insights and patterns in large datasets.

SunnahGPT is a project that leverages NLP techniques to extract and analyze hadith data from sunnah.com. The project applies OpenAI's GPT-3.5 model to generate textual embeddings for each hadith, providing researchers with a powerful tool for studying and analyzing hadith literature. The project also provides a comprehensive and structured dataset of hadiths, complete with accurate Arabic and English translations, reference information, and GPT-3.5 embeddings.

SunnahGPT represents a significant step forward in the field of NLP and Islamic studies, providing researchers with a powerful tool for analyzing and understanding hadith literature. We believe that this project has significant potential for advancing research in this area and facilitating new insights into the teachings of the Prophet Muhammad (PBUH) and the early Islamic community.

SunnahGPT Embeddings for NLP projects

This is a Python script to scrape hadiths from sunnah.com and save the extracted data in JSON files. The script uses BeautifulSoup to parse HTML content and OpenAI's text-embedding API to embed the hadiths' text.

Project Structure

The project consists of the following files:

  • main.py: The main file to run the program.
  • scraper.py: A file containing the HadithScraper class to scrape hadiths from sunnah.com.
  • config.py: A file containing the OpenAI API key.
  • README.md: A file containing the project description and usage instructions.

Requirements

The project requires the following dependencies to be installed:

  • requests: To fetch the content of web pages.
  • BeautifulSoup: To parse and extract data from HTML.
  • json: To convert data to JSON format.
  • time: To set time intervals between requests.
  • openai: To use OpenAI's text-embedding API.

You can install these dependencies by running the following command in your - terminal or command prompt:

user@machine:~$ pip install requests beautifulsoup4 openai

Usage

To use the project, follow these steps:

  • Clone or download the project from GitHub.
  • Open config.py and enter your OpenAI API key.
  • In main.py, specify the main URL of sunnah.com and the directory where the extracted data will be saved.
  • Run the following command
user@machine:~$ python main.py

Collected data

License

This project is licensed under the MIT License.

sunnahgpt's People

Contributors

haidark avatar hazemabdelkawy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sunnahgpt's Issues

Embedding Language

First of all, splendid work. Are the embeddings extracted on the original text in arabic or the corresponding english translations?

Alhumdulillah

May Allah bless you for your efforts and help us all be guided to the straight path ๐Ÿ™

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.