Giter VIP home page Giter VIP logo

data-quran's Introduction

Al-Quran Data Repository

This repository contains all data that is related with Al-Quran, from the text, translation, explanation (tafseer), metadata and font. It's collected from several sources and stored as JSON (except for font, obviously) to make it easy to use in various programming language.

Metadata

Metadata is taken from Tanzil. There are several metadatas available:

  • Surah, the chapter of the Quran. There are 114 surahs in the Quran, each divided into ayahs (verses).
  • Juz, the division of Quran into 30 parts.
  • Hizb, the division of Quran where one juz has two hizb. The hizb itself sometimes is separated into four quarters, so in Quran there are 240 hizb quarters.
  • Ruku basically is a paragraph of the Quran. Used to denote a group of thematically related verses in the Quran, so that the reciters could identify when to make ruku in Salah without breaking an ongoing topic in the Quranic text.
  • Page is division of Quran page based on Medina Mushaf.

Text

There are several type of text that available in this database:

  • simple, Quran text in modern Arabic writing style (Imla'ei script) which is commonly used nowadays.
  • simple-plain, like simple but without special demonstration of Ikhfas and Idghams.
  • simple-min, like simple but with a minimal number of diacritics and symbols. Suitable for embedding in other texts.
  • simple-clean, like simple but without any diacritics or symbols. Suitable for searching.
  • uthmani, Quran text in old-fashion script which used in Medina Mushaf.
  • uthmani-min, like uthmani but with a minimal number of diacritics and symbols.
  • indonesia, Quran text that used as a standard for writing Quran in Indonesia.

All simple and uthmani text are taken from Tanzil. The only modification done are removing basmalah from beginning of each surah (except for surah Al-Fatiha) since basmalah is not part of ayah.

Unfortunately, their official download link doesn't provide text with pause marks even if you set it otherwise. So, to download the complete text with pause marks and sajdah sign, we need to use the unofficial API like this:

curl 'http://tanzil.net/tanzil/php/get-aya.php' \
	-H 'Referer: http://tanzil.net/' \
	--data-raw 'type=$TYPE&startAya=0&endAya=6236&version=1.5'

To use the above command, replace the $TYPE into one of the simple or Uthmani variant.

For indonesia text, it's taken from Quran Kemenag plugin for Microsoft Word which available in its official site. If you want to extract the database by yourself, you can use install the plugin then access the SQLite database in the installation location. If you are not using Windows, you can extract the SQLite database by using innoextract on the plugin installer.

Translation

There are 152 translations which taken from several sources, mainly Tanzil and QuranEnc. Besides those two sources, there is also translation that taken from King Fahd Quran Complex which used for one of the Indonesian translation.

For the ones that taken from Tanzil, there are zero modification from the original sources except changing format from CSV into JSON. For the ones that taken from QuranEnc, there some modifications to make sure all footnotes have consistent numbering format, because currently they use several variations which differs between each translation.

There are several translations from Tanzil and QuranEnc that incomplete, but still included in this repository because they are still useful and could be corrected later. From Tanzil, these are the incomplete translations:

  • sq-mehdiu by Feti Mehdiu (Albanian), missing translation for Al-Anbiya (21):56 and Al-Mursilat (77):14
  • cs-hrbek by Preklad I. Hrbek (Czech), missing Ar-Rum (30):18, Al-Waqi'a (56):2 and Al-Waqi'a (56):13
  • ku-asan by Burhan Muhammad-Amin (Kurdish), missing Al-Kautsar (108):3

From QuranEnc, Kinyarwanda translation (rw-assoc) is incomplete because it's missing translation for:

  • Fussilat (41):27
  • Ash-Shura (42):18
  • Az-Zukhruf (43):14
  • Az-Zukhruf (43):68
  • Al-Fath (48):2
  • Adh-Dhariyat (51):6
  • Ar-Rahman (55):14

Word by Word

Word by word data is taken from Quran WBW. There are 77,429 words translated into nine languages:

  • Bengali
  • English
  • German
  • Hindi
  • Indonesian
  • Ingush
  • Russian
  • Turkish
  • Urdu

From those nine, the translations for Ingush, Russian and Turkish are still incomplete and it seems there are wrongly translated words due to incorrect positioning, so use it with care.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.