Giter VIP home page Giter VIP logo

quran-align's Introduction

quran-align

A tool for producing word-precise segmentation of recorded Qur'anic recitation. Designed to work with EveryAyah style audio input.

Each word in the Qur'an is assigned a precise start and end timestamp within the recorded audio of the ayah. You can use this data to highlight the word currently being spoken during playback, to repeat a certain word or phrase, to compare against other audio, to analyze a qari's speaking cadence, and so on.

Data

If you just want the data, you may not need to actually run this tool: I've generated word-by-word timing files for many quraa' already. Visit the Releases tab to download them.

These data files are licensed under a Creative Commons Attribution 4.0 International License. Please consider emailing me if you use this data, so I can let you know when new & revised timing data is available.

Data Format

Data files are JSON, of the following format:

[
    {
        "surah": 1,
        "ayah": 1,
        "segments": {
            [word_start_index, word_end_index, start_msec, end_msec],
            ...
        },
        "stats": {
          "insertions": 123,
          "deletions": 456,
          "transpositions": 789
        }
    },
    ...
]

Where...

  • word_start_index is the 0-base index of the first word contained in the segment.
  • word_end_index is the 0-base index of the word after the last word contained in the segment.
  • start_msec/end_msec are timestamps within the input audio file.
  • stats contain statistics from the matching routine that aligns the recognized words with reference text.

Here, a "word" is defined by splitting the text of the Qur'an by spaces (specifically, quran-uthmani.txt from Tanzil.net - without me_quran tanween differentiation). Within the code, you may notice that the language model used for recognition treats muqata'at as sequences of words (ا ل م instead of الم) - but they will always appear as a single word in the alignment output.

Data Quality

Between the subjective nature of deciding exactly one word ends and the next begins, ambiguity surrounding repeated phrases, and most significantly, the lack of human-reviewed reference data, it is hard to measure the accuracy of this system. However, I was able to compare these results with those from the creators of ElMohafez, who use a different, independently-developed methodology than my own.

Using this data as a reference, I found that word timestamps fell an average of <73 msec away fro the reference data on a per-span basis, with standard deviations averaging 139 msec across all 6 recordings. 98.5-99.9% of words were individually segmented. These results except certain cases, most significantly, where the qari repeated or skipped a phrase (generally <1% of all words).

As our two independent implementations produce very similar results, it's reasonable to conclude that the data is largely correct, or that both implementations made the same mistakes.

Data Completeness

In some cases, it was not possible to automatically differentiate two words. This is a rare occurrence. In all cases, the segment's start and end word indices indicate the range of words contained by the segment. It is possible that some words may be omitted from the result sequence if their bounds could not be determined directly or inferred from adjacent words.

Methodology

  1. A CMU Sphinx speaker-specific acoustic model is trained using the verse-by-verse recitation recording and a Qur'an-specific language model.
  2. PocketSphinx full-utterance recognition is run on each ayah's audio, provided with a filtered LM dictionary containing only words from that ayah to improve recognition rates and runtime performance.
  3. Recognized words are matched to the reference text of each ayah, accounting for insertions, deletions, transpositions, etc.
  4. Raw audio data and a derived MFCC feature stream are used to refine alignment of words to syllable boundaries within the ayah audio.

Usage

Unfortunately, a key component - the script that generates the speech model training inputs and supporting data files - is currently in an unpublishable state. Nonetheless, with this excercise left to the reader, the align tool's help output explains its full usage. You may need to override CMUSPHINX_ROOT in the Makefile. Note that WAV files must be generated by FFMPEG because I hard-coded an offset to the audio data to avoid writing a RIFF parser.

Requirements

  • A UNIX machine (Windows Bash/LWS works)
  • PocketSphinx, SphinxTrain, and cmuclmtk from CMU Sphinx
  • EveryAyah-style audio recordings of the recitation
  • A C++11-compatible compiler
  • Python

quran-align's People

Contributors

cpfair avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

quran-align's Issues

How to use this project and test it?

hi @cpfair
pleaze, could you guide me how to use this project, and test it on some of my data.
i try to do this and install the requirement ,but it doesn't success.
with thanks

how to use this tool?

please guide me in detail that how can i use this tool? from where can i download its requirements ?.

Slice all the data and support

Salaam brother,

If this work, I'm willing to provide free hosting for all the data to be sliced and host on it, so people can download it and then they can also use this tools for creating data, which is not already sliced in.

Let me know if you are interested working together to provide the service for the community.

https://quranicaudio.com/ - has horizontal scrolling .

Assalamu alaikum .

whenever i select a hafiz and go to his section . Horizontal scrolling appears .
I found the solution. Please fix it .

The header's negetive margin-right is taking it out of the normal flow . The header's style is -
.YuRhw1tjUohnA3r6u0TTU { background-size: cover; min-height: 350px; background: #009faa; padding-top: 8rem; color: #ffffff; padding-bottom: 5rem; background-color: #2CA4AB; margin-left: -15px; margin-right: -15px; margin-bottom: 20px;

margin-right = -15px is causing the problem . If u wanna keep it just add width=100% . That will inshallah solve the problem .

hope it helps

Need timing file

AOA brother!
I want some timing files of specific reciters can you help me in this matter?
The list of reciters are:

  • Ibrahim_Akhdar
  • minshawy_mujawwad
  • Ibrahim_Walk:
  • ayman_sowaid
  • maher_alMuaiqly
  • muhammad_jibreel
  • alafasy
  • hani_rifai

Release data has an error

It seems file Abdurrahmaan_As-Sudais_192kbps.json produce error data. Could you re-upload this ?
I want timing data per ayah, not verse by verse, could you help me how to generate that ?

Audio not working in web version.

Please check Audio not working in the web version. It can be working other device but not working for me. Please check it and solve if possible. Thank you for your awesome work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.