Giter VIP home page Giter VIP logo

tlap-rs's Introduction

tlap

What is tlap?

tlap (transliterate language for an accessibility purpose) is a program that can take live or pre-recorded audio and transliterate that into a subtitle file following the SRT format.

Why is it called 'tlap'?

'tlap' is an in-joke from the Fediverse based on a user of a similar name, 'tlapka'.

Why did you make it?

My university records lectures but do not include subtitles. This makes it difficult for those hard of hearing to follow along and impedes the process of summarising what the lecture covers.

What audio formats are supported?

Only 16-bit single-channel (mono) Wave is supported. This program will output to 16-bit mono Wave format if real-time subtitling has been specified.

Where can I get the source code for Coqui?

You may find the source code here.

Are the bundled Coqui binaries modified?

No, the binaries used were obtained from the project's releases page.

Prerequisites

Packages

The following list is for Fedora (38). Ensure packages offering similar functionality for your distro are installed.

rust cargo glibc-devel alsa-lib-devel

libstt

You will need to download the libs from Coqui's releases page and modify the LIBRARY_PATH in 'perform_stt.sh'. On Linux I would suggest /usr/local/lib, on Windows I have bundled binaries as a suggested place to run them from.

Model (mandatory) and scorer (optional)

Coqui (formerly DeepSpeech) works by using a pre-trained model to determine what has been said in a given audio sample. To improve speed and accuracy it may use a scorer, a vocabulary bank of sorts, to pick a word that fits.

tlap-rs's People

Contributors

bricky149 avatar

Watchers

 avatar

tlap-rs's Issues

Naively splitting samples every so often results in misunderstood words

Right now, tlap splits every 64,000 samples (four seconds) and uses that to write a subtitle. The problem with this is it often splits samples while a word is being said, resulting in two inaccurate words rather than a single more accurate one.

The solution would be to determine when a period of silence is detected so the split can happen within that, rather than the current behaviour. This would improve transcription accuracy, especially when using live input, as it may change how often Coqui should be run and enabling the possibility of adding punctuation in future.

Slow input file reads may cause failed model mutex locks

Summary:
When tlap's run long enough with the "rt" argument, live subtitles stop working.

STR:

  1. Run tlap with the "rt" argument
  2. Leave it running for hours

Expected result:
Should transcribe input as normal.

Actual result:
Reading file back exceeds four seconds, causing thread runaway and future model mutex locks to fail.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.