Giter VIP home page Giter VIP logo

atrain's Introduction

Logo

Accessible Transcription of Interviews

aTrain is a tool for automatically transcribing speech recordings utilizing state-of-the-art machine learning models without uploading any data. It was developed by researchers at the Business Analytics and Data Science-Center at the University of Graz and tested by researchers from the Know-Center Graz.

Big News! The paper introducing aTrain has been published in the Journal of Behavioral and Experimental Finance. Please now cite the published paper if you used aTrain for your research: Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews.

Windows (10 and 11) users can install aTrain via the Microsoft app store (Link) or by downloading the installer from the BANDAS-Center Website (Link).

For Linux, follow the instructions in our Wiki.

aTrain offers the following benefits:

Fast and accurate πŸš€
aTrain provides a user friendly access to the faster-whisper implementation of OpenAI’s Whisper model, ensuring best in class transcription quality (see Wollin-Geiring et al. 2023) paired with higher speeds on your local computer. Transcription when selecting the highest-quality model takes only around three times the audio length on current mobile CPUs typically found in middle-class business notebooks (e.g., Core i5 12th Gen, Ryzen Series 6000).

Speaker detection πŸ—£οΈ
aTrain has a speaker detection mode based on pyannote.audio and can analyze each text segment to determine which speaker it belongs to.

Privacy Preservation and GDPR compliance πŸ”’
aTrain processes the provided speech recordings completely offline on your own device and does not send recordings or transcriptions to the internet. This helps researchers to maintain data privacy requirements arising from ethical guidelines or to comply with legal requirements such as the GDRP.

Multi-language support 🌍
aTrain can process speech recordings in any of the following 57 languages: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.

MAXQDA and ATLAS.ti compatible output πŸ“„
aTrain provides transcription files that are seamlessly importable into the most popular tools for qualitative analysis, ATLAS.ti and MAXQDA. This allows you to directly play audio for the corresponding text segment by clicking on its timestamp. Go to the tutorial.

Nvidia GPU support πŸ–₯️
aTrain can either run on the CPU or an NVIDIA GPU (CUDA toolkit installation required). A CUDA-enabled NVIDIA GPU significantly improves the speed of transcriptions and speaker detection, reducing transcription time to 20% of audio length on current entry-level gaming notebooks.

Screenshot 1 Screenshot 2
Screenshot1 Screenshot2

Benchmarks

For testing the processing time of aTrain we transcribed an audiobook ("The Snow Queen" from Hans Christian Andersen with a duration of 1 hour, 13 minutes, and 38 seconds) with three different computers (see table 1). The figure below shows the processing time of each transcription relative to the length of the speech recording. In this relative processing time (RPT), a transcription is considered ’real time’ when the recording length and the processing time are equal. Subsequently, faster transcriptions lead to an RPT below 1 and slower transcriptions to an RPT time above 1.

Benchmark results Used hardware
Benchmark Hardware

System requirements

Windows is fully supported.

Debian support with manual installation Wiki instructions

Currently no MacOS support.

Installation for users 😎

Simply access the installer from the Microsoft app store
https://apps.microsoft.com/store/detail/atrain/9N15Q44SZNS2

Installation for developers βš™οΈ

You need to have python >=3.10
If you need help with installing that, look at these resources:
https://www.python.org/downloads/release/python-31011/

Setup a virtual environment

python -m venv venv

Activate the virtual environment

.\venv\Scripts\activate

Install aTrain

pip install aTrain@git+https://github.com/JuergenFleiss/aTrain.git --extra-index-url https://download.pytorch.org/whl/cu118

Download ffmpeg and all required models from Whisper and pyannote.audio with a console script Note: The user version in the Microsoft store has those assets already included.

aTrain init

Run the app with the console script

aTrain start

How to build a standalone executable πŸ“¦

We use pyinstaller to freeze the code of aTrain and create a standalone executable.
If you want to create your own code package follow these steps:

Clone and install aTrain in editable mode

git clone https://github.com/JuergenFleiss/aTrain.git
cd aTrain
pip install -e . --extra-index-url https://download.pytorch.org/whl/cu118


Download ffmpeg and all required models from Whisper and pyannote.audio with a console script

aTrain init

Install pyinstaller

pip install pyinstaller

Build the executable using the provided instruction in the file "build.spec"

pyinstaller build.spec

Congratulations! You just built a standalone executable for aTrain.

To open this version of aTrain just go to the output folder (./dist/aTrain) and open the executable (e.g. aTrain.exe for Windows).

If you want to go a step further and create an MSIX-installer for aTrain you can use Advanced Installer Express.
For information on how to use Advanced Installer Express refer to their documentation.

Attribution

The GIFs and Icons in aTrain are from tenor and flaticon.

atrain's People

Contributors

arminhaberl avatar juergenfleiss avatar bandas-center avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.