Giter VIP home page Giter VIP logo

whispering's Introduction

Whispering Tiger (Live Translate/Transcribe)

Whispering Tiger is a free and Open-Source tool that can listen/watch to any audio stream or in-game image on your machine and prints out the transcription or translation to a web browser using Websockets or over OSC (examples are Streaming-overlays or VRChat).

Content:

Features

  • Runs 100% locally on your machine. (Once A.I. Models are downloaded, no further internet connection is required)
  • Speech recognition, translation and transcription
    • OpenAI's Whisper project, Supports ~98 languages)
  • Text translation
    • LID [Language Identification] (Supports 200 languages)
    • NLLB-200 (single model, Supporting 200 languages, high accuracy)
    • M2M-100 (single model, Supporting 100 languages, high accuracy)
  • OCR [Optical Character Recognition] (to capture game images and translate in-game text)
    • EasyOCR (Supports 80+ languages)
  • TTS [Text-to-Speech] (Read out transcriptions/translations)
    • Silero
  • VAD [Voice Activity Detection]
    • Silero-VAD
  • LLM [Large language model] (Continuation of text. automatic answer generation etc.) Proof of concept

Quickstart

For a quick and easy start, download the latest Whispering Tiger UI from here: https://github.com/Sharrnah/whispering-ui

This is a native UI application that allows keeping your Whispering Tiger version up-to-date and manage the settings more easily.

Release Downloads

Standalone Releases with all dependencies included.

Go to the GitHub Releases Page and Download from the download Link in the description or find the Latest Release here.

(because of the 2 GB Limit, no direct release files on GitHub)

  • Install CUDA for GPU Acceleration (recommended)
  • Extract the Files on a Drive with enough free Space.
    • (After download of medium Whisper Model + medium NLLB-200 Translation model, it can take up to 20 GB)
  • Run only using the *.bat files. Edit or copy an existing start-*.bat file and edit the parameters in any text editor for your own command-line flags.
    • start-transcribe-mic.bat tries to use your default microphone and is a good starting point.

Sources

A thanks goes to

whispering's People

Contributors

sharrnah avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.