Giter VIP home page Giter VIP logo

xiaoyi8383 / vrctextboxstt Goto Github PK

View Code? Open in Web Editor NEW

This project forked from i5ucc/vrctextboxstt

0.0 0.0 0.0 35.6 MB

A SpeechToText application that uses OpenAI's whisper to transcribe audio and send that information to VRChats textbox system and/or KillFrenzyAvatarText over OSC. Also supports OBS via Browsersource!

License: GNU General Public License v3.0

Python 97.77% HTML 1.16% Batchfile 1.06%

vrctextboxstt's Introduction

TextboxSTT Github All Releases Buy Me a Coffee at ko-fi.com

A SpeechToText application that uses OpenAI's whisper to transcribe audio and send that information to VRChats textbox system and/or KillFrenzyAvatarText over OSC. Also supports OBS via Browsersource!

This program is supposed to be entirely free (as in money), open source and independent of Cloud Based Transcription services like Microsoft Azure etc., by using transcription Algorithms running on your own hardware, thus respecting privacy and improving latency and reliability, all at the cost of compromising a bit of performance by running on your own hardware. Therefore, I will not be implementing any Cloud Based transcription/translation etc.

Contents

Features

  • Sending transcription to either
    • VRChats Ingame Textbox allowing for use with any avatar
    • KillFrenzyAvatarText (KAT) that needs to be integrated to an avatar.
      • You can use Frosty704's Billboard to add a speech bubble to your avatar.
      • Support for up to 80 emotes!
      • Automatic Detection of KAT on an avatar. It will use KAT if available, otherwise fall back to VRChat Textbox.
  • Fast and Efficient. VRCTextboxSTT uses ctranslate2 as the runtime for transcription and translation, which makes it incredibly efficient and fast.
  • Uses SteamVR binding system, press to transcribe, hold to clear/cancel (A/X by default)
  • Customizable
    • You can bind the button to start transcription to any action that SteamVR allows you to set.
    • You can bind it to any key on your keyboard.
    • Many Timing settings for transcription delays and button presses.
  • Optional Live Transcription
  • Optional Automatic launch with SteamVR.
  • Text to Text for quick typing.
  • Optional SteamVR Overlay for seeing your transcription without having to look at your own textbox in-game.
  • Optional OBS Browser Source.
  • Simple API. latest transcription bound to the "/transcript" endpoint. (Requires OBS Source to be turned on)
  • Audio feedback for each step in the transcription.
  • Multi Language support. whisper supports around 100 different languages. Here, with a few limitations.
  • Translate into and from different languages. (Powered by M2M100)
  • Word Replacements and Emote Replacements with regex(regular expressions).
  • Free to use as of the GPL-3.0 license
  • Completely free of Subscription/Cloud Services, by running locally on your hardware.

Limitations

  • Limited character availability
    • VRChats Textbox is limited to showing 144 characters at a time.
    • KillFrenzyAvatarText does support ASCII characters and a certain set of Japanese hiragana.
      Limited to showing 128 characters at a time.
  • Visibility
    • VRChats Textbox is only visible to friends by default, consider telling people they can change that in VRChats settings.
    • VRChats Textbox is not visibile in Streamer-Mode.
    • KillFrenzyAvatarText is only visible to shown avatars and is PC only, as it uses a custom shader setup.

Requirements

With default settings, this program has following requirements:

  • CPU version:
    • ~2GB of storage space.
    • ~400MB of available RAM.
  • GPU version:
    • CUDA enabled GPU (NVIDIA ONLY), otherwise it will fall back to using CPU.
    • ~5GB of storage space.
    • ~1GB of available RAM.
    • ~500MB of available VRAM.
  • SteamVR (IF ran in VR, no Oculus/Meta support as of now.)

Demo

Frosty704 using VRCTextboxSTT and KillFrenzyAvatarText with their Billboard project. More to that on their repository.

How to use

Run from Releases

  • Download one of the Releases.
  • unpack the .7z file with a software of your choice.
  • Run TextboxSTT.exe

Run from source

  • clone this repository, for example with git git clone https://github.com/I5UCC/VRCTextboxSTT.git
  • Using python 3.10, install all of the dependencies from the requirements.txt file python -m pip install -r requirements.txt
  • run the program by running python TextboxSTT.py in the projects directory

Usage in VRChat

  • Activate OSC in VRChat:

    EnableOSC
  • Run the program.
  • The program will use your standard microphone set in windows.
  • if you have a lot of background noise you should play around with the "energy_threshold" option in the configuration (or press the ⟳ button next to it), to get it working well.
  • Press A on the left Controller on index or X on Oculus or F1 on your Keyboard.
  • Holding any of those for 1.5s clears the chatbox or cancels the action.
  • If the program doesnt work as its supposed to, try the troubleshooting steps in the next section.

OSC Troubleshoot

If you have problems with this program, try this to fix it:

1 - Close VRChat.
2a - Press the "Reset OSC Settings" in the Settings of TextboxSTT
2b - Open 'Run' in Windows (Windows Key + R).
        Type in %APPDATA%\..\LocalLow\VRChat\VRChat\OSC
       ㅤㅤ Delete the folders that start with 'usr_*'.
3 - Startup VRChat again and it should work.

Configuration

You can either Edit this configuration manually by editing the config.json file, or you can change those settings in the Program itself by clicking "Settings" in the bottom right:
You can hover over any of the options to get a brief explanation on what that option does.
image

You can edit Word replacements by clicking the "Edit Word Replacements" button:

image

You can edit the emote settings by clicking the "Edit Emotes" button:

image

Modifying SteamVR binding

You can set the boolean "sttlisten" in the Binding UI of SteamVR. Anyone who has set up OpenVR-Advanced-Settings might be familiar with that. You can set it to any action that supports a boolean input. By default it is the left A button (X button on Oculus/Meta respectively). image

If you want to use a Chord, you have to create empty actions for the buttons you want to use for that chord and they will show up in the chrod menu:
image image

Automatic launch with SteamVR

On first launch of the program, it registers as an Overlay app on SteamVR just like other well known programs like XSOverlay or OVRAdvancedSettings and can be launched on startup:
Screenshot 2022-12-04 184629
Screenshot 2023-01-02 084823

After setting the option to ON it will launch the program on SteamVR startup. If it doesnt show up, manually register the ´app.vrmanifest´ file by double clicking it and running it with SteamVR.

Backlog

  • Add a quick entry box for quick messaging.
  • Create a Settings UI for easy config editing.
  • Enable Integration with KillFrenzyAvatarText.
  • Transcribe continuously until the user stops talking.
  • Add an emote feature
  • Demo Gif/Video (Stole from Frosty, thanks lol)
  • Add a OBS browsersource
  • Use whisper.cpp/faster-whisper for transcription, for better performance.
  • Allow use of finetuned models.
  • Allow translation into and from different languages. M2M100 using ctranslate2
  • remove the need for building the program, enable OTA updates. (Currently in work)
  • Implement Text To Speech silero-tts (Currently in work)
  • Documentation of features in wiki

Donate

You can always leave a Github Star 🟊 (It's free) or buy me a coffee:

Buy Me a Coffee at ko-fi.com

Credit

vrctextboxstt's People

Contributors

i5ucc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.