TextboxSTT

A SpeechToText application that uses OpenAI's whisper to transcribe audio and send that information to VRChats textbox system and/or KillFrenzyAvatarText over OSC. Also supports OBS via Browsersource!

This program is supposed to be entirely free (as in money), open source and independent of Cloud Based Transcription services like Microsoft Azure etc., by using transcription Algorithms running on your own hardware, thus respecting privacy and improving latency and reliability, all at the cost of compromising a bit of performance by running on your own hardware. Therefore, I will not be implementing any Cloud Based transcription/translation etc.

Discord Support Server

🢃 Download Latest Release

Features

Sending transcription to either
- VRChats Ingame Textbox allowing for use with any avatar
- KillFrenzyAvatarText (KAT) that needs to be integrated to an avatar.
  - You can use Frosty704's Billboard to add a speech bubble to your avatar.
  - Support for up to 80 emotes!
  - Automatic Detection of KAT on an avatar. It will use KAT if available, otherwise fall back to VRChat Textbox.
Fast and Efficient. VRCTextboxSTT uses ctranslate2 as the runtime for transcription and translation, which makes it incredibly efficient and fast.
Uses SteamVR binding system, press to transcribe, hold to clear/cancel (A/X by default)
Customizable
- You can bind the button to start transcription to any action that SteamVR allows you to set.
- You can bind it to any key on your keyboard.
- Many Timing settings for transcription delays and button presses.
Optional Live Transcription
Optional Automatic launch with SteamVR.
Text to Text for quick typing.
Optional SteamVR Overlay for seeing your transcription without having to look at your own textbox in-game.
Optional OBS Browser Source.
Simple API. latest transcription bound to the "/transcript" endpoint. (Requires OBS Source to be turned on)
Audio feedback for each step in the transcription.
Multi Language support. whisper supports around 100 different languages. Here, with a few limitations.
Translate into and from different languages. (Powered by M2M100)
Word Replacements and Emote Replacements with regex(regular expressions).
Free to use as of the GPL-3.0 license
Completely free of Subscription/Cloud Services, by running locally on your hardware.

Limitations

Limited character availability
- VRChats Textbox is limited to showing 144 characters at a time.
- KillFrenzyAvatarText does support ASCII characters and a certain set of Japanese hiragana.
  Limited to showing 128 characters at a time.
Visibility
- VRChats Textbox is only visible to friends by default, consider telling people they can change that in VRChats settings.
- VRChats Textbox is not visibile in Streamer-Mode.
- KillFrenzyAvatarText is only visible to shown avatars and is PC only, as it uses a custom shader setup.

Requirements

With default settings, this program has following requirements:

CPU version:
- ~2GB of storage space.
- ~400MB of available RAM.
GPU version:
- CUDA enabled GPU (NVIDIA ONLY), otherwise it will fall back to using CPU.
- ~5GB of storage space.
- ~1GB of available RAM.
- ~500MB of available VRAM.
SteamVR (IF ran in VR, no Oculus/Meta support as of now.)

Demo

Frosty704 using VRCTextboxSTT and KillFrenzyAvatarText with their Billboard project. More to that on their repository.

How to use

Run from Releases

Download one of the Releases.
unpack the .7z file with a software of your choice.
Run TextboxSTT.exe

Run from source

clone this repository, for example with git git clone https://github.com/I5UCC/VRCTextboxSTT.git
Using python 3.10, install all of the dependencies from the requirements.txt file python -m pip install -r requirements.txt
run the program by running python TextboxSTT.py in the projects directory

Usage in VRChat

Activate OSC in VRChat:
Run the program.
The program will use your standard microphone set in windows.
if you have a lot of background noise you should play around with the "energy_threshold" option in the configuration (or press the ⟳ button next to it), to get it working well.
Press A on the left Controller on index or X on Oculus or F1 on your Keyboard.
Holding any of those for 1.5s clears the chatbox or cancels the action.
If the program doesnt work as its supposed to, try the troubleshooting steps in the next section.

OSC Troubleshoot

If you have problems with this program, try this to fix it:

1 - Close VRChat.
2a - Press the "Reset OSC Settings" in the Settings of TextboxSTT
2b - Open 'Run' in Windows (Windows Key + R).
Type in %APPDATA%\..\LocalLow\VRChat\VRChat\OSC
ㅤㅤ Delete the folders that start with 'usr_*'.
3 - Startup VRChat again and it should work.

Configuration

You can either Edit this configuration manually by editing the config.json file, or you can change those settings in the Program itself by clicking "Settings" in the bottom right:
You can hover over any of the options to get a brief explanation on what that option does.

You can edit Word replacements by clicking the "Edit Word Replacements" button:

You can edit the emote settings by clicking the "Edit Emotes" button:

Modifying SteamVR binding

You can set the boolean "sttlisten" in the Binding UI of SteamVR. Anyone who has set up OpenVR-Advanced-Settings might be familiar with that. You can set it to any action that supports a boolean input. By default it is the left A button (X button on Oculus/Meta respectively).

If you want to use a Chord, you have to create empty actions for the buttons you want to use for that chord and they will show up in the chrod menu:

Automatic launch with SteamVR

On first launch of the program, it registers as an Overlay app on SteamVR just like other well known programs like XSOverlay or OVRAdvancedSettings and can be launched on startup:

After setting the option to ON it will launch the program on SteamVR startup. If it doesnt show up, manually register the ´app.vrmanifest´ file by double clicking it and running it with SteamVR.

Backlog

~~Add a quick entry box for quick messaging.~~
~~Create a Settings UI for easy config editing.~~
~~Enable Integration with KillFrenzyAvatarText.~~
~~Transcribe continuously until the user stops talking.~~
~~Add an emote feature~~
~~Demo Gif/Video (Stole from Frosty, thanks lol)~~
~~Add a OBS browsersource~~
~~Use whisper.cpp/faster-whisper for transcription, for better performance.~~
~~Allow use of finetuned models.~~
~~Allow translation into and from different languages. M2M100 using ctranslate2~~
remove the need for building the program, enable OTA updates. (Currently in work)
Implement Text To Speech silero-tts (Currently in work)
Documentation of features in wiki

Donate

You can always leave a Github Star 🟊 (It's free) or buy me a coffee:

Credit

OpenAI for their amazing work with anything really.
guillaumekln/faster-whisper and ctranslate2, their work makes this project much more efficent and faster then it otherwise would be.
ValveSoftware/openvr and cmbruns/pyopenvr
Uberi/speech_recognition and jleb/pyaudio
killfrenzy96 for KillFrenzyAvatarText and KatOSC
Frosty704's Billboard for making this project more useful.

xiaoyi8383 / vrctextboxstt Goto Github PK

vrctextboxstt's Introduction