vrcwizard / tts-voice-wizard Goto Github PK

Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) (VTuber TTS)

Home Page: https://TTSVoiceWizard.com

License: MIT License

C# 100.00%

tts speech-to-text speech-recognition vrchat osc discord free voice vtuber chatbox

tts-voice-wizard's Introduction

TTS-Voice-Wizard

A Voice for Everyone

Socials

Repo

Do.it.NOW.1.mp4

Features

Use TTS Voice Wizard's accessibility features to improve your VRChat experience (it works outside of VRChat too!)

🎙️ You can convert your Speech-to-Text and back to Speech through various Speech Recognition and Text-to-Speech methods

💬 You can send what you say as OSC messages to VRChat to be displayed on your avatar using KillFrenzyAvatarText/Frosty's Billboard or VRChats Chatbox

🌐 The app can translate your speech from one language to over 50 other supported languages

🔊 There are 100+ different voices with various customization options so you can pick a voice that best suits you

🎵 Display the current song you are listening to on Spotify or via your browser

🔋 Display tracker and controller battery life in conjunction with XSOverlay

❤️ Use in conjuntion with Pulsoid or HRtoVRChat_OSC to enable you to display your heartrate in VRChat's Chatbox

🗣️ Control VRChat avatar parameters with voice commands

🫵 Display customizable and interactive counters for amount of times a VRChat contact receiver has been touched

📹 For a more detailed look at the app's features, check out the Demonstration Video (although it's a bit outdated)

GETTING STARTED

Download TTS Voice Wizard

Please read the Quickstart Guide or watch the Tutorial Video for instructions on how to set up the basic features of TTS Voice Wizard.

Alternative Installation Method

Thanks to babo4d TTS Voice Wizard also can be installed via scoop with the following commands:

scoop bucket add extras
scoop install extras/tts-voice-wizard

Requirements

Windows 10/11 (compatibility not guarenteed with older version of Windows)

GitHub Wiki Table of Contents

Quick Start Guide

Additional Guides

Need Help / Have Questions / Wanna make suggestions?

Join the Discord Server

What comes next?

Check the Trello page to see what features will be coming out next!

Support the Project

Leave me a Github Star ⭐ (it's free) or

🔑 Unlock VoiceWizardPro Benefits!

Subscribe to Ko-Fi or Patreon and experience a world of powerful features that will transform your TTS and translation experience:

✨ Instant Access to Premium Voices: Enjoy hundreds of voices from leading cloud services, including:
- Microsoft Azure
- Amazon Polly
- Google Cloud
- IBM Watson
🌍 Multilingual Magic: Translate your voices into 70+ supported languages, talk to your friends from all over the world
🎤 Crystal-Clear Transcriptions: Gain access to speech recognition through DeepGram's Nova-2 model, the fastest and most accurate speech-to-text API.

Your subscription not only enhances your capabilities but also supports future development:

💪 Empower Ongoing Development: Your contribution assists in server upkeep, covers character costs from premium APIs, and fuels future software innovations.

Ready to elevate your TTS game? Dive into VoiceWizardPro now! For detailed insights, explore our VoiceWizardPro Docs.

Unlock the power of VoiceWizardPro today! 🚀

Partners

TalaTora - TalaTora the talented tiger VTuber, check them out on twitch

Credits

@RavenNovaa - New logo art by @RavenNovaa on twitter, check them out :)
CoreOSC - Sending / recieving OSC data (modified for sending UTF8)
octokit.net - Github Update Notifier
AutoUpdater.NET - Auto Updater for .NET applications
Sharptalk - .NET wrapper for the FonixTalk TTS engine.
tiktok-tts - Tiktok voices made possible by Weilbyte's Tiktok TTS endpoint
KillFrenzyAvatarText (KAT) - A text display system designed to be used on VRChat Avatars
Frosty's Billboard - Frosty's billboard is a container for the KAT allowing you to have a speech bubble in your hand
WindowsMediaController - Windows Media Package
WhisperNet - C# wrapper for whisper.cpp
ZAPSPLAT - Sound effects obtained from https://www.zapsplat.com

tts-voice-wizard's People

Contributors

Stargazers

Watchers

tts-voice-wizard's Issues

tts deads evry 5 min

as soon as i hop on vrc tts dead evry 5 min need help

Wiki update: install with scoop (package manager)

Scoop is a command-line installer for windows that is well-suited to managing portable apps like this one. Scoop users can install the app from my scoop bucket.

You can add a note to the wiki about installing with scoop, which only requires a couple of commands:

scoop bucket add xrtools "https://github.com/babo4d/scoop-xrtools"
scoop install tts-voice-wizard

The scoop installation also creates a start menu shortcut, recommends the user to install VB-CABLE, and provides instructions on how to install the .NET desktop runtime dependency with scoop:

> scoop info tts-voice-wizard
...
...
Suggestions : extras/windowsdesktop-runtime-lts
Notes       : Some features require a virtual audio cable like VB-CABLE <https://vb-audio.com/Cable/>
              Requires .NET Desktop Runtime 6.0. To install with scoop:
                scoop install sudo
                sudo scoop install windowsdesktop-runtime-lts

I have submitted the manifest to the scoop Extras repository (ScoopInstaller/Extras#11918) which would further simplify installation and improve discoverability.

i am getting [OBSText File Error: Could not find a part of the path 'C:\Users\User\Desktop\TTSVoiceWizard-v1.5.8.1-x64\Output\TextOut\OBSText.txt'.. Try moving folder location.] it was in downloads but no matter where i move tts voice it dose it I also run it as administrator it even did it befor i updated as well i can click open file location and it works it just wont update / overwrite the txt file

Remembering the last used presets

Hi, there!
I have a small request.

Every time when I use the app, I have to change the preset, for example, "Text to Speech" -> "Presets" always go back to "Non Selected"
Is it possible to remember "Preset"?

Thanks.

[Feature Request] Show the typing dots when STTTS is listening or while typing in the text box

Binary Reduction in source repo

Extremely cool project but noticed a few bits of concern in the git repo worth addressing.

There's a number of prebuilt blobs in the git project and a large amount of content that could be better served as git submodules with their own sub builds wired into the main build script.
Could lead to much improved ability to modify the program to suit individual needs, future contributions, and of course support for building and running on more platforms than win32.

Is cleaning up the source repo on the roadmap?

Can't find my language in spoken language list

I speak Persian , so i checked whisper and saw that it supports my language

but i can't find it in the list

how can i change this into Persian?

[SUGGESTION] Listen to custom port to start/stop TTSVoiceWizard output

As the title suggests, would be awesome if there was an option to have TTSVoiceWizard listen to a custom port and allow it to be controlled by other OSC applications. Things like starting/stopping the chatbox output and KAT output seperately with endpoints would be cool. Thank you for your work VRCWizard!

TTS OBS plugin using Text Source

I would like to make a suggestion here. This is about OBS.
I am currently using Closed Captioning via Google Speech Recognition 0.0.8.
This plug-in converts my voice into text in real time.
This text is entered as an OBS text source and disappears after a while.
I want to convert this real-time text into AI voice and send it to the broadcast using this text source.
Currently, in your wizard program, you have to input text directly from the program to the keyboard to convert it into voice.
Can you create an OBS plug-in that takes the text source of OBS and converts it into voice in real time?

Missing Spotify Feature in compact mode in test version

Just wanted to make this known just in case it isn't.

Compact mode doesn't display current duration with "Output Current Song Periodically" is enabled.

Also it isn't compact upon switching songs

Additionally, add support for the hour mark if possible (only if song length >1hr, otherwise hide hour mark), as when I added local files that are long mixes, it doesn't display the hour mark.

Invalid text input, Invalid MP3 file

[10:39:56 AM]: [Your text input may have been invalid]
[10:39:56 AM]: [Error Playing Audio: Invalid MP3 file - no MP3 Frames Detected]

I've tried reinstalling it and it worked yesterday.

Music Update problem with youtube

when playing music thru YouTube, it'll show the first song your listening to and the what min and sec your on if you have it showing that and upon updating the text it'll show the same thing indefinitely until i restart the app. Pls get this fixed asap

Api request faild

I tried several other region, generated multiple resources and tried several other version(0.7.9.5, 0.7.9.7, 0.8.0.1).

All fields on Speech to TTS are not saved on app closing

I would like all changed field on Speech to TTS menu will be saved on app closing, Thank you <3

Whisper unstability

Hi, STT (Whisper) is the biggest use-case for me. I think it's probably the most important feature for now until I can use it reliably.
Hopefully, it's the same for everyone as it's the starting point for using TTSVoiceWizard.

Anyway, there is what I find using the latest v.1.5.0 from the github main.

In the Log View, I see the new "Whisper Debug: ..." output. When STT mode is on, it will always shows randomly shows one of the followings. I think it's clear what it means.
(A) "Listening" (listening and there is no sound input)
(B) "Listening, Voice" (listening and sound input is detected)
(C) "Listening, Transcribing" (processing recorded voice)

But the problem is that they do no accurately represent what's really happening, and the behaviors are bit random.

Here are my observations. (I always launch it from VS Debug but I think the behaviors are the same from .exe)

When STT is first activated, it will always start at (B) although there is no voice input. And it will stuck at (B) until speak several times. (yes, I waited sufficient time until ggml model loads) And when it unfreezes from (B), it will output several strings bunched up that I spoke into.
After the initial hiccups, it will become more responsive. However, sometimes (A), (B), (C) will cycle through on it's own without any sound inputs.
I then, wait until it stabilizes to (A), and then start speaking again. It will sometimes go to (B) immediately, and sometimes it doesn't. And it will start cycle through (A), (B), (C) on it's own again.

Here is another observation/question.
I see the following logs in VS Console Output.

It seems to recreate the same threads infinitely.
Can you please tell me what these threads are for?
Perhaps the unstability is related to these thread constantly being recreated?

Many thanks.

[Bug] Spotify outputting periodically when it shouldn't

There appears to be a bug where spotify periodically outputs when the setting to do so is disabled, except without the current time.

This happens when "Send Text to VRChat with KAT" is disabled.

Cant find some tts

I tried several other TTS on this programm, but some are not working, the most important to me is "Acapela Elan TTS Digalo Nikolai", even though its SAPI5, its just not on the list

Hi Bro,can u add more API,like OPENAI for the translation and TTS,

Hi Bro,can u add more API,like OPENAI for the translation and TTS at the sametime, or Edge TTS stuff, just a thought, hope u see it,
i like your work a lot by the way!! im a Chinese user, the Api apply and payment is not friendly, maybe you can add more API for choose,
respect~

FR: Support download mp3 voice file when TTS

Hi! Thanks for you share.
Can support download mp3 voice file when TTS？

Output TTS to file

there is any way to convert the audio into mp3 ?

[Suggestion] Websocket server controls or extra endpoint variables

Would be really nice if there were toggles or some way to control what parts of the program are influenced by websocket commands. It could be handled similarly to the OSC endpoint variables or just toggles in the UI itself. For example, there could be a way to control whether the speech output, chatbox etc is influenced by websocket commands or not. Also having another OSC endpoint for TTS output would be a nice addition alongside the current chatbox and KAT endpoints. Thanks again Wizard for catering to my weird needs, still really appreciate you adding the OSC to TTS function. It's been really helpful for integrating my own niche chatbox projects alongside this and I hope others will take advantage of it too <3

Other .NET components not optional

Updated to the 1.0.3 version today, I think i was on 0.9.9 before. Anyway I got the .NET error again like you warned in the 1.0.2 release notes. However, it would not start after installing the package from that error message. I installed the x86 package for console apps and after that it also still would not start. After I installed the Hosting Bundle it started normally. I've never used Fonixtalk/Moonbase but apparently it must be checking for that dependency before it will even launch.

[BUG] STT (whisper)

Hi, there! Using STT (whisper), I found a couple of bugs. (I'm running the app on Visual Studio 2022 Debug)

Click on "Speech to Text to Speech" on/off several times will cause the Exception Error.
Closing the app (pressing "" icon) with "Speech to Text to Speech" Enabled, will not kill the app. There are dangling thread remaining preventing the app to exit. You will see this behavior when the app is launched from Visual Studio Debug.

I think that's all for now and many thanks for the wonderful app.

Auto Clear TTS Text Box Field not functioning after latest update

Version: v0.9.4.6
OS: Win10

After the latest update, the text box does not clear itself after TTS.
I've checked that the toggle is on and tried with it toggle off as well. Also tried extracting the zip file again with same results.

[TTS Glitch] TTS Says Stuff Twice

Hi uhm, im having an issue with the TTS system i've never experienced before, upon pressing enter to make the program do the TTS process.. its, doubled, lemme try to explain it the best i can, instead of hearing the TTS say the text i inputted only once, i hear it twice, the second time with some slight delay than the first time, its a lil weird and idk why this is happening.

Clip of this happening: https://streamable.com/rsh7fc

Any help with this would be appreciated- (this is my first time ever making a post on github btw lol)

Web Captioner Speech to Text hook, always skips the first word.

Hi, I've been trying the web captioner hook. And no matter how long or short the recognized speech is. TTS wizard ignores the first word.

And on related note, Sometimes It repeats the sentences as individual words.

Webhook is also set to 4-5 word batches, so short sentences should be sent instead of individual words.

[Feature Request] STT (Whisper)

Hi, I have a couple of feature requests that can help working with STT easier.

I like to change my chatting language quickly to talk to someone with different languages.
Right now, in order to change the language, I have to restart STT (click on "Speech to Text to Speech").
This has a couple of problems. Click on "Speech to Text to Speech" quickly can cause crashes, and restarting STT can take bit of time to load ggml and it's a bit unstable when it starts.
Therefore, it would be nice to change "language" without restarting STT. Ggml should be loaded only once, or only when it's changed, I think.
I would like to temporarily "pause" capturing voice without stopping STT. Restarting STT can be problematic as I mentioned above. Is it possible to add a Toggle Button right next to "Speech to Text to Speech" to temporarily stop capturing voice? This way, I can control STT not to go crazy and STT can have time to clear up the buffer when not capturing.

Please let me know if you have a better idea.

Cheers!

Found way get Quest 1 / 2 Battery Life - Requires ADB Stuff

i found way pull Quest 2 Battery Life but it does Require ADB (Android Debug Bridge) to Detect Battery life.

to get battery life have to find way get Quest 2 Connect by Wired and Wireless using ADB in port 5555

mainly Oculus App uses ADB pull Quest 2 Battery Life and Controllers all direct headset and ADB calls it out.

so might be way get Quest and Quest 2 Supported and already fork Current Project today i see can find way intergrade it in and push changes with Quest 1/2 Integrations less someone else on it before me.

TTSVoiceWizard is looking for assets not in its own directory

Greetings! Just upgraded to version 1.5.1.7 and tried to use Text-to-Speech in System Speech mode, the program crashes after the second use. Crash dumps didn't appear in the program directory. Using Event Viewer, it turned out that TTSVoiceWizard crashes with an error System.IO.DirectoryNotFoundException, trying to find assets not in its own directory, but in the directory of the voice used. Here are the general details straight from the Event Viewer:

Event 1026, .NET Runtime

Application: TTSVoiceWizard.exe
CoreCLR Version: 6.0.1823.26907
.NET Version: 6.0.18
Description: The process was terminated due to an unhandled exception.
Exception Info: System.IO.DirectoryNotFoundException: Could not find a part of the path 'C:\Program Files (x86)\Speech2Go Voice Package\x64\Assets\sounds\TTSButton.wav'.
   at Microsoft.Win32.SafeHandles.SafeFileHandle.CreateFile(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
   at System.IO.Strategies.OSFileStreamStrategy..ctor(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
   at System.IO.Strategies.FileStreamHelpers.ChooseStrategy(FileStream fileStream, String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options, Int64 preallocationSize)
   at System.IO.File.OpenRead(String path)
   at NAudio.Wave.WaveFileReader..ctor(String waveFile)
   at OSCVRCWiz.Resources.Audio.AudioDevices.PlaySoundAsync(String soundName) in {Filtered}\OSCVRCWiz\Resources\Audio\AudioDevices.cs:line 483
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()

Excluded some part of the path for privacy reasons.

Adding Our Own Custom Voice

Hi.

Would you add the option so that we can add our own .pth RVC voice file to it to use as basis for our STTTS?

Thank you.

Issue about Whisper Model recognition Chinese

I have tested almost all whisper models of all sizes, and similar things happen.
When the recognition input language is set to Chinese.
When there is no voice input, the whisper model will continue to output spam, but if it is set to English, there will be no similar issue.
Japanese has not found the same issue, and I have not tried other languages.
Then I tried to switch the model to vosk, and there was no similar issue.

When I set the input to a microphone without any signal input, as can be seen in the screenshot, the whisper models output garbage.

This issue occurs more often when the mic is set to the one that I everyday use at the time I don't say anything

I don't know what caused this problem. I tried to use text replacement to delete these spam messages, but he often randomly combined some common words and randomly added spaces, which made it basically unusable.

I checked Whisper Model's github, as well as some technology shares using Whisper Model directly.
But there seems to be no such problem.
It seems that Whisper Model can be set to recognize multiple languages at the same time.
Is it possible to manually add startup commands, or provide an option to select multiple languages?
I wonder if this problem will be eliminated when multiple languages are recognized at the same time.

Old version was good now its getting wrose

old version there was optimus prime in english now portuguise ? and mexican WTH and now less voices its bad ill delete it and never going to use it

Suggestion: Audio volume percentage level for TTS

While the TTS is great to use, there is a lack of being able to control the volume in the TTS Voice Wizard itself. The GLADOS one comes out at a very high volume. Having a simple volume slider for the TTS would solve this issue

Have you considered using DeepL's free windows integration?

So DeepL supports capturing from applications, translating directly in applications, and piping text back and fourth, it also supports screen capture and ocr.

As someone who uses it to communicate with a massive japanese team on the daily, it is by far the best and most accurate translation. And, it's completely free for the kind of use this application would be spitting out.
It's paid options are for team translations documents and cat tools, all of which you don't need.

I'm planning on using your tool to give myself jp closed captions in game (i'm less interested in the TTS portion)
The azure requirement is a huge turn off for many, and to be honest it's translations aren't even all that good in comparison, especially for Japanese.
This would create a completely free option for users that works with what is considered the best tool to work with japanese users.