koljab / realtimetts Goto Github PK
View Code? Open in Web Editor NEWConverts text to speech in realtime
Converts text to speech in realtime
It's heartbreaking that Coqui has shut down, and I've been looking for alternative projects.
I'm not sure if https://github.com/myshell-ai/MeloTTS has potential, but I've heard that the project has high quality.
Could you please take a look when you have time?
Hi,
I want to know if there's any way to get an iterator over chunks so that I'll be able to do like this -
chunks = stream.<some_way_to_get_chunk_iterator>
def gen():
for chunk in chunks:
yield chunk
then in fastapi return - StreamingResponse(gen(), media_type="audio/wav")
Thanks
The front page says:
to clone a voice submit the filename of a wave file containing the source voice as cloning_reference_wav to the CoquiEngine constructor.
This code works, where do I put the cloning_reference_wav? Thanks
if name == 'main':
from RealtimeTTS import TextToAudioStream, CoquiEngine
import logging
logging.basicConfig(level=logging.INFO)
engine = CoquiEngine(level=logging.INFO)
stream = TextToAudioStream(engine)
print ("Starting to play stream")
stream.feed("Everything is going perfectly")
stream.play() #pause(), resume(), stop()
engine.shutdown()
Out of pure curiosity, I want to know if asynchronous programming would be helpful in processing this project?
Hey,
I'll take a stab myself, but wanted to suggest getting a docker image created to help with some of these dependency issues to see if that helps out and though it best to record it here for both TTS/STT projects.
Example "tests/write_to_file.py" not producing any files.
I've tried for SystemEngine() and for CoquiEngine()
Used file name, relative and full file name.
Same result as .play():
.play(file_name) outputs to speakers and no "system_output.wav" anywhere on C:\ drive
stream.load_engine(system_engine)
stream.feed(dummy_generator())
# works as a .play() without parameters->output to speakers
stream.play(output_wavfile=stream.engine.engine_name + "_output.wav")
in my case it looks like:
def speakSys():
text_gen = dummy_generator()
sys_engine = SystemEngine()
stream = TextToAudioStream(sys_engine)
stream.feed(text_gen)
# last attempt to put it into the working folder since just file name or ".\\" did not work
output_wavfile = "C:\\dev_free\\w1\\" + stream.engine.engine_name + "_output.wav"
print (f"Writing to {output_wavfile} ...")
stream.play(output_wavfile)
I've managed to get the RealTimeTTS library to work, I'm wondering if there's anyway to save/keep appending audio chunks as they come in to a file so that I play it back later on? I want to listen to the output audio exactly as the stream will be playing it using the play_async()
function.
Thanks!
play(self,
fast_sentence_fragment: bool = True,
buffer_threshold_seconds: float = 0.0,
minimum_sentence_length: int = 10,
minimum_first_fragment_length : int = 10,
log_synthesized_text = False,
reset_generated_text: bool = True,
output_wavfile: str = None,
on_sentence_synthesized = None,
on_audio_chunk = None,
tokenizer: str = "nltk",
language: str = "en",
context_size: int = 12,
muted: bool = False,
):
I need to use the on_audio_chunk data as input data in realtime service,can I just use the callback data,but not use the system to play audio?
thanks!
Thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.
stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream)
stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)
error in play() with engine coqui: [Errno -9996] Invalid output device (no default output device)
Traceback: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/text_to_stream.py", line 254, in play
self.player.start()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 269, in start
self.audio_stream.open_stream()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 68, in open_stream
self.stream = self.pyaudio_instance.open(
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid output device (no default output device)
Thanks for great repo again )
It will be great to have something like on_sentence_START_synthesized, so sentence can be printed before the speech started.
Hey! I've tried running the CoquiEngine using your example, but I'm receiving error
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
The traceback points at line 107@RealtimeTTS/engines/coqui_engine.py. Could this be caused by different torch or python version?
I believe elevenlabs
dependency version need freezing or should be refactored when their engine is not being imported:
I'm getting following error on python 3.12:
Traceback (most recent call last):
File "tts.py", line 1, in <module>
from RealtimeTTS import CoquiEngine, TextToAudioStream
File ".venv/lib/python3.12/site-packages/RealtimeTTS/__init__.py", line 1, in <module>
from .text_to_stream import TextToAudioStream
File ".venv/lib/python3.12/site-packages/RealtimeTTS/text_to_stream.py", line 1, in <module>
from .engines import BaseEngine
File ".venv/lib/python3.12/site-packages/RealtimeTTS/engines/__init__.py", line 4, in <module>
from .elevenlabs_engine import ElevenlabsEngine
File ".venv/lib/python3.12/site-packages/RealtimeTTS/engines/elevenlabs_engine.py", line 2, in <module>
from elevenlabs import voices, generate, stream
ImportError: cannot import name 'generate' from 'elevenlabs' (venv path here)
I'm trying to run a simple TextToAudioStream in an Ubuntu Lightsail container on AWS like this to stream to browser js:
stream = client.chat.completions.create( model="gpt-4-turbo-preview", messages=[{"role": "user", "content": text}], stream=True ) tts_stream = TextToAudioStream( AzureEngine( speech_key='', service_region='westeurope', voice='zh-CN-XiaochenMultilingualNeural', rate='5' ), log_characters=True ) tts_stream.feed(stream).play(on_audio_chunk=handle_audio_chunk, muted=True)
This works in my local environment with muted=True
but I'm getting this error even though I don't require audio playback and set muted=True
Is there a way around this or somehow to get this working in a VPS environment?
ALSA lib confmisc.c:855:(parse_card) cannot find card '0' ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings ... ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM sysdefault ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
Thank you for your work, I think it's cool.I find that speech generated using short text works great, but when I try to use it to generate speech for longer text, the speech starts out fast and gets slower and slower later on, and there are occasional repetitive sentences, what are the appropriate parameters that should be used for speech generation for long text? Thank you.
This code:
stream = TextToAudioStream(engine)
stream.feed("Hello")
stream.play_async()
time.sleep(0.1)
stream.feed("friend")
if stream.is_playing():
stream.play_async()
... does only play "hello" and not "friend". However, if I comment out time.sleep() it plays both. Also, if I sleep for 2+ seconds it also plays both words.
Is this expected?
from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine
engine = SystemEngine() # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
print(f"type:{type(stream)}")
print(f"stream:{stream}")
stream.play()
error in play() with engine system: [Errno -9996] Invalid output device (no default output device)
How can such problems be solved and what causes them
Hi,
For my usage I am feeding the engine the sentence word by word.
using the SystemEngine I got a somewhat coherent sentence (the words were clear but the sentence was too fast),
but when using the CoquiEngine the words became very unclear and I experienced pauses.
I tried to up the buffer_threshold_seconds=7 but with no apparent improvement.
any suggestions how can I improve the output?
when feeding the engine complete sentences I gotten pretty good result, I am also using voice cloning, but this phenomenon persist with the default voice too.
Thank you!
stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream)
stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)
thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.
error in play() with engine coqui: [Errno -9996] Invalid output device (no default output device)
Traceback: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/text_to_stream.py", line 254, in play
self.player.start()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 269, in start
self.audio_stream.open_stream()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 68, in open_stream
self.stream = self.pyaudio_instance.open(
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid output device (no default output device)
I continue to get this error and have yet to figure out why. Any ideas?
Exception in thread Thread-2 (play): Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/RealtimeTTS/text_to_stream.py", line 231, in play self.engine.synthesize(self.char_iter) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/RealtimeTTS/engines/elevenlabs_engine.py", line 121, in synthesize self.stream(self.audio_stream) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/RealtimeTTS/engines/elevenlabs_engine.py", line 177, in stream for chunk in audio_stream: File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/elevenlabs/api/tts.py", line 134, in generate_stream_input if data["audio"]: ~~~~^^^^^^^^^ KeyError: 'audio'
https://github.com/yl4579/StyleTTS2 seems very promising, was wondering if its supported by this library
(Windows 11, VSCode, py_10.11) almost fresh VSCode (only torch, numpy).
Resolved manually as described below:
from RealtimeTTS import TextToAudioStream, SystemEngine
TextToAudioStream(SystemEngine()).feed(dummy_generator()).play()
(.venv) C:\dev_free\w1>python th_cuda.py
C:\dev_free\w1.venv\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
...
C:\dev_free\w1.venv\lib\site-packages\pydub\utils.py:198: RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work
warn("Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work", RuntimeWarning)
WARNING:root:engine system failed to synthesize sentence "This is a sentence." with error: [WinError 2] The system cannot find the file specified
Traceback: Traceback (most recent call last):
File "C:\dev_free\w1.venv\lib\site-packages\RealtimeTTS\text_to_stream.py", line 279, in synthesize_worker
success = self.engine.synthesize(sentence)
pip install ffmpeg
pip install ffprobe
https://stackoverflow.com/questions/74651215/couldnt-find-ffmpeg-or-avconv-python
pip install ffmpeg-downloader
ffdl install --add-path
#--> restart VSCode (as after all previous steps)
When I have a test with RealtimeTTS with the code below:
from RealtimeTTS import TextToAudioStream, CoquiEngine
engine = CoquiEngine() # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()
it crashes and give the error:
engine = CoquiEngine() # replace with your TTS engine
File "D:\Python\xxxx\venv\lib\site-packages\RealtimeTTS\engines\base_engine.py", line 11, in __call__
instance = super().__call__(*args, **kwargs)
File "D:\Python\xxxx\venv\lib\site-packages\RealtimeTTS\engines\coqui_engine.py", line 96, in __init__
self.synthesize_process.start()
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
OSError: libespeak.so.1: cannot open shared object file: No such file or directory
trying to run:
from RealtimeTTS import TextToAudioStream, SystemEngine
def dummy_generator():
yield "This is a sentence. And here's another! Yet, "
yield "there's more. This ends now."
TextToAudioStream(SystemEngine()).feed(dummy_generator()).play()
ubuntu 22
Hi, I just tried using your API on a jupyter notebook with the CoquiTTS engine. It seems like it's expected a female.wav file to be present? Here's the error I'm getting:
ERROR:root:Error initializing main faster_whisper transcription model: Error opening 'female.wav': System error.
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/RealtimeTTS/engines/coqui_engine.py", line 135, in _synthesize_worker
gpt_cond_latent, speaker_embedding = get_conditioning_latents(cloning_reference_wav)
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/RealtimeTTS/engines/coqui_engine.py", line 95, in get_conditioning_latents
gpt_cond_latent, speaker_embedding = tts.get_conditioning_latents(audio_path=filename)
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 343, in get_conditioning_latents
audio = load_audio(file_path, load_sr)
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 73, in load_audio
audio, lsr = torchaudio.load(audiopath)
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torchaudio/_backend/utils.py", line 203, in load
return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torchaudio/_backend/soundfile.py", line 26, in load
return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torchaudio/_backend/soundfile_backend.py", line 221, in load
with soundfile.SoundFile(filepath, "r") as file_:
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/soundfile.py", line 658, in __init__
self._file = self._open(file, mode_int, closefd)
File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/soundfile.py", line 1216, in _open
raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening 'female.wav': System error.
Process Process-1:
Hi, I encountered the following error, could someone help me?
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
from RealtimeTTS import TextToAudioStream, CoquiEngine
engine = CoquiEngine()
tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
/home/yipyewmun/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2
Using model: xtts
Process Process-1:
Traceback (most recent call last):
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 149, in _synthesize_worker
gpt_cond_latent, speaker_embedding = get_conditioning_latents(cloning_reference_wav)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 92, in get_conditioning_latents
speaker_embedding = (torch.tensor(latents["speaker_embedding"]).unsqueeze(0).unsqueeze(-1))
KeyError: 'speaker_embedding'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 152, in _synthesize_worker
logging.exception(f"Error initializing main faster_whisper transcription model: {e}")
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/logging/init.py", line 2113, in exception
error(msg, *args, exc_info=exc_info, **kwargs)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/logging/init.py", line 2105, in error
root.error(msg, *args, **kwargs)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/logging/init.py", line 1506, in error
self._log(ERROR, msg, args, **kwargs)
TypeError: Log._log() got an unexpected keyword argument 'exc_info'
(venv) ➜ RealtimeTTS git:(main) ✗ python3 simple_test.py
Traceback (most recent call last):
File "/Users/lout/Documents/projects/explore_ai/RealtimeTTS/simple_test.py", line 12, in <module>
engine = SystemEngine() # replace with your TTS engine
^^^^^^^^^^^^^^
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/RealtimeTTS/engines/base_engine.py", line 10, in __call__
instance = super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/RealtimeTTS/engines/system_engine.py", line 36, in __init__
self.set_voice(voice)
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/RealtimeTTS/engines/system_engine.py", line 105, in set_voice
installed_voices = self.engine.getProperty('voices')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/engine.py", line 146, in getProperty
return self.proxy.getProperty(name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/driver.py", line 173, in getProperty
return self._driver.getProperty(name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/drivers/nsss.py", line 69, in getProperty
return [self._toVoice(NSSpeechSynthesizer.attributesForVoice_(v))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/drivers/nsss.py", line 69, in <listcomp>
return [self._toVoice(NSSpeechSynthesizer.attributesForVoice_(v))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/drivers/nsss.py", line 64, in _toVoice
attr['VoiceAge'])
~~~~^^^^^^^^^^^^
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/objc/_convenience_mapping.py", line 18, in __getitem__objectForKey_
return container_unwrap(res, KeyError, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/objc/_convenience.py", line 134, in container_unwrap
raise exc_type(*exc_args)
KeyError: 'VoiceAge'
Code:
from RealtimeTTS import TextToAudioStream, SystemEngine
def dummy_generator():
yield "This is a sentence. And here's another! Yet, "
yield "there's more. This ends now."
TextToAudioStream(SystemEngine()).feed(dummy_generator()).play()
Also i had to install the
pip3 install pyobjc==9.0.1
for this sample to work
import pyttsx3
engine = pyttsx3.init()
engine.say("I will speak this text")
engine.runAndWait()
Hello, I'm super interested in your project and have been messing around with AI of various types for a few months, however, I am relatively new to python and coding in general.
This project stands out to me and I'm forcing myself to learn python and to be able to utilize it fully, due to myself being mute and unable to speak. I want to be able to use this to join in and talk to my friends over discord and have more of a presence.
short version, is it possible to share the simple UI (Or is it already there and I'm just dumb..) so that I may learn from it and expand on it?
Sorry, wasn't sure where else to ask this, Kindest regards.
Hello,
Since v0.3.0 the cloning on wav file seems to not work.
When it's a json file, it finds it and uses the voice accordingly but when when it's a wav file it fallback to the coqui_default_voice.
Thank you very much for your work.
Hi, I'm doing a project now, and I really need to be able to specify the output device (I want to output to a virtual microphone) the result of streaming.
It would be cool if it would be possible to specify the output device as it is done in RealtimeSTT with the input device.
Thanks for your work )
Does adding Google gTTS support make sense?
Hello Author,
Really great Job with your 4 speech assistant related projects , to try bring the latency down as much as possible!
I was wondering have you see this paper , I think it claims the same , I still don't know which is faster , yours or theirs
https://arxiv.org/abs/2309.11210
Thanks in advance!
I installed coqui engine for my project by refering the github docs and i got the following error
File "", line 1, in
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 131, in _main
prepare(preparation_data)
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 291, in run_path
File "", line 98, in _run_module_code
File "", line 88, in _run_code
File "C:\Users\david\PycharmProjects\ai\e.py", line 3, in
engine = CoquiEngine() # replace with your TTS engine
^^^^^^^^^^^^^
File "C:\Users\david\AppData\Roaming\Python\Python311\site-packages\RealtimeTTS\engines\base_engine.py", line 11, in call
instance = super().call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\AppData\Roaming\Python\Python311\site-packages\RealtimeTTS\engines\coqui_engine.py", line 190, in init
self.create_worker_process()
File "C:\Users\david\AppData\Roaming\Python\Python311\site-packages\RealtimeTTS\engines\coqui_engine.py", line 248, in create_worker_process
self.synthesize_process.start()
File "C:\Users\david\anaconda3\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data
_check_not_importing_main()
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
To fix this issue, refer to the "Safe importing of main module"
section in https://docs.python.org/3/library/multiprocessing.html
Hi,
Is it possible to call this via frontend JS and then stream audio directly to browser for playback? If so, what should the approach be?
Thank You!
Hello, thanks for your work first.
Can I use this in Flask Python webapp?
I'm going to send request to Flask app from JS to get audio streaming.
Is this possible as well?
Hope to hear from you soon.
I have run this
from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine
engine = SystemEngine() # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()
Traceback (most recent call last):
File ".\tts-realtime.py", line 8, in
import RealtimeTTS
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS_init_.py", line 1, in
from .text_to_stream import TextToAudioStream
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS\text_to_stream.py", line 1, in
from .engines import BaseEngine
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS\engines_init_.py", line 4, in
from .elevenlabs_engine import ElevenlabsEngine
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS\engines\elevenlabs_engine.py", line 2, in
from elevenlabs import voices, generate, stream
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs_init_.py", line 3, in
from .types import (
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs\types_init_.py", line 4, in
from .add_project_response_model import AddProjectResponseModel
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs\types\add_project_response_model.py", line 7, in
from .project_response import ProjectResponse
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs\types\project_response.py", line 15, in
class ProjectResponse(pydantic.BaseModel):
File "pydantic\main.py", line 205, in pydantic.main.ModelMetaclass.new
File "pydantic\fields.py", line 491, in pydantic.fields.ModelField.infer
File "pydantic\fields.py", line 421, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 537, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 636, in pydantic.fields.ModelField._type_analysis
File "pydantic\fields.py", line 781, in pydantic.fields.ModelField._create_sub_type
File "pydantic\fields.py", line 421, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 537, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 641, in pydantic.fields.ModelField._type_analysis
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\typing.py", line 774, in subclasscheck
return issubclass(cls, self.origin)
TypeError: issubclass() arg 1 must be a class
Hello!
This seems like a very cool project! :) I'm trying to test it out for this little game thing I'm doing but I get an error message when trying to install on Windows 11. (from inside pycharm venv)
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
Does this seem correct to you?
Would I really need to install MSVC for python RealtimeTTS ?
Thank you for making your work public!
Cheers!
Fred
install fails at pyaudio:
following are various errors from subsequent attempts to resolve issue.
ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects
Building wheel for PyAudio (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for PyAudio (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-12.6-x86_64-cpython-39
creating build/lib.macosx-12.6-x86_64-cpython-39/pyaudio
copying src/pyaudio/init.py -> build/lib.macosx-12.6-x86_64-cpython-39/pyaudio
running build_ext
building 'pyaudio._portaudio' extension
creating build/temp.macosx-12.6-x86_64-cpython-39
creating build/temp.macosx-12.6-x86_64-cpython-39/src
creating build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/device_api.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/device_api.o
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/host_api.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/host_api.o
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/init.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/init.o
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/mac_core_stream_info.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/mac_core_stream_info.o
In file included from src/pyaudio/mac_core_stream_info.c:3:
src/pyaudio/mac_core_stream_info.h:13:10: fatal error: 'pa_mac_core.h' file not found
#include "pa_mac_core.h"
^~~~~~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for PyAudio
Building wheel for TTS (pyproject.toml) ... done
Created wheel for TTS: filename=TTS-0.22.0-cp39-cp39-macosx_12_0_x86_64.whl size=903439 sha256=ea358468699ac39beab3575f19324aa69622c73a77c6e429933673579e3aee0d
Stored in directory: /Users/karibu/Library/Caches/pip/wheels/e9/94/e7/52e526c3ef9c07ac0b67a7dce87f81b6fb83ffd2d1754224e3
Successfully built TTS
Failed to build PyAudio
ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects
I test the tests/chinese_test.py but there is an error. Does anyone know how to solve it?
Traceback: Traceback (most recent call last):
File "/Users/zhujunming/Desktop/AIQQ/tts/RealtimeTTS/RealtimeTTS/text_to_stream.py", line 265, in synthesize_worker
success = self.engine.synthesize(sentence)
File "/Users/zhujunming/Desktop/AIQQ/tts/RealtimeTTS/RealtimeTTS/engines/system_engine.py", line 72, in synthesize
with wave.open(self.file_path, 'rb') as wf:
File "/usr/local/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 509, in open
return Wave_read(f)
File "/usr/local/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 163, in init
self.initfp(f)
File "/usr/local/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 130, in initfp
raise Error('file does not start with RIFF id')
wave.Error: file does not start with RIFF id
I am trying to pass a var i can use inside coqui_engine.py _synthesize_worker loop
But i am not finding the right way to do it.
So we have our
stream.feed(content)
which is
in coqui_engine.py _synthesize_worker
text = data['text']
But i want to be able to pass an id like
tts_id_file = '95986845'
stream.tts_id(tts_id_file)
then in coqui_engine.py _synthesize_worker
be able to get inside the loop.
I am using this instead of the output filesname since i am doing custom logic with the chunks i want to be able to name each chunk batch with the unique id.
I know its a bit unusual, i'v been trying for a few hours with no luck ;/
Hi! I’m interesting is there any way to get audio stream (chunks while generated) output as numpy array?
I've just been doing a load of work with the Coquii TTS engine and I thought it wanted 24000Hz for sample files. Turns out as standard it wants 22050Hz. They both work, but if you look in the config.json file that comes downloaded with the models, it has a set preference for 22050Hz as the input file (and yes, mono 16 bit etc).
I was just taking an interest in your RealtimeTTS and thinking of pulling it into my project and spotted https://github.com/KoljaB/RealtimeTTS#coquiengine figured you may want to update it.
Thanks
stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream)
stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)
thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.
error in play() with engine coqui: [Errno -9996] Invalid output device (no default output device)
Traceback: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/text_to_stream.py", line 254, in play
self.player.start()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 269, in start
self.audio_stream.open_stream()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 68, in open_stream
self.stream = self.pyaudio_instance.open(
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid output device (no default output device)
stream.play_async()
logic how can we just start creating the chunks but not initiate the playback of the audio files.
I already added logic in
for i, chunk in enumerate(chunks):
to save the chunks in a folder, but i cant find a simple way to stop the playback without breaking stuff.
Hi, first of all thanks for your project, it is very cool to be able to get the result almost in realtime
I use CoquiEngine
I am using FastAPI and I have a problem that when I want to stop a stream and then restart stream, maybe with a different text, I can't do it.
After I do stream.stop()
the subsequent stream.play_async()
doesn't work and I have to restart the server to get everything working again.
To make it clearer, I recorded a small video and attached a simple server code
Here is a simple server code that demonstrates the problem
server.py
import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from RealtimeTTS import TextToAudioStream, CoquiEngine
app = FastAPI()
class SynthesisRequest(BaseModel):
text: str
# Stopping occurs correctly, but when we start a new stream with new text,
# we can't do anything with it, the stream stops playing and doesn't work anymore,
# only restarting helps to fix it
@app.get("/tts_stop")
async def tts_stop():
stream.stop()
# It works fine, but after we do stream.stop() it crashes and doesn't work anymore.
@app.post("/tts_to_audio")
async def tts_to_audio(request: SynthesisRequest):
stream.feed(request.text)
stream.play_async()
return {"message": "stream"}
if __name__ == "__main__":
engine = CoquiEngine()
stream = TextToAudioStream(engine)
uvicorn.run(app,port=8010)
Awesome library. Can you integrate Metavoice 1B
https://github.com/metavoiceio/metavoice-src
I made my own logic to build the sentences up.
How can i send each call of
stream.feed(content)
straight into synthesize.
I don't want stream to wait for the next batch or sentence and just go straight to synthesize each time its called.
Have
buffer_threshold_seconds = 0
fast_sentence_fragment = True
I create a python virtual env and install the lib but I am having this error, any idea how to solve this?
from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine
ModuleNotFoundError: No module named 'RealtimeTTS'
I am using ubuntu lts 22.04
thank you so much
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.