koljab / realtimetts Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 89.0 8.86 MB

Converts text to speech in realtime

Python 98.58% JavaScript 1.42%

python realtime speech-synthesis text-to-speech

realtimetts's People

Stargazers

Watchers

Forkers

ppang0405 doublespaced1 oijoijcoiejoijce zfbok tomchapin jeffara id-2 moxmoussa zhnathaniellee redheli bigdatasciencegroup triyam kyrolabs amorjnyh niskarsh12 haojingyuan majiajue ajawebx dlyss haodaohong lyhiving benjamesbabala zhangnn520 adam12211234 coderwpf songfang arkboy1224 xinqiyang universeroc hamburger-l albertzzzzzzzzzzzz omjomjomj mxlcpu arwin-cc zenimarc keyman9848 lendot mercuryyy seafitliu xiaolingis joshuafinch aaabbbcccdddeeef k0hacuu zwglory waywardspooky lonelyxmas hallucinate-ai chr1st1ang regression-io jknohr fengnote daryllgomas jiangminglu erikhaandrikman harshitdeepsolv kimjammer v1cc0 joshtrim ayrisdev softwareengineer-imerjr genievn watchdog87 wuzhongdehua zavier-sanders raspikinselok daswer123 prajwaljumde rakesh-shaw-aminfoweb killianlucas ngoiyaeric unokitchen lcsouzamenezes fiditenemini gowthaminti

realtimetts's Issues

Please consider MeloTTS

It's heartbreaking that Coqui has shut down, and I've been looking for alternative projects.
I'm not sure if https://github.com/myshell-ai/MeloTTS has potential, but I've heard that the project has high quality.
Could you please take a look when you have time?

support chinese？

Is there any way to return iterator to feed to fastapi StreamingResponse?

Hi,

I want to know if there's any way to get an iterator over chunks so that I'll be able to do like this -

chunks = stream.<some_way_to_get_chunk_iterator>

def gen():
for chunk in chunks:
yield chunk

then in fastapi return - StreamingResponse(gen(), media_type="audio/wav")

Thanks

Voice cloning procedure.

The front page says:
to clone a voice submit the filename of a wave file containing the source voice as cloning_reference_wav to the CoquiEngine constructor.

This code works, where do I put the cloning_reference_wav? Thanks
if name == 'main':
from RealtimeTTS import TextToAudioStream, CoquiEngine

import logging
logging.basicConfig(level=logging.INFO)    
engine = CoquiEngine(level=logging.INFO)

stream = TextToAudioStream(engine)

print ("Starting to play stream")
stream.feed("Everything is going perfectly")
stream.play() #pause(), resume(), stop()

engine.shutdown()

What is the effect of replacing multi-threaded pipelines with asynchronous queues?

Out of pure curiosity, I want to know if asynchronous programming would be helpful in processing this project?

Can we get a docker image created for quick setup?

Hey,

I'll take a stab myself, but wanted to suggest getting a docker image created to help with some of these dependency issues to see if that helps out and though it best to record it here for both TTS/STT projects.

"tests/write_to_file.py" does not seem producing any wav files

Example "tests/write_to_file.py" not producing any files.
I've tried for SystemEngine() and for CoquiEngine()
Used file name, relative and full file name.
Same result as .play():
.play(file_name) outputs to speakers and no "system_output.wav" anywhere on C:\ drive

  stream.load_engine(system_engine)
  stream.feed(dummy_generator())
  # works as a .play() without parameters->output to speakers
  stream.play(output_wavfile=stream.engine.engine_name + "_output.wav")

in my case it looks like:

def speakSys():
  text_gen   = dummy_generator()
  sys_engine = SystemEngine()
  stream = TextToAudioStream(sys_engine)
  
  stream.feed(text_gen)

  # last attempt to put it into the working folder since just file name or ".\\" did not work  
  output_wavfile = "C:\\dev_free\\w1\\" + stream.engine.engine_name + "_output.wav" 
 
  print (f"Writing to {output_wavfile} ...")
  stream.play(output_wavfile)

[Question] Is there a way to save the streamed audio to file?

I've managed to get the RealTimeTTS library to work, I'm wondering if there's anyway to save/keep appending audio chunks as they come in to a file so that I play it back later on? I want to listen to the output audio exactly as the stream will be playing it using the play_async() function.

Thanks!

Can we use on_audio_chunk callback as input data in realtime service？

play(self,
fast_sentence_fragment: bool = True,
buffer_threshold_seconds: float = 0.0,
minimum_sentence_length: int = 10,
minimum_first_fragment_length : int = 10,
log_synthesized_text = False,
reset_generated_text: bool = True,
output_wavfile: str = None,
on_sentence_synthesized = None,
on_audio_chunk = None,
tokenizer: str = "nltk",
language: str = "en",
context_size: int = 12,
muted: bool = False,
):
I need to use the on_audio_chunk data as input data in realtime service，can I just use the callback data，but not use the system to play audio？
thanks！

Use coqui engine play_async Invalid output device error

Thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.

stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream)
stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)

error in play() with engine coqui: [Errno -9996] Invalid output device (no default output device)
Traceback: Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/text_to_stream.py", line 254, in play
self.player.start()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 269, in start
self.audio_stream.open_stream()
File "/usr/local/lib/python3.10/dist-packages/RealtimeTTS/stream_player.py", line 68, in open_stream
self.stream = self.pyaudio_instance.open(
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pyaudio/init.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9996] Invalid output device (no default output device)

on_sentence_START_synthesized

Thanks for great repo again )
It will be great to have something like on_sentence_START_synthesized, so sentence can be printed before the speech started.

multiprocessing RuntimeError on Coqui engine

Hey! I've tried running the CoquiEngine using your example, but I'm receiving error

        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

The traceback points at line 107@RealtimeTTS/engines/coqui_engine.py. Could this be caused by different torch or python version?

Possible invalid elevenlabs version being used

I believe elevenlabs dependency version need freezing or should be refactored when their engine is not being imported:

I'm getting following error on python 3.12:

Traceback (most recent call last):
  File "tts.py", line 1, in <module>
    from RealtimeTTS import CoquiEngine, TextToAudioStream
  File ".venv/lib/python3.12/site-packages/RealtimeTTS/__init__.py", line 1, in <module>
    from .text_to_stream import TextToAudioStream
  File ".venv/lib/python3.12/site-packages/RealtimeTTS/text_to_stream.py", line 1, in <module>
    from .engines import BaseEngine
  File ".venv/lib/python3.12/site-packages/RealtimeTTS/engines/__init__.py", line 4, in <module>
    from .elevenlabs_engine import ElevenlabsEngine
  File ".venv/lib/python3.12/site-packages/RealtimeTTS/engines/elevenlabs_engine.py", line 2, in <module>
    from elevenlabs import voices, generate, stream

ImportError: cannot import name 'generate' from 'elevenlabs' (venv path here)

.play(muted=True) in VPS but error ALSA lib confmisc.c:855:(parse_card) cannot find card '0

I'm trying to run a simple TextToAudioStream in an Ubuntu Lightsail container on AWS like this to stream to browser js:

stream = client.chat.completions.create( model="gpt-4-turbo-preview", messages=[{"role": "user", "content": text}], stream=True ) tts_stream = TextToAudioStream( AzureEngine( speech_key='', service_region='westeurope', voice='zh-CN-XiaochenMultilingualNeural', rate='5' ), log_characters=True ) tts_stream.feed(stream).play(on_audio_chunk=handle_audio_chunk, muted=True)

This works in my local environment with muted=True but I'm getting this error even though I don't require audio playback and set muted=True

Is there a way around this or somehow to get this working in a VPS environment?

ALSA lib confmisc.c:855:(parse_card) cannot find card '0' ALSA lib conf.c:5178:(_snd_config_evaluate) function snd_func_card_inum returned error: No such file or directory ALSA lib confmisc.c:422:(snd_func_concat) error evaluating strings ... ALSA lib conf.c:5701:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM sysdefault ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ALSA lib pcm.c:2664:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear

What parameters that should be used for speech generation for long text

Thank you for your work, I think it's cool.I find that speech generated using short text works great, but when I try to use it to generate speech for longer text, the speech starts out fast and gets slower and slower later on, and there are occasional repetitive sentences, what are the appropriate parameters that should be used for speech generation for long text? Thank you.

async not working as expected?

This code:

stream = TextToAudioStream(engine)
stream.feed("Hello")
stream.play_async()
time.sleep(0.1)
stream.feed("friend")
if stream.is_playing():
    stream.play_async()

... does only play "hello" and not "friend". However, if I comment out time.sleep() it plays both. Also, if I sleep for 2+ seconds it also plays both words.

Is this expected?

error in play() with engine system: [Errno -9996] Invalid output device (no default output device)

from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine

engine = SystemEngine() # replace with your TTS engine

import ipdb;ipdb.set_trace()

stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
print(f"type:{type(stream)}")
print(f"stream:{stream}")

stream.play_async()

stream.play()

error in play() with engine system: [Errno -9996] Invalid output device (no default output device)

How can such problems be solved and what causes them

Unclear output with the CoquiEngine when using short input feed

Hi,
For my usage I am feeding the engine the sentence word by word.
using the SystemEngine I got a somewhat coherent sentence (the words were clear but the sentence was too fast),
but when using the CoquiEngine the words became very unclear and I experienced pauses.
I tried to up the buffer_threshold_seconds=7 but with no apparent improvement.
any suggestions how can I improve the output?
when feeding the engine complete sentences I gotten pretty good result, I am also using voice cloning, but this phenomenon persist with the default voice too.
Thank you!

Use coqui engine play_async Invalid output device error

stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream)
stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)

thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.

KeyError: 'audio'

I continue to get this error and have yet to figure out why. Any ideas?

Exception in thread Thread-2 (play): Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/RealtimeTTS/text_to_stream.py", line 231, in play self.engine.synthesize(self.char_iter) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/RealtimeTTS/engines/elevenlabs_engine.py", line 121, in synthesize self.stream(self.audio_stream) File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/RealtimeTTS/engines/elevenlabs_engine.py", line 177, in stream for chunk in audio_stream: File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/elevenlabs/api/tts.py", line 134, in generate_stream_input if data["audio"]: ~~~~^^^^^^^^^ KeyError: 'audio'

is styleTTS supported

https://github.com/yl4579/StyleTTS2 seems very promising, was wondering if its supported by this library

Pyaudio error while installing RealtimeTTS module

Why is this come out?

Dependencies missing after "pip install RealtimeTTS" (Windows 11, VSCode, py_10.11)

Dependencies missing after "pip install RealtimeTTS"

(Windows 11, VSCode, py_10.11) almost fresh VSCode (only torch, numpy).
Resolved manually as described below:

Simple test code:

from RealtimeTTS import TextToAudioStream, SystemEngine
TextToAudioStream(SystemEngine()).feed(dummy_generator()).play()

Got an error:

(.venv) C:\dev_free\w1>python th_cuda.py
C:\dev_free\w1.venv\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
...
C:\dev_free\w1.venv\lib\site-packages\pydub\utils.py:198: RuntimeWarning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work
warn("Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work", RuntimeWarning)
WARNING:root:engine system failed to synthesize sentence "This is a sentence." with error: [WinError 2] The system cannot find the file specified
Traceback: Traceback (most recent call last):
File "C:\dev_free\w1.venv\lib\site-packages\RealtimeTTS\text_to_stream.py", line 279, in synthesize_worker
success = self.engine.synthesize(sentence)

--------------------------------------------

installed the ffmpeg/ffprobe

pip install ffmpeg
pip install ffprobe

did not help

google gave:

https://stackoverflow.com/questions/74651215/couldnt-find-ffmpeg-or-avconv-python

ffmpeg-downloader package:

pip install ffmpeg-downloader

did not help

ffdl install --add-path

did not help

#--> restart VSCode (as after all previous steps)

now it worked!

Crash on windows when use ConqEngine

When I have a test with RealtimeTTS with the code below:

from RealtimeTTS import TextToAudioStream, CoquiEngine

engine = CoquiEngine()  # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()

it crashes and give the error:

   engine = CoquiEngine()  # replace with your TTS engine
  File "D:\Python\xxxx\venv\lib\site-packages\RealtimeTTS\engines\base_engine.py", line 11, in __call__
    instance = super().__call__(*args, **kwargs)
  File "D:\Python\xxxx\venv\lib\site-packages\RealtimeTTS\engines\coqui_engine.py", line 96, in __init__
    self.synthesize_process.start()
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\xxxx\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

OS Error: No such file or directory

OSError: libespeak.so.1: cannot open shared object file: No such file or directory

trying to run:
from RealtimeTTS import TextToAudioStream, SystemEngine

def dummy_generator():
yield "This is a sentence. And here's another! Yet, "
yield "there's more. This ends now."

TextToAudioStream(SystemEngine()).feed(dummy_generator()).play()

ubuntu 22

Error initializing main faster_whisper transcription model: Error opening 'female.wav': System error.

Hi, I just tried using your API on a jupyter notebook with the CoquiTTS engine. It seems like it's expected a female.wav file to be present? Here's the error I'm getting:

ERROR:root:Error initializing main faster_whisper transcription model: Error opening 'female.wav': System error.
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/RealtimeTTS/engines/coqui_engine.py", line 135, in _synthesize_worker
    gpt_cond_latent, speaker_embedding = get_conditioning_latents(cloning_reference_wav)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/RealtimeTTS/engines/coqui_engine.py", line 95, in get_conditioning_latents
    gpt_cond_latent, speaker_embedding = tts.get_conditioning_latents(audio_path=filename)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 343, in get_conditioning_latents
    audio = load_audio(file_path, load_sr)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/TTS/tts/models/xtts.py", line 73, in load_audio
    audio, lsr = torchaudio.load(audiopath)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torchaudio/_backend/utils.py", line 203, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torchaudio/_backend/soundfile.py", line 26, in load
    return soundfile_backend.load(uri, frame_offset, num_frames, normalize, channels_first, format)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/torchaudio/_backend/soundfile_backend.py", line 221, in load
    with soundfile.SoundFile(filepath, "r") as file_:
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/soundfile.py", line 658, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/home/ubuntu/miniconda3/envs/whisper/lib/python3.9/site-packages/soundfile.py", line 1216, in _open
    raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening 'female.wav': System error.
Process Process-1:

KeyError: 'speaker_embedding' and TypeError: Log._log() got an unexpected keyword argument 'exc_info'

Hi, I encountered the following error, could someone help me?

Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from RealtimeTTS import TextToAudioStream, CoquiEngine
engine = CoquiEngine()
tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
/home/yipyewmun/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2
Using model: xtts

Process Process-1:
Traceback (most recent call last):
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 149, in _synthesize_worker
gpt_cond_latent, speaker_embedding = get_conditioning_latents(cloning_reference_wav)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 92, in get_conditioning_latents
speaker_embedding = (torch.tensor(latents["speaker_embedding"]).unsqueeze(0).unsqueeze(-1))
KeyError: 'speaker_embedding'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/site-packages/RealtimeTTS/engines/coqui_engine.py", line 152, in _synthesize_worker
logging.exception(f"Error initializing main faster_whisper transcription model: {e}")
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/logging/init.py", line 2113, in exception
error(msg, *args, exc_info=exc_info, **kwargs)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/logging/init.py", line 2105, in error
root.error(msg, *args, **kwargs)
File "/home/yipyewmun/miniforge3/envs/realtimeTTS/lib/python3.10/logging/init.py", line 1506, in error
self._log(ERROR, msg, args, **kwargs)
TypeError: Log._log() got an unexpected keyword argument 'exc_info'

KeyError: 'VoiceAge' in mac M1

(venv) ➜  RealtimeTTS git:(main) ✗ python3 simple_test.py
Traceback (most recent call last):
  File "/Users/lout/Documents/projects/explore_ai/RealtimeTTS/simple_test.py", line 12, in <module>
    engine = SystemEngine() # replace with your TTS engine
             ^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/RealtimeTTS/engines/base_engine.py", line 10, in __call__
    instance = super().__call__(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/RealtimeTTS/engines/system_engine.py", line 36, in __init__
    self.set_voice(voice)
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/RealtimeTTS/engines/system_engine.py", line 105, in set_voice
    installed_voices = self.engine.getProperty('voices')
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/engine.py", line 146, in getProperty
    return self.proxy.getProperty(name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/driver.py", line 173, in getProperty
    return self._driver.getProperty(name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/drivers/nsss.py", line 69, in getProperty
    return [self._toVoice(NSSpeechSynthesizer.attributesForVoice_(v))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/drivers/nsss.py", line 69, in <listcomp>
    return [self._toVoice(NSSpeechSynthesizer.attributesForVoice_(v))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/pyttsx3/drivers/nsss.py", line 64, in _toVoice
    attr['VoiceAge'])
    ~~~~^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/objc/_convenience_mapping.py", line 18, in __getitem__objectForKey_
    return container_unwrap(res, KeyError, key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lout/Documents/projects/explore_ai/venv/lib/python3.11/site-packages/objc/_convenience.py", line 134, in container_unwrap
    raise exc_type(*exc_args)
KeyError: 'VoiceAge'

Code:

from RealtimeTTS import TextToAudioStream, SystemEngine

def dummy_generator():
    yield "This is a sentence. And here's another! Yet, "
    yield "there's more. This ends now."

TextToAudioStream(SystemEngine()).feed(dummy_generator()).play()

Also i had to install the
pip3 install pyobjc==9.0.1
for this sample to work

import pyttsx3
engine = pyttsx3.init()
engine.say("I will speak this text")
engine.runAndWait()

Ui like in showcase video?

Hello, I'm super interested in your project and have been messing around with AI of various types for a few months, however, I am relatively new to python and coding in general.

This project stands out to me and I'm forcing myself to learn python and to be able to utilize it fully, due to myself being mute and unable to speak. I want to be able to use this to join in and talk to my friends over discord and have more of a presence.

short version, is it possible to share the simple UI (Or is it already there and I'm just dumb..) so that I may learn from it and expand on it?

Sorry, wasn't sure where else to ask this, Kindest regards.

wav file cloning seems to not work

Hello,

Since v0.3.0 the cloning on wav file seems to not work.
When it's a json file, it finds it and uses the voice accordingly but when when it's a wav file it fallback to the coqui_default_voice.

Thank you very much for your work.

[Feature request] Ability to specify the output device

Hi, I'm doing a project now, and I really need to be able to specify the output device (I want to output to a virtual microphone) the result of streaming.

It would be cool if it would be possible to specify the output device as it is done in RealtimeSTT with the input device.

Thanks for your work )

Does adding Google gTTS support make sense?

About a paper

Hello Author,
Really great Job with your 4 speech assistant related projects , to try bring the latency down as much as possible!

I was wondering have you see this paper , I think it claims the same , I still don't know which is faster , yours or theirs
https://arxiv.org/abs/2309.11210

Thanks in advance!

Getting an error from multiprocessing for Coqui

I installed coqui engine for my project by refering the github docs and i got the following error

File "", line 1, in
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 131, in _main
prepare(preparation_data)
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 246, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 291, in run_path
File "", line 98, in _run_module_code
File "", line 88, in _run_code
File "C:\Users\david\PycharmProjects\ai\e.py", line 3, in
engine = CoquiEngine() # replace with your TTS engine
^^^^^^^^^^^^^
File "C:\Users\david\AppData\Roaming\Python\Python311\site-packages\RealtimeTTS\engines\base_engine.py", line 11, in call
instance = super().call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\AppData\Roaming\Python\Python311\site-packages\RealtimeTTS\engines\coqui_engine.py", line 190, in init
self.create_worker_process()
File "C:\Users\david\AppData\Roaming\Python\Python311\site-packages\RealtimeTTS\engines\coqui_engine.py", line 248, in create_worker_process
self.synthesize_process.start()
File "C:\Users\david\anaconda3\Lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\context.py", line 336, in _Popen
return Popen(process_obj)
^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\popen_spawn_win32.py", line 45, in init
prep_data = spawn.get_preparation_data(process_obj._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 164, in get_preparation_data
_check_not_importing_main()
File "C:\Users\david\anaconda3\Lib\multiprocessing\spawn.py", line 140, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

    To fix this issue, refer to the "Safe importing of main module"
    section in https://docs.python.org/3/library/multiprocessing.html

Stream audio to frontend JS in browser?

Hi,

Is it possible to call this via frontend JS and then stream audio directly to browser for playback? If so, what should the approach be?

Thank You!

Can I use this in Flask Python webapp?

Hello, thanks for your work first.

Can I use this in Flask Python webapp?

I'm going to send request to Flask app from JS to get audio streaming.

Is this possible as well?

Hope to hear from you soon.

TypeError: issubclass() arg 1 must be a class

I have run this

from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine

engine = SystemEngine() # replace with your TTS engine
stream = TextToAudioStream(engine)
stream.feed("Hello world! How are you today?")
stream.play_async()

Error

Traceback (most recent call last):
File ".\tts-realtime.py", line 8, in
import RealtimeTTS
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS_init_.py", line 1, in
from .text_to_stream import TextToAudioStream
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS\text_to_stream.py", line 1, in
from .engines import BaseEngine
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS\engines_init_.py", line 4, in
from .elevenlabs_engine import ElevenlabsEngine
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\RealtimeTTS\engines\elevenlabs_engine.py", line 2, in
from elevenlabs import voices, generate, stream
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs_init_.py", line 3, in
from .types import (
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs\types_init_.py", line 4, in
from .add_project_response_model import AddProjectResponseModel
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs\types\add_project_response_model.py", line 7, in
from .project_response import ProjectResponse
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\site-packages\elevenlabs\types\project_response.py", line 15, in
class ProjectResponse(pydantic.BaseModel):
File "pydantic\main.py", line 205, in pydantic.main.ModelMetaclass.new
File "pydantic\fields.py", line 491, in pydantic.fields.ModelField.infer
File "pydantic\fields.py", line 421, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 537, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 636, in pydantic.fields.ModelField._type_analysis
File "pydantic\fields.py", line 781, in pydantic.fields.ModelField._create_sub_type
File "pydantic\fields.py", line 421, in pydantic.fields.ModelField.init
File "pydantic\fields.py", line 537, in pydantic.fields.ModelField.prepare
File "pydantic\fields.py", line 641, in pydantic.fields.ModelField._type_analysis
File "C:\Users\12345\AppData\Local\Programs\Python\Python38\lib\typing.py", line 774, in subclasscheck
return issubclass(cls, self.origin)
TypeError: issubclass() arg 1 must be a class

Install error on Windows - Microsoft Visual C++ 14.0 or greater is required.

Hello!
This seems like a very cool project! :) I'm trying to test it out for this little game thing I'm doing but I get an error message when trying to install on Windows 11. (from inside pycharm venv)

error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

Does this seem correct to you?
Would I really need to install MSVC for python RealtimeTTS ?

Thank you for making your work public!

Cheers!
Fred

failed install

install fails at pyaudio:
following are various errors from subsequent attempts to resolve issue.
ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects
Building wheel for PyAudio (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for PyAudio (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [22 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-12.6-x86_64-cpython-39
creating build/lib.macosx-12.6-x86_64-cpython-39/pyaudio
copying src/pyaudio/init.py -> build/lib.macosx-12.6-x86_64-cpython-39/pyaudio
running build_ext
building 'pyaudio._portaudio' extension
creating build/temp.macosx-12.6-x86_64-cpython-39
creating build/temp.macosx-12.6-x86_64-cpython-39/src
creating build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/device_api.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/device_api.o
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/host_api.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/host_api.o
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/init.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/init.o
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DOPENSSL_NO_SSL3 -DMACOS=1 -I/usr/local/include -I/usr/include -I/opt/homebrew/include -I/Users/karibu/myenv/include -I/Users/.pyenv/versions/3.9.18/include/python3.9 -c src/pyaudio/mac_core_stream_info.c -o build/temp.macosx-12.6-x86_64-cpython-39/src/pyaudio/mac_core_stream_info.o
In file included from src/pyaudio/mac_core_stream_info.c:3:
src/pyaudio/mac_core_stream_info.h:13:10: fatal error: 'pa_mac_core.h' file not found
#include "pa_mac_core.h"
^~~~~~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for PyAudio
Building wheel for TTS (pyproject.toml) ... done
Created wheel for TTS: filename=TTS-0.22.0-cp39-cp39-macosx_12_0_x86_64.whl size=903439 sha256=ea358468699ac39beab3575f19324aa69622c73a77c6e429933673579e3aee0d
Stored in directory: /Users/karibu/Library/Caches/pip/wheels/e9/94/e7/52e526c3ef9c07ac0b67a7dce87f81b6fb83ffd2d1754224e3
Successfully built TTS
Failed to build PyAudio
ERROR: Could not build wheels for PyAudio, which is required to install pyproject.toml-based projects

solved

wave.Error: file does not start with RIFF id

I test the tests/chinese_test.py but there is an error. Does anyone know how to solve it?

Traceback: Traceback (most recent call last):
File "/Users/zhujunming/Desktop/AIQQ/tts/RealtimeTTS/RealtimeTTS/text_to_stream.py", line 265, in synthesize_worker
success = self.engine.synthesize(sentence)
File "/Users/zhujunming/Desktop/AIQQ/tts/RealtimeTTS/RealtimeTTS/engines/system_engine.py", line 72, in synthesize
with wave.open(self.file_path, 'rb') as wf:
File "/usr/local/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 509, in open
return Wave_read(f)
File "/usr/local/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 163, in init
self.initfp(f)
File "/usr/local/Cellar/[email protected]/3.10.13_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/wave.py", line 130, in initfp
raise Error('file does not start with RIFF id')
wave.Error: file does not start with RIFF id

Pass Var to coqui_engine.py from stream.()

I am trying to pass a var i can use inside coqui_engine.py _synthesize_worker loop
But i am not finding the right way to do it.

So we have our

stream.feed(content)

which is
in coqui_engine.py _synthesize_worker
text = data['text']

But i want to be able to pass an id like
tts_id_file = '95986845'
stream.tts_id(tts_id_file)

then in coqui_engine.py _synthesize_worker
be able to get inside the loop.

I am using this instead of the output filesname since i am doing custom logic with the chunks i want to be able to name each chunk batch with the unique id.

I know its a bit unusual, i'v been trying for a few hours with no luck ;/

Yield output as numpy array

Hi! I’m interesting is there any way to get audio stream (chunks while generated) output as numpy array?

Small update for your README.md

I've just been doing a load of work with the Coquii TTS engine and I thought it wanted 24000Hz for sample files. Turns out as standard it wants 22050Hz. They both work, but if you look in the config.json file that comes downloaded with the models, it has a set preference for 22050Hz as the input file (and yes, mono 16 bit etc).

I was just taking an interest in your RealtimeTTS and thinking of pulling it into my project and spotted https://github.com/KoljaB/RealtimeTTS#coquiengine figured you may want to update it.

Thanks

Use coqui engine play_async Invalid output device error

stream = TextToAudioStream(engine, log_characters=True).feed(translation_stream)
stream.play_async(tokenizer="stanza",language="zh",on_audio_chunk=on_audio_chunk_callback,muted=True)

thanks to upgrade the RealtimeTTS v0.3.42,but when use the engine coqui play_async,in the linux Ubuntu Server environment,can not get the callback data.

How to just Generate the .wav chunks but not playback the audio automaticlty.

stream.play_async()

logic how can we just start creating the chunks but not initiate the playback of the audio files.

I already added logic in

for i, chunk in enumerate(chunks):

to save the chunks in a folder, but i cant find a simple way to stop the playback without breaking stuff.

stream.stop() doesn't work

Hi, first of all thanks for your project, it is very cool to be able to get the result almost in realtime

I use CoquiEngine
I am using FastAPI and I have a problem that when I want to stop a stream and then restart stream, maybe with a different text, I can't do it.

After I do stream.stop() the subsequent stream.play_async() doesn't work and I have to restart the server to get everything working again.

To make it clearer, I recorded a small video and attached a simple server code

stop_problem.mp4

Here is a simple server code that demonstrates the problem
server.py

import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from RealtimeTTS import TextToAudioStream, CoquiEngine

app = FastAPI()

class SynthesisRequest(BaseModel):
    text: str

# Stopping occurs correctly, but when we start a new stream with new text, 
# we can't do anything with it, the stream stops playing and doesn't work anymore, 
# only restarting helps to fix it
@app.get("/tts_stop")
async def tts_stop():
    stream.stop()

# It works fine, but after we do stream.stop() it crashes and doesn't work anymore.
@app.post("/tts_to_audio")
async def tts_to_audio(request: SynthesisRequest):
    stream.feed(request.text)
    stream.play_async()

    return {"message": "stream"}

if __name__ == "__main__":
    engine = CoquiEngine()
    stream = TextToAudioStream(engine)

    uvicorn.run(app,port=8010)

Can you please add MetaVoice 1B

Awesome library. Can you integrate Metavoice 1B
https://github.com/metavoiceio/metavoice-src

How can i send every call to stream.feed(content) straight to synthesize.

I made my own logic to build the sentences up.

How can i send each call of

stream.feed(content)

straight into synthesize.

I don't want stream to wait for the next batch or sentence and just go straight to synthesize each time its called.

Have

buffer_threshold_seconds = 0
fast_sentence_fragment = True

No module named 'RealtimeTTS'

I create a python virtual env and install the lib but I am having this error, any idea how to solve this?

from RealtimeTTS import TextToAudioStream, SystemEngine, AzureEngine, ElevenlabsEngine
ModuleNotFoundError: No module named 'RealtimeTTS'

I am using ubuntu lts 22.04

thank you so much

koljab / realtimetts Goto Github PK

realtimetts's People

Stargazers

Watchers

Forkers

realtimetts's Issues

import ipdb;ipdb.set_trace()

stream.play_async()

Dependencies missing after "pip install RealtimeTTS"

Simple test code:

Got an error:

--------------------------------------------

installed the ffmpeg/ffprobe

did not help

google gave:

ffmpeg-downloader package:

did not help

did not help

now it worked!

Error

Recommend Projects

Recommend Topics

Recommend Org