The resemble.ai system has markup

[Feature request] prosody rate, style emotions, expressiveness, aggressiveness, pace, etc. about tts HOT 4 CLOSED

coqui-ai commented on May 21, 2024

[Feature request] prosody rate, style emotions, expressiveness, aggressiveness, pace, etc.

from tts.

Comments (4)

AndrewBarfield commented on May 21, 2024 1

For numerics and acronyms, we can simply preprocess the string before synthesizing using search and replace or regex.

This is no show stopper.

from tts.

AndrewBarfield commented on May 21, 2024

I've been thinking about the same. Especially speech rate.

I've also come across some text that isn't read correctly, like number ranges (i.e., 400-750) and acronyms (i.e., MPH). This could be interpreted correctly via mark-up configuration.

from tts.

erogol commented on May 21, 2024

This level of detail is not possible with coqui TTS yet due to the limits of the open datasets.

Depending on which model you use, it might struggle with the acronyms and numbers too.

These are limitations due to the use of a publicly available dataset. Most commercial systems use specially created TTS datasets.

from tts.

erogol commented on May 21, 2024

That's true. Some of the models we release use Phonemes and a text front-end to do the work. You might like to try them.

The only model that only use characters is tts_models/en/ljspeech/tacotron2-DDC the rest is more robust to such variations.

Hopefully we'll update this mode soon to use a more advance front end.

from tts.

[Feature request] prosody rate, style emotions, expressiveness, aggressiveness, pace, etc. about tts HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent