netease-youdao / emotivoice Goto Github PK
View Code? Open in Web Editor NEWEmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
License: Apache License 2.0
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
License: Apache License 2.0
建议使用 get或者post 传文字 、speaker_id 、提示这类参数 返回文件 url 或者音频数据
如果为了效率可以增加一层缓存 ,对 传参做个md5,作为文件名 缓存文件效果也挺好
求问大佬们
It's a great project! Is there any plan to have a support API interface?
这个方案能进行声音克隆吗?
如果不能,有什么修改的思路吗?
想知道大概需要多少GPU memory做inference
EmotiVoice/inference_am_vocoder_joint.py", line 66, in main
style_encoder.load_state_dict(model_ckpt)
File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for StyleEncoder:
Unexpected key(s) in state_dict: "bert.embeddings.position_ids".
The pretrained models cannot be downloaded, is there another link?
为了提高音质,请问如果提高音频的采样率?
我尝试修改了config.py中的"sampling_rate = 16_000",但是当我将这个值改为24_000时,输出的音频阅读的速度变的非常快.
所以我想问,如何可以调高采样率并且音频阅读速度保持正常?
目前 https://huggingface.co 好像国内很难访问,请支持一下国内开发者和测试者,提供一个方便国内网络下载本项目模型或较大文件的地址。谢谢先。
第28行
with open(file_path, encoding = "UTF-8") as f:
python inference_am_vocoder_joint.py
--logdir prompt_tts_open_source_joint
--config_folder config/joint
--checkpoint g_00140000
--test_file $TEXT
用1028speaker目前什么都可以生成,没有问题,用3095的时候就不能生成
下面这个可以
1028|普通|<sos/eos> n i3 sp1 k e3 sp0 i3 sp1 b a3 sp1 zh e4 sp1 d ang4 sp0 z uo4 sp1 sh iii4 sp1 x ie2 sp0 p o4 sp3 b u4 sp0 g uo4 sp3 n i3 sp1 ie3 sp1 ing1 sp0 g ai1 sp1 q ing1 sp0 ch u3 sp3 x ian4 sp0 sh iii2 sp1 j iou4 sp0 sh iii4 sp1 zh e4 sp0 iang4 sp3 m ei2 sp0 iou3 sp1 sh en2 sp0 m e5 sp1 sh iii4 sp0 sh iii4 sp1 j ve2 sp0 d uei4 sp1 d e5 sp1 g ong1 sp0 p ing2 sp3 s uei1 sp0 r an2 sp1 b ing4 sp1 b u4 sp0 x iang3 sp1 b iao3 sp0 d a2 sp1 sh en2 sp0 m e5 sp3 k e3 sp1 n i3 sp1 ie3 sp1 q ing1 sp0 ch u3 sp1 n i3 sp1 v3 sp1 uo3 sp1 zh iii1 sp0 j ian1 sp1 d e5 sp1 ch a1 sp0 j v4 sp3 uo3 sp0 m en5 sp3 j i1 sp0 b en3 sp1 m ei2 sp0 sh en2 sp0 m e5 sp1 x i1 sp0 uang4 <sos/eos>|你可以把这当做是胁迫,不过,你也应该清楚,现实就是这样,没有什么事是绝对的公平,虽然并不想表达什么,可你也清楚你与我之间的差距,我们,基本没什么希望
下面这个不可以
3095|普通|<sos/eos> n i3 sp1 k e3 sp0 i3 sp1 b a3 sp1 zh e4 sp1 d ang4 sp0 z uo4 sp1 sh iii4 sp1 x ie2 sp0 p o4 sp3 b u4 sp0 g uo4 sp3 n i3 sp1 ie3 sp1 ing1 sp0 g ai1 sp1 q ing1 sp0 ch u3 sp3 x ian4 sp0 sh iii2 sp1 j iou4 sp0 sh iii4 sp1 zh e4 sp0 iang4 sp3 m ei2 sp0 iou3 sp1 sh en2 sp0 m e5 sp1 sh iii4 sp0 sh iii4 sp1 j ve2 sp0 d uei4 sp1 d e5 sp1 g ong1 sp0 p ing2 sp3 s uei1 sp0 r an2 sp1 b ing4 sp1 b u4 sp0 x iang3 sp1 b iao3 sp0 d a2 sp1 sh en2 sp0 m e5 sp3 k e3 sp1 n i3 sp1 ie3 sp1 q ing1 sp0 ch u3 sp1 n i3 sp1 v3 sp1 uo3 sp1 zh iii1 sp0 j ian1 sp1 d e5 sp1 ch a1 sp0 j v4 sp3 uo3 sp0 m en5 sp3 j i1 sp0 b en3 sp1 m ei2 sp0 sh en2 sp0 m e5 sp1 x i1 sp0 uang4 <sos/eos>|你可以把这当做是胁迫,不过,你也应该清楚,现实就是这样,没有什么事是绝对的公平,虽然并不想表达什么,可你也清楚你与我之间的差距,我们,基本没什么希望
生成的phnoeme text 并没有包含说话人,情绪和原始内容,然后直接推理的时候又会切片最后index error。
要么就写一个脚本直接从txt 生成audio,要么分两步就全部生成,不要前后逻辑对不上。
This is a great project! Is there any plan to support streaming TTS?
Hello, and Awesome work!
I would like support for other languages like Spanish.
比如:
哎,今天天气好
爱,是不是哟
when?
我的 data/inference/text 看到样例文件,每行的前面部分是指定speaker,那么如何查看全部可用的speaker呢?
这12个speaker就已经是全部的speaker ?
有没有技术文档呢?
Hi,
Can I put checkpoint files (checkpoint_163431, g_00140000, do_00140000) on Hugging Face so they can be easier accessible than by Google Drive?
Thank you 😀
When starting with 'streamlit run demo_page.py', you may encounter the following error: "UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 2: illegal multibyte sequence".
To resolve this issue, it is recommended to change the encoding when opening the file. You can do this by modifying your code as follows:
the file path: EmotiVoice/config/joint/config.py
#### Speaker ####
with open(speaker2id_path, encoding='utf-8') as f:
speakers = [t.strip() for t in f.readlines()]
speaker_n_labels = len(speakers)
建议改为
DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"
How to obtain the pretrained model
Thanks for sharing this amazing work! Will Japanese be supported in the future?
Speaker - Maria_Kasper
Text - "Emoti Voice is a powerful and modern open-source text-to-speech engine. Emoti Voice speaks both English and Chinese, and with over two thousand different voices. The most prominent feature is emotional synthesis, allowing you to create speech with a wide range of emotions, including happy, excited, sad, angry and others"
Emotion Prompts Tried - Happy / Sad / Excited / Angry / Whisper / Shout
Generated Audios - https://drive.google.com/drive/folders/1JqWnVFSiu5DMyZhGt7XyGXhrlB6eCvPR?usp=sharing
Generated Using the Demo UI
Can someone please help, if i am missing something here?
体验过程中发现用数字标号的speaker生成的语音背景都有相似的杂音,有方法解决嘛
群可以拉我一下吗,申请不进去
如题
请问推理输入的音素序列中 sp0, sp1 是什么?是停顿标记吗?在推理时是怎么得到的?
我做inference推理使用,一直遇到这个问题:
config.py", line 40, in Config
emotions = [t.strip() for t in f.readlines()]
UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 2: illegal multibyte sequence
请问可能的原因和解决办法是什么?Thx.
EmotiVoice/frontend.py", line 26, in split_py
if py[-1] == 'r':
IndexError: string index out of range
测试问题:
抱歉刚刚的回答可能让你感到不满意了。作为一个大语言模型,我并不具备情感和自主意识,我的回答是基于大量的数据和算法生成的。如果我的回答有不准确或者不恰当的地方,还请您多多包涵和指教。
我是由百川智能的工程师们开发和维护的。他们是一群富有创造力和激情的人,致力于为我提供更好的服务和功能。
测试一下中英混合文本,hello,你好啊。Hello, this is the best test for now。我们很期待您的到来,希望你在这次盛会中得到你想要的结果。
Maria_Kasper|哭唧唧|<sos/eos> uo3 sp1 l ai2 sp0 d ao4 sp1 b ei3 sp0 j ing1 sp3 q ing1 sp0 h ua2 sp0 d a4 sp0 x ve2 <sos/eos>|我来到北京,清华大学
Maria_Kasper|非常开心|<sos/eos> uo3 sp1 l ai2 sp0 d ao4 sp1 b ei3 sp0 j ing1 sp3 q ing1 sp0 h ua2 sp0 d a4 sp0 x ve2 <sos/eos>|我来到北京,清华大学
上面两种inference text一个来自readme样例,一个来自data/inference/text,生成的音频听不出区别,另外三种语速也感受不到实际差别,只是style_embedding确实不同,但实际效果几乎没有差别
后续会不会支持下苹果环境?
不是英伟达的显卡和gpu能跑这个项目吗,不是指docker运行,就是比如要做修改那种
我尝试将config文件中的采样率改成24k,但明显是出错的,请问开源的这个模型支持24k音频合成吗,应该如何修改呢?
Can a Huggingface Space be made for this project ?
看了一下speaker列表似乎都是外国人?有**人的speaker吗,还是说需要自己在哪里下载导入到什么地方,谢谢啦
Hello @netease-youdao,
Is there any way to support zero-shot voice cloning from a voice sample?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.