myshell-ai / openvoice Goto Github PK
View Code? Open in Web Editor NEWInstant voice cloning by MyShell.
Home Page: https://research.myshell.ai/open-voice
License: MIT License
Instant voice cloning by MyShell.
Home Page: https://research.myshell.ai/open-voice
License: MIT License
rt,thx!any plan open train code?
Dear OpenVoice Contributors,
First and foremost, I would like to extend my sincerest commendations for the remarkable work you have accomplished with OpenVoice. The technology's ability to clone voice tones accurately and facilitate flexible voice style control is nothing short of revolutionary. Moreover, the zero-shot cross-lingual voice cloning feature is a testament to the innovative strides you are making in the field of speech synthesis.
Having perused your paper and explored the OpenVoice demos, I am thoroughly impressed by the system's capabilities. However, I would like to propose an enhancement that could potentially augment the versatility of OpenVoice, particularly in handling diverse linguistic contexts.
Issue: Expanding Linguistic Adaptability for Underrepresented Languages
While OpenVoice performs admirably with languages and accents present in the massive-speaker multi-lingual training dataset, there is an opportunity to extend its adaptability to underrepresented languages that are often not included in global datasets. These languages, which may have unique phonetic and prosodic characteristics, present a challenge for any voice cloning technology.
Proposed Enhancement:
Incorporating a Broader Range of Phonetic and Prosodic Features: By expanding the dataset to include a wider array of phonetic and prosodic features from underrepresented languages, OpenVoice could potentially improve its cloning accuracy for these languages.
Developing a Framework for Community-Driven Dataset Expansion: Establishing a platform where native speakers of underrepresented languages can contribute voice samples could enrich the training dataset and enhance the model's performance across a broader linguistic spectrum.
Integrating Adaptive Algorithms for Phonetic Variation: Implementing machine learning algorithms that can adapt to the phonetic variations of new languages could make OpenVoice more robust in handling the nuances of different linguistic contexts.
I believe these enhancements could not only refine the performance of OpenVoice but also contribute to the preservation and representation of linguistic diversity in the digital realm.
Thank you for considering my proposal. I eagerly await your thoughts on this matter and am keen to contribute further to this discussion.
Best regards,
yihong1120
the demo site is not accepting any new signups pleas fix this is disappointing
Hello,
I've been reading your paper and am very interested in your project. I noticed that the paper mentions the use of a MSML for training the model, but it doesn't specify the exact dataset used. I'm particularly interested in the emotion-style speech data that you've collected. This is a unique and valuable resource, and I'd love to learn more about it. Could you share with us access permission to your training dataset?
Thank you for your work on this project.
安装时报错:
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
after i install all the requirements and start up the openvoice_app.py it spits out an error saying "[Errno 2] No such file or directory: 'checkpoints/base_speakers/EN/config.json' "
Going into the web thing, tick agree, and press submit, and i get "ERROR] Get target tone color error {str(e)}"
Hi,
Thanks again for your nice work. Here is another questions:
(1) There are two stages for training convertor, how long should each stage train?
(2) How long (or iterations) could we get a model which performs like the one you provided?
I haved trained for nearly 30w iters, but the result is too bad.
thx again and looking forward to your reply ^_^
From your README, you state:
This is an open-source implementation that approximates the performance of the internal voice clone technology of myshell.ai.
The "non-commercial" clause makes this project not open source, in the common usage of the term "open source".
"Open Source" has a generally accepted meaning of being able to use the digital artifacts for commercial purposes. The OSI and Wikipedia's entry on open-source licensing both articulate that commercial re-use is a (generally accepted) requirement of an "open source" license.
If the intent of the project is to create an open source software voice cloning software, could you change the licensing terms, both in the README.md, the headers of the source files and data licensing terms to reflect this intent? For example dropping the non-commercial clause would make this project open source.
If the intent is to keep the non-commercial clause of the license, indications of the project being "open source" should be removed as it isn't open source and can cause confusion for people wishing to use your code and data. A commonly accepted term is "source available" rather than "open source" to indicate that you've made the source available to view but not use commercially.
installed locally on Manjaro linux with nvidia drivers
no problems during installation
I get this error in the info box every time
[ERROR] Get target tone color error cuFFT error: CUFFT_INTERNAL_ERROR
Hi myshell team,
I'm VB, I lead the developer advocacy efforts for Audio at Hugging Face. Congratulations on releasing such a brilliant checkpoint.
It'd also be nice to upload the model weights to the hub. This would also increase the visibility of the model checkpoint, hence leading to adoption.
The process to do is quite simple:
Of course, I am happy to help/ guide you if you have any questions.
Cheers,
VB
I've tried the demo on hugging face. The voice is similar but the naturalness is still not very good. Is it because of the TTS model? If replace it by a better TTS model, can we expect a better result?
Please research and, if possible, add the following:
A good test while working on this would be to combine a reference from an American speaker, and a reference from a British speaker, and see if OpenVoice can blend them to create a convincing "mid-Atlantic" hybrid.
This would be an absolutely killer feature and I've seen no other software able to do it. Many thanks for considering this suggestion!
Hello,
Your model and paper look great.
I am deeply impressed for the ability to mimic the tone voice of your model. I tested on English and the results were so good.
However, when I tested ToneColorConverter on Vietnamese, the results were not as good as that of English. It even mistakes femle's tone voice for male's tone voice. An explaination for this result could be that ToneColorConverter was not trained on Vietnamese datasets so it may not capture special features on Vietnamese.
Could you please suggest some measures to solve this problem?
And I also look forward to your new plan for training the base speaker model and tone color converter model on the custom datasets.
Thank you.
Thank you very much for your contributions. This is a very useful and clear open-source project.
I noticed that it performed surprisingly well on English tasks. However, there seem to be some potential areas for improvement in the Chinese task, such as the lack of a style model that can adjust emotions.
Do you plan to release the relevant training process in the future, such as data and code? I want to try to fine-tune the Chinese base model and train the style-model. Thanks a lot.
Will this support Linux at some point? If yes, when?
How to make adjustments to other languages such as Japanese, such as emotions, accents, rhythms, pauses, and introductions?
Can you make instruction for windows users? Some used dependencies uses multiple different python version.
WARNING: A conda environment already exists at 'c:\Users\vovap\miniconda3\envs\openvoice'
Remove existing environment (y/[n])? y
Channels:
- defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: c:\Users\vovap\miniconda3\envs\openvoice
added / updated specs:
- python=3.9
The following NEW packages will be INSTALLED:
ca-certificates pkgs/main/win-64::ca-certificates-2023.12.12-haa95532_0
openssl pkgs/main/win-64::openssl-3.0.12-h2bbff1b_0
pip pkgs/main/win-64::pip-23.3.1-py39haa95532_0
python pkgs/main/win-64::python-3.9.18-h1aa4202_0
setuptools pkgs/main/win-64::setuptools-68.2.2-py39haa95532_0
sqlite pkgs/main/win-64::sqlite-3.41.2-h2bbff1b_0
tzdata pkgs/main/noarch::tzdata-2023d-h04d1e81_0
vc pkgs/main/win-64::vc-14.2-h21ff451_1
vs2015_runtime pkgs/main/win-64::vs2015_runtime-14.27.29016-h5e58377_2
wheel pkgs/main/win-64::wheel-0.41.2-py39haa95532_0
Proceed ([y]/n)? y
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
# $ conda activate openvoice
#
# To deactivate an active environment, use
#
# $ conda deactivate
E:\AI\OpenVoice\OpenVoice>conda activate openvoice
CondaError: Run 'conda init' before 'conda activate'
E:\AI\OpenVoice\OpenVoice>conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
Channels:
- pytorch
- nvidia
- defaults
Platform: win-64
Collecting package metadata (repodata.json): done
Solving environment: \ warning libmamba Added empty dependency for problem type SOLVER_RULE_UPDATE
failed
LibMambaUnsatisfiableError: Encountered problems while solving:
- package torchvision-0.14.1-py310_cpu requires python >=3.10,<3.11.0a0, but none of the providers can be installed
Could not solve for environment specs
The following packages are incompatible
├─ pin-1 is installable and it requires
│ └─ python 3.11.* , which can be installed;
└─ torchvision 0.14.1 is not installable because there are no viable options
├─ torchvision 0.14.1 would require
│ └─ python >=3.10,<3.11.0a0 , which conflicts with any installable versions previously reported;
├─ torchvision 0.14.1 would require
│ └─ python >=3.7,<3.8.0a0 , which conflicts with any installable versions previously reported;
├─ torchvision 0.14.1 would require
│ └─ python >=3.8,<3.9.0a0 , which conflicts with any installable versions previously reported;
└─ torchvision 0.14.1 would require
└─ python >=3.9,<3.10.0a0 , which conflicts with any installable versions previously reported.
Pins seem to be involved in the conflict. Currently pinned specs:
- python 3.11.* (labeled as 'pin-1')
E:\AI\OpenVoice\OpenVoice>pip install -r requirements.txt
Collecting librosa==0.9.1 (from -r requirements.txt (line 1))
Downloading librosa-0.9.1-py3-none-any.whl (213 kB)
---------------------------------------- 213.1/213.1 kB 541.0 kB/s eta 0:00:00
Collecting faster-whisper==0.9.0 (from -r requirements.txt (line 2))
Downloading faster_whisper-0.9.0-py3-none-any.whl.metadata (11 kB)
Collecting pydub==0.25.1 (from -r requirements.txt (line 3))
Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Collecting wavmark==0.0.2 (from -r requirements.txt (line 4))
Downloading wavmark-0.0.2-py3-none-any.whl.metadata (5.0 kB)
Collecting numpy==1.22.0 (from -r requirements.txt (line 5))
Downloading numpy-1.22.0.zip (11.3 MB)
---------------------------------------- 11.3/11.3 MB 16.8 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting eng_to_ipa==0.0.2 (from -r requirements.txt (line 6))
Downloading eng_to_ipa-0.0.2.tar.gz (2.8 MB)
---------------------------------------- 2.8/2.8 MB 174.9 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting inflect==7.0.0 (from -r requirements.txt (line 7))
Downloading inflect-7.0.0-py3-none-any.whl.metadata (21 kB)
Collecting unidecode==1.3.7 (from -r requirements.txt (line 8))
Downloading Unidecode-1.3.7-py3-none-any.whl.metadata (13 kB)
Collecting whisper-timestamped==1.14.2 (from -r requirements.txt (line 9))
Downloading whisper_timestamped-1.14.2-py3-none-any.whl.metadata (1.2 kB)
Collecting openai (from -r requirements.txt (line 10))
Downloading openai-1.6.1-py3-none-any.whl.metadata (17 kB)
Collecting python-dotenv (from -r requirements.txt (line 11))
Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting pypinyin==0.50.0 (from -r requirements.txt (line 12))
Downloading pypinyin-0.50.0-py2.py3-none-any.whl.metadata (12 kB)
Collecting cn2an==0.5.22 (from -r requirements.txt (line 13))
Downloading cn2an-0.5.22-py3-none-any.whl.metadata (10 kB)
Collecting jieba==0.42.1 (from -r requirements.txt (line 14))
Downloading jieba-0.42.1.tar.gz (19.2 MB)
---------------------------------------- 19.2/19.2 MB 5.2 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting gradio==3.48.0 (from -r requirements.txt (line 15))
Downloading gradio-3.48.0-py3-none-any.whl.metadata (17 kB)
Collecting langid==1.1.6 (from -r requirements.txt (line 16))
Downloading langid-1.1.6.tar.gz (1.9 MB)
---------------------------------------- 1.9/1.9 MB 127.7 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting audioread>=2.1.5 (from librosa==0.9.1->-r requirements.txt (line 1))
Downloading audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Collecting scipy>=1.2.0 (from librosa==0.9.1->-r requirements.txt (line 1))
Downloading scipy-1.11.4-cp311-cp311-win_amd64.whl.metadata (60 kB)
---------------------------------------- 60.4/60.4 kB 3.3 MB/s eta 0:00:00
Collecting scikit-learn>=0.19.1 (from librosa==0.9.1->-r requirements.txt (line 1))
Downloading scikit_learn-1.3.2-cp311-cp311-win_amd64.whl.metadata (11 kB)
Collecting joblib>=0.14 (from librosa==0.9.1->-r requirements.txt (line 1))
Downloading joblib-1.3.2-py3-none-any.whl.metadata (5.4 kB)
Collecting decorator>=4.0.10 (from librosa==0.9.1->-r requirements.txt (line 1))
Downloading decorator-5.1.1-py3-none-any.whl (9.1 kB)
Collecting resampy>=0.2.2 (from librosa==0.9.1->-r requirements.txt (line 1))
Downloading resampy-0.4.2-py3-none-any.whl (3.1 MB)
---------------------------------------- 3.1/3.1 MB 98.9 MB/s eta 0:00:00
Collecting numba>=0.45.1 (from librosa==0.9.1->-r requirements.txt (line 1))
Downloading numba-0.58.1-cp311-cp311-win_amd64.whl.metadata (2.8 kB)
Collecting soundfile>=0.10.2 (from librosa==0.9.1->-r requirements.txt (line 1))
Downloading soundfile-0.12.1-py2.py3-none-win_amd64.whl (1.0 MB)
---------------------------------------- 1.0/1.0 MB 62.4 MB/s eta 0:00:00
Collecting pooch>=1.0 (from librosa==0.9.1->-r requirements.txt (line 1))
Downloading pooch-1.8.0-py3-none-any.whl.metadata (9.9 kB)
Requirement already satisfied: packaging>=20.0 in c:\users\vovap\miniconda3\lib\site-packages (from librosa==0.9.1->-r requirements.txt (line 1)) (23.1)
Collecting av==10.* (from faster-whisper==0.9.0->-r requirements.txt (line 2))
Downloading av-10.0.0-cp311-cp311-win_amd64.whl (25.3 MB)
---------------------------------------- 25.3/25.3 MB 12.1 MB/s eta 0:00:00
Collecting ctranslate2<4,>=3.17 (from faster-whisper==0.9.0->-r requirements.txt (line 2))
Downloading ctranslate2-3.23.0-cp311-cp311-win_amd64.whl.metadata (10 kB)
Collecting huggingface-hub>=0.13 (from faster-whisper==0.9.0->-r requirements.txt (line 2))
Downloading huggingface_hub-0.20.2-py3-none-any.whl.metadata (12 kB)
Collecting tokenizers<0.15,>=0.13 (from faster-whisper==0.9.0->-r requirements.txt (line 2))
Downloading tokenizers-0.14.1-cp311-none-win_amd64.whl.metadata (6.8 kB)
Collecting onnxruntime<2,>=1.14 (from faster-whisper==0.9.0->-r requirements.txt (line 2))
Downloading onnxruntime-1.16.3-cp311-cp311-win_amd64.whl.metadata (4.5 kB)
INFO: pip is looking at multiple versions of wavmark to determine which version is compatible with other requirements. This could take a while.
ERROR: Ignored the following versions that require a different python version: 0.52.0
Requires-Python >=3.6,<3.9; 0.52.0rc3
Requires-Python >=3.6,<3.9; 0.53.0
Requires-Python >=3.6,<3.10; 0.53.0rc1.post1
Requires-Python >=3.6,<3.10; 0.53.0rc2
Requires-Python >=3.6,<3.10; 0.53.0rc3
Requires-Python >=3.6,<3.10; 0.53.1
Requires-Python >=3.6,<3.10; 0.54.0
Requires-Python >=3.7,<3.10; 0.54.0rc2
Requires-Python >=3.7,<3.10; 0.54.0rc3
Requires-Python >=3.7,<3.10; 0.54.1
Requires-Python >=3.7,<3.10; 0.55.0
Requires-Python >=3.7,<3.11; 0.55.0rc1
Requires-Python >=3.7,<3.11; 0.55.1
Requires-Python >=3.7,<3.11; 0.55.2
Requires-Python >=3.7,<3.11; 1.21.2
Requires-Python >=3.7,<3.11; 1.21.3
Requires-Python >=3.7,<3.11; 1.21.4
Requires-Python >=3.7,<3.11; 1.21.5
Requires-Python >=3.7,<3.11; 1.21.6
Requires-Python >=3.7,<3.11; 1.6.2
Requires-Python >=3.7,<3.10; 1.6.3
Requires-Python >=3.7,<3.10; 1.7.0
Requires-Python >=3.7,<3.10; 1.7.1
Requires-Python >=3.7,<3.10; 1.7.2
Requires-Python >=3.7,<3.11; 1.7.3
Requires-Python >=3.7,<3.11; 1.8.0
Requires-Python >=3.8,<3.11; 1.8.0rc1
Requires-Python >=3.8,<3.11; 1.8.0rc2
Requires-Python >=3.8,<3.11; 1.8.0rc3
Requires-Python >=3.8,<3.11; 1.8.0rc4
Requires-Python >=3.8,<3.11; 1.8.1
Requires-Python >=3.8,<3.11
ERROR: Could not find a version that satisfies the requirement torch<2.0 (from
wavmark) (from versions: 2.0.0, 2.0.1, 2.1.0, 2.1.1, 2.1.2)
ERROR: No matching distribution found for torch<2.0```
I am getting this error every time in info box
but no problem during the installation
[ERROR] Get target tone color error cuFFT error: CUFFT_INTERNAL_ERROR
你好,demo里面中文的音调感觉不太正常,早是三声,但是读不能读三声,是否有计划吧中文语调支持更完善一下?
Use case:
Meeting with foreigners
Hi, Thanks for this great repository. It is amazing work.
My problem is when I initialize OpenVoice's BaseSpeakerTTS, It uses ~3 GiB memory and ~1 GiB video ram. I think that, it consumes too much resources. Do you have any idea to optimize it ?
Really awesome looking paper and samples was anxious to try but seems the repo's empty or is there a different repo we should be tracking?
Installed and getting this error on execution of generation. It looks like a silero version needs to be added to the code.
[ERROR] Get target tone color error Problem when installing silero with version None. Check versions here: https://github.com/snakers4/silero-vad/wiki/Version-history-and-Available-Models
how to collect reddit
Are there any plans to support Ukrainian and Russian languages? great product would like to try it
我完整运行了你们的项目,为你们取得的成果表示由衷的祝贺。
使用默认的TTS转换对于中文的支持不是太好,会有一些声调的问题,部分文字发音听起来像广西老表的口音一样😂。我使用真人的录音替换TTS,这会有很好的表现,即使是男声转女声也会有很不错的效果。
如果后续能够支持实时语音转换,那么本项目的想象空间就会大很多。用于国内的泛娱乐主播市场,或者是游戏内语音交流场景,我相信会有很大的关注度,哪怕是会有秒级的延迟也可以接受。
以上,祝本项目能够真正商业落地。
Translated using chatGPT:
I have successfully executed your project in its entirety and extend my heartfelt congratulations on the achievements you've made.
The default TTS conversion does not support Chinese very well; there are some tonal issues, and some words sound like they have a Guangxi accent 😂. I have replaced the TTS with real human recordings, which perform much better. Even when converting from male to female voices, the results are quite impressive.
If there could be support for real-time voice conversion in the future, the potential for this project would significantly expand. I believe there would be considerable attention in the broader entertainment broadcasting market in China or in-game voice communication scenarios. Even if there were delays of a few seconds, it would still be acceptable.
In conclusion, I hope this project can genuinely be implemented commercially.
Hi,
Saw the paper go out and was wondering if your going to release the code as well.
Thanks!
The latest version of requirements doesn't contain pypinyin, and it is not a dependency for other packages, so
pip install -r requirements.txt
does not insall it. As a result demo_part1.ipynb gives and error:
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 4
2 import torch
3 import se_extractor
----> 4 from api import BaseSpeakerTTS, ToneColorConverter
File OpenVoice/api.py:9
7 import os
8 import librosa
----> 9 from text import text_to_sequence
10 from mel_processing import spectrogram_torch
11 from models import SynthesizerTrn
File OpenVoice/text/__init__.py:2
1 """ from https://github.com/keithito/tacotron """
----> 2 from text import cleaners
3 from text.symbols import symbols
6 # Mappings from symbol to numeric ID and vice versa:
File OpenVoice/text/cleaners.py:3
1 import re
2 from text.english import english_to_lazy_ipa, english_to_ipa2, english_to_lazy_ipa2
----> 3 from text.mandarin import number_to_chinese, chinese_to_bopomofo, latin_to_bopomofo, chinese_to_romaji, chinese_to_lazy_ipa, chinese_to_ipa, chinese_to_ipa2
5 def cjke_cleaners2(text):
6 text = re.sub(r'\[ZH\](.*?)\[ZH\]',
7 lambda x: chinese_to_ipa(x.group(1))+' ', text)
OpenVoice/text/mandarin.py:4
2 import sys
3 import re
----> 4 from pypinyin import lazy_pinyin, BOPOMOFO
5 import jieba
6 import cn2an
ModuleNotFoundError: No module named 'pypinyin'
I have not checked any other missing requirements.
非常感谢您开源这么棒的语音音色克隆项目。
我在本地配置并运行起来了该项目,我注意到:
1、中文效果不是很好
2、貌似不能指定每个字的音调或读音
3、中文与英文混合的句子下,英文单次会被拆成字符去朗读(这个应该是tts的问题)
4、我看到语音克隆的流程是:提取模板音频音色特征融合到另一个语音中(由文字转语音得到)
因此我使用了其他的tts去得到音频,然后使用音色进行融合,这样效果会好很多,也比较自然(音色有点相似的情况下)
我看到您在计划另一个中文场景的项目,您有计划时间吗,如果开放了我将会再次尝试!
Not sure what's happening here - I managed to spin this up in the local gradio app, recorded my own voice, but inference gave me an american-sounding output - I'm British - is that expected?
Thanks!
OK
after run python -m openvoice_app --share, i got this:
Loaded checkpoint 'checkpoints/base_speakers/EN/checkpoint.pth'
missing/unexpected keys: [] []
Loaded checkpoint 'checkpoints/base_speakers/ZH/checkpoint.pth'
missing/unexpected keys: [] []
Loaded checkpoint 'checkpoints/converter/checkpoint.pth'
missing/unexpected keys: [] []
F:\OpenVoice\installer_files\env\lib\site-packages\gradio\components\dropdown.py:90: UserWarning: Themax_choices
parameter is ignored whenmultiselect
is False.
warnings.warn(
Traceback (most recent call last):
File "F:\OpenVoice\installer_files\env\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "F:\OpenVoice\installer_files\env\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "F:\OpenVoice\openvoice_app.py", line 267, in
ref_gr = gr.Audio(
File "F:\OpenVoice\installer_files\env\lib\site-packages\gradio\component_meta.py", line 157, in wrapper
return fn(self, **kwargs)
TypeError: init() got an unexpected keyword argument 'info'
Does any compare results to other state-of-art methods?
it seems text/cleaner's funciton chinese_to_ipa
not declare?
Could not load library libcudnn_cnn_infer.so.8. Error: libcuda.so: cannot open shared object file: No such file or directory
For anyone who gets the error that says something about "color" and "tone" and not being able to find a file (I didn't save the exact error message), I looked it up and seen a barely related post about FFmeg and decided it was worth a shot. I choco'd it into the openvoice folder and it shockingly fixed the issue.
Hi @Zengyi-Qin
The paper looks great. Unfortunately the pre-training model can only work with English, although the examples contain other languages as well, which is misleading.
I tried adding a new language by modifying the code (adding tags and a converter to phonemes) and even managed to synthesize audio, but unfortunately it only looks a bit like promt.
Are you planning to open access (add an example) to train a custom model so that the community can add their own languages and train the model on their own dataset?
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
Channels:
LibMambaUnsatisfiableError: Encountered problems while solving:
Could not solve for environment specs
The following package could not be installed
└─ pytorch-cuda 11.7** is not installable because it requires
└─ cuda 11.7.* , which does not exist (perhaps a missing channel).
What is this file after launching the gradio demo ???
\venv\Lib\site-packages\gradio\frpc_windows_amd64_v0.2
is PUA:Win32/FRProxy
[Detected by Microsoft Defender Antivirus]
(https://www.microsoft.com/en-us/windows/windows-defender?ocid=cx-wdsi-ency)
??? Please explain !!!
Hi,
Thanks for the great idea of your works! Here is a question.
When training the convertor, the paper shows that lots of audio all collected. So for a sample like ( text_1, audio_1 ), what's the input audio of the convertor(encoder)? audio_1 or the audio_x generated by the base tts using the text_1?
* if audio_1, it seems that input equals output?
* if audio_x, would the convertor too strongly related to the base TTS (the generated voice) ?
looking for your reply. thanks again.
The demo on Spaces is awesome! It would also be great to have the Gradio demo available locally. This could help the community easily clone and test the model on their local hardware.
demo_part1.ipynb
reference_speaker = 'resources/example_reference.mp3'
target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, target_dir='processed', vad=True)
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
File ~/miniconda3/envs/openvoice/lib/python3.9/site-packages/whisper_timestamped/transcribe.py:1885, in get_vad_segments(audio, output_sample, min_speech_duration, min_silence_duration, dilatation, method)
1884 try:
-> 1885 _silero_vad_model, utils = torch.hub.load(repo_or_dir=repo_or_dir, model="silero_vad", onnx=onnx, source=source)
1886 except ImportError as err:
File ~/miniconda3/envs/openvoice/lib/python3.9/site-packages/torch/hub.py:539, in load(repo_or_dir, model, source, trust_repo, force_reload, verbose, skip_validation, *args, **kwargs)
538 if source == 'github':
--> 539 repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, trust_repo, "load",
540 verbose=verbose, skip_validation=skip_validation)
542 model = _load_local(repo_or_dir, model, *args, **kwargs)
File ~/miniconda3/envs/openvoice/lib/python3.9/site-packages/torch/hub.py:203, in _get_cache_or_reload(github, force_reload, trust_repo, calling_fn, verbose, skip_validation)
202 if not skip_validation:
--> 203 _validate_not_a_forked_repo(repo_owner, repo_name, ref)
205 cached_file = os.path.join(hub_dir, normalized_br + '.zip')
File ~/miniconda3/envs/openvoice/lib/python3.9/site-packages/torch/hub.py:162, in _validate_not_a_forked_repo(repo_owner, repo_name, ref)
161 url = f'{url_prefix}?per_page=100&page={page}'
--> 162 response = json.loads(_read_url(Request(url, headers=headers)))
163 # Empty response means no more data to process
File ~/miniconda3/envs/openvoice/lib/python3.9/site-packages/torch/hub.py:145, in _read_url(url)
144 def _read_url(url):
...
-> 1889 raise RuntimeError(f"Problem when installing silero with version {version}. Check versions here: https://github.com/snakers4/silero-vad/wiki/Version-history-and-Available-Models") from err
1890 finally:
1891 if need_folder_hack:
RuntimeError: Problem when installing silero with version None. Check versions here: https://github.com/snakers4/silero-vad/wiki/Version-history-and-Available-Models
Hi, I have some questions as belows:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.