corentinj / real-time-voice-cloning Goto Github PK
View Code? Open in Web Editor NEWClone a voice in 5 seconds to generate arbitrary speech in real-time
License: Other
Clone a voice in 5 seconds to generate arbitrary speech in real-time
License: Other
Excuse my inexperience, but I tried to use your software since it looked very appealing to me.
I use an AMD GPU, so I followed the steps on installing torch without it. However, I keep getting this error message.
It seems my only failure is that it does not playback.
Is it even possible to use it without CUDA? What can I do to circumvent this error?
Link to a Youtube video in the README wont work.
When you click on picture in README file Youtube opens but it says that "Video is not available".
Hi,
I am new to pytorch and audio related manipulations using python. I have installed and configured along with sample train-clean-100 dataset. Now I am able to record 5-7 samples of my own voice but the for some reason they synthesize is not happening in my own voice. it seems to be doing in a default male voice. Can you please guide me in right direction?
Thanks for the help.
I have meet some problem about visdom,what is your visdom version?
Hello, thank you for your work, I have read your thesis briefly, I liked the style of writing, and also, code quality is good, its easy for researchers and hobbyists to play with it, I have experimented with a few voices, and I have observed the following behaviors
sometimes, there is a space, empty sound in spectrogram and final audio, like pauses which are introduced, what could be the reason for it ? Is it because of the vocoder quality ?
The audio volume changes with text, for example, for some texts, the final audio has lower volume, but for some text, it has significantly higher volume, how can I fix this ? Is this also related to vocoder ( wavenet ) ?
Also, since you mention in your thesis, that each system ( of 3 systems ) can be trained independently. with independent datasets, If say, I want to finetune on a specific dataset, or even, a single person, who is native english speaker, so should I only retrained the vocoder system, or tacotron also needs to be retrained ( or finetuned ) ?
Hi,
I have come across the following error when using the toolbox in low memory mode.
On this computer, my GPU only has 2GB so I need to use this mode.
I have tested this on another computer that has a GPU with 4GB RAM. The toolbox works perfectly in normal mode but when I turn on low_mem, I run into the same error.
I'm not sure what other information you would need to look into this so please let me know what else I can provide to help out.
Hello, and thank you for the great work! One of the limitations that I have noticed is that the synthesizer starts to have long gaps in speech if the input text length is short. @CorentinJ do you have any ideas why this is or how I could fix it? I'll also probably ask on Rayhane's repo if I can reproduce the issue on his synthesizer.
Am I correct in assuming that the issue is caused by the stop prediction in Taco2 not having a high enough activation, which results in long spectrograms?
Every time I try a filepath in demo_cli.py I get the same error:
Caught exception: TypeError("argument of type 'WindowsPath' is not iterable")
I don't know what am I doing wrong.
How should be the filepath formatted?
PS -> demo_toolbox.py is 100% working.
Traceback (most recent call last):
File "C:\Users\The Atomizer\Desktop\text\voice\voice\toolbox\__init__.py", line 52, in <lambda>
self.ui.browser_load_button.clicked.connect(lambda: self.load_from_browser())
File "C:\Users\The Atomizer\Desktop\text\voice\voice\toolbox\__init__.py", line 110, in load_from_browser
wav = Synthesizer.load_preprocess_wav(fpath)
File "C:\Users\The Atomizer\Desktop\text\voice\voice\synthesizer\inference.py", line 111, in load_preprocess_wav
wav = librosa.load(fpath, hparams.sample_rate)[0]
File "C:\Users\The Atomizer\Miniconda3\envs\voice\lib\site-packages\librosa\core\audio.py", line 119, in load
with audioread.audio_open(os.path.realpath(path)) as input_file:
File "C:\Users\The Atomizer\Miniconda3\envs\voice\lib\site-packages\audioread\__init__.py", line 116, in audio_open
raise NoBackendError()
audioread.NoBackendError
When clicking Load in the GUI
Hi under win 10 64 bits trying using python 3.6 it failed to import the print_args wiht the fact that he can't find the argutils.
think i have a relative import error but can't solve it
btw nice job on what i heard on the youtube demo
if i mnaully try to import the utils from the root dir seems he load another utils files
I have tried to generate pre-trained data, running next command (following your instructions).
python encoder_preprocess.py <datasets_root>
python encoder_train.py my_run <datasets_root>
I've got following errors. I also tried to use some audio from dataset, but I got an exception (see picture) , I understand here I don't have drivers for nVidia.
C:\Users\admin\Downloads\Real-Time-Voice-Cloning>python encoder_preprocess.py "C
:\Users\admin\Downloads\Real-Time-Voice-Cloning\dataset_root"
Traceback (most recent call last):
File "encoder_preprocess.py", line 1, in <module>
from encoder.preprocess import preprocess_librispeech, preprocess_voxceleb1,
preprocess_voxceleb2
File "C:\Users\admin\Downloads\Real-Time-Voice-Cloning\encoder\preprocess.py",
line 1, in <module>
from multiprocess.pool import ThreadPool
ModuleNotFoundError: No module named 'multiprocess'
C:\Users\admin\Downloads\Real-Time-Voice-Cloning>python encoder_train.py my_run
"C:\Users\admin\Downloads\Real-Time-Voice-Cloning\dataset_root"
Arguments:
run_id: my_run
clean_data_root: C:\Users\admin\Downloads\Real-Time-Voice-Cloning\dataset_
root
models_dir: encoder\saved_models
vis_every: 10
umap_every: 100
save_every: 500
backup_every: 7500
force_restart: False
visdom_server: http://localhost
no_visdom: False
No model "my_run" found, starting training from scratch.
Updating the visualizations every 10 steps.
WARNING:root:Setting up a new session...
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\util\connection.py", line 80, in create_connection
raise err
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\util\connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [WinError 10061] No connection could be made because the
target machine actively refused it
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connectionpool.py", line 603, in urlopen
chunked=chunked)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connectionpool.py", line 355, in _make_request
conn.request(method, url, **httplib_request_kw)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 1229, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 1275, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 1224, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 1016, in _send_output
self.send(msg)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 956, in send
self.connect()
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connection.py", line 183, in connect
conn = self._new_conn()
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connection.py", line 169, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object
at 0x000000000A7232B0>: Failed to establish a new connection: [WinError 10061]
No connection could be made because the target machine actively refused it
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\adapters.py", line 449, in send
timeout=timeout
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connectionpool.py", line 641, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\util\retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8097
): Max retries exceeded with url: /env/my_run%20(21-06%2001h12) (Caused by NewCo
nnectionError('<urllib3.connection.HTTPConnection object at 0x000000000A7232B0>:
Failed to establish a new connection: [WinError 10061] No connection could be m
ade because the target machine actively refused it'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
visdom\__init__.py", line 548, in _send
data=json.dumps(msg),
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\sessions.py", line 581, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8
097): Max retries exceeded with url: /env/my_run%20(21-06%2001h12) (Caused by Ne
wConnectionError('<urllib3.connection.HTTPConnection object at 0x000000000A7232B
0>: Failed to establish a new connection: [WinError 10061] No connection could b
e made because the target machine actively refused it'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\admin\Downloads\Real-Time-Voice-Cloning\encoder\visualizations.
py", line 51, in __init__
self.vis = visdom.Visdom(server, env=self.env_name, raise_exceptions=True)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
visdom\__init__.py", line 406, in __init__
}, endpoint='env/' + env)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
visdom\__init__.py", line 562, in _send
raise ConnectionError("Error connecting to Visdom server")
ConnectionError: Error connecting to Visdom server
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "encoder_train.py", line 46, in <module>
train(**vars(args))
File "C:\Users\admin\Downloads\Real-Time-Voice-Cloning\encoder\train.py", line
60, in train
vis = Visualizations(run_id, vis_every, server=visdom_server, disabled=no_vi
sdom)
File "C:\Users\admin\Downloads\Real-Time-Voice-Cloning\encoder\visualizations.
py", line 53, in __init__
raise Exception("No visdom server detected. Run the command \"visdom\" in yo
ur CLI to "
Exception: No visdom server detected. Run the command "visdom" in your CLI to st
art it.
How can I do pre-trained data file for Romanian languages or how I should to structure content of the folders and files to generate pre-trained data for Romanian language.
I have sucessfully run the program with your checkpoint in Griffin-LIM.My dataset is LibriSpeech.But when i used pretrained wavernn vocoder,i meet a error.
Here is the code
Building Wave-RNN
Trainable Parameters: 4.481M
Loading model weights at vocoder\saved_models\pretrained\pretrained.pt
Traceback (most recent call last):
File "D:\Real-Time-Voice-Cloning-master\toolbox\__init__.py", line 249, in init_vocoder
vocoder.load_model(model_fpath)
File "D:\Real-Time-Voice-Cloning-master\vocoder\inference.py", line 31, in load_model
_model.load_state_dict(checkpoint['model_state'])
File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 721, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for WaveRNN:
Unexpected key(s) in state_dict: "upsample.resnet.batch_norm.num_batches_tracked", "upsample.resnet.layers.0.batch_norm1.num_batches_tracked", "upsample.resnet.layers.0.batch_n
orm2.num_batches_tracked", "upsample.resnet.layers.1.batch_norm1.num_batches_tracked", "upsample.resnet.layers.1.batch_norm2.num_batches_tracked", "upsample.resnet.layers.2.batch_norm1
.num_batches_tracked", "upsample.resnet.layers.2.batch_norm2.num_batches_tracked", "upsample.resnet.layers.3.batch_norm1.num_batches_tracked", "upsample.resnet.layers.3.batch_norm2.num
_batches_tracked", "upsample.resnet.layers.4.batch_norm1.num_batches_tracked", "upsample.resnet.layers.4.batch_norm2.num_batches_tracked", "upsample.resnet.layers.5.batch_norm1.num_bat
ches_tracked", "upsample.resnet.layers.5.batch_norm2.num_batches_tracked", "upsample.resnet.layers.6.batch_norm1.num_batches_tracked", "upsample.resnet.layers.6.batch_norm2.num_batches
_tracked", "upsample.resnet.layers.7.batch_norm1.num_batches_tracked", "upsample.resnet.layers.7.batch_norm2.num_batches_tracked", "upsample.resnet.layers.8.batch_norm1.num_batches_tra
cked", "upsample.resnet.layers.8.batch_norm2.num_batches_tracked", "upsample.resnet.layers.9.batch_norm1.num_batches_tracked", "upsample.resnet.layers.9.batch_norm2.num_batches_tracked
".
I used the network install for CUDA before trying this and have pyenv local set to 3.7, anyone know why I'm getting this error?
Thanks!
OSX Version: 10.12.6 (16G2016)
Here's the stack trace:
Real-Time-Voice-Cloning git:(master) ✗ pip install -r requirements.txt --verbose
Created temporary directory: /private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-ephem-wheel-cache-3rpdn6k2
Created temporary directory: /private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-req-tracker-hwlp7vz4
Created requirements tracker '/private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-req-tracker-hwlp7vz4'
Created temporary directory: /private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-install-il8b8a11
Collecting tensorflow-gpu<=1.14.0,>=1.10.0 (from -r requirements.txt (line 1))
1 location(s) to search for versions of tensorflow-gpu:
* https://pypi.org/simple/tensorflow-gpu/
Getting page https://pypi.org/simple/tensorflow-gpu/
Looking up "https://pypi.org/simple/tensorflow-gpu/" in the cache
Request header has "max_age" as 0, cache bypassed
Starting new HTTPS connection (1): pypi.org:443
https://pypi.org:443 "GET /simple/tensorflow-gpu/ HTTP/1.1" 304 0
Analyzing links from page https://pypi.org/simple/tensorflow-gpu/
Skipping link https://files.pythonhosted.org/packages/e0/fe/9fb7fff32441dff89e00b359dade48f5f071127d604a7259fb6ee1f43e4f/tensorflow_gpu-0.12.0rc0-cp27-cp27mu-manylinux1_x86_64.whl#sha256=c97f916fc7edf0867149d8faa31cbee75bf4925bdfc0e0eef924cda1dd5c853b (from https://pypi.org/simple/tensorflow-gpu/); it is not compatible with this Python
....
Skipping link https://files.pythonhosted.org/packages/ad/be/9a5ab6b9757113b841695a203883aa1a7a3ac514258038c885aa31d443be/tensorflow_gpu-2.0.0b1-cp37-cp37m-win_amd64.whl#sha256=0c8ced99e74f10c66604fa61e40cb8cfbc073e89eb95c061626d3535958142b5 (from https://pypi.org/simple/tensorflow-gpu/); it is not compatible with this Python
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu<=1.14.0,>=1.10.0 (from -r requirements.txt (line 1)) (from versions: none)
Cleaning up...
Removed build tracker '/private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-req-tracker-hwlp7vz4'
ERROR: No matching distribution found for tensorflow-gpu<=1.14.0,>=1.10.0 (from -r requirements.txt (line 1))
Exception information:
Traceback (most recent call last):
File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 178, in main
status = self.run(options, args)
File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 352, in run
resolver.resolve(requirement_set)
File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/resolve.py", line 131, in resolve
self._resolve_one(requirement_set, req)
File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/resolve.py", line 294, in _resolve_one
abstract_dist = self._get_abstract_dist_for(req_to_install)
File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/resolve.py", line 242, in _get_abstract_dist_for
self.require_hashes
File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 282, in prepare_linked_requirement
req.populate_link(finder, upgrade_allowed, require_hashes)
File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 198, in populate_link
self.link = finder.find_requirement(self, upgrade)
File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/index.py", line 792, in find_requirement
'No matching distribution found for %s' % req
pip._internal.exceptions.DistributionNotFound: No matching distribution found for tensorflow-gpu<=1.14.0,>=1.10.0 (from -r requirements.txt (line 1))
Generated audio have noise artifacts between words like in attachment
demo.zip
What can be a reason?
hi, It is cool.
when cpu-only could be supported?
Thank you.
Just a question, but will there ever be a way to clone a voice, except instead of typing it out, you can just have a real time (or almost real time) voice conversion? As in I could say something after "training" it to a different voice into the mic, and it would output audio in near real time. I don't know how else to explain it, I hope you know what I mean. XP
Hi:
I am trying to run your code on a centos server with X11 forwarding open. But when I try python demo_toolbox dataset
, it prints
Arguments:
datasets_root: dataset
enc_models_dir: encoder/saved_models
syn_models_dir: synthesizer/saved_models
voc_models_dir: vocoder/saved_models
Aborted
I believe I installed all required packages. Looks like the error is not caused by python but some low level call. So is there any way to print more error message? Or is there any way to run without GUI ? (I think although I open X11 forward on this server but it still might not fit as good as a pure GUI machine).
Thanks!
At present, the training speed is very slow. May I ask you to read all the data directly into the memory and then take it directly. How can I modify it?
Pretrained models
Download the latest here.
I need " https://pan.baidu.com/ "
cant get the pretrain model
plz uplode the pan.baidu.com version
When synthesizing the text to speech on the audio of your choice (where you recommend using three audio files of the same speaker), do you average the speaker embeddings (from the three audio files) and input that into the trained model? If you don't average the speaker embeddings, what do you do? If its too much work to explain, you can point to me which line of code deals with this. I can figure the rest out.
Here is the problems I got with python 3.5
1.
File "/data/Real-Time-Voice-Cloning/vocoder/models/fatchord_version.py", line 247
msg = f'| {pbar} {i*b_size}/{seq_len*b_size} | Batch Size: {b_size} | Gen Rate: {gen_rate:.1f}kHz | '
File "/data/Real-Time-Voice-Cloning/vocoder/models/fatchord_version.py", line 411
parameters = sum([np.prod(p.size()) for p in parameters]) / 1_000_000
"/data/Real-Time-Voice-Cloning/vocoder/display.py", line 50
temp_head = f'| {headings[i]} '
And seems there are lots more.
Why is loss calculated faster on the cpu, and what parameters are you depending on?thanks
Hello, very nice repo, especially implementation of D-vector speaker verification architecture.
Quick question about the vocoder model training. Does the WaveRNN vocoder take d-vector speaker embedding, or is it just trained on all speakers in the training dataset without explicit conditioning? Curious about any vocoder experiments you've run in this regard.
I had to manually install them. Pip doesn't seem to want to install those two.
Hi,
If I wanted to integrate a new synthesizer, (https://github.com/syang1993/gst-tacotron/), what would be the steps I would need to take?
I've tried to figure out why, but couldn't fix it:
The process completely hangs (forever) in this line:
Pressing Ctrl+C causes the UI to become responsive again but loading obviously fails.
Running the same command (import librosa
and librosa.load("datasets/....", 16000)
in a shell works fine)
I thought it might be caused by something like this:
But setting this at the start of the process doesn't fix it:
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['JOBLIB_START_METHOD']='forkserver'
Attaching a debugger to that line works as expected, but as soon as you step into that line it's impossible to pause the process again. Running the same librosa.load command in the debug REPL also causes it to hang.
I was able to get this repo working pretty well and even tested out my own voice with a few utterances and using the libra 100 data set. However I would like to get better results and wish to attempt to use the other data sets. I noticed that the librispeech data sets all extracted as you say in the Librispeech folde as expected ; however when i downloaded the VoxCeleb datasets the files in the zip files consist of folders in each zip file and then it has files named ***_dev_wav_partaa etc.
My question is in VoxCeleb1 do I extract and dump everything under the VoxCeleb1 folder? Where do I put the txt files?
You have done a spectacular job with this and you deserve a medal. I am just a bit confused as to how to extract these VoxCeleb datasets.
One more question. Once I get all the data sets installed. What is the step by step process to creating a model of my own voice? Do i create a bunch of utterances in toolbox and insert a bunch of utterances from datasets and then click the sythesise and vocode? What is the best method and approach. I am sorry for asking so many questions that perhaps you think are obvious but I really am just now learning about how to use this type of stuff and I am eager to know more. This is more of hobby for me
No torch version in requirements.txt
Hi,
I downloaded the dataset LibriSpeech/train-clean-100 and I have it in the project root folder.
But when I try to execute the demo_toolbox.py it tells me that 'you do not have any of the recognized datasets in LibriSpeech'
Maybe I'm passing the path wrong?
python demo-toolbox.py -d LibriSpeech
Do I need to do something different?
The error message is as follows, I hope to get your reply.thanks
No model "my_run" found, starting training from scratch.
Updating the visualizations every 10 steps.
......Traceback (most recent call last):
File "encoder_train.py", line 47, in
train(**vars(args))
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/train.py", line 71, in train
for step, speaker_batch in enumerate(loader, init_step):
File "/home/zyq/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 568, in next
return self._process_next_batch(batch)
File "/home/zyq/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
Exception: Traceback (most recent call last):
File "/home/zyq/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker_verification_dataset.py", line 56, in collate
return SpeakerBatch(speakers, self.utterances_per_speaker, partials_n_frames)
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker_batch.py", line 8, in init
self.partials = {s: s.random_partial(utterances_per_speaker, n_frames) for s in speakers}
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker_batch.py", line 8, in
self.partials = {s: s.random_partial(utterances_per_speaker, n_frames) for s in speakers}
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker.py", line 35, in random_partial
self._load_utterances()
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker.py", line 19, in _load_utterances
self.utterance_cycler = RandomCycler(self.utterances)
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/random_cycler.py", line 15, in init
raise Exception("Can't create RandomCycler from an empty collection")
Exception: Can't create RandomCycler from an empty collection
Is this repo uses vanilla version of WaveRNN (https://github.com/fatchord/WaveRNN)? or architecture was modified and model was retrained?
It's quite fast relative to my benchmark of tacotron2 + wavernn from here https://github.com/erogol/WaveRNN but (maybe) have less natural voice.
BTW I don't tried new universal vocoder
version.
The encoder and synthesiser network training has a sequential relationship, and the vocoder network is not dependent on these two networks, can be trained at any time?
I try to run python demo_toolbox.py <dataset_root> in give an error (see bellow), I also try to run in cmd this C:\Python37\Scripts>pip install unicode, but it doesn't help. Can anybody help me?
C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master> python demo_toolbox.py "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\datasets_root"
Traceback (most recent call last):
File "demo_toolbox.py", line 2, in
from toolbox import Toolbox
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\toolbox_init_.py", line 3, in
from synthesizer import inference as synthesizer
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\synthesizer\inference.py", line 2, in
from synthesizer.synthesizer import Synthesizer
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\synthesizer\synthesizer.py", line 1, in
from synthesizer.utils.text import text_to_sequence
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\synthesizer\utils\text.py", line 2, in
from . import cleaners
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\synthesizer\utils\cleaners.py", line 14, in
from unidecode import unidecode
ModuleNotFoundError: No module named 'unidecode'
Hi guys, I tried to install this tool, following next steps:
but I've got the following error: (see below). Could you help me with some suggestion. I don't know python well, but I will very gratefull if your suggestion help me.
Thank you in advance.
C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\Real-Time-Voice-Cloning-ma
ster>pip install -r requirements.txt
Requirement already satisfied: tensorflow-gpu<=1.14.0,>=1.10.0 in c:\users\admin
\appdata\local\programs\python\python37\lib\site-packages (from -r requirements.
txt (line 1)) (1.13.1)
Requirement already satisfied: umap-learn in c:\users\admin\appdata\local\progra
ms\python\python37\lib\site-packages (from -r requirements.txt (line 2)) (0.3.9)
Requirement already satisfied: visdom in c:\users\admin\appdata\local\programs\p
ython\python37\lib\site-packages (from -r requirements.txt (line 3)) (0.1.8.8)
Collecting webrtcvad (from -r requirements.txt (line 4))
Using cached https://files.pythonhosted.org/packages/89/34/e2de2d97f3288512b9e
a56f92e7452f8207eb5a0096500badf9dfd48f5e6/webrtcvad-2.0.10.tar.gz
Collecting librosa>=0.5.1 (from -r requirements.txt (line 5))
Collecting matplotlib>=2.0.2 (from -r requirements.txt (line 6))
Using cached https://files.pythonhosted.org/packages/3b/52/17dbb82ca36937dd4d0
027fe1945c3c78bdb465b4736903d0904b7f595ad/matplotlib-3.1.0-cp37-cp37m-win_amd64.
whl
Requirement already satisfied: numpy>=1.14.0 in c:\users\admin\appdata\local\pro
grams\python\python37\lib\site-packages (from -r requirements.txt (line 7)) (1.1
6.4)
Requirement already satisfied: scipy>=1.0.0 in c:\users\admin\appdata\local\prog
rams\python\python37\lib\site-packages (from -r requirements.txt (line 8)) (1.3.
0)
Collecting tqdm (from -r requirements.txt (line 9))
Using cached https://files.pythonhosted.org/packages/45/af/685bf3ce889ea191f3b
916557f5677cc95a5e87b2fa120d74b5dd6d049d0/tqdm-4.32.1-py2.py3-none-any.whl
Collecting sounddevice (from -r requirements.txt (line 10))
Using cached https://files.pythonhosted.org/packages/7f/15/fd6d923adccc64d2d93
fcffc245bb2471a2509bb2905a89c4fc772ce4e35/sounddevice-0.3.13-py2.py3.cp26.cp27.c
p32.cp33.cp34.cp35.cp36.cp37.cp38.pp27.pp32.pp33.pp34.pp35.pp36-none-win_amd64.w
hl
Requirement already satisfied: protobuf>=3.6.1 in c:\users\admin\appdata\local\p
rograms\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0-
-r requirements.txt (line 1)) (3.8.0)
Requirement already satisfied: wheel>=0.26 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-r
requirements.txt (line 1)) (0.33.4)
Requirement already satisfied: tensorboard<1.14.0,>=1.13.0 in c:\users\admin\app
data\local\programs\python\python37\lib\site-packages (from tensorflow-gpu<=1.14
.0,>=1.10.0->-r requirements.txt (line 1)) (1.13.1)
Requirement already satisfied: six>=1.10.0 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-r
requirements.txt (line 1)) (1.12.0)
Requirement already satisfied: astor>=0.6.0 in c:\users\admin\appdata\local\prog
rams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-r
requirements.txt (line 1)) (0.8.0)
Requirement already satisfied: termcolor>=1.1.0 in c:\users\admin\appdata\local
programs\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0
->-r requirements.txt (line 1)) (1.1.0)
Requirement already satisfied: keras-preprocessing>=1.0.5 in c:\users\admin\appd
ata\local\programs\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.
0,>=1.10.0->-r requirements.txt (line 1)) (1.1.0)
Requirement already satisfied: grpcio>=1.8.6 in c:\users\admin\appdata\local\pro
grams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-
r requirements.txt (line 1)) (1.21.1)
Requirement already satisfied: keras-applications>=1.0.6 in c:\users\admin\appda
ta\local\programs\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0
,>=1.10.0->-r requirements.txt (line 1)) (1.0.8)
Requirement already satisfied: tensorflow-estimator<1.14.0rc0,>=1.13.0 in c:\use
rs\admin\appdata\local\programs\python\python37\lib\site-packages (from tensorfl
ow-gpu<=1.14.0,>=1.10.0->-r requirements.txt (line 1)) (1.13.0)
Requirement already satisfied: absl-py>=0.1.6 in c:\users\admin\appdata\local\pr
ograms\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->
-r requirements.txt (line 1)) (0.7.1)
Requirement already satisfied: gast>=0.2.0 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-r
requirements.txt (line 1)) (0.2.2)
Requirement already satisfied: scikit-learn>=0.16 in c:\users\admin\appdata\loca
l\programs\python\python37\lib\site-packages (from umap-learn->-r requirements.t
xt (line 2)) (0.21.2)
Requirement already satisfied: numba>=0.37 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from umap-learn->-r requirements.txt (lin
e 2)) (0.44.0)
Requirement already satisfied: tornado in c:\users\admin\appdata\local\programs
python\python37\lib\site-packages (from visdom->-r requirements.txt (line 3)) (6
.0.2)
Requirement already satisfied: requests in c:\users\admin\appdata\local\programs
\python\python37\lib\site-packages (from visdom->-r requirements.txt (line 3)) (
2.22.0)
Requirement already satisfied: websocket-client in c:\users\admin\appdata\local
programs\python\python37\lib\site-packages (from visdom->-r requirements.txt (li
ne 3)) (0.56.0)
Requirement already satisfied: torchfile in c:\users\admin\appdata\local\program
s\python\python37\lib\site-packages (from visdom->-r requirements.txt (line 3))
(0.1.0)
Requirement already satisfied: pillow in c:\users\admin\appdata\local\programs\p
ython\python37\lib\site-packages (from visdom->-r requirements.txt (line 3)) (6.
0.0)
Requirement already satisfied: pyzmq in c:\users\admin\appdata\local\programs\py
thon\python37\lib\site-packages (from visdom->-r requirements.txt (line 3)) (18.
0.1)
Collecting decorator>=3.0.0 (from librosa>=0.5.1->-r requirements.txt (line 5))
Using cached https://files.pythonhosted.org/packages/5f/88/0075e461560a1e750a0
dcbf77f1d9de775028c37a19a346a6c565a257399/decorator-4.4.0-py2.py3-none-any.whl
Collecting resampy>=0.2.0 (from librosa>=0.5.1->-r requirements.txt (line 5))
Requirement already satisfied: joblib>=0.12 in c:\users\admin\appdata\local\prog
rams\python\python37\lib\site-packages (from librosa>=0.5.1->-r requirements.txt
(line 5)) (0.13.2)
Collecting audioread>=2.0.0 (from librosa>=0.5.1->-r requirements.txt (line 5))
Collecting python-dateutil>=2.1 (from matplotlib>=2.0.2->-r requirements.txt (li
ne 6))
Using cached https://files.pythonhosted.org/packages/41/17/c62faccbfbd163c7f57
f3844689e3a78bae1f403648a6afb1d0866d87fbb/python_dateutil-2.8.0-py2.py3-none-any
.whl
Collecting cycler>=0.10 (from matplotlib>=2.0.2->-r requirements.txt (line 6))
Using cached https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af69644
0ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib>=2.0.2->-r
requirements.txt (line 6))
Using cached https://files.pythonhosted.org/packages/dd/d9/3ec19e966301a6e2576
9976999bd7bbe552016f0d32b577dc9d63d2e0c49/pyparsing-2.4.0-py2.py3-none-any.whl
Collecting kiwisolver>=1.0.1 (from matplotlib>=2.0.2->-r requirements.txt (line
6))
Using cached https://files.pythonhosted.org/packages/c6/ea/e5474014a13ab2dcb50
56608e0716c600c3d8a8bcffb10ed55ccd6a42eb0/kiwisolver-1.1.0-cp37-none-win_amd64.w
hl
Collecting CFFI>=1.0 (from sounddevice->-r requirements.txt (line 10))
Using cached https://files.pythonhosted.org/packages/2f/ad/9722b7752fdd88c858b
e57b47f41d1049b5fb0ab79caf0ab11407945c1a7/cffi-1.12.3-cp37-cp37m-win_amd64.whl
Requirement already satisfied: setuptools in c:\users\admin\appdata\local\progra
ms\python\python37\lib\site-packages (from protobuf>=3.6.1->tensorflow-gpu<=1.14
.0,>=1.10.0->-r requirements.txt (line 1)) (40.8.0)
Requirement already satisfied: werkzeug>=0.11.15 in c:\users\admin\appdata\local
\programs\python\python37\lib\site-packages (from tensorboard<1.14.0,>=1.13.0->t
ensorflow-gpu<=1.14.0,>=1.10.0->-r requirements.txt (line 1)) (0.15.4)
Requirement already satisfied: markdown>=2.6.8 in c:\users\admin\appdata\local\p
rograms\python\python37\lib\site-packages (from tensorboard<1.14.0,>=1.13.0->ten
sorflow-gpu<=1.14.0,>=1.10.0->-r requirements.txt (line 1)) (3.1.1)
Requirement already satisfied: h5py in c:\users\admin\appdata\local\programs\pyt
hon\python37\lib\site-packages (from keras-applications>=1.0.6->tensorflow-gpu<=
1.14.0,>=1.10.0->-r requirements.txt (line 1)) (2.9.0)
Requirement already satisfied: mock>=2.0.0 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from tensorflow-estimator<1.14.0rc0,>=1.1
3.0->tensorflow-gpu<=1.14.0,>=1.10.0->-r requirements.txt (line 1)) (3.0.5)
Requirement already satisfied: llvmlite>=0.29.0 in c:\users\admin\appdata\local
programs\python\python37\lib\site-packages (from numba>=0.37->umap-learn->-r req
uirements.txt (line 2)) (0.29.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\use
rs\admin\appdata\local\programs\python\python37\lib\site-packages (from requests
->visdom->-r requirements.txt (line 3)) (1.25.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\admin\appdata\loca
l\programs\python\python37\lib\site-packages (from requests->visdom->-r requirem
ents.txt (line 3)) (2019.3.9)
Requirement already satisfied: idna<2.9,>=2.5 in c:\users\admin\appdata\local\pr
ograms\python\python37\lib\site-packages (from requests->visdom->-r requirements
.txt (line 3)) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\admin\appdata\l
ocal\programs\python\python37\lib\site-packages (from requests->visdom->-r requi
rements.txt (line 3)) (3.0.4)
Collecting pycparser (from CFFI>=1.0->sounddevice->-r requirements.txt (line 10)
)
Building wheels for collected packages: webrtcvad
Building wheel for webrtcvad (setup.py) ... error
ERROR: Complete output from command 'c:\users\admin\appdata\local\programs\pyt
hon\python37\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Us
ers\admin\AppData\Local\Temp\pip-install-iqkryxja\webrtcvad\setup.py'"'"'
;f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'
\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))
' bdist_wheel -d 'C:\Users\admin\AppData\Local\Temp\pip-wheel-7ewjy9xx' --python
-tag cp37:
ERROR: running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
copying webrtcvad.py -> build\lib.win-amd64-3.7
running build_ext
building '_webrtcvad' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/
ERROR: Failed building wheel for webrtcvad
Running setup.py clean for webrtcvad
Failed to build webrtcvad
Installing collected packages: webrtcvad, decorator, resampy, audioread, librosa
, python-dateutil, cycler, pyparsing, kiwisolver, matplotlib, tqdm, pycparser, C
FFI, sounddevice
Running setup.py install for webrtcvad ... error
ERROR: Complete output from command 'c:\users\admin\appdata\local\programs\p
ython\python37\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\
Users\admin\AppData\Local\Temp\pip-install-iqkryxja\webrtcvad\setup.py'"'
"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'
"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'
))' install --record 'C:\Users\admin\AppData\Local\Temp\pip-record-t598pe1f\inst
all-record.txt' --single-version-externally-managed --compile:
ERROR: running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
copying webrtcvad.py -> build\lib.win-amd64-3.7
running build_ext
building '_webrtcvad' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual
C++ Build Tools": https://visualstudio.microsoft.com/downloads/
----------------------------------------
ERROR: Command "'c:\users\admin\appdata\local\programs\python\python37\python.ex
e' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\admin\AppData\L
ocal\Temp\pip-install-iqkryxja\webrtcvad\setup.py'"'"';f=getattr(tokenize, '
"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"
');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:
\Users\admin\AppData\Local\Temp\pip-record-t598pe1f\install-record.txt' --single
-version-externally-managed --compile" failed with error code 1 in C:\Users\admi
n\AppData\Local\Temp\pip-install-iqkryxja\webrtcvad\
Chinese (Mandarin): #811
German: #571*
Swedish: #257*
* Requires Tensorflow 1.x (harder to set up).
Arabic: #871
Czech: #655
English: #388 (UK accent), #429 (Indian accent)
French: #854
Hindi: #525
Italian: #697
Polish: #815
Portuguese: #531
Russian: #707
Spanish: #789
Turkish: #761
Ukrainian: #492
Thanks for your greate job.I have tried this project in my own computer(win10, 1060ti 3gb), and I think the similarity of voice is good.Do you have any ideas of how to improve the qulity and similarity of voice ?Does para wavenet is a good way? thang you。
Traceback (most recent call last):
File "D:\sdx\DeepFake\voice\Real-Time-Voice-Cloning\toolbox_init_.py", line 70, in
func = lambda: self.load_from_browser(self.ui.browse_file())
File "D:\sdx\DeepFake\voice\Real-Time-Voice-Cloning\toolbox_init_.py", line 110, in load_from_browser
wav = Synthesizer.load_preprocess_wav(fpath)
File "D:\sdx\DeepFake\voice\Real-Time-Voice-Cloning\synthesizer\inference.py", line 111, in load_preprocess_wav
wav = librosa.load(fpath, hparams.sample_rate)[0]
File "C:\ProgramData\Anaconda3\lib\site-packages\librosa\core\audio.py", line 119, in load
with audioread.audio_open(os.path.realpath(path)) as input_file:
File "C:\ProgramData\Anaconda3\lib\site-packages\audioread_init_.py", line 116, in audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError
I have installed ffmpeg, but still get this error.
I'm getting some kind of DLL load failed, which is probably unrelated to your project but the CLI check passed and is happens when I run toolbox.
PS C:\Users\gantm\Desktop\Code\ml\Real-Time-Voice-Cloning> python .\demo_toolbox.py
C:\Python36\lib\site-packages\numpy\core\__init__.py:29: UserWarning: loaded more than 1 DLL from .libs:
C:\Python36\lib\site-packages\numpy\.libs\libopenblas.BNVRK7633HSX7YVO2TADGR4A5KEKXJAW.gfortran-win_amd64.dll
C:\Python36\lib\site-packages\numpy\.libs\libopenblas.IPBC74C7KURV7CB2PKT5Z5FNR3SIBV4J.gfortran-win_amd64.dll
stacklevel=1)
Traceback (most recent call last):
File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\Python36\lib\imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "C:\Python36\lib\imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified procedure could not be found.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".\demo_toolbox.py", line 2, in <module>
from toolbox import Toolbox
File "C:\Users\gantm\Desktop\Code\ml\Real-Time-Voice-Cloning\toolbox\__init__.py", line 3, in <module>
from synthesizer import inference as synthesizer
File "C:\Users\gantm\Desktop\Code\ml\Real-Time-Voice-Cloning\synthesizer\inference.py", line 1, in <module>
from synthesizer.hparams import hparams
File "C:\Users\gantm\Desktop\Code\ml\Real-Time-Voice-Cloning\synthesizer\hparams.py", line 1, in <module>
from tensorflow.contrib.training import HParams
File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\__init__.py", line 28, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "C:\Python36\lib\imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "C:\Python36\lib\imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: DLL load failed: The specified procedure could not be found.
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
Sorry, this should really be just a pull request but I just don't have the time to make a proper implementation for you.
I've been dragging myself through the papers and your thesis. (I'm just a self taught hobbyist trying to squeeze by). Thanks so much for sharing your implementations and models.
I just needed this quickly and saw it was on your todo list :)
Took a bit of googling to kinda find out a way. So I thought I'd leave this here for you and others. (The errors it throws if you don't do this are a bit obtuse and misleading)
Add these and it'll save... Though there's something up a bit. Every now and then there's a clicking. And in general the audio sounds a tad bit different then the playback.
Real-Time-Voice-Cloning/toolbox/ui.py
Line 143 in 2ef15c6
import soundfile as sf #pip3 install soundfile
def float2pcm(self, sig, dtype='int32'):
i = np.iinfo(dtype)
abs_max = 2 ** (i.bits - 1)
offset = i.min + abs_max
return (sig * abs_max + offset).clip(i.min, i.max).astype(dtype)
def save(self, wav, sample_rate) :
data = self.float2pcm(wav)
sf.write('temp.wav', data, sample_rate)
Then make sure to call the save after generation
Real-Time-Voice-Cloning/toolbox/__init__.py
Line 213 in 2ef15c6
self.ui.save(wav, Synthesizer.sample_rate)
https://stackoverflow.com/a/45032967
Thanks again for your work and time
I have several audio files from the same person. In GUI, I can load them using browser and then synthesize and vocode some text. How can I do the same thing without GUI?
Thanks
any plans to set this up in google colab as it provides free Nvidia t4 gpus for 12 hour cycles?
What should be the length of input audio? What are other requirements(can there be silence, etc.)?
Real-Time-Voice-Cloning/demo_cli.py
Line 133 in 2e8ef14
Is it feasible to use several audio files and averaged embedding for better quality?
In your thesis paper, you mentioned you couldn't produce meaningful alignments on the LibriTTS dataset even though it is the cleaner version of the LibriSpeech. Could you please explain what might be the reasons for the model not learning the alignments? Also, what are the preprocessing steps you did during training synthesizer on LibriTTS? Did you use Montreal Forced Aligner on the LibriTTS too?
the file open dialog does not show any files for me
replacing filter="*.mp3;*.flac;*.wav;*.m4a"
with filter="Audio Files (*.mp3 *.flac *.wav *.m4a)"
fixes it.
See doc: https://doc.qt.io/qt-5/qfiledialog.html#getOpenFileName
I load a new audio file and use pretrained model. Here is the code.
from synthesizer.inference import Synthesizer
from encoder import inference as encoder
from vocoder import inference as vocoder
import numpy as np
import torch
import sys
enc_model_fpath = 'encoder/saved_models/pretrained.pt'
syn_model_dir = "synthesizer/saved_models/logs-pretrained/"
voc_model_fpath = "vocoder/saved_models/pretrained/pretrained.pt"
low_mem = False
encoder.load_model(enc_model_fpath)
synthesizer = Synthesizer(syn_model_dir + "/taco_pretrained", low_mem=low_mem, verbose = False)
vocoder.load_model(voc_model_fpath)
fpath = 'UserAudio/sample.wav'
wav_speaker = Synthesizer.load_preprocess_wav(fpath)
spec = Synthesizer.make_spectrogram(wav_speaker)
encoder_wav = encoder.load_preprocess_wav(wav_speaker)
embed, partial_embeds, _ = encoder.embed_utterance(encoder_wav, return_partials=True)
utterances = 'You will need the following whether you plan to use the toolbox only or to retrain the models. Thank you very much. I am proud of you.'
texts = utterances.split("\n")
embeds = np.stack([embed] * len(texts))
specs = synthesizer.synthesize_spectrograms(texts, embeds)
Here is the error after running the last line of code.
Constructing model: Tacotron
W0702 10:20:42.369393 139984982034304 ag_logging.py:145] Entity <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>>: ValueError: Failed to parse source code of <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>>, which Python reported as:
def __call__(self, inputs, state, scope=None):
"""Runs vanilla LSTM Cell and applies zoneout.
"""
# Apply vanilla LSTM
output, new_state = self._cell(inputs, state, scope)
if self.state_is_tuple:
(prev_c, prev_h) = state
(new_c, new_h) = new_state
else:
num_proj = self._cell._num_units if self._cell._num_proj is None else \
self._cell._num_proj
prev_c = tf.slice(state, [0, 0], [-1, self._cell._num_units])
prev_h = tf.slice(state, [0, self._cell._num_units], [-1, num_proj])
new_c = tf.slice(new_state, [0, 0], [-1, self._cell._num_units])
new_h = tf.slice(new_state, [0, self._cell._num_units], [-1, num_proj])
# Apply zoneout
if self.is_training:
# nn.dropout takes keep_prob (probability to keep activations) not drop_prob (
# probability to mask activations)!
c = (1 - self._zoneout_cell) * tf.nn.dropout(new_c - prev_c,
(1 - self._zoneout_cell)) + prev_c
h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h,
(1 - self._zoneout_outputs)) + prev_h
else:
c = (1 - self._zoneout_cell) * new_c + self._zoneout_cell * prev_c
h = (1 - self._zoneout_outputs) * new_h + self._zoneout_outputs * prev_h
new_state = tf.nn.rnn_cell.LSTMStateTuple(c, h) if self.state_is_tuple else tf.concat(1, [c,
h])
return output, new_state
This may be caused by multiline strings or comments not indented at the same level as the code.
W0702 10:20:42.435266 139984982034304 ag_logging.py:145] Entity <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>>: ValueError: Failed to parse source code of <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>>, which Python reported as:
def __call__(self, inputs, state, scope=None):
"""Runs vanilla LSTM Cell and applies zoneout.
"""
# Apply vanilla LSTM
output, new_state = self._cell(inputs, state, scope)
if self.state_is_tuple:
(prev_c, prev_h) = state
(new_c, new_h) = new_state
else:
num_proj = self._cell._num_units if self._cell._num_proj is None else \
self._cell._num_proj
prev_c = tf.slice(state, [0, 0], [-1, self._cell._num_units])
prev_h = tf.slice(state, [0, self._cell._num_units], [-1, num_proj])
new_c = tf.slice(new_state, [0, 0], [-1, self._cell._num_units])
new_h = tf.slice(new_state, [0, self._cell._num_units], [-1, num_proj])
# Apply zoneout
if self.is_training:
# nn.dropout takes keep_prob (probability to keep activations) not drop_prob (
# probability to mask activations)!
c = (1 - self._zoneout_cell) * tf.nn.dropout(new_c - prev_c,
(1 - self._zoneout_cell)) + prev_c
h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h,
(1 - self._zoneout_outputs)) + prev_h
else:
c = (1 - self._zoneout_cell) * new_c + self._zoneout_cell * prev_c
h = (1 - self._zoneout_outputs) * new_h + self._zoneout_outputs * prev_h
new_state = tf.nn.rnn_cell.LSTMStateTuple(c, h) if self.state_is_tuple else tf.concat(1, [c,
h])
return output, new_state
This may be caused by multiline strings or comments not indented at the same level as the code.
WARNING: Entity <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>>: ValueError: Failed to parse source code of <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>>, which Python reported as:
def __call__(self, inputs, state, scope=None):
"""Runs vanilla LSTM Cell and applies zoneout.
"""
# Apply vanilla LSTM
output, new_state = self._cell(inputs, state, scope)
if self.state_is_tuple:
(prev_c, prev_h) = state
(new_c, new_h) = new_state
else:
num_proj = self._cell._num_units if self._cell._num_proj is None else \
self._cell._num_proj
prev_c = tf.slice(state, [0, 0], [-1, self._cell._num_units])
prev_h = tf.slice(state, [0, self._cell._num_units], [-1, num_proj])
new_c = tf.slice(new_state, [0, 0], [-1, self._cell._num_units])
new_h = tf.slice(new_state, [0, self._cell._num_units], [-1, num_proj])
# Apply zoneout
if self.is_training:
# nn.dropout takes keep_prob (probability to keep activations) not drop_prob (
# probability to mask activations)!
c = (1 - self._zoneout_cell) * tf.nn.dropout(new_c - prev_c,
(1 - self._zoneout_cell)) + prev_c
h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h,
(1 - self._zoneout_outputs)) + prev_h
else:
c = (1 - self._zoneout_cell) * new_c + self._zoneout_cell * prev_c
h = (1 - self._zoneout_outputs) * new_h + self._zoneout_outputs * prev_h
new_state = tf.nn.rnn_cell.LSTMStateTuple(c, h) if self.state_is_tuple else tf.concat(1, [c,
h])
return output, new_state
This may be caused by multiline strings or comments not indented at the same level as the code.
WARNING: Entity <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>>: ValueError: Failed to parse source code of <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>>, which Python reported as:
def __call__(self, inputs, state, scope=None):
"""Runs vanilla LSTM Cell and applies zoneout.
"""
# Apply vanilla LSTM
output, new_state = self._cell(inputs, state, scope)
if self.state_is_tuple:
(prev_c, prev_h) = state
(new_c, new_h) = new_state
else:
num_proj = self._cell._num_units if self._cell._num_proj is None else \
self._cell._num_proj
prev_c = tf.slice(state, [0, 0], [-1, self._cell._num_units])
prev_h = tf.slice(state, [0, self._cell._num_units], [-1, num_proj])
new_c = tf.slice(new_state, [0, 0], [-1, self._cell._num_units])
new_h = tf.slice(new_state, [0, self._cell._num_units], [-1, num_proj])
# Apply zoneout
if self.is_training:
# nn.dropout takes keep_prob (probability to keep activations) not drop_prob (
# probability to mask activations)!
c = (1 - self._zoneout_cell) * tf.nn.dropout(new_c - prev_c,
(1 - self._zoneout_cell)) + prev_c
h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h,
(1 - self._zoneout_outputs)) + prev_h
else:
c = (1 - self._zoneout_cell) * new_c + self._zoneout_cell * prev_c
h = (1 - self._zoneout_outputs) * new_h + self._zoneout_outputs * prev_h
new_state = tf.nn.rnn_cell.LSTMStateTuple(c, h) if self.state_is_tuple else tf.concat(1, [c,
h])
return output, new_state
This may be caused by multiline strings or comments not indented at the same level as the code.
initialisation done /gpu:0
Initialized Tacotron model. Dimensions (? = dynamic shape):
Train mode: False
Eval mode: False
GTA mode: False
Synthesis mode: True
Input: (?, ?)
device: 0
embedding: (?, ?, 512)
enc conv out: (?, ?, 512)
encoder out (cond): (?, ?, 768)
decoder out: (?, ?, 80)
residual out: (?, ?, 512)
projected residual out: (?, ?, 80)
mel out: (?, ?, 80)
<stop_token> out: (?, ?)
Tacotron Parameters 28.439 Million.
Loading checkpoint: synthesizer/saved_models/logs-pretrained//taco_pretrained/tacotron_model.ckpt-278000
Where are run_live() and linear_dir defined? Are these missing imports?
flake8 testing of https://github.com/CorentinJ/Real-Time-Voice-Cloning on Python 3.7.1
$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics
./synthesizer/synthesize.py:110:9: F821 undefined name 'run_live'
run_live(args, checkpoint_path, hparams)
^
./synthesizer/train.py:347:46: F821 undefined name 'linear_dir'
np.save(os.path.join(linear_dir, linear_filename), linear_prediction.T,
^
2 F821 undefined name 'run_live'
2
E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.
name
name
in __all__
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.