corentinj / real-time-voice-cloning Goto Github PK

View Code? Open in Web Editor NEW

51.6K 51.6K 8.6K 361.02 MB

Clone a voice in 5 seconds to generate arbitrary speech in real-time

License: Other

Python 100.00%

deep-learning python pytorch tensorflow tts voice-cloning

real-time-voice-cloning's People

Contributors

Stargazers

Watchers

Forkers

oytunturk faraday entn-at suwoncjh templeblock haifengzeng peterzhousz alexpeattie ariedamuco zuxriddinkamalov amoliu peter05010402 fireae shartoo tarsbase 1337g shaun95 mattanimation pratikio cclauss clebig pandinosaurus juvu billyotieno burakdev rockystevejobs songxianjin chouaibhm shaunstanislauslau kkkmmu hhy5277 sh1nu11bi karankapoor229 hack121 tspannhw mathigatti abeusher sliceofbytes wendonggan asdlei99 chr1043086360 thaneacheron normonisping theindividuals thecooltechguy naynasa reactjsinaction noernova ssprah averroes morristech mitziyolotzin dashdanilo neuroradiology sutera01 laurencestruck welcomefrank krisk248 yuripourre-forks awesome-ml haoyuanluo suchyu meitianjinbu wangwill folkevil pj520 hyzhan vartraghan towersxu alessandrostone hongwen-sun ugobbh vineshg soudescolado darkio14 thewover avinash-mishra awesome-archive balaprasanna melspectrum007 aiwebbot goryszewskig soldie jayd2446 cod3r0k qsdj elyesmanai tail95 keevol slbinilkumar mem0rz diemesleno dbonattoj bimunlp cliuxinxin alinnb aistream-peelout audio-dsp banyip redfluid

real-time-voice-cloning's Issues

Issues playing without CUDA

Excuse my inexperience, but I tried to use your software since it looked very appealing to me.
I use an AMD GPU, so I followed the steps on installing torch without it. However, I keep getting this error message.

It seems my only failure is that it does not playback.
Is it even possible to use it without CUDA? What can I do to circumvent this error?

Link for youtube video wont work

Link to a Youtube video in the README wont work.
When you click on picture in README file Youtube opens but it says that "Video is not available".

Unable synthesize/clone in own voice

Hi,

I am new to pytorch and audio related manipulations using python. I have installed and configured along with sample train-clean-100 dataset. Now I am able to record 5-7 samples of my own voice but the for some reason they synthesize is not happening in my own voice. it seems to be doing in a default male voice. Can you please guide me in right direction?

Thanks for the help.

Aborted (core dumped)

I tried on PC with GPU 2080ti, cuda, PyQt5 version 10 with on ubuntu 16.04 but when click button "Load" alway error "Aborted (core dumped)".
Can I fix that error?

Thanks!

About Visdom

I have meet some problem about visdom,what is your visdom version？

[Query] Making the inference more robust

Hello, thank you for your work, I have read your thesis briefly, I liked the style of writing, and also, code quality is good, its easy for researchers and hobbyists to play with it, I have experimented with a few voices, and I have observed the following behaviors

sometimes, there is a space, empty sound in spectrogram and final audio, like pauses which are introduced, what could be the reason for it ? Is it because of the vocoder quality ?
The audio volume changes with text, for example, for some texts, the final audio has lower volume, but for some text, it has significantly higher volume, how can I fix this ? Is this also related to vocoder ( wavenet ) ?

Also, since you mention in your thesis, that each system ( of 3 systems ) can be trained independently. with independent datasets, If say, I want to finetune on a specific dataset, or even, a single person, who is native english speaker, so should I only retrained the vocoder system, or tacotron also needs to be retrained ( or finetuned ) ?

Cuda Error with low_mem

Hi,
I have come across the following error when using the toolbox in low memory mode.

On this computer, my GPU only has 2GB so I need to use this mode.

I have tested this on another computer that has a GPU with 4GB RAM. The toolbox works perfectly in normal mode but when I turn on low_mem, I run into the same error.

I'm not sure what other information you would need to look into this so please let me know what else I can provide to help out.

Fixing the synthesizer's gaps in spectrograms

Hello, and thank you for the great work! One of the limitations that I have noticed is that the synthesizer starts to have long gaps in speech if the input text length is short. @CorentinJ do you have any ideas why this is or how I could fix it? I'll also probably ask on Rayhane's repo if I can reproduce the issue on his synthesizer.

Am I correct in assuming that the issue is caused by the stop prediction in Taco2 not having a high enough activation, which results in long spectrograms?

demo_cly.py : TypeError("argument of type 'WindowsPath' is not iterable")

Every time I try a filepath in demo_cli.py I get the same error:
Caught exception: TypeError("argument of type 'WindowsPath' is not iterable")

I don't know what am I doing wrong.
How should be the filepath formatted?

PS -> demo_toolbox.py is 100% working.

CUDA Error

Hi! This work is very interesting, and also your thesis! I tried to use the toolbox, but i obtained the following error, when i tried to load a file from the LibriSpeech dataset.

I followed all the steps and installed pytorch, torchvision, ecc...
Can you help me?
Thank you in advance,
Giorgio

audioread.NoBackendError

Traceback (most recent call last):
  File "C:\Users\The Atomizer\Desktop\text\voice\voice\toolbox\__init__.py", line 52, in <lambda>
    self.ui.browser_load_button.clicked.connect(lambda: self.load_from_browser())
  File "C:\Users\The Atomizer\Desktop\text\voice\voice\toolbox\__init__.py", line 110, in load_from_browser
    wav = Synthesizer.load_preprocess_wav(fpath)
  File "C:\Users\The Atomizer\Desktop\text\voice\voice\synthesizer\inference.py", line 111, in load_preprocess_wav
    wav = librosa.load(fpath, hparams.sample_rate)[0]
  File "C:\Users\The Atomizer\Miniconda3\envs\voice\lib\site-packages\librosa\core\audio.py", line 119, in load
    with audioread.audio_open(os.path.realpath(path)) as input_file:
  File "C:\Users\The Atomizer\Miniconda3\envs\voice\lib\site-packages\audioread\__init__.py", line 116, in audio_open
    raise NoBackendError()
audioread.NoBackendError

When clicking Load in the GUI

problem with utils.argutils in python 3.6

Hi under win 10 64 bits trying using python 3.6 it failed to import the print_args wiht the fact that he can't find the argutils.
think i have a relative import error but can't solve it

btw nice job on what i heard on the youtube demo
if i mnaully try to import the utils from the root dir seems he load another utils files

Can't create pre-trained data file

I have tried to generate pre-trained data, running next command (following your instructions).

python encoder_preprocess.py <datasets_root>

python encoder_train.py my_run <datasets_root>

I've got following errors. I also tried to use some audio from dataset, but I got an exception (see picture) , I understand here I don't have drivers for nVidia.

C:\Users\admin\Downloads\Real-Time-Voice-Cloning>python encoder_preprocess.py "C
:\Users\admin\Downloads\Real-Time-Voice-Cloning\dataset_root"
Traceback (most recent call last):
  File "encoder_preprocess.py", line 1, in <module>
    from encoder.preprocess import preprocess_librispeech, preprocess_voxceleb1,
 preprocess_voxceleb2
  File "C:\Users\admin\Downloads\Real-Time-Voice-Cloning\encoder\preprocess.py",
 line 1, in <module>
    from multiprocess.pool import ThreadPool
ModuleNotFoundError: No module named 'multiprocess'

C:\Users\admin\Downloads\Real-Time-Voice-Cloning>python encoder_train.py my_run
"C:\Users\admin\Downloads\Real-Time-Voice-Cloning\dataset_root"
Arguments:
    run_id:            my_run
    clean_data_root:   C:\Users\admin\Downloads\Real-Time-Voice-Cloning\dataset_
root
    models_dir:        encoder\saved_models
    vis_every:         10
    umap_every:        100
    save_every:        500
    backup_every:      7500
    force_restart:     False
    visdom_server:     http://localhost
    no_visdom:         False

No model "my_run" found, starting training from scratch.
Updating the visualizations every 10 steps.
WARNING:root:Setting up a new session...
Traceback (most recent call last):
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\util\connection.py", line 80, in create_connection
    raise err
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\util\connection.py", line 70, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [WinError 10061] No connection could be made because the
 target machine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connectionpool.py", line 603, in urlopen
    chunked=chunked)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connectionpool.py", line 355, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 1275, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 1224, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 1016, in _send_output
    self.send(msg)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\http\client.py
", line 956, in send
    self.connect()
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connection.py", line 183, in connect
    conn = self._new_conn()
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object
 at 0x000000000A7232B0>: Failed to establish a new connection: [WinError 10061]
No connection could be made because the target machine actively refused it

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\adapters.py", line 449, in send
    timeout=timeout
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\connectionpool.py", line 641, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
urllib3\util\retry.py", line 399, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8097
): Max retries exceeded with url: /env/my_run%20(21-06%2001h12) (Caused by NewCo
nnectionError('<urllib3.connection.HTTPConnection object at 0x000000000A7232B0>:
 Failed to establish a new connection: [WinError 10061] No connection could be m
ade because the target machine actively refused it'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
visdom\__init__.py", line 548, in _send
    data=json.dumps(msg),
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\sessions.py", line 581, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
requests\adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8
097): Max retries exceeded with url: /env/my_run%20(21-06%2001h12) (Caused by Ne
wConnectionError('<urllib3.connection.HTTPConnection object at 0x000000000A7232B
0>: Failed to establish a new connection: [WinError 10061] No connection could b
e made because the target machine actively refused it'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\admin\Downloads\Real-Time-Voice-Cloning\encoder\visualizations.
py", line 51, in __init__
    self.vis = visdom.Visdom(server, env=self.env_name, raise_exceptions=True)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
visdom\__init__.py", line 406, in __init__
    }, endpoint='env/' + env)
  File "C:\Users\admin\AppData\Local\Programs\Python\Python37\lib\site-packages\
visdom\__init__.py", line 562, in _send
    raise ConnectionError("Error connecting to Visdom server")
ConnectionError: Error connecting to Visdom server

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "encoder_train.py", line 46, in <module>
    train(**vars(args))
  File "C:\Users\admin\Downloads\Real-Time-Voice-Cloning\encoder\train.py", line
 60, in train
    vis = Visualizations(run_id, vis_every, server=visdom_server, disabled=no_vi
sdom)
  File "C:\Users\admin\Downloads\Real-Time-Voice-Cloning\encoder\visualizations.
py", line 53, in __init__
    raise Exception("No visdom server detected. Run the command \"visdom\" in yo
ur CLI to "
Exception: No visdom server detected. Run the command "visdom" in your CLI to st
art it.

How can I do pre-trained data file for Romanian languages or how I should to structure content of the folders and files to generate pre-trained data for Romanian language.

Error in loading state_dict for WaveRNN:

I have sucessfully run the program with your checkpoint in Griffin-LIM.My dataset is LibriSpeech.But when i used pretrained wavernn vocoder,i meet a error.
Here is the code

Building Wave-RNN
Trainable Parameters: 4.481M
Loading model weights at vocoder\saved_models\pretrained\pretrained.pt
Traceback (most recent call last):
  File "D:\Real-Time-Voice-Cloning-master\toolbox\__init__.py", line 249, in init_vocoder
    vocoder.load_model(model_fpath)
  File "D:\Real-Time-Voice-Cloning-master\vocoder\inference.py", line 31, in load_model
    _model.load_state_dict(checkpoint['model_state'])
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for WaveRNN:
        Unexpected key(s) in state_dict: "upsample.resnet.batch_norm.num_batches_tracked", "upsample.resnet.layers.0.batch_norm1.num_batches_tracked", "upsample.resnet.layers.0.batch_n
orm2.num_batches_tracked", "upsample.resnet.layers.1.batch_norm1.num_batches_tracked", "upsample.resnet.layers.1.batch_norm2.num_batches_tracked", "upsample.resnet.layers.2.batch_norm1
.num_batches_tracked", "upsample.resnet.layers.2.batch_norm2.num_batches_tracked", "upsample.resnet.layers.3.batch_norm1.num_batches_tracked", "upsample.resnet.layers.3.batch_norm2.num
_batches_tracked", "upsample.resnet.layers.4.batch_norm1.num_batches_tracked", "upsample.resnet.layers.4.batch_norm2.num_batches_tracked", "upsample.resnet.layers.5.batch_norm1.num_bat
ches_tracked", "upsample.resnet.layers.5.batch_norm2.num_batches_tracked", "upsample.resnet.layers.6.batch_norm1.num_batches_tracked", "upsample.resnet.layers.6.batch_norm2.num_batches
_tracked", "upsample.resnet.layers.7.batch_norm1.num_batches_tracked", "upsample.resnet.layers.7.batch_norm2.num_batches_tracked", "upsample.resnet.layers.8.batch_norm1.num_batches_tra
cked", "upsample.resnet.layers.8.batch_norm2.num_batches_tracked", "upsample.resnet.layers.9.batch_norm1.num_batches_tracked", "upsample.resnet.layers.9.batch_norm2.num_batches_tracked
".

Can't install requirements on OSX

I used the network install for CUDA before trying this and have pyenv local set to 3.7, anyone know why I'm getting this error?

Thanks!

OSX Version: 10.12.6 (16G2016)
Here's the stack trace:

  Real-Time-Voice-Cloning git:(master) ✗ pip install -r requirements.txt --verbose
Created temporary directory: /private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-ephem-wheel-cache-3rpdn6k2
Created temporary directory: /private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-req-tracker-hwlp7vz4
Created requirements tracker '/private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-req-tracker-hwlp7vz4'
Created temporary directory: /private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-install-il8b8a11
Collecting tensorflow-gpu<=1.14.0,>=1.10.0 (from -r requirements.txt (line 1))
  1 location(s) to search for versions of tensorflow-gpu:
  * https://pypi.org/simple/tensorflow-gpu/
  Getting page https://pypi.org/simple/tensorflow-gpu/
  Looking up "https://pypi.org/simple/tensorflow-gpu/" in the cache
  Request header has "max_age" as 0, cache bypassed
  Starting new HTTPS connection (1): pypi.org:443
  https://pypi.org:443 "GET /simple/tensorflow-gpu/ HTTP/1.1" 304 0
  Analyzing links from page https://pypi.org/simple/tensorflow-gpu/
    Skipping link https://files.pythonhosted.org/packages/e0/fe/9fb7fff32441dff89e00b359dade48f5f071127d604a7259fb6ee1f43e4f/tensorflow_gpu-0.12.0rc0-cp27-cp27mu-manylinux1_x86_64.whl#sha256=c97f916fc7edf0867149d8faa31cbee75bf4925bdfc0e0eef924cda1dd5c853b (from https://pypi.org/simple/tensorflow-gpu/); it is not compatible with this Python
  ....
    Skipping link https://files.pythonhosted.org/packages/ad/be/9a5ab6b9757113b841695a203883aa1a7a3ac514258038c885aa31d443be/tensorflow_gpu-2.0.0b1-cp37-cp37m-win_amd64.whl#sha256=0c8ced99e74f10c66604fa61e40cb8cfbc073e89eb95c061626d3535958142b5 (from https://pypi.org/simple/tensorflow-gpu/); it is not compatible with this Python
  ERROR: Could not find a version that satisfies the requirement tensorflow-gpu<=1.14.0,>=1.10.0 (from -r requirements.txt (line 1)) (from versions: none)
Cleaning up...
Removed build tracker '/private/var/folders/4p/9q3jmsg959s6gtjs4dc6sn14ks6wf6/T/pip-req-tracker-hwlp7vz4'
ERROR: No matching distribution found for tensorflow-gpu<=1.14.0,>=1.10.0 (from -r requirements.txt (line 1))
Exception information:
Traceback (most recent call last):
  File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/cli/base_command.py", line 178, in main
    status = self.run(options, args)
  File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/commands/install.py", line 352, in run
    resolver.resolve(requirement_set)
  File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/resolve.py", line 131, in resolve
    self._resolve_one(requirement_set, req)
  File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/resolve.py", line 294, in _resolve_one
    abstract_dist = self._get_abstract_dist_for(req_to_install)
  File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/resolve.py", line 242, in _get_abstract_dist_for
    self.require_hashes
  File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/operations/prepare.py", line 282, in prepare_linked_requirement
    req.populate_link(finder, upgrade_allowed, require_hashes)
  File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/req/req_install.py", line 198, in populate_link
    self.link = finder.find_requirement(self, upgrade)
  File "/Users/ketjohn/.pyenv/versions/3.7.3/lib/python3.7/site-packages/pip/_internal/index.py", line 792, in find_requirement
    'No matching distribution found for %s' % req
pip._internal.exceptions.DistributionNotFound: No matching distribution found for tensorflow-gpu<=1.14.0,>=1.10.0 (from -r requirements.txt (line 1))

Noise artefacts on generated audio

Generated audio have noise artifacts between words like in attachment
demo.zip
What can be a reason?

when cpu only could be supported

hi, It is cool.
when cpu-only could be supported?
Thank you.

Real time voice conversion

Just a question, but will there ever be a way to clone a voice, except instead of typing it out, you can just have a real time (or almost real time) voice conversion? As in I could say something after "training" it to a different voice into the mic, and it would output audio in near real time. I don't know how else to explain it, I hope you know what I mean. XP

Abort when python demo_toolbox.py

Hi:
I am trying to run your code on a centos server with X11 forwarding open. But when I try python demo_toolbox dataset , it prints

Arguments:
    datasets_root:    dataset
    enc_models_dir:   encoder/saved_models
    syn_models_dir:   synthesizer/saved_models
    voc_models_dir:   vocoder/saved_models
Aborted

I believe I installed all required packages. Looks like the error is not caused by python but some low level call. So is there any way to print more error message? Or is there any way to run without GUI ? (I think although I open X11 forward on this server but it still might not fit as good as a pure GUI machine).

Thanks!

about "encoder training speed".

At present, the training speed is very slow. May I ask you to read all the data directly into the memory and then take it directly. How can I modify it?

china version google-drive please

Pretrained models
Download the latest here.

I need " https://pan.baidu.com/ "
cant get the pretrain model
plz uplode the pan.baidu.com version

Speaker Embedding Question

When synthesizing the text to speech on the audio of your choice (where you recommend using three audio files of the same speaker), do you average the speaker embeddings (from the three audio files) and input that into the trained model? If you don't average the speaker embeddings, what do you do? If its too much work to explain, you can point to me which line of code deals with this. I can figure the rest out.

Any chance of music synthesis being implemented?

Running with python 3.5

Here is the problems I got with python 3.5
1.

File "/data/Real-Time-Voice-Cloning/vocoder/models/fatchord_version.py", line 247
    msg = f'| {pbar} {i*b_size}/{seq_len*b_size} | Batch Size: {b_size} | Gen Rate: {gen_rate:.1f}kHz | '

File "/data/Real-Time-Voice-Cloning/vocoder/models/fatchord_version.py", line 411
    parameters = sum([np.prod(p.size()) for p in parameters]) / 1_000_000

"/data/Real-Time-Voice-Cloning/vocoder/display.py", line 50
    temp_head = f'| {headings[i]} '

And seems there are lots more.

about "the forward pass is faster on the GPU whereas the loss is often (depending on your hyperparameters) faster on the CPU"

Why is loss calculated faster on the cpu, and what parameters are you depending on?thanks

Failure on "from PyQt5.QtCore import Qt"

When runing demo_toolbox.py, an importError occurs, which points to the line 3 in ui.py "from PyQt5.QtCore import Qt". However, when I tried this import independently in python environment, it succeeded. Details are shown in the following picture.

Vocoder Training

Hello, very nice repo, especially implementation of D-vector speaker verification architecture.

Quick question about the vocoder model training. Does the WaveRNN vocoder take d-vector speaker embedding, or is it just trained on all speakers in the training dataset without explicit conditioning? Curious about any vocoder experiments you've run in this regard.

PyQt4 and torch are not installable through pip

I had to manually install them. Pip doesn't seem to want to install those two.

Integrating new synthesizer

Hi,

If I wanted to integrate a new synthesizer, (https://github.com/syang1993/gst-tacotron/), what would be the steps I would need to take?

Hang on audio file load

I've tried to figure out why, but couldn't fix it:

The process completely hangs (forever) in this line:

Real-Time-Voice-Cloning/synthesizer/inference.py

Line 111 in 8229b1d

wav = librosa.load(fpath, hparams.sample_rate)[0]

Pressing Ctrl+C causes the UI to become responsive again but loading obviously fails.

Running the same command (import librosa and librosa.load("datasets/....", 16000) in a shell works fine)

I thought it might be caused by something like this:

But setting this at the start of the process doesn't fix it:

os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['JOBLIB_START_METHOD']='forkserver'

Attaching a debugger to that line works as expected, but as soon as you step into that line it's impossible to pause the process again. Running the same librosa.load command in the debug REPL also causes it to hang.

A question about Celeb datasets and process.

I was able to get this repo working pretty well and even tested out my own voice with a few utterances and using the libra 100 data set. However I would like to get better results and wish to attempt to use the other data sets. I noticed that the librispeech data sets all extracted as you say in the Librispeech folde as expected ; however when i downloaded the VoxCeleb datasets the files in the zip files consist of folders in each zip file and then it has files named ***_dev_wav_partaa etc.

My question is in VoxCeleb1 do I extract and dump everything under the VoxCeleb1 folder? Where do I put the txt files?

You have done a spectacular job with this and you deserve a medal. I am just a bit confused as to how to extract these VoxCeleb datasets.

One more question. Once I get all the data sets installed. What is the step by step process to creating a model of my own voice? Do i create a bunch of utterances in toolbox and insert a bunch of utterances from datasets and then click the sythesise and vocode? What is the best method and approach. I am sorry for asking so many questions that perhaps you think are obvious but I really am just now learning about how to use this type of stuff and I am eager to know more. This is more of hobby for me

No torch version in requirements.txt

How to inform the dataset?

Hi,

I downloaded the dataset LibriSpeech/train-clean-100 and I have it in the project root folder.
But when I try to execute the demo_toolbox.py it tells me that 'you do not have any of the recognized datasets in LibriSpeech'

Maybe I'm passing the path wrong?

python demo-toolbox.py -d LibriSpeech

Do I need to do something different?

I meet a error about "Exception: Can't create RandomCycler from an empty collection"

The error message is as follows, I hope to get your reply.thanks

No model "my_run" found, starting training from scratch.
Updating the visualizations every 10 steps.
......Traceback (most recent call last):
File "encoder_train.py", line 47, in
train(**vars(args))
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/train.py", line 71, in train
for step, speaker_batch in enumerate(loader, init_step):
File "/home/zyq/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 568, in next
return self._process_next_batch(batch)
File "/home/zyq/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
Exception: Traceback (most recent call last):
File "/home/zyq/miniconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker_verification_dataset.py", line 56, in collate
return SpeakerBatch(speakers, self.utterances_per_speaker, partials_n_frames)
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker_batch.py", line 8, in init
self.partials = {s: s.random_partial(utterances_per_speaker, n_frames) for s in speakers}
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker_batch.py", line 8, in
self.partials = {s: s.random_partial(utterances_per_speaker, n_frames) for s in speakers}
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker.py", line 35, in random_partial
self._load_utterances()
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/speaker.py", line 19, in _load_utterances
self.utterance_cycler = RandomCycler(self.utterances)
File "/home/zyq/speech/Real-Time-Voice-Cloning/encoder/data_objects/random_cycler.py", line 15, in init
raise Exception("Can't create RandomCycler from an empty collection")
Exception: Can't create RandomCycler from an empty collection

WaveRNN modifications?

Is this repo uses vanilla version of WaveRNN (https://github.com/fatchord/WaveRNN)? or architecture was modified and model was retrained?

It's quite fast relative to my benchmark of tacotron2 + wavernn from here https://github.com/erogol/WaveRNN but (maybe) have less natural voice.

BTW I don't tried new universal vocoder version.

about “the three networks training order”

The encoder and synthesiser network training has a sequential relationship, and the vocoder network is not dependent on these two networks, can be trained at any time?

Can't install demo_toolbox.py

I try to run python demo_toolbox.py <dataset_root> in give an error (see bellow), I also try to run in cmd this C:\Python37\Scripts>pip install unicode, but it doesn't help. Can anybody help me?

C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master> python demo_toolbox.py "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\datasets_root"
Traceback (most recent call last):
File "demo_toolbox.py", line 2, in
from toolbox import Toolbox
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\toolbox_init_.py", line 3, in
from synthesizer import inference as synthesizer
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\synthesizer\inference.py", line 2, in
from synthesizer.synthesizer import Synthesizer
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\synthesizer\synthesizer.py", line 1, in
from synthesizer.utils.text import text_to_sequence
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\synthesizer\utils\text.py", line 2, in
from . import cleaners
File "C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\synthesizer\utils\cleaners.py", line 14, in
from unidecode import unidecode
ModuleNotFoundError: No module named 'unidecode'

can't install requirements

Hi guys, I tried to install this tool, following next steps:

Install Python 3.73
Install pip
Run from command line pip install -r requirements.txt

but I've got the following error: (see below). Could you help me with some suggestion. I don't know python well, but I will very gratefull if your suggestion help me.
Thank you in advance.

C:\Users\admin\Desktop\Real-Time-Voice-Cloning-master\Real-Time-Voice-Cloning-ma
ster>pip install -r requirements.txt
Requirement already satisfied: tensorflow-gpu<=1.14.0,>=1.10.0 in c:\users\admin
\appdata\local\programs\python\python37\lib\site-packages (from -r requirements.
txt (line 1)) (1.13.1)
Requirement already satisfied: umap-learn in c:\users\admin\appdata\local\progra
ms\python\python37\lib\site-packages (from -r requirements.txt (line 2)) (0.3.9)

Requirement already satisfied: visdom in c:\users\admin\appdata\local\programs\p
ython\python37\lib\site-packages (from -r requirements.txt (line 3)) (0.1.8.8)
Collecting webrtcvad (from -r requirements.txt (line 4))
Using cached https://files.pythonhosted.org/packages/89/34/e2de2d97f3288512b9e
a56f92e7452f8207eb5a0096500badf9dfd48f5e6/webrtcvad-2.0.10.tar.gz
Collecting librosa>=0.5.1 (from -r requirements.txt (line 5))
Collecting matplotlib>=2.0.2 (from -r requirements.txt (line 6))
Using cached https://files.pythonhosted.org/packages/3b/52/17dbb82ca36937dd4d0
027fe1945c3c78bdb465b4736903d0904b7f595ad/matplotlib-3.1.0-cp37-cp37m-win_amd64.
whl
Requirement already satisfied: numpy>=1.14.0 in c:\users\admin\appdata\local\pro
grams\python\python37\lib\site-packages (from -r requirements.txt (line 7)) (1.1
6.4)
Requirement already satisfied: scipy>=1.0.0 in c:\users\admin\appdata\local\prog
rams\python\python37\lib\site-packages (from -r requirements.txt (line 8)) (1.3.
0)
Collecting tqdm (from -r requirements.txt (line 9))
Using cached https://files.pythonhosted.org/packages/45/af/685bf3ce889ea191f3b
916557f5677cc95a5e87b2fa120d74b5dd6d049d0/tqdm-4.32.1-py2.py3-none-any.whl
Collecting sounddevice (from -r requirements.txt (line 10))
Using cached https://files.pythonhosted.org/packages/7f/15/fd6d923adccc64d2d93
fcffc245bb2471a2509bb2905a89c4fc772ce4e35/sounddevice-0.3.13-py2.py3.cp26.cp27.c
p32.cp33.cp34.cp35.cp36.cp37.cp38.pp27.pp32.pp33.pp34.pp35.pp36-none-win_amd64.w
hl
Requirement already satisfied: protobuf>=3.6.1 in c:\users\admin\appdata\local\p
rograms\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0-

-r requirements.txt (line 1)) (3.8.0)
Requirement already satisfied: wheel>=0.26 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-r
requirements.txt (line 1)) (0.33.4)
Requirement already satisfied: tensorboard<1.14.0,>=1.13.0 in c:\users\admin\app
data\local\programs\python\python37\lib\site-packages (from tensorflow-gpu<=1.14
.0,>=1.10.0->-r requirements.txt (line 1)) (1.13.1)
Requirement already satisfied: six>=1.10.0 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-r
requirements.txt (line 1)) (1.12.0)
Requirement already satisfied: astor>=0.6.0 in c:\users\admin\appdata\local\prog
rams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-r
requirements.txt (line 1)) (0.8.0)
Requirement already satisfied: termcolor>=1.1.0 in c:\users\admin\appdata\local
programs\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0
->-r requirements.txt (line 1)) (1.1.0)
Requirement already satisfied: keras-preprocessing>=1.0.5 in c:\users\admin\appd
ata\local\programs\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.
0,>=1.10.0->-r requirements.txt (line 1)) (1.1.0)
Requirement already satisfied: grpcio>=1.8.6 in c:\users\admin\appdata\local\pro
grams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-
r requirements.txt (line 1)) (1.21.1)
Requirement already satisfied: keras-applications>=1.0.6 in c:\users\admin\appda
ta\local\programs\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0
,>=1.10.0->-r requirements.txt (line 1)) (1.0.8)
Requirement already satisfied: tensorflow-estimator<1.14.0rc0,>=1.13.0 in c:\use
rs\admin\appdata\local\programs\python\python37\lib\site-packages (from tensorfl
ow-gpu<=1.14.0,>=1.10.0->-r requirements.txt (line 1)) (1.13.0)
Requirement already satisfied: absl-py>=0.1.6 in c:\users\admin\appdata\local\pr
ograms\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->
-r requirements.txt (line 1)) (0.7.1)
Requirement already satisfied: gast>=0.2.0 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from tensorflow-gpu<=1.14.0,>=1.10.0->-r
requirements.txt (line 1)) (0.2.2)
Requirement already satisfied: scikit-learn>=0.16 in c:\users\admin\appdata\loca
l\programs\python\python37\lib\site-packages (from umap-learn->-r requirements.t
xt (line 2)) (0.21.2)
Requirement already satisfied: numba>=0.37 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from umap-learn->-r requirements.txt (lin
e 2)) (0.44.0)
Requirement already satisfied: tornado in c:\users\admin\appdata\local\programs
python\python37\lib\site-packages (from visdom->-r requirements.txt (line 3)) (6
.0.2)
Requirement already satisfied: requests in c:\users\admin\appdata\local\programs
\python\python37\lib\site-packages (from visdom->-r requirements.txt (line 3)) (
2.22.0)
Requirement already satisfied: websocket-client in c:\users\admin\appdata\local
programs\python\python37\lib\site-packages (from visdom->-r requirements.txt (li
ne 3)) (0.56.0)
Requirement already satisfied: torchfile in c:\users\admin\appdata\local\program
s\python\python37\lib\site-packages (from visdom->-r requirements.txt (line 3))
(0.1.0)
Requirement already satisfied: pillow in c:\users\admin\appdata\local\programs\p
ython\python37\lib\site-packages (from visdom->-r requirements.txt (line 3)) (6.
0.0)
Requirement already satisfied: pyzmq in c:\users\admin\appdata\local\programs\py
thon\python37\lib\site-packages (from visdom->-r requirements.txt (line 3)) (18.
0.1)
Collecting decorator>=3.0.0 (from librosa>=0.5.1->-r requirements.txt (line 5))
Using cached https://files.pythonhosted.org/packages/5f/88/0075e461560a1e750a0
dcbf77f1d9de775028c37a19a346a6c565a257399/decorator-4.4.0-py2.py3-none-any.whl
Collecting resampy>=0.2.0 (from librosa>=0.5.1->-r requirements.txt (line 5))
Requirement already satisfied: joblib>=0.12 in c:\users\admin\appdata\local\prog
rams\python\python37\lib\site-packages (from librosa>=0.5.1->-r requirements.txt
(line 5)) (0.13.2)
Collecting audioread>=2.0.0 (from librosa>=0.5.1->-r requirements.txt (line 5))
Collecting python-dateutil>=2.1 (from matplotlib>=2.0.2->-r requirements.txt (li
ne 6))
Using cached https://files.pythonhosted.org/packages/41/17/c62faccbfbd163c7f57
f3844689e3a78bae1f403648a6afb1d0866d87fbb/python_dateutil-2.8.0-py2.py3-none-any
.whl
Collecting cycler>=0.10 (from matplotlib>=2.0.2->-r requirements.txt (line 6))
Using cached https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af69644
0ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib>=2.0.2->-r
requirements.txt (line 6))
Using cached https://files.pythonhosted.org/packages/dd/d9/3ec19e966301a6e2576
9976999bd7bbe552016f0d32b577dc9d63d2e0c49/pyparsing-2.4.0-py2.py3-none-any.whl
Collecting kiwisolver>=1.0.1 (from matplotlib>=2.0.2->-r requirements.txt (line
6))
Using cached https://files.pythonhosted.org/packages/c6/ea/e5474014a13ab2dcb50
56608e0716c600c3d8a8bcffb10ed55ccd6a42eb0/kiwisolver-1.1.0-cp37-none-win_amd64.w
hl
Collecting CFFI>=1.0 (from sounddevice->-r requirements.txt (line 10))
Using cached https://files.pythonhosted.org/packages/2f/ad/9722b7752fdd88c858b
e57b47f41d1049b5fb0ab79caf0ab11407945c1a7/cffi-1.12.3-cp37-cp37m-win_amd64.whl
Requirement already satisfied: setuptools in c:\users\admin\appdata\local\progra
ms\python\python37\lib\site-packages (from protobuf>=3.6.1->tensorflow-gpu<=1.14
.0,>=1.10.0->-r requirements.txt (line 1)) (40.8.0)
Requirement already satisfied: werkzeug>=0.11.15 in c:\users\admin\appdata\local
\programs\python\python37\lib\site-packages (from tensorboard<1.14.0,>=1.13.0->t
ensorflow-gpu<=1.14.0,>=1.10.0->-r requirements.txt (line 1)) (0.15.4)
Requirement already satisfied: markdown>=2.6.8 in c:\users\admin\appdata\local\p
rograms\python\python37\lib\site-packages (from tensorboard<1.14.0,>=1.13.0->ten
sorflow-gpu<=1.14.0,>=1.10.0->-r requirements.txt (line 1)) (3.1.1)
Requirement already satisfied: h5py in c:\users\admin\appdata\local\programs\pyt
hon\python37\lib\site-packages (from keras-applications>=1.0.6->tensorflow-gpu<=
1.14.0,>=1.10.0->-r requirements.txt (line 1)) (2.9.0)
Requirement already satisfied: mock>=2.0.0 in c:\users\admin\appdata\local\progr
ams\python\python37\lib\site-packages (from tensorflow-estimator<1.14.0rc0,>=1.1
3.0->tensorflow-gpu<=1.14.0,>=1.10.0->-r requirements.txt (line 1)) (3.0.5)
Requirement already satisfied: llvmlite>=0.29.0 in c:\users\admin\appdata\local
programs\python\python37\lib\site-packages (from numba>=0.37->umap-learn->-r req
uirements.txt (line 2)) (0.29.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\use
rs\admin\appdata\local\programs\python\python37\lib\site-packages (from requests
->visdom->-r requirements.txt (line 3)) (1.25.3)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\admin\appdata\loca
l\programs\python\python37\lib\site-packages (from requests->visdom->-r requirem
ents.txt (line 3)) (2019.3.9)
Requirement already satisfied: idna<2.9,>=2.5 in c:\users\admin\appdata\local\pr
ograms\python\python37\lib\site-packages (from requests->visdom->-r requirements
.txt (line 3)) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in c:\users\admin\appdata\l
ocal\programs\python\python37\lib\site-packages (from requests->visdom->-r requi
rements.txt (line 3)) (3.0.4)
Collecting pycparser (from CFFI>=1.0->sounddevice->-r requirements.txt (line 10)
)
Building wheels for collected packages: webrtcvad
Building wheel for webrtcvad (setup.py) ... error
ERROR: Complete output from command 'c:\users\admin\appdata\local\programs\pyt
hon\python37\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\Us
ers\admin\AppData\Local\Temp\pip-install-iqkryxja\webrtcvad\setup.py'"'"'
;f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'
\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))
' bdist_wheel -d 'C:\Users\admin\AppData\Local\Temp\pip-wheel-7ewjy9xx' --python
-tag cp37:
ERROR: running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
copying webrtcvad.py -> build\lib.win-amd64-3.7
running build_ext
building '_webrtcvad' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": https://visualstudio.microsoft.com/downloads/

ERROR: Failed building wheel for webrtcvad
Running setup.py clean for webrtcvad
Failed to build webrtcvad
Installing collected packages: webrtcvad, decorator, resampy, audioread, librosa
, python-dateutil, cycler, pyparsing, kiwisolver, matplotlib, tqdm, pycparser, C
FFI, sounddevice
Running setup.py install for webrtcvad ... error
ERROR: Complete output from command 'c:\users\admin\appdata\local\programs\p
ython\python37\python.exe' -u -c 'import setuptools, tokenize;file='"'"'C:\
Users\admin\AppData\Local\Temp\pip-install-iqkryxja\webrtcvad\setup.py'"'
"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'
"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'
))' install --record 'C:\Users\admin\AppData\Local\Temp\pip-record-t598pe1f\inst
all-record.txt' --single-version-externally-managed --compile:
ERROR: running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.7
copying webrtcvad.py -> build\lib.win-amd64-3.7
running build_ext
building '_webrtcvad' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual
C++ Build Tools": https://visualstudio.microsoft.com/downloads/
----------------------------------------
ERROR: Command "'c:\users\admin\appdata\local\programs\python\python37\python.ex
e' -u -c 'import setuptools, tokenize;file='"'"'C:\Users\admin\AppData\L
ocal\Temp\pip-install-iqkryxja\webrtcvad\setup.py'"'"';f=getattr(tokenize, '
"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"
');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record 'C:
\Users\admin\AppData\Local\Temp\pip-record-t598pe1f\install-record.txt' --single
-version-externally-managed --compile" failed with error code 1 in C:\Users\admi
n\AppData\Local\Temp\pip-install-iqkryxja\webrtcvad\

Support for other languages

Available languages

Chinese (Mandarin): #811
German: #571*
Swedish: #257*

* Requires Tensorflow 1.x (harder to set up).

Requested languages (not available yet)

Arabic: #871
Czech: #655
English: #388 (UK accent), #429 (Indian accent)
French: #854
Hindi: #525
Italian: #697
Polish: #815
Portuguese: #531
Russian: #707
Spanish: #789
Turkish: #761
Ukrainian: #492

how to improve voice quality?

Thanks for your greate job.I have tried this project in my own computer(win10, 1060ti 3gb), and I think the similarity of voice is good.Do you have any ideas of how to improve the qulity and similarity of voice ?Does para wavenet is a good way? thang you。

audioread.exceptions.NoBackendError

Traceback (most recent call last):
File "D:\sdx\DeepFake\voice\Real-Time-Voice-Cloning\toolbox_init_.py", line 70, in
func = lambda: self.load_from_browser(self.ui.browse_file())
File "D:\sdx\DeepFake\voice\Real-Time-Voice-Cloning\toolbox_init_.py", line 110, in load_from_browser
wav = Synthesizer.load_preprocess_wav(fpath)
File "D:\sdx\DeepFake\voice\Real-Time-Voice-Cloning\synthesizer\inference.py", line 111, in load_preprocess_wav
wav = librosa.load(fpath, hparams.sample_rate)[0]
File "C:\ProgramData\Anaconda3\lib\site-packages\librosa\core\audio.py", line 119, in load
with audioread.audio_open(os.path.realpath(path)) as input_file:
File "C:\ProgramData\Anaconda3\lib\site-packages\audioread_init_.py", line 116, in audio_open
raise NoBackendError()
audioread.exceptions.NoBackendError

I have installed ffmpeg, but still get this error.

`demo_cli` passes, and fail when running `demo_toolbox`

I'm getting some kind of DLL load failed, which is probably unrelated to your project but the CLI check passed and is happens when I run toolbox.

PS C:\Users\gantm\Desktop\Code\ml\Real-Time-Voice-Cloning> python .\demo_toolbox.py
C:\Python36\lib\site-packages\numpy\core\__init__.py:29: UserWarning: loaded more than 1 DLL from .libs:
C:\Python36\lib\site-packages\numpy\.libs\libopenblas.BNVRK7633HSX7YVO2TADGR4A5KEKXJAW.gfortran-win_amd64.dll
C:\Python36\lib\site-packages\numpy\.libs\libopenblas.IPBC74C7KURV7CB2PKT5Z5FNR3SIBV4J.gfortran-win_amd64.dll
  stacklevel=1)
Traceback (most recent call last):
  File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "C:\Python36\lib\imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "C:\Python36\lib\imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: DLL load failed: The specified procedure could not be found.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".\demo_toolbox.py", line 2, in <module>
    from toolbox import Toolbox
  File "C:\Users\gantm\Desktop\Code\ml\Real-Time-Voice-Cloning\toolbox\__init__.py", line 3, in <module>
    from synthesizer import inference as synthesizer
  File "C:\Users\gantm\Desktop\Code\ml\Real-Time-Voice-Cloning\synthesizer\inference.py", line 1, in <module>
    from synthesizer.hparams import hparams
  File "C:\Users\gantm\Desktop\Code\ml\Real-Time-Voice-Cloning\synthesizer\hparams.py", line 1, in <module>
    from tensorflow.contrib.training import HParams
  File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\__init__.py", line 28, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "C:\Users\gantm\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "C:\Python36\lib\imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "C:\Python36\lib\imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: DLL load failed: The specified procedure could not be found.


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

Saving can be supported by scaling float values to int values

Sorry, this should really be just a pull request but I just don't have the time to make a proper implementation for you.

I've been dragging myself through the papers and your thesis. (I'm just a self taught hobbyist trying to squeeze by). Thanks so much for sharing your implementations and models.

I just needed this quickly and saw it was on your todo list :)
Took a bit of googling to kinda find out a way. So I thought I'd leave this here for you and others. (The errors it throws if you don't do this are a bit obtuse and misleading)

Add these and it'll save... Though there's something up a bit. Every now and then there's a clicking. And in general the audio sounds a tad bit different then the playback.

Real-Time-Voice-Cloning/toolbox/ui.py

Line 143 in 2ef15c6

import soundfile as sf #pip3 install soundfile

def float2pcm(self, sig, dtype='int32'):
    i = np.iinfo(dtype)
    abs_max = 2 ** (i.bits - 1)
    offset = i.min + abs_max
    return (sig * abs_max + offset).clip(i.min, i.max).astype(dtype)

def save(self, wav, sample_rate) :
    data = self.float2pcm(wav)
    sf.write('temp.wav', data, sample_rate)

Then make sure to call the save after generation

Real-Time-Voice-Cloning/toolbox/__init__.py

Line 213 in 2ef15c6

self.ui.save(wav, Synthesizer.sample_rate)

https://stackoverflow.com/a/45032967

Thanks again for your work and time

How to load my own audio files in demo_cli.py?

I have several audio files from the same person. In GUI, I can load them using browser and then synthesize and vocode some text. How can I do the same thing without GUI?

Thanks

google colab

any plans to set this up in google colab as it provides free Nvidia t4 gpus for 12 hour cycles?

Recommended length of input audio?

What should be the length of input audio? What are other requirements(can there be silence, etc.)?

Real-Time-Voice-Cloning/demo_cli.py

Line 133 in 2e8ef14

original_wav, sampling_rate = librosa.load(in_fpath)

Is it feasible to use several audio files and averaged embedding for better quality?

Synthesizer training on LibriTTS

In your thesis paper, you mentioned you couldn't produce meaningful alignments on the LibriTTS dataset even though it is the cleaner version of the LibriSpeech. Could you please explain what might be the reasons for the model not learning the alignments? Also, what are the preprocessing steps you did during training synthesizer on LibriTTS? Did you use Montreal Forced Aligner on the LibriTTS too?

File open dialog filter is wrong

the file open dialog does not show any files for me

replacing filter="*.mp3;*.flac;*.wav;*.m4a" with filter="Audio Files (*.mp3 *.flac *.wav *.m4a)" fixes it.

See doc: https://doc.qt.io/qt-5/qfiledialog.html#getOpenFileName

Error when synthesizing on new audio file

I load a new audio file and use pretrained model. Here is the code.

from synthesizer.inference import Synthesizer
from encoder import inference as encoder
from vocoder import inference as vocoder
import numpy as np
import torch
import sys

enc_model_fpath = 'encoder/saved_models/pretrained.pt'
syn_model_dir = "synthesizer/saved_models/logs-pretrained/"
voc_model_fpath = "vocoder/saved_models/pretrained/pretrained.pt"
low_mem = False

encoder.load_model(enc_model_fpath)
synthesizer = Synthesizer(syn_model_dir + "/taco_pretrained", low_mem=low_mem, verbose = False)
vocoder.load_model(voc_model_fpath)

fpath = 'UserAudio/sample.wav'
wav_speaker = Synthesizer.load_preprocess_wav(fpath)

spec = Synthesizer.make_spectrogram(wav_speaker)
encoder_wav = encoder.load_preprocess_wav(wav_speaker)
embed, partial_embeds, _ = encoder.embed_utterance(encoder_wav, return_partials=True)

utterances = 'You will need the following whether you plan to use the toolbox only or to retrain the models. Thank you very much. I am proud of you.'
texts = utterances.split("\n")

embeds = np.stack([embed] * len(texts))
specs = synthesizer.synthesize_spectrograms(texts, embeds)

Here is the error after running the last line of code.

Constructing model: Tacotron
W0702 10:20:42.369393 139984982034304 ag_logging.py:145] Entity <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>>: ValueError: Failed to parse source code of <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>>, which Python reported as:
    def __call__(self, inputs, state, scope=None):
        """Runs vanilla LSTM Cell and applies zoneout.
        """
        # Apply vanilla LSTM
        output, new_state = self._cell(inputs, state, scope)

        if self.state_is_tuple:
            (prev_c, prev_h) = state
            (new_c, new_h) = new_state
        else:
            num_proj = self._cell._num_units if self._cell._num_proj is None else \
				self._cell._num_proj
            prev_c = tf.slice(state, [0, 0], [-1, self._cell._num_units])
            prev_h = tf.slice(state, [0, self._cell._num_units], [-1, num_proj])
            new_c = tf.slice(new_state, [0, 0], [-1, self._cell._num_units])
            new_h = tf.slice(new_state, [0, self._cell._num_units], [-1, num_proj])

        # Apply zoneout
        if self.is_training:
            # nn.dropout takes keep_prob (probability to keep activations) not drop_prob (
			# probability to mask activations)!
            c = (1 - self._zoneout_cell) * tf.nn.dropout(new_c - prev_c,
                                                         (1 - self._zoneout_cell)) + prev_c
            h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h,
                                                            (1 - self._zoneout_outputs)) + prev_h

        else:
            c = (1 - self._zoneout_cell) * new_c + self._zoneout_cell * prev_c
            h = (1 - self._zoneout_outputs) * new_h + self._zoneout_outputs * prev_h

        new_state = tf.nn.rnn_cell.LSTMStateTuple(c, h) if self.state_is_tuple else tf.concat(1, [c,
                                                                                                  h])

        return output, new_state

This may be caused by multiline strings or comments not indented at the same level as the code.
W0702 10:20:42.435266 139984982034304 ag_logging.py:145] Entity <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>>: ValueError: Failed to parse source code of <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>>, which Python reported as:
    def __call__(self, inputs, state, scope=None):
        """Runs vanilla LSTM Cell and applies zoneout.
        """
        # Apply vanilla LSTM
        output, new_state = self._cell(inputs, state, scope)

        if self.state_is_tuple:
            (prev_c, prev_h) = state
            (new_c, new_h) = new_state
        else:
            num_proj = self._cell._num_units if self._cell._num_proj is None else \
				self._cell._num_proj
            prev_c = tf.slice(state, [0, 0], [-1, self._cell._num_units])
            prev_h = tf.slice(state, [0, self._cell._num_units], [-1, num_proj])
            new_c = tf.slice(new_state, [0, 0], [-1, self._cell._num_units])
            new_h = tf.slice(new_state, [0, self._cell._num_units], [-1, num_proj])

        # Apply zoneout
        if self.is_training:
            # nn.dropout takes keep_prob (probability to keep activations) not drop_prob (
			# probability to mask activations)!
            c = (1 - self._zoneout_cell) * tf.nn.dropout(new_c - prev_c,
                                                         (1 - self._zoneout_cell)) + prev_c
            h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h,
                                                            (1 - self._zoneout_outputs)) + prev_h

        else:
            c = (1 - self._zoneout_cell) * new_c + self._zoneout_cell * prev_c
            h = (1 - self._zoneout_outputs) * new_h + self._zoneout_outputs * prev_h

        new_state = tf.nn.rnn_cell.LSTMStateTuple(c, h) if self.state_is_tuple else tf.concat(1, [c,
                                                                                                  h])

        return output, new_state

This may be caused by multiline strings or comments not indented at the same level as the code.
WARNING: Entity <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>>: ValueError: Failed to parse source code of <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff0110f28>>, which Python reported as:
    def __call__(self, inputs, state, scope=None):
        """Runs vanilla LSTM Cell and applies zoneout.
        """
        # Apply vanilla LSTM
        output, new_state = self._cell(inputs, state, scope)

        if self.state_is_tuple:
            (prev_c, prev_h) = state
            (new_c, new_h) = new_state
        else:
            num_proj = self._cell._num_units if self._cell._num_proj is None else \
				self._cell._num_proj
            prev_c = tf.slice(state, [0, 0], [-1, self._cell._num_units])
            prev_h = tf.slice(state, [0, self._cell._num_units], [-1, num_proj])
            new_c = tf.slice(new_state, [0, 0], [-1, self._cell._num_units])
            new_h = tf.slice(new_state, [0, self._cell._num_units], [-1, num_proj])

        # Apply zoneout
        if self.is_training:
            # nn.dropout takes keep_prob (probability to keep activations) not drop_prob (
			# probability to mask activations)!
            c = (1 - self._zoneout_cell) * tf.nn.dropout(new_c - prev_c,
                                                         (1 - self._zoneout_cell)) + prev_c
            h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h,
                                                            (1 - self._zoneout_outputs)) + prev_h

        else:
            c = (1 - self._zoneout_cell) * new_c + self._zoneout_cell * prev_c
            h = (1 - self._zoneout_outputs) * new_h + self._zoneout_outputs * prev_h

        new_state = tf.nn.rnn_cell.LSTMStateTuple(c, h) if self.state_is_tuple else tf.concat(1, [c,
                                                                                                  h])

        return output, new_state

This may be caused by multiline strings or comments not indented at the same level as the code.
WARNING: Entity <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: converting <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>>: ValueError: Failed to parse source code of <bound method ZoneoutLSTMCell.__call__ of <synthesizer.models.modules.ZoneoutLSTMCell object at 0x7f4ff00aa9e8>>, which Python reported as:
    def __call__(self, inputs, state, scope=None):
        """Runs vanilla LSTM Cell and applies zoneout.
        """
        # Apply vanilla LSTM
        output, new_state = self._cell(inputs, state, scope)

        if self.state_is_tuple:
            (prev_c, prev_h) = state
            (new_c, new_h) = new_state
        else:
            num_proj = self._cell._num_units if self._cell._num_proj is None else \
				self._cell._num_proj
            prev_c = tf.slice(state, [0, 0], [-1, self._cell._num_units])
            prev_h = tf.slice(state, [0, self._cell._num_units], [-1, num_proj])
            new_c = tf.slice(new_state, [0, 0], [-1, self._cell._num_units])
            new_h = tf.slice(new_state, [0, self._cell._num_units], [-1, num_proj])

        # Apply zoneout
        if self.is_training:
            # nn.dropout takes keep_prob (probability to keep activations) not drop_prob (
			# probability to mask activations)!
            c = (1 - self._zoneout_cell) * tf.nn.dropout(new_c - prev_c,
                                                         (1 - self._zoneout_cell)) + prev_c
            h = (1 - self._zoneout_outputs) * tf.nn.dropout(new_h - prev_h,
                                                            (1 - self._zoneout_outputs)) + prev_h

        else:
            c = (1 - self._zoneout_cell) * new_c + self._zoneout_cell * prev_c
            h = (1 - self._zoneout_outputs) * new_h + self._zoneout_outputs * prev_h

        new_state = tf.nn.rnn_cell.LSTMStateTuple(c, h) if self.state_is_tuple else tf.concat(1, [c,
                                                                                                  h])

        return output, new_state

This may be caused by multiline strings or comments not indented at the same level as the code.
initialisation done /gpu:0
Initialized Tacotron model. Dimensions (? = dynamic shape): 
  Train mode:               False
  Eval mode:                False
  GTA mode:                 False
  Synthesis mode:           True
  Input:                    (?, ?)
  device:                   0
  embedding:                (?, ?, 512)
  enc conv out:             (?, ?, 512)
  encoder out (cond):       (?, ?, 768)
  decoder out:              (?, ?, 80)
  residual out:             (?, ?, 512)
  projected residual out:   (?, ?, 80)
  mel out:                  (?, ?, 80)
  <stop_token> out:         (?, ?)
  Tacotron Parameters       28.439 Million.
Loading checkpoint: synthesizer/saved_models/logs-pretrained//taco_pretrained/tacotron_model.ckpt-278000

Undefined names: run_live() and linear_dir

Where are run_live() and linear_dir defined? Are these missing imports?

flake8 testing of https://github.com/CorentinJ/Real-Time-Voice-Cloning on Python 3.7.1

$ flake8 . --count --select=E9,F63,F72,F82 --show-source --statistics

./synthesizer/synthesize.py:110:9: F821 undefined name 'run_live'
        run_live(args, checkpoint_path, hparams)
        ^
./synthesizer/train.py:347:46: F821 undefined name 'linear_dir'
                        np.save(os.path.join(linear_dir, linear_filename), linear_prediction.T,
                                             ^
2     F821 undefined name 'run_live'
2

E901,E999,F821,F822,F823 are the "showstopper" flake8 issues that can halt the runtime with a SyntaxError, NameError, etc. These 5 are different from most other flake8 issues which are merely "style violations" -- useful for readability but they do not effect runtime safety.

F821: undefined name name
F822: undefined name name in __all__
F823: local variable name referenced before assignment
E901: SyntaxError or IndentationError
E999: SyntaxError -- failed to compile a file into an Abstract Syntax Tree