facebookarchive / loop Goto Github PK

View Code? Open in Web Editor NEW

870.0 68.0 159.0 146 KB

A method to generate speech across multiple speakers

License: Other

Python 88.80% Jupyter Notebook 8.56% Shell 2.64%

loop's Issues

Pretrained checkpoints for Blizzard 2013 and LJSpeech

Hi There!

In your paper, you mentioned training on LJSpeech and Blizzard 2013. Can you release those as well?

Thanks!

problem setting ?

Hi,

Want to confirm if the problem setting for this research is like this:

Some known speakers trained from VCTK
Speaker in the wild / text try to mimic a known speaker in 1 through generate.py ?

error on generate.py execution

I get an error upon executing:
python generate.py --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 13 --checkpoint models/vctk/bestmodel.pth

(gpu_13) abhinav@ubuntu11:~/.../loop$ python generate.py  --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 13 --checkpoint models/vctk/bestmodel.pth
Traceback (most recent call last):
  File "generate.py", line 153, in <module>
    main()
  File "generate.py", line 132, in main
    out, attn = model([txt, spkr], feat)
  File "/home/abhinav/tensorflow/gpu_13/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/software/LM_stash/abhinav/projects/tts/loop/model.py", line 247, in forward
    context, ident = self.encoder(src[0], src[1])
  File "/home/abhinav/tensorflow/gpu_13/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/software/LM_stash/abhinav/projects/tts/loop/model.py", line 66, in forward
    outputs = self.lut_p(input)
  File "/home/abhinav/tensorflow/gpu_13/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/abhinav/tensorflow/gpu_13/local/lib/python2.7/site-packages/torch/nn/modules/sparse.py", line 94, in forward
    self.scale_grad_by_freq, self.sparse
  File "/home/abhinav/tensorflow/gpu_13/local/lib/python2.7/site-packages/torch/nn/_functions/thnn/sparse.py", line 48, in forward
    cls._renorm(indices, weight, max_norm, norm_type)
TypeError: _renorm() takes exactly 5 arguments (4 given)

I have followed all steps in the Setup segment

Blizzard Model

Can't use the Blizzard model without the original training data:

Traceback (most recent call last):
  File "generate.py", line 156, in <module>
    main()
  File "generate.py", line 83, in main
    train_dataset = NpzFolder(train_args.data + '/numpy_features')
  File "/home/michael/Desktop/loop/data.py", line 84, in __init__
    self.NPZ_EXTENSION))
RuntimeError: Found 0 npz in subfolders of: data/blizzard/numpy_features
Supported image extensions are: npz

Looks generate.py uses parameters in the training data to generate.py

RuntimeError: invalid argument 6: expected 3D tensor at /home/a524yangsen/soft/pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:442

when i run the command provided by the document
python train.py --expName vctk --data data/vctk --noise 4 --seq-len 100 --epochs 90

i counter the problem ,what is wrong? would you help me?
RuntimeError: invalid argument 6: expected 3D tensor at /home/a524yangsen/soft/pytorch/torch/lib/THC/generic/THCTensorMathBlas.cu:442

python generate.py --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 13 --checkpoint models/vctk/bestmodel.pth Traceback (most recent call last): File "generate.py", line 153, in <module> main() File "generate.py", line 142, in main norm_path) File "/mnt/sdb1/Learning/pytorch/loop/utils.py", line 257, in generate_merlin_wav weight=os.path.join(gen_dir, 'weight')), shell=True) File "/mnt/sdb1/Learning/pytorch/loop/utils.py", line 121, in pe for line in execute(cmd, shell=shell): File "/mnt/sdb1/Learning/pytorch/loop/utils.py", line 114, in execute raise subprocess.CalledProcessError(return_code, cmd) subprocess.CalledProcessError: Command 'echo 1 1 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 1.4 | /mnt/sdb1/Learning/pytorch/loop/tools/SPTK-3.9/x2x +af > /mnt/sdb1/Learning/pytorch/loop/models/vctk/results/weight' returned non-zero exit status 127

invalid combination of arguments error in training or generating

Once I've installed and tried to train or generate as described in the readme, it makes an invalid combination of arguments error as following:

$ python train.py --expName vctk --data data/vctk --noise 4 --seq-len 100 --epochs 90
INFO - 09/06/17 09:30:25 - 0:00:00 - Namespace(K=10, attention_alignment=0.05, batch_size=64, checkpoint='', clip_grad=0.5, data='data/vctk', epochs=90, expName='checkpoints/vctk', gpu=0, hidden_size=256, ignore_grad=10000.0, lr=0.0001, max_seq_len=1000, mem_size=20, noise=4, nspk=22, output_size=63, seed=1, seq_len=100, vocabulary_size=44)
INFO - 09/06/17 09:30:25 - 0:00:00 - Building dataset.
INFO - 09/06/17 09:30:25 - 0:00:00 - Dataset ready!
Traceback (most recent call last):
  File "train.py", line 207, in <module>
    main()
  File "train.py", line 175, in main
    model = Loop(args)
  File "/d2/jbaik/loop/model.py", line 217, in __init__
    self.decoder = Decoder(opt)
  File "/d2/jbaik/loop/model.py", line 137, in __init__
    opt.attention_alignment)
  File "/d2/jbaik/loop/model.py", line 87, in __init__
    self.N_a = getLinear(mem_elem, 3*K)
  File "/d2/jbaik/loop/model.py", line 15, in getLinear
    return nn.Sequential(nn.Linear(dim_in, dim_in/10),
  File "/home/jbaik/.pyenv/versions/3.6.2/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 41, in __init__
    self.weight = Parameter(torch.Tensor(out_features, in_features))
TypeError: torch.FloatTensor constructor received an invalid combination of arguments - got (float, int), but expected one of:
 * no arguments
 * (int ...)
      didn't match because some of the arguments have invalid types: (float, int)
 * (torch.FloatTensor viewed_tensor)
 * (torch.Size size)
 * (torch.FloatStorage data)
 * (Sequence data)

Could you let me get some hint to handle this? Thanks!

Adding new datasets to train

Hi,

How can you add new datasets (voices) for training? I want to use this datasets. https://linksync-2032.kxcdn.com/wp-content/uploads/2017/06/female-voice-1.zip

they are all in .wav files and I want them to add as a dataset so I can use that voice.

Memory usage

Hi, and thank you for realizing your code. Currently, I'm trying to reproduce VCTK results on an ec2 instance with a Kepler GPU, and more than an issue I have a question:

Tqdm report shows iterations are taking around 9 seconds:

Train (loss 50.62) epoch 2: 28%|##7 | 35/126 [05:01<13:02, 8.60s/it]
Train (loss 48.54) epoch 2: 40%|#### | 51/126 [07:29<12:21, 9.89s/it]

And nvidia-smi shows a very low memory usage:

1208MiB / 11439MiB

So, I'm not sure if I'm missing something or if that is the expected performance.

Thanks.

Thank you for releasing the code

I have tried to rebuild this model based on the details mentioned in the paper, but the result is bad. I used adam optimizer with 0.0002 lr and 0.5/0.9 momentum as well as gradient clip 1, but the gradient still explodes at the first few epochs. Now I can check what is wrong with my implementation.

What's vctk_alt model?

Is this trained on noise level 4?

New Language

First of all thank you for releasing the codes.
I would like to know how difficult will be to do the training on a speakers data on a new language such as Turkish. As far as I sow during the generation step there is need for some kind of pronunciation dictionary. But what about pre-processing steps, Merlin and other tools, are they language agnostic. Thank you in advance

Creating norm_info_mgc_lf0_vuv_bap_63_MVN.dat for the Full VCTK dataset

Hi There!

For large datasets, where extract_feats.py uses it's multifolder feature like the full VCTK dataset; it's unclear what the norm_info/norm.dat file is. The norm_info_mgc_lf0_vuv_bap_63_MVN.dat file is regenerated for each tmp split of the dataset. How do you create the norm_info/norm.dat for datasets with more than 5000 files?

I believe you had to deal with the same problem with the 22 speaker dataset because it contains around 8000 files.

Thanks for your time, Michael. Happy to contribute back the findings.

P.S. I've been commenting in https://gist.github.com/kastnerkyle/cc0ac48d34860c5bb3f9112f4d9a0300 about changes needed to make the extract_feats.py script work. I can't submit a pull request. I know many people are struggling to get it running.

Training on big dataset.

Hello loop experts,

If I have a big dataset say 12 person with around 50K data in total, I want to train a loop model, any parameters need to adjust?

Full VCTK dataset

Hi There!

Did you try training on the full VCTK dataset? Does the quality get better?
How long does it take to train on the 22 speakers VCTK dataset?

Can we use multi-GPU to make the training faster? How?

From English into Chinese?

Hi, how can I prepare Chinese material as training data? Thank you.

How do I train on a new speaker?

Confusion of the Phoneme Generation

In the paper, it says the phoneme transcription of the text is generated by CMU lexicon. However, in this code, it uses phonemizer, a toolkit uses US phoneset. There is a little difference in phoneme set and phoneme number between them. Besides, the paper also mentions that they added two phonemes for two pauses with different length, but I do not know where it is done in the code.

Thanks!

How to train the samples obtained from public？too much noise and speed too fast!

I obtain some samples from public, but these samples are too much noise and speed too fast. After training, I found that generated sound is very vague，can not separate tone and tone. How to train these noisy and faster sound samples？thanks!

Why scale the outputs by 30 in Decoder.update_buffer?

The 30 here seems to be a magic number unless I missed something in the paper?

    def update_buffer(self, S_tm1, c_t, o_tm1, ident):
        # concat previous output & context
        idt = torch.tanh(self.F_u(ident))
        o_tm1 = o_tm1.squeeze(0)
        z_t = torch.cat([c_t + idt, o_tm1/30], 1)
        z_t = z_t.unsqueeze(2)
        Sp = torch.cat([z_t, S_tm1[:, :, :-1]], 2)

        # update S
        u = self.N_u(Sp.view(Sp.size(0), -1))
        u[:, :idt.size(1)] = u[:, :idt.size(1)] + idt
        u = u.unsqueeze(2)
        S = torch.cat([u, S_tm1[:, :, :-1]], 2)

        return S

Thanks.

Reproducing the results

Hi, thanks for open sourcing the code!

I am trying to reproduce your results. However, I am running into problems. I have been training:

sequence length: 100
epoch: 90
only American accent VCTK speaker samples
noise level 4

So the problem is that only some speakers actually produce a speech signal based on the input. The majority of speakers only produce noise. However, the speech producing speakers are depended on the actual phoneme input. The problem seems to be that the attention does not work correctly for these samples. The attention basically stays at the beginning of the sequence and does not advance.

Did you have a similar issue when training the model? Or do you might have an idea what the problem could be?

good attention with speech output:
p226_009_11.pdf
p225_005_4.pdf

somewhat working:
p226_009_2.pdf

Most examples:
p226_009_9.pdf
p226_009_13.pdf
p226_009_1.pdf

Thanks!

Error during training

Hi,

So I have run the training a 4 months ago and there was no issue. But now when I add new a dataset and train with multiple speakers I get this error.

cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu:226

Can you please help?

Does anyone successfully train a multi-speaker loop model with his/her own data?

Or use the same dataset of VTCK speakers but extract the features locally?

What does a good loss curve look like?

After the first 90 epochs, I attained a loss of 36. Starting on the next 90 epochs with reduced noise and increased sequence length, it's dropped to 26.

How low should it get to for a quality model?

Feature extraction

I trained loop with a subset of vctk data (American speakers). I found that the audio from those speakers when I run generate.py using my trained model are pretty bad. I just hear only a couple of words in a sentence and the rest is silence or noise.

My guess is that something went wrong during feature extraction. When I compare same feature extracted files i.e. p294_001.npz from the given s3 bucket and the one I feature extracted by running extract_feats.py, I see that vuv_idx from s3 has larger numbers (range: -5 to 5) compared to mine (range: -10e-02 to 5 )

I also noticed that text_features and audio_features are of different shape:
(226, 420) - s3
(540, 420) - me

Other features like durations and code2phone also look different.

May I know what changes I've to make to the extract_feats.py to get similar features as the one in s3?

Training data is slow.

I'm on the section where training the data. It took 10hrs for the first 33 epochs out of 90. Is it normal or did I miss something? I'm new to this so I'm not that expert in this field.

Thanks.

Error when synthesis wave with the download model.

Hi,

When I try to generate some waves using the download model, it has some errors when dealing with some sentences, the error type is same, not sure why. Could you please help?

Sentence1:
The boy's grandmother is his legal guardian.
Cmd:
python generate.py --text "The boy's grandmother is his legal guardian." --spkr 1 --checkpoint models/vctk/bestmodel.pth
Error:
Traceback (most recent call last):
File "generate.py", line 151, in
main()
File "generate.py", line 140, in main
norm_path)
File "/kaldi/loop/utils.py", line 266, in generate_merlin_wav
base_r0=files['mgc'] + '_r0'), shell=True)
File "/kaldi/loop/utils.py", line 121, in pe
for line in execute(cmd, shell=shell):
File "/kaldi/loop/utils.py", line 114, in execute
raise subprocess.CalledProcessError(return_code, cmd)
subprocess.CalledProcessError: Command '/kaldi/loop/tools/bin/SPTK-3.9/freqt -m 59 -a 0.58 -M 511 -A 0 < The_boy's_grandmother_is_his_legal_guardian..gen_1.mgc | /kaldi/loop/tools/bin/SPTK-3.9/c2acr -m 511 -M 0 -l 1024 > The_boy's_grandmother_is_his_legal_guardian..gen_1.mgc_r0' returned non-zero exit status 2

Sentence2:
When he's able to return to campaigning, Santorum will have to decide whether he wants to.
Cmd:
python generate.py --text "When he's able to return to campaigning, Santorum will have to decide whether he wants to." --spkr 1 --checkpoint models/vctk/bestmodel.pth
Error:
Traceback (most recent call last):
File "generate.py", line 151, in
main()
File "generate.py", line 140, in main
norm_path)
File "/kaldi/loop/utils.py", line 266, in generate_merlin_wav
base_r0=files['mgc'] + '_r0'), shell=True)
File "/kaldi/loop/utils.py", line 121, in pe
for line in execute(cmd, shell=shell):
File "/kaldi/loop/utils.py", line 114, in execute
raise subprocess.CalledProcessError(return_code, cmd)
subprocess.CalledProcessError: Command '/kaldi/loop/tools/bin/SPTK-3.9/freqt -m 59 -a 0.58 -M 511 -A 0 < When_he's_able_to_return_to_campaigning,_Santorum_will_have_to_decide_whether_he_wants_to..gen_1.mgc | /kaldi/loop/tools/bin/SPTK-3.9/c2acr -m 511 -M 0 -l 1024 > When_he's_able_to_return_to_campaigning,_Santorum_will_have_to_decide_whether_he_wants_to..gen_1.mgc_r0' returned non-zero exit status 2

shape mismatch of "audio_features" between downloaded and generated npz files

There is a shape mismatch in the audio_features array of npz files between data uploaded by you and npz generated by using your extract_features script by Kyle.

For eg.
in p299_405.npz,
shape of audio_features is (393,60) for uploaded npz file
shape is (829,60) for npz created by the extract_feats script.

This issue could possibly stem from silences not being removed by the extract_feats script, while it has been removed from the uploaded data.

Can you please recommend a solution for this?

Any clue? Missing norm_info_mgc_lf0_vuv_bap_63_MVN.dat while prepossessing Lj Speech Dataset.

I am getting below error while preprocessing Lj Speech data set.

Traceback (most recent call last):
  File "extract_feats.py", line 1406, in <module>
    save_numpy_features()
  File "extract_feats.py", line 853, in save_numpy_features
    shutil.copy2(audio_norm_source, audio_norm_dest)
  File "/usr/lib/python2.7/shutil.py", line 130, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 82, in copyfile
    with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '/home/jax/latest_features/final_acoustic_data/norm_info_mgc_lf0_vuv_bap_63_MVN.dat'

Can loop support parallel training in multiple GPU?

Hi Loop experts,

Currently, I have repro the original loop model with 8K vctk data, it tooks around 3 days in my Ubuntu GPU server, the server have 2 GPU.
So can loop support parallel training in multiple GPU to accelerate the training?

Thanks.

How much minutes of audio datasets to train for a single speaker using blizzard model?

can not run the demo

I tested the demo but failed

python generate.py --text "hello world" --spkr 1 --checkpoint models/vctk/bestmodel.pth
Traceback (most recent call last):
File "generate.py", line 153, in
main()
File "generate.py", line 132, in main
out, attn = model([txt, spkr], feat)
File "/usr/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/xinwang/voiceloop/loop/model.py", line 247, in forward
context, ident = self.encoder(src[0], src[1])
File "/usr/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/xinwang/voiceloop/loop/model.py", line 66, in forward
outputs = self.lut_p(input)
File "/usr/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python2.7/site-packages/torch/nn/modules/sparse.py", line 94, in forward
self.scale_grad_by_freq, self.sparse
File "/usr/local/lib/python2.7/site-packages/torch/nn/_functions/thnn/sparse.py", line 48, in forward
cls._renorm(indices, weight, max_norm, norm_type)
TypeError: _renorm() takes exactly 5 arguments (4 given)

How much hours or minutes of training data do we need to make the performance very good?

No numpy_features_valid

Hello,

After running extract_feats.py all went through except I don't see any numpy_features_valid. Is it still needed? Do I manually create that?

How long speech I can synthesize from the text?

I've tried, and it works well. Thanks for sharing the great project!
I wonder how long speech it can generate from text. It seems limited < 3 secs if I tried a little long sentence. Where the limit is originated from and how to make it longer? Is it related to the --seq-len option in training?
Thank you!

Decoder input

In the paper, decoder input seems to mix previous decoder output and ground truth input (+noise).
But it seems the decoder in the code only uses ground truth input with noise.
Am I missing something?

self.training？

Hi, I am reading your code and the code is really clean.
I noticed that in the class 'Loop' and 'Decoder' in python file 'model.py', 'self.training' is not defined but used as a condition statement. The inherited class 'torch.nn.Module' doesn't have an attribute named 'training' either.

New Dataset

Hi, So everything worked perfectly with your pre-process Vctk. Now I want to test with Nancy data set. I'm using the script you suggested, but I have 2 questions:

When I run the script I get 2 files on the norm_info folder: label_norm_HTS_420.dat and norm_info_mgc_lf0_vuv_bap_63_MVN.dat. Based on the shape the correct file is norm_info_mgc_lf0_vuv_bap_63_MVN.dat, but I want to be sure.
In order to combine both datasets, should I have to run the script for each speaker and them combine somehow the norms file, or should I put all data in one folder and process it?

Thanks.

Issue on generating with --text param

Hi when I try to run
sudo python generate.py --text "hello world" --spkr 1 --checkpoint models/vctk/bestmodel.pth
I always get this error.

Traceback (most recent call last):
  File "generate.py", line 153, in <module>
    main()
  File "generate.py", line 112, in main
    txt = text2phone(args.text, char2code)
  File "generate.py", line 43, in text2phone
    cmudict = nltk.corpus.cmudict.dict()
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/util.py", line 116, in __getattr__
    self.__load()
  File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/util.py", line 81, in __load
    except LookupError: raise e
LookupError: 
**********************************************************************
  Resource cmudict not found.
  Please use the NLTK Downloader to obtain the resource:
  >>> import nltk
  >>> nltk.download('cmudict')
  
  Searched in:
    - '/home/jax/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
**********************************************************************

Thanks in advance.

Face 'Not a finite gradient or too big, ignoring.' when training other data.

Hello,

I met some issues when training loop model with my own data. please help.
I am preparing a data set with 12 person and total 5000 sentences. I am using the parameters in the readme guide to training:
python train.py --expName myexp--data data/mydata--noise 4 --seq-len 100 --epochs 90 --nspk 12
python train.py --expName myexp_final --data data/mydata--checkpoint checkpoints/myexp/bestmodel.pth --noise 2 --seq-len 1000 --epochs 90 --nspk 12

The first training is done and seems no issues. with some logs in the last lines:

INFO - 11/16/17 08:03:41 - 21:26:30 - ====> Train set loss: 31.4378
INFO - 11/16/17 08:04:01 - 21:26:50 - ====> Test set loss: 32.6544
INFO - 11/16/17 08:18:16 - 21:41:05 - ====> Train set loss: 31.4457
INFO - 11/16/17 08:18:37 - 21:41:26 - ====> Test set loss: 32.5302

But when start training with the second line, in first epoch. it start showing: 'Not a finite gradient or too big, ignoring.' frequently. I have print the befgad in utils.py in below line:
befgad = torch.nn.utils.clip_grad_norm(params, clip_th)

it has some values larger than 10000 like below ones
42648.5450444
1599437.41826
167695.944851

I have tried another experiments with 12 person and 10000 sentences, the same issue happened when training the second model.

My questions are:

Why we separate the training with 2 steps?
Need I adjust some parameters for training or what is the problems?

Thanks.

ConnectionError: HTTPConnectionPool(host='localhost', port=8097)

Anyone else getting this error?

python train.py --expName vctk --data data/vctk --noise 4 --seq-len 100 --epochs 90
INFO - 09/06/17 08:52:38 - 0:00:00 - Namespace(K=10, attention_alignment=0.05, batch_size=64, checkpoint='', clip_grad=0.5, data='data/vctk', epochs=90, expName='checkpoints/vctk', gpu=0, hidden_size=256, ignore_grad=10000.0, lr=0.0001, max_seq_len=1000, mem_size=20, noise=4, nspk=22, output_size=63, seed=1, seq_len=100, vocabulary_size=44)
INFO - 09/06/17 08:52:38 - 0:00:00 - Building dataset.
INFO - 09/06/17 08:52:38 - 0:00:00 - Dataset ready!
Train (loss 50.63) epoch 1: 100%|█████████████| 126/126 [11:06<00:00,  4.17s/it]
Exception in user code:
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 240, in _send
    data=json.dumps(msg),
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc867a836d0>: Failed to establish a new connection: [Errno 111] Connection refused',))
INFO - 09/06/17 09:03:46 - 0:11:08 - ====> Train set loss: 55.6526
Valid (loss 51.16) epoch 1: 100%|███████████████| 11/11 [00:17<00:00,  1.73s/it]
Exception in user code:
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 240, in _send
    data=json.dumps(msg),
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc858e3abd0>: Failed to establish a new connection: [Errno 111] Connection refused',))
INFO - 09/06/17 09:04:04 - 0:11:26 - ====> Test set loss: 51.7753
Train (loss 42.89) epoch 2: 100%|█████████████| 126/126 [11:18<00:00,  5.10s/it]
Exception in user code:
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 240, in _send
    data=json.dumps(msg),
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc8617b41d0>: Failed to establish a new connection: [Errno 111] Connection refused',))
INFO - 09/06/17 09:15:23 - 0:22:44 - ====> Train set loss: 49.4345
Valid (loss 47.57) epoch 2: 100%|███████████████| 11/11 [00:17<00:00,  1.73s/it]
Exception in user code:
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 240, in _send
    data=json.dumps(msg),
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc85b904c10>: Failed to establish a new connection: [Errno 111] Connection refused',))
INFO - 09/06/17 09:15:40 - 0:23:02 - ====> Test set loss: 48.0748
Train (loss 45.03) epoch 3: 100%|█████████████| 126/126 [10:59<00:00,  4.99s/it]
Exception in user code:
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/visdom/__init__.py", line 240, in _send
    data=json.dumps(msg),
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc862629190>: Failed to establish a new connection: [Errno 111] Connection refused',))
INFO - 09/06/17 09:26:41 - 0:34:02 - ====> Train set loss: 46.5043
Valid (loss 44.79) epoch 3: 100%|███████████████| 11/11 [00:17<00:00,  1.73s/it]
Exception in user code:
------------------------------------------------------------

What is the split ratio between the training set and test set of the VCTK data provided in your project?

espeak or espeak-NG?

[REDACTED]

Issue for training on new Dataset.

Hi,

Thanks for sharing the project and I am doing some experiment with the tools. I have 2 questions.

the npz file download with download_data.sh is different with the ones generated by the extract_feats.py according to the same sample wave/text file. let's say p294_001. Why is this happened? other arrays are also have some differences.
download one:
phonemes
[28 22 19 41 21 3 22 31 34 11 22 5]
durations
[29 4 25 18 21 27 11 32 7 12 39 3]
extract one:
phonemes
[28 22 19 40 21 3 22 31 33 11 22 5]
durations
[ 9 6 23 33 6 17 24 32 3 14 28 32]
If I want to retrain the model using the data, I need to extract features to prepare the npz files, do I need to put the training set and validation set together to run extract_feats.py and get the norm.dat? or I need only deal with the training data to get the norm.dat then kick-off training?

Thank you for your guidance in advanced. :)

Can we use the pretrained model that was used on this project to a new speaker?

' out of memory' accrued when training with single speaker

Hi,

Below error accrued when training with single speaker:
Train (loss 63.31) epoch 1: 3%|████▍ | 1/29 [00:23<11:10, 23.93s/it]THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "train.py", line 211, in
main()
File "train.py", line 199, in main
train(model, criterion, optimizer, epoch, train_losses)
File "train.py", line 122, in train
loss.backward()
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 146, in backward
self._execution_engine.run_backward((self,), (gradient,), retain_variables)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/_functions/linear.py", line 24, in backward
grad_weight = torch.mm(grad_output.t(), input)
RuntimeError: cuda runtime error (2) : out of memory at /b/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.cu:66

Could you please help?

Training Epochs Number

How many epochs of training are needed in order to start hearing anything meaningful on output of generate.py? tnx!

slow down the voice

I find it speak too fast, how can I slow down the voice?

facebookarchive / loop Goto Github PK

loop's Issues

Recommend Projects

Recommend Topics

Recommend Org