Comments (21)
Ah, I see, that makes sense. I'll go ahead and install the missing dependency then. Hopefully it'll like the latest version. Thank you :)
from mellotron.
just a simple missing dependency, so I ignored it and moved on to see the rest of the code.
The code stops at the missing dependency and so everything else in the first block has not been imported.
import librosa
import torch
from hparams import create_hparams
from model import Tacotron2, load_model
from waveglow.denoiser import Denoiser
from layers import TacotronSTFT
from data_utils import TextMelLoader, TextMelCollate
from text import cmudict, text_to_sequence
from mellotron_utils import get_data_from_musicxml
was never run.
I'd suggest installing any dependencies you can. Most are required to run the notebook.
from mellotron.
I opted to use version 0.25.3 of pandas since it was around the same time this project was uploaded.
I had to also add in the existing version of numpy or else it would update it to latest and cause issues I believe conda install pandas=0.25.3 numpy=1.16.4
.
No errors anymore except for IndentationError: unexpected indent
, harmless.
I guess I just rename checkpoint_#####
to mellotron_ljs.pt
, or is there a conversion process of checkpoints to the pt
extension for inference?
Going to attempt to train waveglow next before running the full inference code
from mellotron.
It's easier to rename
checkpoint_path = "models/mellotron_libritts.pt"
to
checkpoint_path = "outdir/checkpoint_XXXXXXX"
inside the notebook.
I don't believe any conversion is required to test the checkpoint.
from mellotron.
I must be missing something about the training procedure. I followed the waveglow readme on training instructions because I thought you use a seperate mellotron and waveglow model to synthesize results. But when I try to train I get
(condaenv): python train.py -c config.json
Traceback (most recent call last):
File "train.py", line 39, in <module>
from mel2samp import Mel2Samp
File "C:\Users\camja\Desktop\mellotron\waveglow\mel2samp.py", line 38, in <module>
from tacotron2.layers import TacotronSTFT
ModuleNotFoundError: No module named 'tacotron2.layers'
from mellotron.
If I try to run it with the waveglow model available on the readme, I get this error
C:\Users\camja\anaconda3\envs\mello\lib\site-packages\torch\serialization.py:593: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
which might be breaking it.
Because this is my result
and the audio it created https://voca.ro/kFaOGGxbLAj
from mellotron.
@camjac251
The Predicted Mel is from Mellotron therefore Mellotron is the one acting up here.
from mellotron.
@camjac251
Did you start your Mellotron model from scratch?
The Source rhythm should be a diagonal line where each text input would match to a part of the output time however this requires it to be trained to a decent degree without using pretrained weights as the starting point.
from mellotron.
Yeah I did. I started from nothing with the LJS speech dataset. It initially trained and had to be restarted every now and then so I would train with this python train.py --output_directory=outdir --log_directory=logdir --checkpoint_path outdir/checkpoint_#####
every time. I did see another issue about quality loss with resuming #30 so I changed use_saved_learning_rate=True
, but didn't touch ignore_layers=['speaker_embedding.weight']
. I trained it up to 16,000 iterations when it started to look good on the predicted.
from mellotron.
@camjac251
Refer to the notebook.
https://github.com/NVIDIA/mellotron/blob/master/inference.ipynb
You can see that there is a Green/Yellow line in the last graph.
That's you alignment aka how well the model has learned to link the text/f0 to the audio.
Your tensorboard output shows the the model is still learning alignment (top image in your comment).
from mellotron.
Those breaks are when my machine turned off during training, it's happened at least 20 times I think during training. I can only get 6 hours of constant training in a row per day before short bursts of restarting it.
from mellotron.
I let it run longer and tried today to get a result
@rafaelvalle Is this a ok to ignore with waveglow?
C:\Users\camja\anaconda3\envs\mello\lib\site-packages\torch\serialization.py:593: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
I feel like it might be why my audio is sounding like this
https://voca.ro/bT4CP3It1BV
Edit: If waveglow requires pytorch 1.0, then how come mellotron doesn't impose this same requirement in the readme if tacotron2 does.
from mellotron.
The issue comes from the mel-spectrogram you are producing.
Your model hasn't learned to attend yet.
from mellotron.
It's been a while since I last tried training, but that was after 2 weeks or so of constant training (at least uninterrupted, there were other sessions before)
I thought maybe setting the suggested settings from #30 could help but it might've hurt instead. I changed in my hqparams use_saved_learning_rate=True
and ignore_layers=[]
Do commas and breaths mess up the alignment of data with training? I have quite a few samples that repeat half of the word before saying the full word, but I tried to add in those to the transcribed part and commas when a thought is changed mid sentence.
from mellotron.
Can you share a screenshot of your tensorboard logs with training and validation curves, the attention maps and predicted mel-spectrograms ?
from mellotron.
I'm not sure where to find the attention map. I went back and forth with the hqparam settings and think I even started over with the LJSpeech model as the starter and this might be the result of that, I can't remember. I started with warm start and that would run for a few weeks, I believe.
from mellotron.
The validation loss is going up, showing evidence that your model is overfitting.
from mellotron.
Does it need just more time and training data?
from mellotron.
Take a look at issues related to overfitting in the tacotron2 repo.
https://github.com/NVIDIA/tacotron2
from mellotron.
Ok thank you. I'll look for answers there.
I've been able to generate audio that sounded like the voice I was training with but some words in the sentence sounded a bit slurred or were missing in the generated audio. I was worried that it might've been my training set and that more time training wouldn't have helped.
from mellotron.
Augment your data if you can.
from mellotron.
Related Issues (20)
- NoneType' object is not iterable
- Mismatch model volume
- Training on a different language HOT 1
- Inference without rhythm and pitch
- parse_output error with Blizzard2013 data
- Training on EmovDB HOT 2
- Voice synthesis by model is not the same as the voice with speaker ID HOT 1
- Try to train some new words
- inference speed on CPU
- Training time
- Two key points of training multispeaker mellotron
- how to train?
- colab demo for inferenece
- How to generate .musicxml files like the examples in `/data`? HOT 1
- Synthesize own text without style transfer gives poor audio results HOT 1
- Here's some code to start mellotron inference by calling a .py file from CLI [Docs]
- What is the reason of filtering "_" and "~" symbols?
- Something wrong with text padding HOT 5
- Can I use TensorRT to speed up model inference?
- colab error HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mellotron.