I wanted to try and synthesize a short sample using a model I've been training before

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Inference troubles on Windows about mellotron HOT 21 OPEN

nvidia commented on August 23, 2024

Inference troubles on Windows

from mellotron.

Comments (21)

camjac251 commented on August 23, 2024 1

Ah, I see, that makes sense. I'll go ahead and install the missing dependency then. Hopefully it'll like the latest version. Thank you :)

from mellotron.

CookiePPP commented on August 23, 2024

just a simple missing dependency, so I ignored it and moved on to see the rest of the code.

The code stops at the missing dependency and so everything else in the first block has not been imported.

import librosa
import torch

from hparams import create_hparams
from model import Tacotron2, load_model
from waveglow.denoiser import Denoiser
from layers import TacotronSTFT
from data_utils import TextMelLoader, TextMelCollate
from text import cmudict, text_to_sequence
from mellotron_utils import get_data_from_musicxml

was never run.

I'd suggest installing any dependencies you can. Most are required to run the notebook.

from mellotron.

camjac251 commented on August 23, 2024

I opted to use version 0.25.3 of pandas since it was around the same time this project was uploaded.
I had to also add in the existing version of numpy or else it would update it to latest and cause issues I believe conda install pandas=0.25.3 numpy=1.16.4.

No errors anymore except for IndentationError: unexpected indent, harmless.
I guess I just rename checkpoint_##### to mellotron_ljs.pt, or is there a conversion process of checkpoints to the pt extension for inference?
Going to attempt to train waveglow next before running the full inference code

from mellotron.

CookiePPP commented on August 23, 2024

It's easier to rename

checkpoint_path = "models/mellotron_libritts.pt"

checkpoint_path = "outdir/checkpoint_XXXXXXX"

inside the notebook.
I don't believe any conversion is required to test the checkpoint.

from mellotron.

camjac251 commented on August 23, 2024

I must be missing something about the training procedure. I followed the waveglow readme on training instructions because I thought you use a seperate mellotron and waveglow model to synthesize results. But when I try to train I get

(condaenv): python train.py -c config.json
Traceback (most recent call last):
  File "train.py", line 39, in <module>
    from mel2samp import Mel2Samp
  File "C:\Users\camja\Desktop\mellotron\waveglow\mel2samp.py", line 38, in <module>
    from tacotron2.layers import TacotronSTFT
ModuleNotFoundError: No module named 'tacotron2.layers'

from mellotron.

camjac251 commented on August 23, 2024

If I try to run it with the waveglow model available on the readme, I get this error

C:\Users\camja\anaconda3\envs\mello\lib\site-packages\torch\serialization.py:593: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)

which might be breaking it.
Because this is my result

and the audio it created https://voca.ro/kFaOGGxbLAj

from mellotron.

CookiePPP commented on August 23, 2024

@camjac251
The Predicted Mel is from Mellotron therefore Mellotron is the one acting up here.

from mellotron.

CookiePPP commented on August 23, 2024

@camjac251
Did you start your Mellotron model from scratch?
The Source rhythm should be a diagonal line where each text input would match to a part of the output time however this requires it to be trained to a decent degree without using pretrained weights as the starting point.

from mellotron.

camjac251 commented on August 23, 2024

Yeah I did. I started from nothing with the LJS speech dataset. It initially trained and had to be restarted every now and then so I would train with this python train.py --output_directory=outdir --log_directory=logdir --checkpoint_path outdir/checkpoint_##### every time. I did see another issue about quality loss with resuming #30 so I changed use_saved_learning_rate=True, but didn't touch ignore_layers=['speaker_embedding.weight']. I trained it up to 16,000 iterations when it started to look good on the predicted.

from mellotron.

CookiePPP commented on August 23, 2024

@camjac251
Refer to the notebook.
https://github.com/NVIDIA/mellotron/blob/master/inference.ipynb
You can see that there is a Green/Yellow line in the last graph.
That's you alignment aka how well the model has learned to link the text/f0 to the audio.
Your tensorboard output shows the the model is still learning alignment (top image in your comment).

from mellotron.

camjac251 commented on August 23, 2024

Those breaks are when my machine turned off during training, it's happened at least 20 times I think during training. I can only get 6 hours of constant training in a row per day before short bursts of restarting it.

from mellotron.

camjac251 commented on August 23, 2024

I let it run longer and tried today to get a result

@rafaelvalle Is this a ok to ignore with waveglow?

C:\Users\camja\anaconda3\envs\mello\lib\site-packages\torch\serialization.py:593: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)

I feel like it might be why my audio is sounding like this
https://voca.ro/bT4CP3It1BV

Edit: If waveglow requires pytorch 1.0, then how come mellotron doesn't impose this same requirement in the readme if tacotron2 does.

from mellotron.

rafaelvalle commented on August 23, 2024

The issue comes from the mel-spectrogram you are producing.
Your model hasn't learned to attend yet.

from mellotron.

camjac251 commented on August 23, 2024

It's been a while since I last tried training, but that was after 2 weeks or so of constant training (at least uninterrupted, there were other sessions before)
I thought maybe setting the suggested settings from #30 could help but it might've hurt instead. I changed in my hqparams use_saved_learning_rate=True and ignore_layers=[]

Do commas and breaths mess up the alignment of data with training? I have quite a few samples that repeat half of the word before saying the full word, but I tried to add in those to the transcribed part and commas when a thought is changed mid sentence.

from mellotron.

rafaelvalle commented on August 23, 2024

Can you share a screenshot of your tensorboard logs with training and validation curves, the attention maps and predicted mel-spectrograms ?

from mellotron.

camjac251 commented on August 23, 2024

I'm not sure where to find the attention map. I went back and forth with the hqparam settings and think I even started over with the LJSpeech model as the starter and this might be the result of that, I can't remember. I started with warm start and that would run for a few weeks, I believe.

from mellotron.

rafaelvalle commented on August 23, 2024

The validation loss is going up, showing evidence that your model is overfitting.

from mellotron.

camjac251 commented on August 23, 2024

Does it need just more time and training data?

from mellotron.

rafaelvalle commented on August 23, 2024

Take a look at issues related to overfitting in the tacotron2 repo.
https://github.com/NVIDIA/tacotron2

from mellotron.

camjac251 commented on August 23, 2024

Ok thank you. I'll look for answers there.
I've been able to generate audio that sounded like the voice I was training with but some words in the sentence sounded a bit slurred or were missing in the generated audio. I was worried that it might've been my training set and that more time training wouldn't have helped.

from mellotron.

rafaelvalle commented on August 23, 2024

Augment your data if you can.

from mellotron.

Inference troubles on Windows about mellotron HOT 21 OPEN

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent