Giter VIP home page Giter VIP logo

Comments (21)

camjac251 avatar camjac251 commented on August 23, 2024 1

Ah, I see, that makes sense. I'll go ahead and install the missing dependency then. Hopefully it'll like the latest version. Thank you :)

from mellotron.

CookiePPP avatar CookiePPP commented on August 23, 2024

just a simple missing dependency, so I ignored it and moved on to see the rest of the code.

The code stops at the missing dependency and so everything else in the first block has not been imported.

import librosa
import torch

from hparams import create_hparams
from model import Tacotron2, load_model
from waveglow.denoiser import Denoiser
from layers import TacotronSTFT
from data_utils import TextMelLoader, TextMelCollate
from text import cmudict, text_to_sequence
from mellotron_utils import get_data_from_musicxml

was never run.


I'd suggest installing any dependencies you can. Most are required to run the notebook.

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

I opted to use version 0.25.3 of pandas since it was around the same time this project was uploaded.
I had to also add in the existing version of numpy or else it would update it to latest and cause issues I believe conda install pandas=0.25.3 numpy=1.16.4.

No errors anymore except for IndentationError: unexpected indent, harmless.
I guess I just rename checkpoint_##### to mellotron_ljs.pt, or is there a conversion process of checkpoints to the pt extension for inference?
Going to attempt to train waveglow next before running the full inference code

from mellotron.

CookiePPP avatar CookiePPP commented on August 23, 2024

It's easier to rename

checkpoint_path = "models/mellotron_libritts.pt"

to

checkpoint_path = "outdir/checkpoint_XXXXXXX"

inside the notebook.
I don't believe any conversion is required to test the checkpoint.

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

I must be missing something about the training procedure. I followed the waveglow readme on training instructions because I thought you use a seperate mellotron and waveglow model to synthesize results. But when I try to train I get

(condaenv): python train.py -c config.json
Traceback (most recent call last):
  File "train.py", line 39, in <module>
    from mel2samp import Mel2Samp
  File "C:\Users\camja\Desktop\mellotron\waveglow\mel2samp.py", line 38, in <module>
    from tacotron2.layers import TacotronSTFT
ModuleNotFoundError: No module named 'tacotron2.layers'

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

If I try to run it with the waveglow model available on the readme, I get this error

C:\Users\camja\anaconda3\envs\mello\lib\site-packages\torch\serialization.py:593: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)

which might be breaking it.
Because this is my result
image
and the audio it created https://voca.ro/kFaOGGxbLAj

from mellotron.

CookiePPP avatar CookiePPP commented on August 23, 2024

@camjac251
The Predicted Mel is from Mellotron therefore Mellotron is the one acting up here.

from mellotron.

CookiePPP avatar CookiePPP commented on August 23, 2024

@camjac251
Did you start your Mellotron model from scratch?
The Source rhythm should be a diagonal line where each text input would match to a part of the output time however this requires it to be trained to a decent degree without using pretrained weights as the starting point.

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

Yeah I did. I started from nothing with the LJS speech dataset. It initially trained and had to be restarted every now and then so I would train with this python train.py --output_directory=outdir --log_directory=logdir --checkpoint_path outdir/checkpoint_##### every time. I did see another issue about quality loss with resuming #30 so I changed use_saved_learning_rate=True, but didn't touch ignore_layers=['speaker_embedding.weight']. I trained it up to 16,000 iterations when it started to look good on the predicted.
individualImage
individualImage2
individualImage3
individualImage4

from mellotron.

CookiePPP avatar CookiePPP commented on August 23, 2024

@camjac251
Refer to the notebook.
https://github.com/NVIDIA/mellotron/blob/master/inference.ipynb
You can see that there is a Green/Yellow line in the last graph.
That's you alignment aka how well the model has learned to link the text/f0 to the audio.
Your tensorboard output shows the the model is still learning alignment (top image in your comment).

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

Those breaks are when my machine turned off during training, it's happened at least 20 times I think during training. I can only get 6 hours of constant training in a row per day before short bursts of restarting it.

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

I let it run longer and tried today to get a result
image

@rafaelvalle Is this a ok to ignore with waveglow?

C:\Users\camja\anaconda3\envs\mello\lib\site-packages\torch\serialization.py:593: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)

I feel like it might be why my audio is sounding like this
https://voca.ro/bT4CP3It1BV

Edit: If waveglow requires pytorch 1.0, then how come mellotron doesn't impose this same requirement in the readme if tacotron2 does.

from mellotron.

rafaelvalle avatar rafaelvalle commented on August 23, 2024

The issue comes from the mel-spectrogram you are producing.
Your model hasn't learned to attend yet.

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

It's been a while since I last tried training, but that was after 2 weeks or so of constant training (at least uninterrupted, there were other sessions before)
I thought maybe setting the suggested settings from #30 could help but it might've hurt instead. I changed in my hqparams use_saved_learning_rate=True and ignore_layers=[]

Do commas and breaths mess up the alignment of data with training? I have quite a few samples that repeat half of the word before saying the full word, but I tried to add in those to the transcribed part and commas when a thought is changed mid sentence.

from mellotron.

rafaelvalle avatar rafaelvalle commented on August 23, 2024

Can you share a screenshot of your tensorboard logs with training and validation curves, the attention maps and predicted mel-spectrograms ?

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

I'm not sure where to find the attention map. I went back and forth with the hqparam settings and think I even started over with the LJSpeech model as the starter and this might be the result of that, I can't remember. I started with warm start and that would run for a few weeks, I believe.

Shadow_2020-07-09_21-11-48
Shadow_2020-07-09_21-14-18

from mellotron.

rafaelvalle avatar rafaelvalle commented on August 23, 2024

The validation loss is going up, showing evidence that your model is overfitting.

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

Does it need just more time and training data?

from mellotron.

rafaelvalle avatar rafaelvalle commented on August 23, 2024

Take a look at issues related to overfitting in the tacotron2 repo.
https://github.com/NVIDIA/tacotron2

from mellotron.

camjac251 avatar camjac251 commented on August 23, 2024

Ok thank you. I'll look for answers there.
I've been able to generate audio that sounded like the voice I was training with but some words in the sentence sounded a bit slurred or were missing in the generated audio. I was worried that it might've been my training set and that more time training wouldn't have helped.

from mellotron.

rafaelvalle avatar rafaelvalle commented on August 23, 2024

Augment your data if you can.

from mellotron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.