Previously I trained on a small dataset where the speaker was recorded in a single sho

I went for a GIF, hope it's clearer now: No zoom: <a href="https://i.imgur.com/NsM

Alright, I hope it's better now: <a href="https://i.imgur.com/VRHpvd9.png" rel="nofoll

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

The most optimal audio settings for training dataset preparation? about mellotron HOT 10 OPEN

nvidia commented on July 21, 2024

The most optimal audio settings for training dataset preparation?

from mellotron.

Comments (10)

rafaelvalle commented on July 21, 2024

Share an image of the training and validation loss, the rhythm (alignment map) and the pitch contour (f0) used during inference such that we can investigate better.

from mellotron.

AndroYD84 commented on July 21, 2024

Here's the image of the training and validation loss and the rhythm (alignment map) and the pitch contour (f0) used during inference.
The style transfer results are likely going to get better considering how early the train is (checkpoint at 35500 iterations, I also have it at 36000 but it couldn't generate any sound for some reason), however, the singing voice results are insanely loud and clipping most of the time, which is weird because I don't remember it to be like this with the previous dataset I tested.
I have shared the entire dataset here if you would like to check, thanks!
P.S.: Now that I think about it, I'm using only upper-case letters for the transcription, could this affect the results compared to having lower-case letters?

from mellotron.

rafaelvalle commented on July 21, 2024

Set the smoothing factor of your validation loss to 0 and zoom in such that we can take a look at the curve. My suspicion is that your not picking the model with the lowest validation loss.

Also, would love to hear singing samples from donald trump if you have them :-)

from mellotron.

AndroYD84 commented on July 21, 2024

I went for a GIF, hope it's clearer now:
No zoom: https://i.imgur.com/NsMzzTO.gif
With zoom: https://i.imgur.com/HmkV5X7.gif

from mellotron.

rafaelvalle commented on July 21, 2024

Just zoom in on the long line after the big drop such that we can see the variation on that line.

from mellotron.

AndroYD84 commented on July 21, 2024

Alright, I hope it's better now: https://i.imgur.com/VRHpvd9.png
Oh and I'll definitely share some Trump samples as soon as I can get decent results, the ones I made weren't worth keeping, in the past I synthesized some songs using Trump voice but I used a completely different method than the one adopted here, so I'm not sure if this is the place to share them as they're unrelated to this repo, perhaps your email?

from mellotron.

rafaelvalle commented on July 21, 2024

The training and validation curves, alignment and other things look fine.
Your data seems to have audio files from multiple speaker, although you're training it as a single speaker. This might be the source of the problem. You can also train further and check again later.

from mellotron.

daxiangpanda commented on July 21, 2024

Can you share your batch_size and how many steps when you get this good result? @AndroYD84

from mellotron.

AndroYD84 commented on July 21, 2024

@daxiangpanda I haven't changed any parameters other than pointing my own dataset and training with the LibriTTS pretrained model provided. However this time I'm using a new dataset (4750 files, total duration 2h 55m) as the old one had a few transcription errors, I manually checked that the audio transcriptions were perfect this time.
The super loud screech I experienced before was still present at the beginning, but disappeared the more I kept training, I'm just beginning to get listenable results at 235000 iters.

from mellotron.

rafaelvalle commented on July 21, 2024

Closing due to inactivity.

from mellotron.

The most optimal audio settings for training dataset preparation? about mellotron HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent