Giter VIP home page Giter VIP logo

Comments (10)

rafaelvalle avatar rafaelvalle commented on July 21, 2024

Share an image of the training and validation loss, the rhythm (alignment map) and the pitch contour (f0) used during inference such that we can investigate better.

from mellotron.

AndroYD84 avatar AndroYD84 commented on July 21, 2024

Here's the image of the training and validation loss and the rhythm (alignment map) and the pitch contour (f0) used during inference.
The style transfer results are likely going to get better considering how early the train is (checkpoint at 35500 iterations, I also have it at 36000 but it couldn't generate any sound for some reason), however, the singing voice results are insanely loud and clipping most of the time, which is weird because I don't remember it to be like this with the previous dataset I tested.
I have shared the entire dataset here if you would like to check, thanks!
P.S.: Now that I think about it, I'm using only upper-case letters for the transcription, could this affect the results compared to having lower-case letters?

from mellotron.

rafaelvalle avatar rafaelvalle commented on July 21, 2024

Set the smoothing factor of your validation loss to 0 and zoom in such that we can take a look at the curve. My suspicion is that your not picking the model with the lowest validation loss.

Also, would love to hear singing samples from donald trump if you have them :-)

from mellotron.

AndroYD84 avatar AndroYD84 commented on July 21, 2024

I went for a GIF, hope it's clearer now:
No zoom: https://i.imgur.com/NsMzzTO.gif
With zoom: https://i.imgur.com/HmkV5X7.gif

from mellotron.

rafaelvalle avatar rafaelvalle commented on July 21, 2024

Just zoom in on the long line after the big drop such that we can see the variation on that line.

from mellotron.

AndroYD84 avatar AndroYD84 commented on July 21, 2024

Alright, I hope it's better now: https://i.imgur.com/VRHpvd9.png
Oh and I'll definitely share some Trump samples as soon as I can get decent results, the ones I made weren't worth keeping, in the past I synthesized some songs using Trump voice but I used a completely different method than the one adopted here, so I'm not sure if this is the place to share them as they're unrelated to this repo, perhaps your email?

from mellotron.

rafaelvalle avatar rafaelvalle commented on July 21, 2024

The training and validation curves, alignment and other things look fine.
Your data seems to have audio files from multiple speaker, although you're training it as a single speaker. This might be the source of the problem. You can also train further and check again later.

from mellotron.

daxiangpanda avatar daxiangpanda commented on July 21, 2024

Can you share your batch_size and how many steps when you get this good result? @AndroYD84

from mellotron.

AndroYD84 avatar AndroYD84 commented on July 21, 2024

@daxiangpanda I haven't changed any parameters other than pointing my own dataset and training with the LibriTTS pretrained model provided. However this time I'm using a new dataset (4750 files, total duration 2h 55m) as the old one had a few transcription errors, I manually checked that the audio transcriptions were perfect this time.
The super loud screech I experienced before was still present at the beginning, but disappeared the more I kept training, I'm just beginning to get listenable results at 235000 iters.

from mellotron.

rafaelvalle avatar rafaelvalle commented on July 21, 2024

Closing due to inactivity.

from mellotron.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.