Giter VIP home page Giter VIP logo

Comments (6)

wookladin avatar wookladin commented on June 1, 2024

Hello!
Well, I've never done fine-tuning on single-speaker datasets before.
Hence I'm not sure actually fine-tuning the VC decoder will work well.
If fine-tuning does not work well, it would be good to use VCTK or LibriTTS and the single-speaker dataset together.
On the other hand, HiFi-GAN worked well even with finetuning on single-speaker datasets in my experience.

Please refer to my answer and try it. thank you.

from assem-vc.

vishalbhavani avatar vishalbhavani commented on June 1, 2024

Hi @wookladin ,

  1. As expected the single speaker fine-tuning for VC decoder resulted in overfitting because of low data. The lowest val loss ~0.6 still seems pretty high. What was the best val loss for your multi-speaker experiments?
    image
  2. GTA finetuning HI-FI GAN with the above model gave surprising results. The loss did not improve with training time. Is it because the decoder itself was not good enough to create decent gta mels?
    image

P.S: I'm trying multi-speaker training now. I'll keep you posted on the results.

from assem-vc.

wookladin avatar wookladin commented on June 1, 2024

Hi.

  1. I've uploaded the validation loss graph of the VC decoder at issue #17
    Loss converges to around 0.2. It seems like your VC decoder is overfitted.
    A multi-speaker setting will help you to avoid overtiffing.
    Thank you for sharing the results!

  2. Unfortunately, I am not sure by looking at the loss graph.
    Did you hear the logged audios at the validation step?
    The perceptual quality at that time seems to be important in judgment.

Thank you!

from assem-vc.

vishalbhavani avatar vishalbhavani commented on June 1, 2024

Hi @wookladin ,
Multi-speaker training solved the overfitting problem as expected. Logged audios in vocoder training also sound good after that. Thanks

from assem-vc.

kannadaraj avatar kannadaraj commented on June 1, 2024

@vishalbhavani Thanks for the confirmation that it worked in your case. Did you warm start pre-trained model with the new speaker or did you train from scratch?

from assem-vc.

vishalbhavani avatar vishalbhavani commented on June 1, 2024

I started with the pre-trained model

from assem-vc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.