Giter VIP home page Giter VIP logo

Comments (10)

patriotyk avatar patriotyk commented on May 27, 2024 1

@LEECHOONGHO I have published my model here https://huggingface.co/patriotyk/vocos-mel-hifigan-compat-44100khz
Sounds great, and there is metrics.
@Mahmoud-ghareeb My model has been trained on 800+ hours of audio. Vocoder doesn't require text transcripts so you can easily use audio books for training. You even don't need to cut it by silence because vocos anyway internally splits provided audios to smaller segments.

from vocos.

patriotyk avatar patriotyk commented on May 27, 2024

Do you have a standard tensorboard logs? It is interesting to compare.

from vocos.

LEECHOONGHO avatar LEECHOONGHO commented on May 27, 2024

@patriotyk Sorry, I've change the code to log on WandB server. I have no local logging files nor tensorboard logs.

from vocos.

patriotyk avatar patriotyk commented on May 27, 2024

What is your validation loss on the last checkpoint? It is encoded in to the checkpoint file name. I am training 44100 for an almost a week already and loss still goes down.

from vocos.

Jon-Zbw avatar Jon-Zbw commented on May 27, 2024

Training Loss, Generated Outputs.

I hope this will be a reference for model training.

https://api.wandb.ai/links/xi-speech-team/k0kdfwch

TKS for your work,could your share 32k model training detail like:
your encodec model(i found pretrained models :24k and 48k,so i guess 32k resample to 24k or 48k for encodec pretrained model,then resample to 32k ??)

from vocos.

LEECHOONGHO avatar LEECHOONGHO commented on May 27, 2024

Training Loss, Generated Outputs.
I hope this will be a reference for model training.
https://api.wandb.ai/links/xi-speech-team/k0kdfwch

TKS for your work,could your share 32k model training detail like: your encodec model(i found pretrained models :24k and 48k,so i guess 32k resample to 24k or 48k for encodec pretrained model,then resample to 32k ??)

I'm sry for your confuse.
I just trained Mel Vocoder not for encodec's decoder.

But I have plans to train Mel-Encodec?(Mel Spectrogram to RVQ Encoder, and Vocos Decoder for Various Speech data) in the future.

from vocos.

LEECHOONGHO avatar LEECHOONGHO commented on May 27, 2024

Do you have a standard tensorboard logs? It is interesting to compare.

What is your validation loss on the last checkpoint? It is encoded in to the checkpoint file name. I am training 44100 for an almost a week already and loss still goes down.

I estimated mel loss, and Generator loss with newly gained dataset. and each was 0.0942 and 2.82.
Because of the dataset's Size, estimating Eval loss with eval dataset have no difference with sampled train data.

how about your model output's quality? any artifacts?

from vocos.

patriotyk avatar patriotyk commented on May 27, 2024

Do you have a standard tensorboard logs? It is interesting to compare.

What is your validation loss on the last checkpoint? It is encoded in to the checkpoint file name. I am training 44100 for an almost a week already and loss still goes down.

I estimated mel loss, and Generator loss with newly gained dataset. and each was 0.0942 and 2.82. Because of the dataset's Size, estimating Eval loss with eval dataset have no difference with sampled train data.

how about your model output's quality? any artifacts?

I am still training(third week). It is very slow. I will update with my results when finish.

from vocos.

Mahmoud-ghareeb avatar Mahmoud-ghareeb commented on May 27, 2024

how much data do we need for training

from vocos.

Mahmoud-ghareeb avatar Mahmoud-ghareeb commented on May 27, 2024

Great work! @patriotyk, Thank you so much

from vocos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.