Giter VIP home page Giter VIP logo

Comments (5)

echelon avatar echelon commented on August 18, 2024 6

Add these two hparams:

"n_speakers": 10,
"gin_channels": 16     

I'm not sure what the ideal value for gin_channels is to get a rich embedding, and I asked in another thread.

Your training data and validation CSVs should be in this format:

filename|numeric_speaker_id|transcript

You'll need to swap out the loader:

-from data_utils import TextMelLoader, TextMelCollate 
+from data_utils import TextMelSpeakerLoader, TextMelSpeakerCollate       

You'll also need to change the forward function to accept the g speaker id parameter and unpack the speaker ids from the loader enumerations.

from glow-tts.

marlon-br avatar marlon-br commented on August 18, 2024

i meant not to retrain the whole model once again. only to add one more voice

from glow-tts.

dechubby avatar dechubby commented on August 18, 2024

Add these two hparams:

"n_speakers": 10,
"gin_channels": 16     

I'm not sure what the ideal value for gin_channels is to get a rich embedding, and I asked in another thread.

Your training data and validation CSVs should be in this format:

filename|numeric_speaker_id|transcript

You'll need to swap out the loader:

-from data_utils import TextMelLoader, TextMelCollate 
+from data_utils import TextMelSpeakerLoader, TextMelSpeakerCollate       

You'll also need to change the forward function to accept the g speaker id parameter and unpack the speaker ids from the loader enumerations.

Sorry for jumping in, could you please elaborate the last part about changing the forward function? Thanks in advance!

from glow-tts.

ppanja avatar ppanja commented on August 18, 2024

Add these two hparams:

"n_speakers": 10,
"gin_channels": 16     

I'm not sure what the ideal value for gin_channels is to get a rich embedding, and I asked in another thread.

Your training data and validation CSVs should be in this format:

filename|numeric_speaker_id|transcript

You'll need to swap out the loader:

-from data_utils import TextMelLoader, TextMelCollate 
+from data_utils import TextMelSpeakerLoader, TextMelSpeakerCollate       

You'll also need to change the forward function to accept the g speaker id parameter and unpack the speaker ids from the loader enumerations.

Hi @echelon ,
This information is really useful. I believe I've done necessary changes as suggested by you. In my case I've kept n_speakers = 24 and gin_channels = 256 and rest of the parameters in base.json is same. Number of samples in training records are 9102. I'm getting below runtime error.

RuntimeError: Given groups=1, weight of size 256 448 3, expected input[1, 192, 89] to have 448 channels, but got 192 channels instead

Can you please advice what is going wrong here.

from glow-tts.

ppanja avatar ppanja commented on August 18, 2024

Hi @marlon-br, @dechubby ,
Were you able to run in multi speaker mode? Have you done any other changes apart from whatever mentioned by echelon?
I'm getting some issue which I'm not able to debug.

Any help will be really appreciated.

Regards,
Prasanta

from glow-tts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.