Comments (5)
I think there is a potential idea to be applied here, actually -- you could try to apply the exact same conditioning latent for EVERY line said by a specific character. But that would require additional code and stuff
"Multispeaker" in the current case just means exposing the model to more kinds of speakers during training. Ideally the model would learn to clone all of them, conditionally on the input zero-shot latent; in practice it underfits severely with the short number of epoches available in fine-tuning. I suspect a much much longer training run might teach the model to correctly remember all speakers, but it might also just lead to terrible overfitting on the existing lines
from dl-art-school.
It's all learned implicitly. There's no fundamental difference between a single-speaker and a multi-speaker dataset apart from the variance of the distribution of conditioning latents && predicted audio.
It is perhaps better to actually model each wav file as an individual speaker -- each speaker is a point on latent space, and there are general clusters corresponding to individual characters, and perhaps you could circle each cluster and label it as the broad space of a single speaker's voice, but in practice there ought to be overlaps for a sufficiently diverse multispeaker dataset
from dl-art-school.
Hi and thanks for your work.
So as of now, if I fine tune on a single speaker dataset i will become a single speaker model, or at least it seems to me.
Even when I use the conditional latents from another speaker during zero shot inference, i get always the voice of the speaker I fine tuned on.
from dl-art-school.
Hi, is it possible to train this model for a multispeaker dataset? if there is then, can u give the information in detail? Thank you in advance.
from dl-art-school.
I try to train with multispeaker dataset but it have so thing wrong
i training another language with data ManVoice , when i try to clone , it hard to clone Woman voice or baby voice , but it can clone another man voice (80%) . And womanVoice like this too - can clone man Voice but good in womanVoice (some high voice it will hard or have some anoying noise).
When i training with mix data of man and woman voice, when i clone , the output voice look like radom voice. Some time man, some time woman, not a voice a want to clone.
from dl-art-school.
Related Issues (20)
- Separating
- The process cannot access the file because it is being used by another process HOT 1
- Please help RecursionError: maximum recursion depth exceeded while calling a Python object HOT 1
- Got error trying to test the fine-tuned model HOT 1
- Error no kernel image is available for execution on the device HOT 2
- Very detailed tutorial up-to-date HOT 2
- finetune on single speaker HOT 1
- Anyone tried Multi-lingual Tortoise?
- Unexpected keys in state dict when loading Unified Voice HOT 6
- Multi-gpu option is possible? HOT 2
- getting this error:RuntimeError: CUDA error: invalid device ordinal
- I need help with google colab HOT 3
- how to resolve this when i train the japanese voices HOT 4
- Error on ljspeech train starting in google colab: Unexpected key(s) in state_dict: "gpt.h.0.attn.bias", ... HOT 2
- Only using half the capacity of my vidoe card RTX 4090? (12 out of 24G)
- When do I stop the training? How do I know I should stop (Image inside)
- Is this the only voice cloning tech available right now? HOT 2
- Control the length of AUDIOs generated????
- Ghoost voices at the end of sentence in audio generation.
- Multi-GPU usage
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dl-art-school.