Comments (7)
Hey @Zarbuvit . I feel your frustration. I ended up trying all of those libraries and none of them worked well with glow-tts. Then I came across a forked version of @seungwanpark 's melgan written by @rishikksh20 and it worked perfectly!
Multi-band Melgan that works with glow-tts
https://github.com/rishikksh20/melgan
I forked his project and have been re-working it so it can be used as a package for inference:
https://github.com/seantempesta/melgan-1
(Note: I may have totally broken the training aspects as I've only tested the inference parts since I repackaged it)
from glow-tts.
Its working!
@seantempesta thank you for that repo!
I ended up using mostly https://github.com/rishikksh20/melgan with editing the denoiser according to what @seantempesta did in his repo: https://github.com/seantempesta/melgan-1
As for the garbling words - completely my personal problem! I missnamed my models and used a different method of converting to phonemes in training and in inference.
I am sorry for any time I caused you to waste on my stupidity.
Thank you for your help!
from glow-tts.
This is fantastic! Thanks for sharing!
from glow-tts.
@seantempesta Thank you soo much! I will have a look now and hopefully all goes well
from glow-tts.
@seantempesta Sadly this isn't working for me. I took the inference from glow-tts as is, removed the waveglow stuff and added the mb melgan generator from rishikksh20 instead, and all the inference from rishikksh as well. I used his pretrained model.
Im getting a clean voice but it is garbled up and i cant understand the words. This is similar to what got using MozillaTTS multiband melgan after applying the code changes you recommended in a different issue.
Also a separate issue I am having is that the denoiser isn't working. I get the error:
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
on the line audio = denoiser(audio, 0.1)
. This is weird for me because I assumed if I got any error like that it would happen during inference, but the inference was fine, only the denoiser crashes. If I comment out the denoiser it all runs fine, except for the result being garbled as mentioned before.
I saw that you changed the Denoiser in your repo to work with cpu. Is this just a preferece or is it because it does not work in gpu for you?
Did you run into any of these issues?
from glow-tts.
Its working!
@seantempesta thank you for that repo!
I ended up using mostly https://github.com/rishikksh20/melgan with editing the denoiser according to what @seantempesta did in his repo: https://github.com/seantempesta/melgan-1As for the garbling words - completely my personal problem! I missnamed my models and used a different method of converting to phonemes in training and in inference.
I am sorry for any time I caused you to waste on my stupidity.Thank you for your help!
Hello Zarbuvit, Can you share some of the phoneme you are using, I am struggling with representation in phoneme
from glow-tts.
@Zarbuvit I have trained model glow, model speak very natural, but buzzing noise. Have you any ideas?. I tried out other model not have buzzing noise, however not temperature.
Thank you
from glow-tts.
Related Issues (20)
- Runtime Error: Multi speaker HOT 1
- GPU required or CPU-compatible? HOT 1
- Different Languages us different amount of GPU memory
- multi speaker
- Output compared to Fastspeech2
- Models for finetuning
- Could not create monotonic_align HOT 3
- Glowtts melspectrogram to fine tune hifigan HOT 2
- RuntimeError: CUDA error: invalid device function
- ImportError: /glow-tts/monotonic_align/monotonic_align/core.cpython-38-x86_64-linux-gnu.so: failed to map segment from shared object HOT 1
- Error using mel generated from glow-tts for hifi-gan training HOT 1
- Can I apply MAS method to other model ? HOT 1
- Query : How is the Model training different from the Model training of wave glow
- Multi speaker training error HOT 11
- With out Training DDI
- An explanation for the source code of finding the alignment path in GlowTTS? HOT 2
- DDI training compared to not DDI training HOT 1
- [Question] How many iterations for the available pretrained model?
- [Question] about `intersperse` function. HOT 1
- [CONTRIBUTION] Speech Dataset Generator
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glow-tts.