Comments (7)
@mush42 I took a different approach. I searched for the appropriate parameters of torchaudio.transforms.MelSpectrogram
to ensure that the features of Vocos match those of Matcha, which are also the same as those of HiFi-GAN
The difference was on the frequency limits and the mel scaling that uses by default torchaudio
.
mel_spec_transform_mod = torchaudio.transforms.MelSpectrogram( sample_rate=sample_rate, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels, center=padding == "center", power=1, f_min=0, f_max=8000, norm="slaney", mel_scale="slaney", )
I also updated the feature extraction in the reconstruction loss. You can check the changes in this fork https://github.com/wetdog/vocos/tree/matcha
The results sound good after 20 epochs with Libritts, We'll publish the checkpoints once the training finishes.
from vocos.
would you mind helping with this? I don't know where to start.
from vocos.
@mush42 Hey, I'm pissing around with the same thing currently.
Tried synthesising using vocos as a head to MatchaTTS. Vocos seems to want 100 mel bins? Matcha currently outputs specs with 80bins. I'm not sure the best way to go, either retrain Matcha on 100bins, or see if zero padding could work. I tried earlier, just zero padding from 80 to 100mel bins, and synthesising through vocos mel head, quality wasn't that great
from vocos.
You can check out my fork with config for 22050 vocos - https://github.com/egorsmkv/vocos
from vocos.
@egorsmkv, even after training the model with vocos.yaml config from your repo, the issue seems to persist, the output is still robotic and in low-volume
@hubertsiuzdak @alealv, Any help or guidance regarding this would be really helpful!
from vocos.
@egorsmkv, even after training the model with vocos.yaml config from your repo, the issue seems to persist, the output is still robotic and in low-volume
@hubertsiuzdak @alealv, Any help or guidance regarding this would be really helpful!
How many steps did you train?
from vocos.
How many steps did you train?
15k steps
from vocos.
Related Issues (20)
- Is Vocos suitable for singing?
- about the install problems HOT 1
- combine with superresolution HOT 2
- Training error, help needed!
- how to convert custom ckpt to bin? HOT 3
- Bark+Vocos.ipynb fails on saving mp3 files with error about FFmpeg backend
- error
- Export to ONNX HOT 14
- "error: No module named 'encodec'" while training a vocos
- MPS support HOT 2
- Why spectogram power is picket as 1?
- Bark + Vocos for longer text to speech ?
- Debug in vscode
- Training vocos on a single speaker dataset
- 32kHz Vocos Multi Speaker Model Training Log HOT 10
- Feature maps from 1st layer of each discriminator not included
- About the VISQOL
- COLA == Training Instability?
- How to use customized trained models?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vocos.