Comments (5)
Hey, I've updated the Vocos API (#4) to make it easier to integrate with Bark. Take a look at the example notebook.
Hope it helps!
from vocos.
Thank you! For now this is the initial UI, but it will grow from here.
rsxdalv/tts-generation-webui#35
from vocos.
hey rsxdalv Could you make a training section that lets us train our own vocos model at a higher sample rate?
from vocos.
It's possible, do you have a sample of the command/dataset/config?
from vocos.
Dataset I am imagining multiple 10 second audio files config I was making for 48k is
pytorch_lightning==1.8.6
seed_everything: 4444
data:
class_path: vocos.dataset.VocosDataModule
init_args:
train_params:
filelist_path: E:\anaconda3\envs\vocos\TrainFiles\filelist.train
sampling_rate: 48000
num_samples: 16384
batch_size: 16
num_workers: 8
val_params:
filelist_path: E:\anaconda3\envs\vocos\TrainFiles\filelist.val
sampling_rate: 48000
num_samples: 48384
batch_size: 16
num_workers: 8
model:
class_path: vocos.experiment.VocosExp
init_args:
sample_rate: 48000
initial_learning_rate: 2e-4
mel_loss_coeff: 45
mrd_loss_coeff: 0.1
num_warmup_steps: 0 # Optimizers warmup steps
pretrain_mel_steps: 0 # 0 means GAN objective from the first iteration
# automatic evaluation
evaluate_utmos: true
evaluate_pesq: true
evaluate_periodicty: true
feature_extractor:
class_path: vocos.feature_extractors.MelSpectrogramFeatures
init_args:
sample_rate: 48000
n_fft: 1024
hop_length: 256
n_mels: 100
padding: center
backbone:
class_path: vocos.models.VocosBackbone
init_args:
input_channels: 100
dim: 512
intermediate_dim: 1536
num_layers: 8
head:
class_path: vocos.heads.ISTFTHead
init_args:
dim: 512
n_fft: 1024
hop_length: 256
padding: center
trainer:
logger:
class_path: pytorch_lightning.loggers.TensorBoardLogger
init_args:
save_dir: logs/
callbacks:
- class_path: pytorch_lightning.callbacks.LearningRateMonitor
- class_path: pytorch_lightning.callbacks.ModelSummary
init_args:
max_depth: 2
- class_path: pytorch_lightning.callbacks.ModelCheckpoint
init_args:
monitor: val_loss
filename: vocos_checkpoint_{epoch}{step}{val_loss:.4f}
save_top_k: 3
save_last: true
- class_path: vocos.helpers.GradNormCallback
Lightning calculates max_steps across all optimizer steps (rather than number of batches)
This equals to 1M steps per generator and 1M per discriminator
max_steps: 2000000
You might want to limit val batches when evaluating all the metrics, as they are time-consuming
limit_val_batches: 100
accelerator: gpu
strategy: ddp
devices: [0]
log_every_n_steps: 100
from vocos.
Related Issues (20)
- Is Vocos suitable for singing?
- about the install problems HOT 1
- combine with superresolution HOT 2
- Training error, help needed!
- how to convert custom ckpt to bin? HOT 3
- Bark+Vocos.ipynb fails on saving mp3 files with error about FFmpeg backend
- error
- Export to ONNX HOT 14
- Compatibility with Matcha TTS HOT 7
- "error: No module named 'encodec'" while training a vocos
- MPS support HOT 2
- Why spectogram power is picket as 1?
- Bark + Vocos for longer text to speech ?
- Debug in vscode
- Training vocos on a single speaker dataset
- 32kHz Vocos Multi Speaker Model Training Log HOT 10
- Feature maps from 1st layer of each discriminator not included
- About the VISQOL
- COLA == Training Instability?
- How to use customized trained models?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vocos.