Unofficial PyTorch Implementation of Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus. Most of codes are based on VITS
- MelStyleEncoder from StyleSpeech is used instead of the reference encoder.
- Implementation of untranscribed data training is omitted.
- LibriTTS dataset (train-clean-100 and train-clean-360) is used. Sampling rate is set to 22050Hz.
- TransferTTS (Zero-shot) (
master
branch) - [TODO] TransferTTS (Few-shot) (
fewshot
branch)
- Python >= 3.6
- Clone this repository
- Install python requirements. Please refer requirements.txt
- You may need to install espeak first:
apt-get install espeak
- You may need to install espeak first:
- Build Monotonic Alignment Search and run preprocessing if you use your own datasets.
# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace
Run
python prepare_wav.py --data_path [LibriTTS DATAPATH]
for some preparations.
Train your model with
python train_ms.py -c configs/libritts.json -m libritts_base
python inference.py --ref_audio [REF AUDIO PATH] --text [INPUT TEXT]