This is a submission to ZeroSpeech 2020 challenge.
This work has been accepted by Interspeech2020, paper can be found here
This work is based on Chorowski' wavenet autoencoder model and wavenet vocoder implementation
This work consits of two models:
- WaveNet autoencoder + Instance Normalization (IN-WAE)
- WaveNet autoencoder + Sliced Vector Quantization (SVQ-WAE)
- Python 3.6
- PyTorch 0.4.1
- tensorboardX
- challenge evaluation scripts rep
- librosa
- scipy
bash ./run_bash/download_dataset.sh
Unzip the dataset requires 7z (>16.04) and password
bash ./run_bash/run_pre.sh 2020/2019
-
train SVQ-WAE model
bash ./run_bash/run_wv_vqvae_train.sh exp_name hps language
e.g.
bash ./run_bash/run_wv_vqvae_train.sh exp_name hps/wv_vqvae_hp.json english
-
train IN-WAE model
bash ./run_bash/run_inae_train.sh exp_name hps language
e.g.
bash ./run_bash/run_inae_train.sh exp_name hps/inae_hp.json english