Giter VIP home page Giter VIP logo

skipvqvc's Introduction

SkipVQVC

Implementation of SkipVQVC with variant settings. Skip connection is an powerful technique in deep learning. However, in auto-encoder based voice conversion(VC) domain, skip connection is often no-used. Skip-connection cause model learning too fast, and overfitting on reconstruction, and such a model cannot fullfill VC anymore. In this paper, we discuss how quantization can form a strong bottleneck that skip-connection VC can fullfilled.

preprocessing

python preprocessing.py [input_dir (VCTK/wav48)] [output_dir npy dir]

File architecture

# File 
- SkipVQVC
  |- logger (some utlis used in tensorboard)
  |  |.
  |
  |- trainer (differnt trainer have different properties)
  |  |- train_normal.py
  |  |- train_rhythm.py (split speech to rhythm fator, shoud use vqvc+_rhythm model)
  |  |- train_mean_std.py (train with input normalized by mean and std)
  |
  |- model (different models like normal, speaker vae, rhythm, )
  | |- .
  | |- .
  |
  |- utils

Training config

  • -train_dir is your training dir
  • -test_dir is your testing dir (unseen speakers)
  • -m which model do you want in model/* (for example: vqvc+)
  • -n number of vectors in codebook
  • -ch channels in encoder and decoder
  • -t which trainer do you want in trainer/* (for example: train_normal)
  • --load_checkpoint, if you want to load checkpoint(if it is in the checkpoint dir, for example: True)

checkpoint and output dir is auto generated by you model, trainer n_embed and channel. Load checkpoint it auto load the files match its setting.

Example

python train.py -train_dir /homes/aa/mel/mel.melgan -m vqvc+ -n 128 -ch 128 -t train_normal
--> "Saving model and optimizer state at iteration 0 to checkpoint/vqvc+_n128_ch128_train_normal/gen"
--> "Saving model and optimizer state at iteration 100 to checkpoint/vqvc+_n128_ch128_train_normal/gen"

Tensorboard

tensorboard --logdir output/vqvc+_n128_ch128_train_normal

The Whole model are still in investigation to find the best parameters.

# if you want to recover the result in papers.
python train.py -train_dir your-path-to-npy-dir -m vqvc+ -n 64 -ch 64 -t train_normal

# if you want to train with rhythm information ( adjust rhythm )
python train.py -train_dir your-path-to-npy-dir -m vqvc+_rhythm -n 128 -ch 128 -t train_rhythm

# if you find that normal trainging is not very good for one-shot, you can train resample. 
#It resample the quantized code which eliminate more speaker infomration from content

python train.py -train_dir your-path-to-npy-dir -m vqvc+_resample -n 512 -ch 512 -t train_normal

# We find that normalization on embeeding space imporve the result, you can try this
python train.py -train_dir your-path-to-npy-dir -m vqvc+ -n 64 -ch 512 -t train_simple_normalize


# Still in investigation...., speaker quantize <--> cav on speaker embedding

Some details

All model is wrap by vq_model(), details can be seen in model/vqvc*
All trainer is wrap by train_() , details can be seen in trainer/train*

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.