Giter VIP home page Giter VIP logo

awesome-voice-cloning's Introduction

awesome-voice-cloning

What is this?

A place for all things voice cloning. Make a PR!

TacoTron 2

TACOTRON 2

CookiePPP Tacotron 2 Colabs

This is the main Synthesis Colab

This is the simplified Synthesis Colab

This is supposedly a newer version of the simplified Synthesis Colab

For the sake of completeness, this is the training colab

It's worth noting that the cookiePPP training colab has (what I believe is) a major improvement over mine: an integrated grapheme-to-phoneme system, so that the model can learn on syllabes instead of stupid nonstandard English spellings. I believe this will only work with English transcrips.

Scripp's Training Colabs

And another link: this is my fully functional Colab notebook for tacotron2 training and synthesis, with explanatory notes. No hardware required--it'll train your model on google's free GPUs and save the output to your google drive. The most complicated part is prepping your dataset before upload. Currently set up to train from the LJspeech-trained model, on 22050hz wav files with 16-bit PCM encoding. (See the dataset section for help on this)

Training

You can use this tensorboard to interact in parallel with the Tacotron2 for Dummies notebook to check the progress of your model. You will have to use "Factory Reset Runtime" every time you want to update the tensorboard to check progress. This is a GREAT way to visualize what's going on with your model. Much more useful than the alignment charts that the training colab spits out.

Tensorboard

Converting graphemes to phonemes

Below is a hastily coded python script to convert graphemes to phonemes in files already prepped for tt2 learning. Basically it takes each line of <filename.wav|transcription> and converts the transcription segment into IPA characters. What this means is that the model shouldn't get confused about words that don't sound the way they are written, and in general they should learn better.

Script in Colab Form

Waveglow

On training Waveglow - Scripp

Dataset Resources

Tools

Noice's Watson Speech To Text Tool

ASSFAP

Scripp's Guide

Use ffmpeg to convert your wav files to the right format:

ffmpeg -y -i $filename -ac 1 -acodec pcm_s16le -ar 22050 -sample_fmt s16 converted/$filename

Or, on a whole directory:

#!/bin/bash

for filename in *.wav; do
    echo "Converting $filename"
    ffmpeg -y -i $filename -ac 1 -acodec pcm_s16le -ar 22050 -sample_fmt s16 converted/$filename    
done

Datasets

Kanye West

LJSpeech Dataset: Old Reliable

Common Voice: Broad voice dataset sample with demographic metadata. Includes valid-invalid identifier as an indication of transcript quality.

VoxCeleb: 2000+ hours of celebrity utterances, with 7000+ speakers. Audio is captured as "in the wild," including background noise.voxceleb/vox1.html

TED-LIUM: 452 hours of audio and aligned trascripts from TED talks.

LibriSpeech: 1000+ hour dataset of read English speech based on public domain audiobooks.

Creating a Dataset

Some Scripts for recording voices

SRT Splitting

AutoSub

Cam's Workflow

Glossary of Terms

Glossary

awesome-voice-cloning's People

Contributors

noicevice avatar scripples avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.