Giter VIP home page Giter VIP logo

voice_cloner's Introduction

A guide to clone anyone's voice and use it as a text-to-speech with android

Table of Contents

  1. Introduction
  2. Getting Started
  3. Training
  4. Testing
  5. Creating the android app
  6. Usage
  7. Notes


Introduction

This is a fun little project I made out of boredom. After seeing Kyubyong's text-to-speech model, I decided to create an android application that can read what I write with my own voice. If you copy the code and follow my steps, you'll be able to do the same.

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

This is the most important part. If you want this to work, make sure you have the following things installed. Since the model uses an old version of python and tensorflow, I suggest creating a virtual environment and install everything there.

  • Python 3.6
  • Tensorflow 1.15.0
  • librosa
  • tqdm
  • matplotlib
  • scipy
  • Android Studio
  • An android phone (?)
  • Some time to lose

Data preparation

In order to clone your voice you need around 200 samples of your voice, each one between 2-10 seconds. This means that you can clone anyone's voice with only 15-20 minutes of audio, thanks to transfer learning.

  1. First, you need to download the pretrained model if you want to make an english voice. Otherwise, find an online text-to-speech dataset of the desired language and train the model from scratch. For example, I made an italian version of my voice, starting from this dataset. Here you can download the italian pre-trained model I generated. Make sure to put the pretrained model inside the 'logdir' directory.
  2. Inside LJSpeech-1.1 you have to edit the transcript.csv file to match your audio samples. Each line must have this format: <audioFileName|original sentence|normalized sentence>, where the audio name is without the extension and the normalized sentence contains the conversion from numbers to words. Take a look at the original transcript.csv and you'll understand it easily. Then, copy your audio samples inside the wavs folder. If you want to make the data generation process less painful, I suggest writing the transcript file first, then record the sentences using record.py.

Training

If you want to understand how the model works, you should read this paper. Otherwise, treat it as a black box and mechanically follow my steps.

  1. Edit hyperparams.py and make sure that prepro is set to True. Also, edit the data path to match the correct location inside your local pc. Set the batch size to 16 or 32 depending on your ram. You can also tune max_N and max_T.
  2. Run prepo.py only one time. After this step you should see two new folders, 'megs' and 'mals'. If you change dataset, then delete megs and mals and run the prepo.py again.
  3. Run 'python train.py 1'. This is going to take a different amount of steps for each voice, but usually after 10k steps the result should already be decent.
  4. Run 'python train.py 2'. You have to train it at least 2k steps, otherwise the voice will not sound human.

Testing

Open harvard_sentences.txt and edit the lines as you desire. Then, run 'python synthesize.py'. If everything is correct, a 'samples' directory should appear.

Creating the android app

As you can see, it's not very comfortable to generate the sentences. That's why I decided to make this process more user-friendly. The android app is basically just a wrapper that let you generate the audios, save them locally on the phone and share them. When you write something and press the play button in the app, the message is sent to the server.py, that launches synthesize.py and then sends the audio back to the android application. If you want to use the application outside your local network, make sure to set up the port forwarding, opening the access to the port written in the server.py. The default port is '1234'. You can change it if you want, but remember to change also the port in the MainActivity.java. You also have to set your ip address in the same file. By default the model only computes sentences shorter than 10 seconds, but in the server.py I worked around this problem by splitting the input message into small sentences, then running the synthesize on every sentence and merging the resulting audios.

Usage

  1. Import the Android_App folder into Android Studio and edit the ip address to match your ip in MainActivity.java.
  2. Run 'python server.py' on your local pc, then leave it on for as long as you need.

Notes

  • In case something is not clear or you bump into some weird error, don't be afraid to ask.
  • This is my first android project, I had no prior experience on mobile development. So the code is probably not optimal, but it works.
  • The application runs both an Italian and English model because I have cloned my voice in both languages. I think the code still works with one model without any tweaks though.

voice_cloner's People

Contributors

kyubyong avatar simsax avatar totsui avatar w19787 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.