Giter VIP home page Giter VIP logo

mtts's Introduction

Build Status

A Demo of MTTS Mandarin/Chinese Text to Speech FrontEnd

Mandarin/Chinese Text to Speech based on statistical parametric speech synthesis using merlin toolkit Merlin中文语音合成前端(Demo)没有计划实现完整的前端

This is only a demo of mandarin frontend which is lack of some parts like "text normalization" and "prosody prediction", and the phone set && Question Set this project use havn't fully tested yet.

A draft documentation written in Mandarin

Data

There is no open-source mandarin speech synthesis dataset on the internet, this proj used thchs30 dataset to demostrate speech synthesis

Generated Samples

Listen to https://jackiexiao.github.io/MTTS/

How To Reproduce

  1. First, you need data contain wav and txt (prosody mark is optional)
  2. Second, generate HTS label using this project
  3. Using merlin/egs/mandarin_voice to train and generate Mandarin Voice

Context related annotation & Question Set

Install

Python : python3.6
System: linux(tested on ubuntu16.04)

pip install jieba pypinyin
sudo apt-get install libatlas3-base

Run bash tools/install_mtts.sh
Or download file by yourself

Run Demo

bash run_demo.sh

Usage

1. Generate HTS Label by wav and text

  • Usage: Run python src/mtts.py txtfile wav_directory_path output_directory_path (Absolute path or relative path) Then you will get HTS label, if you have your own acoustic model trained by monthreal-forced-aligner, add-a your_acoustic_model.zip, otherwise, this project use thchs30.zip acoustic model as default
  • Attention: Currently only support Chinese Character, txt should not have any Arabia number or English alphabet(不可包含阿拉伯数字和英文字符)

txtfile example

A_01 这是一段文本
A_02 这是第二段文本

wav_directory example(Sampleing Rate should larger than 16khz)

A_01.wav  
A_02.wav  

2. Generate HTS Label by text with or without alignment file

  • Usage: Run python src/mandarin_frontend.py txtfile output_directory_path
  • or import mandarin_frontend
from mandarin_frontend import txt2label

result = txt2label('向香港特别行政区同胞澳门和**同胞海外侨胞')
[print(line) for line in result]

# with prosody mark and alignment file (sfs file)
# result = txt2label('向#1香港#2特别#1行政区#1同胞#4澳门#2和#1**#1同胞#4海外#1侨胞',
            sfsfile='example_file/example.sfs')

see source code for more information, but pay attention to the alignment file(sfs file), the format is endtime phone_type not start_time, phone_type(which is different from speech ocean's data)

3. Forced-alignment

This project use Montreal-Forced-Aligner to do forced alignment

  1. We trained the acoustic model using thchs30 dataset, see misc/thchs30.zip, the dictionary we use mandarin_mtts.lexicon. If you use larger dataset than thchs30, you may get better alignment.
  2. If you want to use mfa's (montreal-forced-aligner) pre-trained mandarin model, this is the dictionary you need mandarin-for-montreal-forced-aligner-pre-trained-model.lexicon

Prosody Mark

You can generate HTS Label without prosody mark. we assume that word segment is smaller than prosodic word(which is adjusted in code)

"#0","#1", "#2","#3" and "#4" are the prosody labeling symbols.

  • #0 stands for word segment
  • #1 stands for prosodic word
  • #2 stands for stressful word (actually in this project we regrad it as #1)
  • #3 stands for prosodic phrase
  • #4 stands for intonational phrase

Improvement to be done in future

  • Text Normalization
  • Better Chinese word segment
  • G2P: Polyphone Problem
  • Better Label format and Question Set
  • Improvement of prosody analyse
  • Better alignment

Contributor

  • Jackiexiao
  • willian56

mtts's People

Contributors

jackiexiao avatar osmboy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.