Giter VIP home page Giter VIP logo

tts_data_tools's Introduction

tts_data_tools

Data processing tools for preparing speech and labels for training TTS voices

Installation

pip install git+https://github.com/zackhodari/tts_data_tools

Usage

Batch processing of datasets

tdt_process_dataset \
    --lab_dir DIR \
    --wav_dir DIR \
    --id_list FILE \
    --out_dir DIR \
    --question_file FILE \
    [--state_level | --no-state_level] \
    [--upsample_to_frame_level] \
    [--trim_silences] \
    [--subphone_feat_type STR] \
    [--calculate_normalisation] \
    [--normalisation_of_deltas]

Other batch processing scripts are defined in setup.py. Additional example scripts are given in scripts/, these can be used from the command line or as templates for your own pre-processing.

Label creation

lab_gen provides four tools to take text and convert them to numerical label features ready for training.

txt_to_utt.py and utt_to_lab.py wrap Festival (to be installed separately), and can extract full-context HTS-style labels.

align_lab.py wraps HTK (to be installed separately) and performs forced alignment using the wavform.

If you already have forced-aligned labels you can use lab_to_feat.py to convert the text-formatted features to numpy arrays (saved as .npy):

tdt_lab_to_feat \
    --lab_dir DIR \
    --id_list FILE \
    --out_dir DIR \
    --question_file FILE \
    [--state_level | --no-state_level] \
    [--upsample_to_frame_level] \
    [--subphone_feat_type STR] \
    [--calculate_normalisation]

Acoustic feature extraction

wav_gen provides a few common vocoders that can be used to extract typical TTS acoustic features. This includes F0 extraction with REAPER and full vocoder feature extraction with WORLD, which can be used as follows.

tdt_world \
    --wav_dir DIR \
    --id_list FILE \
    --out_dir DIR \
    [--calculate_normalisation] \
    [--normalisation_of_deltas]

tts_data_tools's People

Contributors

zackhodari avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tts_data_tools's Issues

run utt_to_lab.py

Excuse me
when I run utt_to_lab.py
The first run was successful, but the second one was wrong
image
File "/home/ๆกŒ้ข/2019tts/tts_data_tools-master/tts_data_tools/lab_gen/utt_to_lab.py", line 97, in utts_to_dumps *glob.glob('label_POS_train/label_phone_align/dump/*')], check=True) File "/home/anaconda3/envs/pytorch/lib/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['sed', '-i', '-e', 's/#/pau/g']' returned non-zero exit status 4.

Reduce the time cost from txt file to lab file

Hi,
Seems you are doing some front end process job for tts in this project, I have some questions about it.
Now I'm using Merlin to do English speech synthesis deploy, I follow the script "prepare_labels_from_txt.sh" to convert raw text into the .lab file, and I see you are trying to translate the shell script into python script, and also call "festival".

And my problem is, the time cost in frontend is very long, even compared with all the post process (duration model predict, acoustic model predict, and world vocoder wave generating).
For example, for one sentence "Can I help you?", the frontend part cost 900ms, the other part cost 120ms.
How can I reduce the time cost in frontend based on festival, I'll be appreciate if you reply me and give me some suggestions.

Bolong.Wen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.