Shanghainese TTS

Dartmouth LING 48 Final Project: Improving TTS for Shanghainese
Yuanhao Chen [email protected] Spring 2023

Goal

To build a text-to-speech (TTS) system for Shanghainese from scratch, seeking to improve the production of tone sandhi compared to existing models by paying special attention to preprocessing of text.

Description

See writeup/main.pdf.

Dependencies

pip install -r phonemisation/requirements.txt
pip install -r speech_synthesis/requirements.txt
pip install -r comparison_questionnaire/requirements.txt  # for analysis of questionnaire results

Usage

See speech_synthesis/README.md.

Structure

phonemisation/: contains the phonemisation module
- See explanation of output in phonemisation/__init__.py
- Usage: python -m phonemisation "text to phonemise"
- Mechanism: Chinese sentence — word segmentation ⟶ Chinese words — romanisation ⟶ Shanghainese pinyin — phonemisation ⟶ Shanghainese phonemes
  - jieba is used for word segmentation
  - A Shanghainese dictionary I previously made is used for romanisation
    - Uses Qieyun module to add the tone number 1 to syllables of 陰平 yinping/inbin tone; other tones are phonologically unmarked
  - The romanisation_to_ipa function in romanisation.py contains the phonemisation function
make_metadata.py: uses the phonemisation module to convert transcription into IPA and generate metadata for training
- See below in data/
data/: contains the dataset used for training
- The transcriptions and audio files are adapted from this repo
  - Downsampled to 16kHz for training
  - Currently, only shh.dict.cn/ is used for training
- The */metadata.txt files are generated by make_metadata.py
training/
- Juptyer notebook for training the model
- Intended to be uploaded and run in Google Colab environment; needs to be modified for local use
- Uses the coqui-ai/TTS repo, which contains an implementation of VITS
writeup/: the write-up
speech_synthesis/: contains the speech synthesis model
- See speech_synthesis/README.md for more details
comparison_questionnaire/: contains the questionnaire and audio files used to compare speech produced by this model, the Apple model, and a human speaker
- *-1.wav: produced by this model
- *-2.wav: produced by Apple VoiceOver (MacBook Pro 14-inch, 2021; MacOS Ventura 13.0.1)
- *-3.wav: spoken by myself
- stats.ipynb: Jupyter notebook for analysing the questionnaire results

edward-martyr / shanghainese-tts Goto Github PK

shanghainese-tts's Introduction

Shanghainese TTS

Goal

Description

Dependencies

Usage

Structure

shanghainese-tts's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent