Giter VIP home page Giter VIP logo

shanghainese-tts's Introduction

Shanghainese TTS

  • Dartmouth LING 48 Final Project: Improving TTS for Shanghainese
  • Yuanhao Chen [email protected] Spring 2023

Goal

To build a text-to-speech (TTS) system for Shanghainese from scratch, seeking to improve the production of tone sandhi compared to existing models by paying special attention to preprocessing of text.

Description

See writeup/main.pdf.

Dependencies

pip install -r phonemisation/requirements.txt
pip install -r speech_synthesis/requirements.txt
pip install -r comparison_questionnaire/requirements.txt  # for analysis of questionnaire results

Usage

See speech_synthesis/README.md.

Structure

  • phonemisation/: contains the phonemisation module
    • See explanation of output in phonemisation/__init__.py
    • Usage: python -m phonemisation "text to phonemise"
    • Mechanism: Chinese sentenceword segmentationChinese wordsromanisationShanghainese pinyinphonemisationShanghainese phonemes
      • jieba is used for word segmentation
      • A Shanghainese dictionary I previously made is used for romanisation
        • Uses Qieyun module to add the tone number 1 to syllables of 陰平 yinping/inbin tone; other tones are phonologically unmarked
      • The romanisation_to_ipa function in romanisation.py contains the phonemisation function
  • make_metadata.py: uses the phonemisation module to convert transcription into IPA and generate metadata for training
    • See below in data/
  • data/: contains the dataset used for training
    • The transcriptions and audio files are adapted from this repo
      • Downsampled to 16kHz for training
      • Currently, only shh.dict.cn/ is used for training
    • The */metadata.txt files are generated by make_metadata.py
  • training/
    • Juptyer notebook for training the model
    • Intended to be uploaded and run in Google Colab environment; needs to be modified for local use
    • Uses the coqui-ai/TTS repo, which contains an implementation of VITS
  • writeup/: the write-up
  • speech_synthesis/: contains the speech synthesis model
  • comparison_questionnaire/: contains the questionnaire and audio files used to compare speech produced by this model, the Apple model, and a human speaker
    • *-1.wav: produced by this model
    • *-2.wav: produced by Apple VoiceOver (MacBook Pro 14-inch, 2021; MacOS Ventura 13.0.1)
    • *-3.wav: spoken by myself
    • stats.ipynb: Jupyter notebook for analysing the questionnaire results

shanghainese-tts's People

Contributors

edward-martyr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.