Giter VIP home page Giter VIP logo

bopomofo2chs's Introduction

Bopomofo2Chs

该项目使用神经网络实现拼音转汉字,如输入为连续的汉语拼音mingtianzaoshangqingjiaoxingwo。,那么模型的任务就是需要输出连续的汉字明天早上请叫醒我。

模型来源于Tacotron架构,如下图所示:

其中CBHG的具体架构如下图所示:

数据集来源:Leipzig Corpora Collection中的1M汉语新闻数据集:

并且只使用了zho_news_2007-2009_1M-sentences.txt

项目组织

Bopomofo2Chs/
├── data
│   └── zho_news_2007-2009_1M-sentences.txt
├── dataset.py
├── eval
│   ├── eval_clean.txt
│   └── input.csv
├── model.py
├── network.ipynb
├── pre.py
├── README.md
├── train&eval.py

dataset.py: 提供数据支持;

model.py: 网络结构在这里定义;

pre.py: 对原始数据的预处理;

requirements

xpinyin==0.5.6
tensorflow==1.9.0
Distance==0.1.3
numpy==1.15.2
regex==2019.06.08

使用说明

下载数据把所需的txt文件放到对应位置,运行pre.py,然后运行train&eval.py即可,程序会在.eval/下生成评估结果文件eval_res.csv

评估

该任务选用编辑距离作为评估标准,归一化后的编辑距离是字符错误率(CER),部分的评估结果如下图所示:

跑了$1$个epoch后的总CER为:$0.05$。


References: Tacotron

Acknowledgement: Kyubyong

bopomofo2chs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

soul199x

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.