Giter VIP home page Giter VIP logo

cmudict's Introduction

Development and maintenance for cmudict
---------------------------------------
[20100118] (air)

The maintainer is responsible for acquiring and vetting new entries,
and for fixing errors that they otherwise encounter.

At this point, the cmudict project has been incrementally
re-organized; maintenance has been simplified and several aspects have
been automated. The scripts/ folder contains instructions and scripts
for routine maintenance. It has everything you should need to get
started.

Version numbers and files
-------------------------
There is no particular rule for incrementing the version number. To
date the minor version (letter suffix) has been incremented to reflect
changes in maintainers. Major version increments (right now, the
decimal) are incurred when some (subjectively) large change
occurs. For example, the 0.6-->0.7 increment was marked by a large
number of new entries and by the removal of many incorrect entries
from the preceeding 0.6e version.

The cmudict.*.phones file lists all legal phones, plus their phonetic class.
The cmudict.*.symbols file lists all legal phonetic symbols (the only
substantive difference is that stress combinations are explicitly noted).


Projects for the ambitious
--------------------------

1. Change the current flat-file version to a database format. This
should still allow producing a flat file, but it will simplify adding
useful information to the dictionary. Some possible data includes:

a. part-of-speech information
b. domain information (e.g., location, medical, non-english, etc)
c. spelling variants
d. source information (who, when, ...)
e. probabilities for pronunciation variants

There's additional stuff that can be done but the above bits seem the
most useful ones. I also have ideas on how to do it, so feel free to
get in touch (air at cs cmu edu).

2. Create an OS independent GUI for managing the database. This should
allow the maintainer to view and modify entries, while dealing with
bookkeeping. It would be nice if the GUI included a synthesizer so
that entries can be checked by listening.

cmudict's People

Contributors

mbait avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.