Giter VIP home page Giter VIP logo

jyutcitzi-transliterate's Introduction

jyutcitzi-transliterate

A simple Python3 tool for the transliteration of the existing Sinoglyph-Latin script used in Hong Kong Cantonese into either a Jyutcitzi-only or Honzi-Jyutcitzi mixed script.

Example

$ cat sample.txt
喂,我哋唔係啱啱已經喺嗰度好咧啡噉噏咗你嗰啲咁𠵇𠺫嘅嘢㗎喇咩?嗱,你唔好咁忟憎先。喂你咪諗住玩我呀,我已經噒熟咗份嘢㗎啦。我哋淨話個presentation已經cover嗮阿Tim prepare嗰份document入面啲bullet point㗎嘞,你之前想幫手proofread份嘢就早響吖嘛。重有呀,你鬧乜Q唧,我噚日都係啱先喺Budapest飛返嚟咋嘛,使唔使咁火滾呀?唉算喇,我哋不如都係下個禮拜再開會傾過啦係咪先,我而家要趕住去健身中心做gym keep fit,係噉先啦掰掰。
$ python transliterate.py < sample.txt --mode web
禾兮`,我大丌゙五.⁼係爻彡¯々已經亾兮´丩个´度好力旡`夫旡¯丩今´噏止个´你丩个´大子¯丩今`臼了⁼力了¯丩旡´央旡˝丩乍`力乍`文旡¯?乃乍⁼,你五.⁼好丩今`忟憎先。禾兮`你文兮゙諗住玩我乍`⁼,我已經力卂¯熟止个´份央旡˝丩乍`力乍¯。我大丌゙淨話個并禾旡·厶`·厶云·天丌·厶亾云·已經臼乍·夫乍·厶介`阿天欠·并禾子·并旡·乍`·丩个´份大百·臼央今·文円·入面大子¯比乎·力乜·并丐·乃`·丩乍`力乍`,你之前想幫手并禾乎·夫`·夫禾必·份央旡˝就早響乍`¯文乍`。重有乍`⁼,你鬧乜臼么·止夕¯,我此今⁼日都係爻彡¯先亾兮´比乎·大十·并旡·厶天·飛返力兮⁼止乍¯文乍`,使五.⁼使丩今`火滾乍`⁼?介`¯算力乍`,我大丌゙不如都係下個禮拜再開會傾過力乍¯係文兮゙先,我而家要趕住去健身中心做止央欠·臼頁·夫必·,係丩今´先力乍¯掰々。
$ python transliterate.py < sample.txt > out.txt
$ python transliterate.py --mode web
---------------------------------
Interactive Jyutcitzi Interpreter
---------------------------------
Howto: Input some text, and the JCZ transliterator will transliterate into the user-specified writing orthography.

Please enter your text (Quit with CTRL + C (*nix) or CTRL + Z (Windows)):
喂,我哋唔係啱啱已經喺嗰度好咧啡噉噏咗你嗰啲咁𠵇𠺫嘅嘢㗎喇咩?
Output:
----------
禾兮`,我大丌゙五.⁼係爻彡¯々已經亾兮´丩个´度好力旡`夫旡¯丩今´噏止个´你丩个´大子¯丩今`臼了⁼力了¯丩旡´央旡˝丩乍`力乍`文旡¯?
----------

alt text

Installation

pip install ToJyutping
pip install pycantonese
pip install wordsegment

In order to render the Jyutcitzi glyphs when using the font mode, install the fonts at https://github.com/jyutcitzi/jyutcitzi-fonts.

Usage

usage: transliterate.py [-h] [-m MODE] [-s STYLE] [-r R] [-v V] [-t DIRECTION]
                        [--use_repeat_char BOOL] [--use_schwa_char BOOL]
                        [-a ALGORITHM] [-x BOOL]

optional arguments:
  -h, --help            show this help message and exit
  -m MODE, --mode MODE  use font or web-style characters
  -s STYLE, --style STYLE
                        use jcz-only (jcz_only) or sinoglyph-jcz (honzi_jcz)
                        writing style
  -r R, --r_block R     whether to use r, wl or w for representing the onset
                        'r'
  -v V, --v_block V     whether to use v or f for representing the onset 'v'
  -t DIRECTION, --tone_config DIRECTION
                        whether to use vertical or horizontal ticks for tones
                        1 and 4
  --use_repeat_char BOOL
                        whether to use the repeat glyph 々
  --use_schwa_char BOOL
                        whether to use 亇 for the final 'a' instead of 乍
  -a ALGORITHM, --alg ALGORITHM
                        algorithm for obtaining Jyutping romanizations:
                        PyCantonese or ToJyutping
  -x BOOL, --sep_eng_words BOOL
                        whether to add whitespace between English words in the
                        output

jyutcitzi-transliterate's People

Contributors

jyutcitzi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.