Giter VIP home page Giter VIP logo

pyewts's Introduction


OpenPecha

Python Tibetan Unicode to Wylie (EWTS) Converter

DescriptionInstallationExamplesChangesLicenseMaintenanceOwner


Description

The goal of this code is to provide a library to convert back and forth between Tibetan Unicode and EWTS. The code is adapted from Java ewts-converter.

It also provides a conversion from the ACIP Transliteration to EWTS.

Installation

pip install pyewts

Examples

Convert Wylie to Unicode

import pyewts

converter = pyewts.pyewts()
print(converter.toUnicode("ba b+ba [a] ba\\u0f0b"))
# བ་བྦ་a་བ་

Convert Unicode to Wylie

print(converter.toWylie("༼༽"))
# ()

Catch Wylie warnings

>>> orig = """dangs
... zhwa
... dwang
... rma
... tshe
... phywa
... dge
... rgya
... dwags
... (rtse mgron)"""
>>> 
>>> print(orig)
dangs
zhwa
dwang
rma
tshe
phywa
dge
rgya
dwags
(rtse mgron)
>>> warns = []
>>> res = converter.toUnicode(orig, warns)
>>> print(res)
དངས
ཞྭ
དྭང
རྨ
ཚེ
ཕྱྭ
དགེ
རྒྱ
དྭགསརྩེམགྲོན>>> print(warns)
['line 1: "dangs": Syllable should probably be "dngas".']

See demo.py

Changes

See CHANGELOG.md.

License

The Python code is Copyright (C) 2018 Esukhia, provided under MIT License. See CONTRIBUTORS.md for a list of authors and contributors.

Maintenance

Build the source dist:

rm -rf dist/
python3 setup.py clean sdist

and upload on twine (version >= 1.11.0) with:

twine upload dist/*

Owner

pyewts's People

Contributors

10zinten avatar drupchen avatar eroux avatar ngawangtrinley avatar riggy2013 avatar slad2019 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pyewts's Issues

Python tools for Tibetan latin to unicode

Hi Elie. Nice to see you're using Python, as I remember you were mainly using lower level languages.

I am glad to see more work being done in this field. Just so you know I also have a Python tool for solving similar problems: https://github.com/ironhouzi/pytib

I hope we could perhaps join our efforts, so that we could benefit from one another.

pytib is far from feature complete and has a different approach than pyewts. I am also curious to see how compatible pytib handles EWTS, as pytib supports dynamic configuration of Latin definitions. pytib has just been a pet project I started when I first learned Python, but as a professional Python programmer, I gave it a big rewrite last year. Still, I think it could use a whole lot more polish, so learning from other skilled developers solving similar problems is very inspiring.

The main difference between the two is definitely the algorithm. I've chosen a more analytical approach, and you're using lookup tables. I think there are pros and cons to both approaches. While there's a benefit of getting rudimentary spell checking from using the analytical approach, the performance is not spectacular. I would assume lookup tables gives good performance, which is the current challenge I'm trying to tackle through concurrent processing. While I find the translation function is pretty OK, the implementation code that utilizes the parse() function for parsing documents has been implemented rather quickly. It uses line based handling instead of character based handling, which seems like a better approach for managing correct Tibetan punctuation and will also need to be figured out before I can do any work on concurrent Latin-Tibetan parsing.

Looking forward to learning from you.

below Tibetan syllables can't be converted.

I used pyewts to convert some text and here are the Tibetan syllables with errors.

dangs
zhwa
dwang
rma
tshe
phywa
dge
rgya
dwags

I don't have much time to look into it by myself. Will someone pls fix them?

Interestingly, it can convert left parenthesis but not the right one. For example,

(rtse mgron)

will be converted to

༼རྩེ་མགྲོན)

unresolved reference

my Python IDE tells me of two unresolved references for TransConverter on lines 804 and 806.
The library seems to be working nonetheless, so it is a low priority issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.