Giter VIP home page Giter VIP logo

cihai's Introduction

cihai · Python Package License Code Coverage

Python library for CJK (chinese, japanese, korean) data.

This project is under active development. Follow our progress and check back for updates!

Quickstart

API / Library (this repository)

$ pip install --user cihai
from cihai.core import Cihai

c = Cihai()

if not c.unihan.is_bootstrapped:  # download and install Unihan to db
    c.unihan.bootstrap()

query = c.unihan.lookup_char('好')
glyph = query.first()
print("lookup for 好: %s" % glyph.kDefinition)
# lookup for 好: good, excellent, fine; well

query = c.unihan.reverse_char('good')
print('matches for "good": %s ' % ', '.join([glph.char for glph in query]))
# matches for "good": 㑘, 㑤, 㓛, 㘬, 㙉, 㚃, 㚒, 㚥, 㛦, 㜴, 㜺, 㝖, 㤛, 㦝, ...

See API documentation and /examples.

CLI (cihai-cli)

$ pip install --user cihai-cli

Character lookup:

$ cihai info 好
char: 
kCantonese: hou2 hou3
kDefinition: good, excellent, fine; well
kHangul: 
kJapaneseOn: KOU
kKorean: HO
kMandarin: hǎo
kTang: "*xɑ̀u *xɑ̌u"
kTotalStrokes: "6"
kVietnamese: háo
ucn: U+597D

Reverse lookup:

$ cihai reverse library
char: 
kCangjie: WLGA
kCantonese: syu1
kCihaiT: '308.302'
kDefinition: library
kMandarin: 
kTotalStrokes: '13'
ucn: U+5715
--------

UNIHAN data

All datasets that cihai uses have stand-alone tools to export their data. No library required.

Developing

$ git clone https://github.com/cihai/cihai.git`
$ cd cihai/

Bootstrap your environment and learn more about contributing. We use the same conventions / tools across all cihai projects: pytest, sphinx, mypy, ruff, tmuxp, and file watcher helpers (e.g. entr(1)).

Python versions

  • 0.19.0: Last Python 3.7 release

Quick links

Docs Build Status

cihai's People

Contributors

dependabot-preview[bot] avatar frankier avatar kianmeng avatar pre-commit-ci[bot] avatar pyup-bot avatar tony avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cihai's Issues

Simplify usage

For now, simplify API and usage

  • Zero config, eliminate need for a config
    • Eliminate datasets via config
    • Replace datasets by cataloging current datasets and versions inside cihai's DB
  • Allow for minor configuration via YAML to override database and file locations
  • SQLite db backend by default
  • UNIHAN by default (#3)

Plugin system

Plugin system

Database architecture

https://github.com/cihai/unihan-db vs automap

Look at examples of SQLAlchemy enterprise architecture to find
a pattern that allows maximize flexibility. We want to pick a good,
composeable pattern for our long term expansion, testability and
understanding, and for our downstream users (whether software
libraries or private projects)

Examples:

Related

#131

Quickstart Code Error: NameError: name 'unihan_options' is not defined

install cihai and try the following code, but get this

Traceback (most recent call last):
  File "test_hanyuDict.py", line 6, in <module>
    c.unihan.bootstrap(unihan_options)
NameError: name 'unihan_options' is not defined

code details:

 from cihai.core import Cihai

 c = Cihai()

 if not c.unihan.is_bootstrapped:  # download and install Unihan to db
     c.unihan.bootstrap(unihan_options)

 query = c.unihan.lookup_char('好')
 glyph = query.first()
 print("lookup for 好: %s" % glyph.kDefinition)
 # lookup for 好: good, excellent, fine; well

 query = c.unihan.reverse_char('good')
 print('matches for "good": %s ' % ', '.join([glph.char for glph in query]))
 # matches for "good": 㑘, 㑤, 㓛, 㘬, 㙉, 㚃, 㚒, 㚥, 㛦, 㜴, 㜺, 㝖, 㤛, 㦝, ...

Initial Update

Hi 👊

This is my first visit to this fine repo, but it seems you have been working hard to keep all dependencies updated so far.

Once you have closed this issue, I'll create separate pull requests for every update as soon as I find one.

That's it for now!

Happy merging! 🤖

Incorporate unihan data in cihai

In order to get cihai working easily out of the box, data from UNIHAN must be incorporated in some way:

  1. Generating an index for the glyphs via iterating through the Codes by regex: https://github.com/cihai/cihai/blob/0a28ce182c5e34e69dbdab8c0c42bef0bc3b1e0d/tests/test_datasets.py

  2. When packages cihai, download cihai and pick out some default fields (like kDictionary) to include with the main set.

  3. Make all versions of cihai download the full UNIHAN.zip afterwords and include the data

Example scripts/extensions to API

Hi,

I wrote some simple scripts to explore variants of Hanzi and the structure of the simplified<->traditional mapping. I was wondering if some of the things I am doing with Alchemy here and the parsing of lists of variant characters would make a useful addition to the API of cihai? What do you think?

https://github.com/frankier/STIFF/blob/master/explore/difficulties.py
https://github.com/frankier/STIFF/blob/master/explore/variants.py
https://github.com/frankier/STIFF/blob/master/explore/utils.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.