Giter VIP home page Giter VIP logo

cjkvi-ids's Introduction

IDS data

This is a collection of various IDS (Ideographic Description Sequence) data.

Description

IDS (Ideographic Description Sequence) is a way to describe the structure of CJK Unified ideographs.

The IDS consists of IDCs (Ideographic Description Characters), namely "⿰" (U+2FF0) to "⿻" (U+2FFB), and DCs (Description Characters), that are usually ideographs.

IDS is quite important information for ideographs, as it may be possible to identify ideographs from them.

However, there may be ambiguity for encoding IDS. Therefore, tools to normalize IDS and identify the ideographs would be important. IDS tool is one of such example.

Also, IDS sequences use full range of CJK ideographs, so the fonts that covers all encoded ideographs (such as HanaMin or Hanamin AFDKO ) should be used.

Encoding Policies

  • Compatibility ideographs, whose IDSes are not equal to their corresponding unified ideographs, may be used as DCs. When there are multiple compatibility ideographs with the same IDS, then the one with smaller character code will be used. (e.g. ⻀,並,荒,冗,叟,切,巢,廾,戛,桒,甾,𤾡,舁,蕤,貫,黾)

  • Following non-ideographs may be used as DCs (for now). "αℓ△⺀⺄⺆⺈⺊⺌⺍⺶⺸⺻⺼〇〢キサ㇀㇉㇢㇞"

  • Encircled numerics ① ~ ⑳ represents unencoded DCs. Number denotes its stroke count. This would be useful when calculating total strokes of ideographs. Such convention does not conform with the Annex I of ISO/IEC 10646, so please replace them with wildcard character `?' (U+FF1F) if you need a strict conformance with the UCS standard.

  • IDS data file with name postfix "*-cdp.txt" adopts PUA characters from CDP (CDP stands for "Chinese Document Processing lab" at Academia Sinica) as DCs. They are deonted as entity reference like "&CDP-xxxx;".

    At the end of "ids-cdp.txt", mappings between PUA DCs and CDP references are enumerated. For details of usable PUA characters, refer an article on CDP at GlyphWiki. CDP's hexadecmail numbers and Unicode BMP PUA character codepoints relationship is based on EUDC codepoints defined by by Microsoft Big5 to PUA conversion table. HanaMinAFDKO Font supports these glyphs in PUA.

  • IDS of compatibility ideographs may sometimes have compatibility ideographs as DCs, by mean of clarifying the difference of their structures compared with corresponding unified ideographs.

  • "G", "T", "J", "K", "V", etc. signs with brackets after IDS indicate that such IDS is specific to each columns of UCS code charts. "A" indicates AJ1-6 shapes, and "X" indicates virtual shape that is not actually appeared in the UCS specification, but possibly matches to that code points according to the Annex S of UCS. Some of such shapes may appear in OS-equipped fonts such as MingLiu, MS-Mincho or SimSun, or famous dictionaries such as "Dai Kanwa Jiten". "O" indicates "obsolete", that was once appeared in older edition of the UCS standard, but no longer.

Licenses

  • 'ids.txt' is derived from CHISE project. License follows their terms. 'ids-ext-cde.txt' is not directly based on CHISE project, and is not restricted to GPLv2 license.

  • All other data are distributed uner GPLv2.

Author

cjkvi-ids's People

Contributors

kawabata avatar mashabow avatar jlhwung avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.