Giter VIP home page Giter VIP logo

unicode's Introduction

This file is in UTF-8 encoding.

To use unicode utility, you need: 
 - python >=2.6 (str format() method is needed), preferrably wide
   unicode build, however, python3 is recommended
 - python optparse library (part of since python2.3)
 - UnicodeData.txt file (http://www.unicode.org/Public) which
   you should put into /usr/share/unicode/, ~/.unicode/ or current
   working directory.
    - apt-get install unicode-data  # Debian
    - dnf install unicode-ucd       # Fedora
 - if you want to see UniHan properties, you need also Unihan.txt file
   which should be put into /usr/share/unicode/, ~./unicode/ or
   current working directory.


Enter regular expression, hexadecimal number or some characters as an
argument. unicode will try to guess what you want to look up, see the
manpage if you want to force other behaviour (the manpage is also the
best documentation). In particular, -r forces searching for regular
expression in the names of character, -s forces unicode to display
information about the characters given.

Here are just some examples:

$ unicode.py euro
U+20A0 EURO-CURRENCY SIGN
UTF-8: e2 82 a0   UTF-16BE: 20a0   Decimal: ₠
₠
Category: Sc (Symbol, Currency)
Bidi: ET (European Number Terminator)

U+20AC EURO SIGN
UTF-8: e2 82 ac   UTF-16BE: 20ac   Decimal: €
€
Category: Sc (Symbol, Currency)
Bidi: ET (European Number Terminator)

$ unicode.py 00c0
U+00C0 LATIN CAPITAL LETTER A WITH GRAVE
UTF-8: c3 80   UTF-16BE: 00c0   Decimal: À
À (à)
Lowercase: U+00E0
Category: Lu (Letter, Uppercase)
Bidi: L (Left-to-Right)
Decomposition: 0041 0300



You can specify a range of characters as arguments, unicode will show
these characters in nice tabular format, aligned to 256-byte boundaries.  
Use two dots ".." to indicate the range, e.g.

       unicode 0450..0520

will display the whole cyrillic, armenian and hebrew blocks (characters from U+0400 to U+05FF)

       unicode 0400..

will display just characters from U+0400 up to U+04FF

Use --fromcp to query codepoints from other encodings:

$ unicode --fromcp cp1250 -d 200
U+010C LATIN CAPITAL LETTER C WITH CARON
UTF-8: c4 8c  UTF-16BE: 010c  Decimal: Č
Č (Č)
Uppercase: U+010C
Category: Lu (Letter, Uppercase)
Bidi: L (Left-to-Right)
Decomposition: 0043 030C

Multibyte encodings are supported:
$ unicode --fromcp big5 -x aff3

and multi-char strings are supported, too:

$ unicode --fromcp utf-8 -x c599c3adc5a5


On format (--format='...'):

Format string tells unicode which information should be displayed.
There is one (and only one) escape character recognised, \n for a new line.

You can use standard python .format() syntax. Following variables are
recognized:

{black} {red} {green} {yellow}
{blue} {magenta} {cyan} {white}  -- ANSI colours (foreground)

{on_black} {on_red} ...          -- ANSI colours (background)

{no_colour} {default} {bold}
{underline} {blink} {reverse}
{concealed}                      -- self-explaining ANSI escape codes

{ordc} -- unicode codepoint of the character (integer)
{name} -- unicode name of the character
{utf8} -- utf8 representation of the character (hexadecimal)
{utf16be} -- utf16 representation of the character (hexadecimal)
{decimal} -- decimal representation of the character
{opt_additional} -- optional representation in additional charset (-c); 
                    empty string if not specified
{pchar} -- the character itself
{opt_flipcase} -- upper- or lowercase opposite of the character, in parentheses;
                  empty if character is not cased
{opt_uppercase}{opt_lowercase} -- optional string describing uppercase
                                  or lowercase variant of the character;
                                  empty if character is not cased
{category} {category_desc} -- character category and its human readable description
{opt_numeric}{numeric_desc} -- the string `Numeric value:' and the numeric value
                               of the character; both empty if the character
                               has no numeric value
{opt_digit}{digit_desc} -- the string `Digit value:' and the digit value
                           of the character; both empty if the character
                           has no digit value
{opt_bidi}{bidi}{bidi_desc} -- the string `Bidi:', the bidi property and
                               a human readable description 
                               of the bidi property; empty if the character
                               has no bidi category
{mirrored_desc} -- the string 'Character is mirrored' if the character is mirrored,
                   empty otherwise
{opt_combining}{combining_desc} -- the string `Combining: ', combining class and a
                                   human readable description of the combining class;
                                   empty if the character is not combining
{opt_decomp}{decomp_desc} -- the string `Decomposition: ' and a hexadecimal sequence
                             of decomposition characters; empty if the character
                             has no decomposition
{opt_unicode_block}{opt_unicode_block_desc} -- the string `Unicode block:', range of the unicode block and description of said unicode block for the given character

unicode's People

Contributors

garabik avatar remram44 avatar anomen-s avatar cben avatar davejagoda avatar raylu-stripe avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.