Giter VIP home page Giter VIP logo

unicode's Introduction

unicode

A kichen knife tool for unicode. The features are below for example.

  • Reading string, with or without normalizing.
  • Display charactors of unicode specified.
  • Showing categories and charactors.
% unicode.py show -nN 'SMILING FACE WITH OPEN MOUTH' | unicode.py read -l
No.  Chr    EAA SZ CP   Name
==== ====== === == ==== ==========
  1: [๐Ÿ˜ƒ ] W   2: 0001F603 SMILING FACE WITH OPEN MOUTH
  2: [๐Ÿ˜„ ] W   2: 0001F604 SMILING FACE WITH OPEN MOUTH AND SMILING EYES
  3: [๐Ÿ˜… ] W   2: 0001F605 SMILING FACE WITH OPEN MOUTH AND COLD SWEAT
  4: [๐Ÿ˜† ] W   2: 0001F606 SMILING FACE WITH OPEN MOUTH AND TIGHTLY-CLOSED EYES

Reading String

It decodes the input string into Unicode and to show the encoded string. Usually, it normalizes with NFC to show the string. So, you can normalize a decomposed string like below.

% cat sample/hoge.nfd.txt
ใปใ‘ใ‚™ใฏใ‘ใ‚™

% cat sample/hoge.nfd.txt | unicode.py read
ใปใ’ใฏใ’

If you want to see a decomposed string, you can use the -n NFD option.

% echo ใปใ’ใฏใ’ | unicode.py read -n NFD
ใปใ‘ใ‚™ใฏใ‘ใ‚™

With the -l option, you can see the charactors in detail.

cat sample/hoge.nfd.txt | unicode read -n NFD -l
No.  Chr    EAA SZ CP   Name
==== ====== === == ==== ==========
  1: [ใป ] W   2: 307B HIRAGANA LETTER HO
  2: [ใ‘ ] W   2: 3051 HIRAGANA LETTER KE
  3: [ใ‚™ ] W   2: 3099 COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK
  4: [ใฏ ] W   2: 306F HIRAGANA LETTER HA
  5: [ใ‘ ] W   2: 3051 HIRAGANA LETTER KE
  6: [ใ‚™ ] W   2: 3099 COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK
  7: ['\n'] N   1: 000A None

More examples.

unicode.py read -f sample/hoge.nfd.txt
unicode.py read -f sample/hoge.nfc.txt -o output.txt
unicode.py read -f sample/hoge.nfc.txt -n NFD -l

Usage.

% unicode.py read -h
usage: unicode.py read [-h] [-f INPUT_FILE] [-o OUTPUT_FILE] [-l] [-c] [-E]
                       [-n NORMALIZE_MODE]

optional arguments:
  -h, --help            show this help message and exit
  -f INPUT_FILE         specify the filename for input.default is stdin.
  -o OUTPUT_FILE        specify the filename for output.default is stdout.
  -l                    show the unicode name for each charactor.
  -c                    add the number of charactors to the tail.
  -E                    disable to handle the input text as EAA.
  -n NORMALIZE_MODE, --normalize-mode NORMALIZE_MODE
                        specify a mode to normalize. valid mode is: ['NFC',
                        'NFKC', 'NFD', 'NFKD']

Displaying Charactors from Unicode Code Points, or Unicode name.

Examples.

% unicode.py show 1F64b 1f3fb
๐Ÿ™‹๐Ÿป
% unicode.py show 307B 3051 3099
ใปใ‘ใ‚™
% unicode.py show 845B e0100
่‘›๓ „€

You can find the unicode name of each code point.

% unicode.py show 307B 3051 3099 -l
  307B: ใป : HIRAGANA LETTER HO
  3051: ใ‘ : HIRAGANA LETTER KE
  3099: ใ‚™ : COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK

With the -s option, you can see the characors of a range of two code points.

% unicode.py show 3041 304F -s
ใใ‚ใƒใ„ใ…ใ†ใ‡ใˆใ‰ใŠใ‹ใŒใใŽใ

If you want to specify the code points in integer, you can use the -i option. So, below commands result same output.

% unicode.py show 12353 12367 -s -i 
ใใ‚ใƒใ„ใ…ใ†ใ‡ใˆใ‰ใŠใ‹ใŒใใŽใ

You can see the charactors from a part of or full Unicode name.

% unicode.py show -n 'SMILING FACE WITH OPEN MOUTH'  
๐Ÿ˜ƒ๐Ÿ˜„๐Ÿ˜…๐Ÿ˜†

Usage.

% unicode.py show -h
usage: unicode.py show [-h] [-s] [-l] [-i] [-n] [-x] [-N] arg [arg ...]

positional arguments:
  arg               a code point in hex.

optional arguments:
  -h, --help        show this help message and exit
  -s                indicate to show a series of the chars specified by the
                    two code points.
  -l                show the chars in virtical with the unicode name.
  -i                specify that the arg is a code point in integer.
  -n                specify that the arg is a part of a unicode name.
  -x                disable to show a unicode name of the option -l.
  -N, --no-newline  disable to add a newline.

Listing Unicode Charactors.

You can see the charactors in a category you specified. Below command, you can see a list of categories in the 1st level.

% unicode.py list
0: European Scripts
1: Modifier Letters
2: Combining Marks
3: African Scripts
4: Middle Eastern Scripts
5: Central Asian Scripts
6: South Asian Scripts
7: Southeast Asian Scripts
8: Indonesia & Oceania Scripts
9: East Asian Scripts
10: American Scripts
11: Other
12: Notational Systems
13: Punctuation
14: Alphanumeric Symbols
15: Technical Symbols
16: Numbers & Digits
17: Mathematical Symbols
18: Emoji & Pictographs
19: Other Symbols
20: Specials
21: Private Use
22: Surrogates
23: Noncharacters in Charts

You can pick one of them to see. Let's see Emoji and Pictographs. With the -c option, you can see a list of subcategories under the category.

% unicode.py list -c 'Emoji & Pictographs'
## Emoji & Pictographs
0: 'Dingbats'
1: 'Ornamental Dingbats'
2: 'Emoticons'
3: 'Miscellaneous Symbols'
4: 'Miscellaneous Symbols And Pictographs'
5: 'Supplemental Symbols and Pictographs'
6: 'Symbols and Pictographs Extended-A'
7: 'Transport and Map Symbols'

You can specify a case insensitive part of the name, or the number of the index. Below two commands result same output.

% unicode.py list -c emoji
% unicode.py list -c 18

Now, you can see a list of the charactores in 'Emoticons'

% unicode.py list -c 'Emoji & Pictographs' -k Emoticons
Emoticons 1F600-1F64F 80
      0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
   0 ๐Ÿ˜” ๐Ÿ˜• ๐Ÿ˜– ๐Ÿ˜— ๐Ÿ˜˜ ๐Ÿ˜™ ๐Ÿ˜š ๐Ÿ˜› ๐Ÿ˜œ ๐Ÿ˜ ๐Ÿ˜ž ๐Ÿ˜Ÿ ๐Ÿ˜  ๐Ÿ˜ก ๐Ÿ˜ข ๐Ÿ˜ฃ ๐Ÿ˜ค ๐Ÿ˜ฅ ๐Ÿ˜ฆ ๐Ÿ˜ง
   1 ๐Ÿ˜จ ๐Ÿ˜ฉ ๐Ÿ˜ช ๐Ÿ˜ซ ๐Ÿ˜ฌ ๐Ÿ˜ญ ๐Ÿ˜ฎ ๐Ÿ˜ฏ ๐Ÿ˜ฐ ๐Ÿ˜ฑ ๐Ÿ˜ฒ ๐Ÿ˜ณ ๐Ÿ˜ด ๐Ÿ˜ต ๐Ÿ˜ถ ๐Ÿ˜ท ๐Ÿ˜ธ ๐Ÿ˜น ๐Ÿ˜บ ๐Ÿ˜ป
   2 ๐Ÿ˜ผ ๐Ÿ˜ฝ ๐Ÿ˜พ ๐Ÿ˜ฟ ๐Ÿ™€ ๐Ÿ™ ๐Ÿ™‚ ๐Ÿ™ƒ ๐Ÿ™„ ๐Ÿ™… ๐Ÿ™† ๐Ÿ™‡ ๐Ÿ™ˆ ๐Ÿ™‰ ๐Ÿ™Š ๐Ÿ™‹ ๐Ÿ™Œ ๐Ÿ™ ๐Ÿ™Ž ๐Ÿ™

You can use the -k option as same as the -c option. So, below two commands result same output.

% unicode list -c emoji -k 2
% unicode list -c 18 -k emoti

Or, you can just specify 'emoti' as it can search with the entire db.

% unicode.py list -k emoti
Emoticons 1F600-1F64F 80
      0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
   0 ๐Ÿ˜” ๐Ÿ˜• ๐Ÿ˜– ๐Ÿ˜— ๐Ÿ˜˜ ๐Ÿ˜™ ๐Ÿ˜š ๐Ÿ˜› ๐Ÿ˜œ ๐Ÿ˜ ๐Ÿ˜ž ๐Ÿ˜Ÿ ๐Ÿ˜  ๐Ÿ˜ก ๐Ÿ˜ข ๐Ÿ˜ฃ ๐Ÿ˜ค ๐Ÿ˜ฅ ๐Ÿ˜ฆ ๐Ÿ˜ง
   1 ๐Ÿ˜จ ๐Ÿ˜ฉ ๐Ÿ˜ช ๐Ÿ˜ซ ๐Ÿ˜ฌ ๐Ÿ˜ญ ๐Ÿ˜ฎ ๐Ÿ˜ฏ ๐Ÿ˜ฐ ๐Ÿ˜ฑ ๐Ÿ˜ฒ ๐Ÿ˜ณ ๐Ÿ˜ด ๐Ÿ˜ต ๐Ÿ˜ถ ๐Ÿ˜ท ๐Ÿ˜ธ ๐Ÿ˜น ๐Ÿ˜บ ๐Ÿ˜ป
   2 ๐Ÿ˜ผ ๐Ÿ˜ฝ ๐Ÿ˜พ ๐Ÿ˜ฟ ๐Ÿ™€ ๐Ÿ™ ๐Ÿ™‚ ๐Ÿ™ƒ ๐Ÿ™„ ๐Ÿ™… ๐Ÿ™† ๐Ÿ™‡ ๐Ÿ™ˆ ๐Ÿ™‰ ๐Ÿ™Š ๐Ÿ™‹ ๐Ÿ™Œ ๐Ÿ™ ๐Ÿ™Ž ๐Ÿ™

With the -a option, you can see the entire charactors under the Emoji and Pictgraphs.

unicode.py l -c emoji -a

You may see many squares. That means your terminal doesn't support the symbols.

Usage.

% unicode.py list -h
usage: unicode list [-h] [-c CATEGORY_HINT] [-a] [-r] [-k KEYWORD_HINT]
                    [--columns NB_COLUMNS]

options:
  -h, --help            show this help message and exit
  -c CATEGORY_HINT      specify a unicode category, which is case-insensitive,
                        can be a part of the name, can be a number in the
                        list.
  -a                    show all chars under the category specified.
  -r                    show the range of code point. It is not valid when the
                        hint unique a subategory or the -a option is used.
  -k KEYWORD_HINT       specify a unicode sub category name, which is case-
                        insensitive, can be a part of the name, can be a
                        number in the list.
  --columns NB_COLUMNS  specify the number of the columns to show.

unicode's People

Contributors

tanupoo avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.