Giter VIP home page Giter VIP logo

lua-utf8-simple's Introduction

lua-utf8-simple

This "library" is meant to be a very thin helper that you can easily drop in to another project without really calling it a dependency. It aims to provide the most minimal of handling functions for working with utf8 strings. It does not aim to be feature-complete or even error-descriptive. It works for what is practical but not complex. You have been warned. =^__^=

The require() Line

local utf8 = require('utf8_simple')

The Only Functions You Need to Know

utf8.chars(s[, no_subs])

  • s: (string) the utf8 string to iterate over (by characters)
  • nosubs: (boolean) true turns the substring utf8 characters into byte-lengths
-- i is the character/letter index within the string
-- c is the utf8 character (string of 1 or more bytes)
-- b is the byte index within the string
for i, c, b in utf8.chars('Αγαπώ τηγανίτες') do
	print(i, c, b)
end

Output:

1	Α	1
2	γ	3
3	α	5
4	π	7
5	ώ	9
6		11
7	τ	12
8	η	14
9	γ	16
10	α	18
11	ν	20
12	ί	22
13	τ	24
14	ε	26
15	ς	28

ALTERNATE FORM

Creating small substrings can be a performance concern, the 2nd parameter to utf8.chars() allows you to toggle the substrings to instead by the byte width of the character.

This is for situations when you only care about the byte width (less common).

-- i is the character/letter index within the string
-- w is the utf8 character width (in bytes)
-- b is the byte index within the string
for i, w, b in utf8.chars('Αγαπώ τηγανίτες', true) do
	print(i, w, b)
end

Output:

1	2	1
2	2	3
3	2	5
4	2	7
5	2	9
6	1	11
7	2	12
8	2	14
9	2	16
10	2	18
11	2	20
12	2	22
13	2	24
14	2	26
15	2	28

utf8.map(s, f[, no_subs])

  • s: (string) the utf8 string to map 'f' over
  • f: (function) a function accepting: f(visual_index, utf8_char -or- width, byte_index)
  • no_subs: (boolean) true means don't make small substrings from each character (byte width instead)

returns: (nothing)

> utf8.map('Αγαπώ τηγανίτες', print) -- does the same as the first example above
> utf8.map('Αγαπώ τηγανίτες', print, true) -- the alternate form from above

Others

utf8.len(s)

  • s: (string) the utf8 string

returns: (number) the number of utf8 characters in s (not the byte length)

note: be aware of "invisible" utf8 characters

> = utf8.len('Αγαπώ τηγανίτες')
15

utf8.reverse(s)

  • s: (string) the utf8 string

returns: (string) the utf8-reversed form of s

note: reversing left-to-right utf8 strings that include directional formatting characters will look odd

> = utf8.reverse('Αγαπώ τηγανίτες')
ςετίναγητ ώπαγΑ

utf8.strip(s)

  • s: (string) the utf8 string

returns: (string) s with all non-ascii characters removed (characters > 1 byte)

> = utf8.strip('cat♥dog')
catdog

utf8.replace(s, map)

  • s: (string) the utf8 string
  • map: (table) keys are utf8 characters to replace, values are their replacement

returns: (string) s with all the key-characters in map replaced

note: the keys must be utf8 characters, the values can be strings

> = utf8.replace('∃y ∀x ¬(x ≺ y)', { [''] = 'E', [''] = 'A', ['¬'] = '\r\n', [''] = '<' })
Ey Ax 
(x < y)

utf8.sub(s, i, j)

  • s: (string) the utf8 string
  • i: (string) the starting index in the utf8 string
  • j: (stirng) the ending index in the utf8 string

returns: (string) the substring formed from i to j, inclusive (this is a utf8-aware string.sub())

> = utf8.sub('Αγαπώ τηγανίτες', 3, -5)
απώ τηγαν

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.