Giter VIP home page Giter VIP logo

icubaby's Introduction

icubaby

A C++ Baby Library to Immediately Convert Unicode. The icubaby library offers a portable, header-only, dependency-free, library for C++ 17 or later. Fast, minimal, and easy to use for converting sequences of text between any of the Unicode UTF encodings. It does not allocate dynamic memory and neither throws nor catches exceptions.

icubaby is in no way related to the International Components for Unicode library and has no connection to any Intensive Care Unit!

Status

Category Badges
License License: MIT
Continuous Integration CI Build & Test Documentation Status
Static Analysis Quality Gate Status Codacy Badge CodeQL Microsoft C++ Code Analysis Coverity
Runtime Analysis Fuzz Test codecov
OpenSSF OpenSSF Scorecard OpenSSF Best Practices

Introduction

C++ 17 deprecated the standard library's <codecvt> header file which contained its unicode conversion facets. Those features weren’t easy to use correctly but without them code is forced to look to other libraries. icubaby is such a library that fulfills the role of converting between the expressions of Unicode. It is simple to use and exceptionally simple to integrate into a project.

The library offers an API which converts to and from UTF-8, UTF-16, or UTF-32 encodings. It can also consume a byte stream of where an optional byte order mark at the start of the stream identifies both the source encoding and byte-order.

Installation

icubaby is entirely contained within a single header file. Installation can be as simple as copying that file (include/icubaby/icubaby.hpp) into your project. It has no dependencies and self-configures to your environment.

Usage

Check out the project documentation: https://paulhuggett-icubaby.readthedocs.io/en

icubaby uses four different types to express the different Unicode encodings that it supports:

Type Meaning
std::byte Encoding and byte-order is determined by the stream byte order mark
icubaby::char8 UTF-8. icubaby::char8 is defined as char8_t when the native type is available and char otherwise
char16_t UTF-16 host-native endian
char32_t UTF-32 host-native endian

There are three ways to use the icubaby library depending on your needs:

  1. C++ 20 range adaptor
  2. Output Iterator interface
  3. Converting one code-unit at a time

1. C++ 20 Range Adaptor

C++ 20 introduced the ranges library for composable and less error-prone interaction with iterators and containers. In icubaby, we can transform a range of input values from one Unicode encoding to another using a single range adaptor:

auto const in = std::array{char32_t{0x1F600}};
auto r = in | icubaby::views::transcode<char32_t, char16_t>;
std::vector<char16_t> out;
std::ranges::copy(r, std::back_inserter(out));

This code converts a single Unicode code-point 😀 (U+1F600 GRINNING FACE) from UTF-32 to UTF-16 and will copy two UTF-16 code-units (0xD83D and 0xDE00) into the out vector.

auto const in = std::array{std::byte{0xFE}, std::byte{0xFF}, std::byte{0x00},
                           std::byte{'A'},  std::byte{0x00}, std::byte{'b'}};
auto r = in | icubaby::views::transcode<std::byte, icubaby::char8>;
std::vector<icubaby::char8> out;
std::ranges::copy(r, std::back_inserter(out));

This snippet converts “Ab” (U+0041 LATIN CAPITAL LETTER A), (U+0042 LATIN SMALL LETTER B) from big-endian UTF-16 to UTF-8.

See the C++20 Range Adaptor documentation for more details.

2. The Output Iterator Interface

auto const in = std::vector{char8_t{0xF0}, char8_t{0x9F}, char8_t{0x98}, char8_t{0x80}};
std::vector<char16_t> out;
icubaby::t8_16 t;
auto it = icubaby::iterator{&t, std::back_inserter (out)};
for (auto cu: in) {
  *(it++) = cu;
}
it = t.end_cp (it);

The icubaby::iterator<> class offers a familiar output iterator for using a transcoder. Each code unit from the input encoding is written to the iterator and this writes the output encoding to a second iterator. This enables use to use standard algorithms such as std::copy with the library.

3. Converting One Code-Unit at a Time

Let’s try converting a single Unicode emoji character 😀 (U+1F600 GRINNING FACE) expressed as four UTF-8 code units (0xF0, 0x9F, 0x98, 0x80) to UTF-16 (where it is the surrogate pair 0xD83D, 0xDE00).

std::vector<char16_t> out;
auto it = std::back_inserter (out);
icubaby::t8_16 t;
for (auto cu: {0xF0, 0x9F, 0x98, 0x80}) {
  it = t (cu, it);
}
it = t.end_cp (it);

The out vector will contain a two UTF-16 code units 0xD83D and 0xDE00. See the explicit conversion documentation for more details.

icubaby's People

Contributors

dependabot[bot] avatar paulhuggett avatar step-security-bot avatar

Watchers

 avatar  avatar

icubaby's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.