Giter VIP home page Giter VIP logo

utf-7's Introduction

A UTF-7 stream encoder and decoder in ANSI C

This is small, free-standing, public domain library encodes a stream of code points into UTF-7 and vice versa. It requires only a single, small struct for its entire internal state.

API

Initialize a context struct (struct utf7), set its buf and len pointers to a buffer for input or output, then "pump" the encoder or decoder similarly to zlib.

char buffer[256];
struct utf7 ctx;

utf7_init(&ctx, "=@"); /* indirectly encode = and @ */
ctx.buf = buffer;
ctx.len = sizeof(buffer));

The context must be re-initialized before switching between encoding and decoding.

utf7_init()

void utf7_init(struct utf7 *, const char *indirect);

The utf7_init() function initializes a context for either encoding or decoding. The indirect argument is optional and may be NULL. It is ignored when the context is used for decoding. By default the encoder directly encodes every character that is permitted to be directly encoded. The indirect argument subtracts from this set of directly-encoded characters. This may be desirable for certain characters, such as = (EQUALS SIGN).

utf7_encode()

int utf7_encode(struct utf7 *, long codepoint);

The utf7_encode() function writes a code point to the buffer pointed to by the context. The buf and len fields are updated on the context as output is produced. Code points outside the Basic Multilingual Plane (BMP) are automatically encoded as surrogate halves for UTF-7.

When there is nothing more to encode, call the encoder with the special UTF7_FLUSH code point to force all remaining output from the context. This behaves just like any other code point, particularly with respect to the return values below, but obviously this value will not be written into the output. After flushing, the context will effectively be reinitialized.

There are two possible return values:

  • UTF7_OK: The operation completed successfully.

  • UTF7_FULL: The output buffer filled up before the operation could be completed. Consume the output buffer as appropriate for your application, update the context's buf and len to a fresh buffer, and continue the operation by calling it again with the exact same arguments.

utf7_decode()

long utf7_decode(struct utf7 *);

This function operates in reverse, consuming input from buf on the context and returning a code point. Surrogate halves in the underlying stream are automatically recombined into a non-BMP code point.

There are four possible return values:

  • UTF7_OK: Input was exhausted, but this is a valid ending for a stream.

  • UTF7_INCOMPLETE: Input was exhausted but more input is expected. If there is no more input, this should be treated as an error since the input was truncated.

  • UTF7_INVALID: The input is not valid UTF-7. The offending byte is pointed to by buf.

  • Any other return value is a code point.

conv7

Under tests/ is a simple command line tool called conv7 that converts between UTF-7 and other encodings via standard input and standard output. For example, to convert a UTF-8 file to UTF-7:

$ conv7 -f utf-8 <in-u8.txt >out-u7.txt

Or vice versa:

$ conv7 -t utf-8 <in-u7.txt >out-u8.txt

utf-7's People

Contributors

skeeto avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.