Giter VIP home page Giter VIP logo

kompress's Introduction

Documentation

kompress

This is a self-learning, proof-of-concept, no-dependency implementation of various lossless compression algorithms, that can be combined into a GZip compatible compressor/decompressor.

By design, it reads the stream of bytes/symbols only once, to comply with the GZip interface.

The API is similar to the gzip golang package : you first construct a writer (resp. a reader) and then write (resp. read) through it to compress (resp. decompress) your data. Writers/Readers can (should) be chained.

It is essential to close the Writer when finished, to ensure data is flushed.

For performance, you may want to buffer the initial io.Writer and io.Reader. This is not taken care of by the engines.

You may use these engines to preprocess data before it is gziped, or to replace gzip completely.

At the moment, the following building blocks are :

MyZip

A typical assembly of a byte-to-symbol block, then a reapeat block, then a lzw block, then a huffann block, then a bit2byte blok. It compresses bytes into bytes.

DynReader and Writer

The DReader/Writer provides an adaptative huffman compression, compressing symbols into bits (after adding and EOF Symbol to the symbol alphabet). A scheduler defines how frequently the huffman frequency tree is recomputed.

It relies on an engine that does the huffan tree management, and hwriter/reader, that implements a fixed tree huffman encoding.

LZW

This layer will use a dictionnary-based compression, based on the idea of the LZW algorith, to compress from an alphabet to a larger alphabet, buiding a dictionnary of known sequence on the way.

KDelta

This layer will not change the alphabet. It tries to predict the next Symbol, based on what it has seen so far, encoding the delta between the prediction and the truth. It does not actually "compress" the message, but improves the statistical properties for a better huffman compression stage if there are some distant redunduncies in the message.

Repeat

This layer will compress the sequences of identical successive Symbols, using and additionnal "escaped" Symbol. Therefore, the resulting alphabet is one Symbol larger.

Utilities

BitBuffer : A FIFO buffer than can read/write bits, or bytes (seen as 8 bits). Closing triggers a flush, padding with 0 bits. An EOF Symbol must be used to recognize the actual end of file.

BitFromByteReader/BitToByteWriter : a conversion layer between bits and bytes.

LogWriter : Writes to / reads from the console, for debugging.

Note on performance:

This is NOT A PRODUTION GRADE package.

As expected, performance is far from matching the build-in Golang GZip, as can be observed in the provided tests and benchmarks.

However, some of these blocks maybe used as preprocessing layers, before the built-in GZip is applied, or when processing streams of Symbols that are not bytes, but significantly wider or narrower.

And anyway, it was fun to write and debug !

kompress's People

Contributors

xavier268 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.