Giter VIP home page Giter VIP logo

python-libraptorq's Introduction

python-libraptorq

Python 2.X CFFI bindings for libRaptorQ v0.1.x - C++11 implementation of RaptorQ Forward Error Correction codes, as described in RFC6330.

Warning: Using libRaptorQ RFC6330 API (which this module wraps around) properly requires knowledge of some concepts and parameters described in that RFC, and not using correct ones may result in undecodable data! See "Usage" section below for more details.

Warning: As far as I know (not a lawyer), there are lots of patents around the use of this technology, which might be important for any high-profile and commercial projects, especially in US and Canada.

General info

Quoting wikipedia on Raptor code:

Raptor codes, as with fountain codes in general, encode a given message consisting of a number of symbols, k, into a potentially limitless sequence of encoding symbols such that knowledge of any k or more encoding symbols allows the message to be recovered with some non-zero probability.

Raptor ("RAPid TORnado") codes are the first known class of fountain codes with linear time encoding and decoding.

And RFC6330:

RaptorQ codes are a new family of codes that provide superior flexibility, support for larger source block sizes, and better coding efficiency than Raptor codes in RFC 5053.

... in most cases, a set of cardinality equal to the number of source symbols is sufficient; in rare cases, a set of cardinality slightly more than the number of source symbols is required.

In practice this means that source data block of size 1 MiB (for example) can (with very high probability) be recovered from any 1.002 MiB of the received symbols for it (from "Application Layer Forward Error Correction for Mobile Multimedia Broadcasting Case Study" paper).

Note that being a probablilistic algorithm, RaptorQ can have highly-improbable pathological cases and be exploited through these e.g. by dropping specific data blocks (see "Stopping a Rapid Tornado with a Puff" paper for more details).

Encoded data will be roughly same size as original plus the "repair symbols", i.e. almost no size overhead, except for what is intentionally generated.

Usage

Module includes command-line script ("rq", when installed or as symlink in the repo), which has example code for both encoding and decoding, and can be used as a standalone tool, or for basic algorithm testing/showcase.

Can also be used from command-line via python2 -m libraptorq ... invocation (when installed as module), e.g. python2 -m libraptorq --help.

Important: With current 0.1.x libRaptorQ API, specifying unsuitable parameters for encoding, such as having symbol_size=16 and max_memory=200 for encoding 200K+ of data WILL result in silently producing encoded data that cannot be decoded.

Command-line script

Note: it's just an example/testing script to run and check if module works with specific parameters or see how to use it, don't rely on it as a production tool or anything like that.

To encode file, with 50% extra symbols (resulting indivisible data chunks to be stored/transmitted intact or lost entirely) and 30% of total from these (K required symbols + X repair symbols) dropped (for testing purposes) before saving them to "setup.py.enc":

% ./rq --debug encode -s16 -m200 --repair-symbols-rate 0.5 --drop-rate 0.3 setup.py setup.py.enc
Initialized RQEncoder (0.063s)...
Precomputed blocks (0.002s)...
Finished encoding symbols (9 blocks, 0.008s)...
Closed RQEncoder (0.002s)...
Encoded 1,721 B into 167 symbols (needed: >108, repair rate: 50%),
  45 dropped (30%), 122 left in output (1,952 B without ids)

Decode original file back from these:

% ./rq --debug decode setup.py.enc setup.py.dec
Initialized RQDecoder (0.064s)...
Decoded enough symbols to recover data (0.010s)...
Closed RQDecoder (0.002s)...
Decoded 1,721 B of data from 108 processed symbols (1,728 B without ids, symbols total: 122)

% sha256sum -b setup.py{,.dec}
36c50348459b51821a2715b0f5c4ef08647d66f77a29913121af4f0f4dfef454 *setup.py
36c50348459b51821a2715b0f5c4ef08647d66f77a29913121af4f0f4dfef454 *setup.py.dec

No matter which chunks are dropped (get picked by random.choice), file should be recoverable from output as long as number of chunks left (in each "block") is slightly (by ~0.02%) above K.

Output data ("setup.py.enc" in the example) for the script is JSON-encoded list of base64-encoded symbols, as well as some parameters for lib init (oti_scheme, oti_common).

Input data length and sha256 hash of source data are only there to make sure that decoded data is same as original (or exit with error otherwise).

See output with --help option for all the other script parameters.

Python module

To use as a python2 module:

from libraptorq import RQEncoder

data = 'some input string' * 500

# Data size must be divisible by RQEncoder.data_size_div
data_len, n = len(data), RQEncoder.data_size_div
if data_len % n: data += '\0' * (n - data_len % n)

with RQEncoder(data, min_subsymbol_size=4, symbol_size=16, max_memory=200) as enc:

  symbols = dict()
  oti_scheme, oti_common = enc.oti_scheme, enc.oti_common

  for block in enc:
    symbols.update(block.encode_iter(repair_rate=0))

data_encoded = data_len, oti_scheme, oti_common, symbols

oti_scheme and oti_common are two integers specifying encoder options, needed to initialize decoder, which can be hard-coded (if constant) on both ends.

block.encode_iter() can be used without options to produce max possible amount of symbols, up to block.symbols + block.max_repair. Above example only produces K symbols - min amount required.

For decoding (reverse operation):

from libraptorq import RQDecoder

data_len, oti_scheme, oti_common, symbols = data_encoded

with RQDecoder(oti_common, oti_scheme) as dec:
  for sym_id, sym in symbols.viewitems(): dec.add_symbol(sym, sym_id)

  data = dec.decode()[:data_len]

Note that in practice, e.g. when transmitting each symbol in a udp packet, one'd want to send something like sym_id || sym_data || checksum, and keep sending these from block.encode_iter() until other side acknowledges that it can decode a block (i.e. enough symbols received, see RQDecoder.decode_block()), then start streaming the next block in similar fashion.

See __main__.py file (cli script) for an extended example, and libRaptorQ docs for info on its C API, which this module wraps around.

Installation

It's a regular package for Python 2.7 (not 3.X).

It uses and needs CFFI (can/should be installed by pip) and libRaptorQ v0.1.x installed (as libRaptorQ.so) on the system.

libRaptorQ v1.x (as opposed to current stable version 0.1.9) has different API and will not work with this module.

Using pip is the best way:

% pip install libraptorq

If you don't have it, use:

% easy_install pip
% pip install libraptorq

Alternatively (see also pip2014.com and pip install guide):

% curl https://raw.github.com/pypa/pip/master/contrib/get-pip.py | python2
% pip install libraptorq

Or, if you absolutely must:

% easy_install libraptorq

But, you really shouldn't do that.

Current-git version can be installed like this:

% pip install 'git+https://github.com/mk-fg/python-libraptorq.git#egg=libraptorq'

Note that to install stuff in system-wide PATH and site-packages, elevated privileges are often required. Use "install --user", ~/.pydistutils.cfg or virtualenv to do unprivileged installs into custom paths.

Alternatively, ./rq tool can be run right from the checkout tree without any installation, if that's the only thing you need there.

Random Notes

  • See github-issue-1 for more info on what happens when encoding parameters (such as symbol_size and max_memory) are specified carelessly, and why command-line interface of this module does not have defaults for these.
  • libRaptorQ is currently used via CFFI in "ABI Mode" to avoid any extra hassle with compilation and the need for compiler, see CFFI docs on the subject for more info on what it means.
  • When decoding, libRaptorQ can raise errors for add_symbol() calls, when source block is already decoded and that extra symbol is not needed.
  • libRaptorQ allows to specify "rq_type" parameter for internal data alignment size (C++ iterator element), which is hard-coded to RQ_ENC_32/RQ_DEC_32 in the module, for simplicity.
  • Lack of Python 3.X compatibility is due to me not using it at all (yet?), so don't need it, have nothing against it in principle.

python-libraptorq's People

Contributors

mk-fg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-libraptorq's Issues

Unable to decode file (checksum failed) when symbol size = 4

Hi,
I used this library to encode a file (size 4336 B) as follows:
./rq --debug encode -s 4 -m 1000000 -n 0.5 infile outfile
The log is as shown below:
2019-03-19 22:21:44 :: DEBUG :: Initialized RQEncoder (0.061s)...
2019-03-19 22:24:19 :: DEBUG :: Precomputed blocks (2.104s)...
2019-03-19 22:24:19 :: DEBUG :: Finished encoding symbols (1 blocks, 0.024s)...
2019-03-19 22:24:19 :: DEBUG :: Closed RQEncoder (0.001s)...
2019-03-19 22:24:19 :: DEBUG :: Encoded 4,336 B into 1,626 symbols (needed: >1,084, repair rate: 50%), 0 dropped (0%), 1,626 left in output (6,504 B without ids)

Now when I try to decode the outfile (without any dropout), using the following command:
./rq --debug decode outfile decodedfile
I get the following output:
2019-03-19 22:26:26 :: DEBUG :: Initialized RQDecoder (0.060s)...
2019-03-19 22:26:26 :: DEBUG :: Decoded enough symbols to recover data (0.026s)...
2019-03-19 22:26:26 :: DEBUG :: Closed RQDecoder (0.001s)...
2019-03-19 22:26:26 :: DEBUG :: Decoded 4,336 B of data from 1,084 processed symbols (4,336 B without ids, symbols total: 1,626)
2019-03-19 22:26:26 :: ERROR :: Operation failed - Data checksum (sha256) mismatch

Is this behavior somehow expected (invalid parameters?)? Or is there a bug? I increased the -s parameter to 8 and the problem persists. It seems to work at -s 12. The infile and outfile are attached (with .txt extension since Github wanted an extension).
infile.txt
outfile.txt

execution problem

Hi mk-fg, I just cloned your libraptorq project into my pc. I already installed all needed dependencies but when i try to exec ./rq --debug encode -s16 -m200 --repair-symbols-rate 0.5 --drop-rate 0.3 setup.py setup.py.enc i get into this error message:

Traceback (most recent call last):
File "./rq", line 237, in
if name == 'main': sys.exit(main())
File "./rq", line 221, in main
try: data = encode(opts, data)
File "./rq", line 60, in encode
opts.subsymbol_size, opts.symbol_size, opts.max_memory ) as enc:
File "/usr/local/lib/python2.7/dist-packages/libraptorq/init.py", line 189, in init
self._ctx_init = self._lib.RaptorQ_Enc,
File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 882, in getattr
make_accessor(name)
File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 878, in make_accessor
accessorsname
File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 808, in accessor_function
value = backendlib.load_function(BType, name)
AttributeError: function/symbol 'RaptorQ_Enc' not found in library '': python2: undefined symbol: RaptorQ_Enc

So could you help me please to fix this issue, thanks in advance.

Error running rq

Hi

I have installed libraptorq successfully, but get the following library error when running rq on Ubuntu 16.04 (64bit). Is this a known issue? Or I am doing something wrong?

root@ubuntu:~# rq --debug encode --repair-symbols-rate 0.5 --drop-rate 0.3 test.csv test.csv.enc
Traceback (most recent call last):
  File "/usr/local/bin/rq", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/dist-packages/libraptorq/__main__.py", line 116, in main
    opts.min_subsymbol_size, opts.symbol_size, opts.max_memory ) as enc:
  File "/usr/local/lib/python2.7/dist-packages/libraptorq/__init__.py", line 179, in __init__
    super(RQEncoder, self).__init__()
  File "/usr/local/lib/python2.7/dist-packages/libraptorq/__init__.py", line 126, in __init__
    self._lib = self._ffi.dlopen('libRaptorQ.so') # ABI mode for simplicity
  File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 139, in dlopen
    lib, function_cache = _make_ffi_library(self, name, flags)
  File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 769, in _make_ffi_library
    backendlib = _load_backend_lib(backend, libname, flags)
  File "/usr/local/lib/python2.7/dist-packages/cffi/api.py", line 758, in _load_backend_lib
    return backend.load_library(name, flags)
OSError: cannot load library libRaptorQ.so: /usr/local/lib/libRaptorQ.so: undefined symbol: LZ4_decompress_safe_continue

Python 3 Support

Is there any chance of Python 3 support for this module? I'd like to use it in a few projects I'm working on, but Python 2 is unsuitable for the type of code I'm working on.

Sending a JPEG buffer through an Encoding-Decoding Pipeline

I am trying to encode-decode a jpeg file using your framework. I can encode the data, but I can’t decode it back.

Here is what I have done:

  1. Read the data in binary and stored it in a string which kind of looks like this: '����C.. ����'
  2. Fed the data through the encoder as described in the README with the same parameters.
  3. Fed the encoded data back to the decoder (as described in the README) to reconstruct the jpeg.

However, I get an error when I try to create the decoder, i.e. with RQDecoder(otiScheme, otiCommon) as dec:. The error is the following:
ile "/Library/Python/2.7/site-packages/libraptorq/__init__.py", line 145, in open self._ctx = self._ctx_init[0](*self._ctx_init[1]) OverflowError: integer 201595027472 does not fit 'uint32_t’

I understand that this is an overflow, but I don’t know why oti_common is such a large number. Am I missing something? Is jpeg data not allowed?

8e-09 in symbol output

I did a pip install libRaptorQ and then installed libRaptorQ.so version 0.10.0 from source (as you appear to know, 1.0 beta doesn't work). I then ran a test:

echo "Hello World" > /tmp/x
rq encode -m 1000000 -s 12 -n 3 /tmp/x /tmp/x1

The last base64 encoded vector in /tmp/x1 was:

    [
      50331648,
      "UjZ7ewtizgs8e-09"
    ]

Needless to say "8e-09" is not valid base64. Smells like a floating point rounding issue, but it is less clear how this could appear in the base64 output.

cannot load library libRaptorQ.so on Mac Sierra

On my Mac Sierra (10.12.5), I cannot import libraptorq.

It fails with OSError: cannot load library libRaptorQ.so: dlopen(libRaptorQ.so, 2): image not found. Additionally, ctypes.util.find_library() did not manage to locate a library called 'libRaptorQ.so’.

I managed to fix it by renaming libRaptorQ.so to libRaptorQ.dylib on this line:
https://github.com/mk-fg/python-libraptorq/blob/master/libraptorq/__init__.py#L126

Ideally, importing the shared object should be handled in a platform agnostic manner.

function/symbol 'RaptorQ_Enc' not found in library '/usr/local/lib/libRaptorQ.dylib'

When I try to encode a string similar to the wiki, e.g. data = 'some input string' * 500, on Python 2.7, I get the following error:

AttributeError: function/symbol 'RaptorQ_Enc' not found in library '/usr/local/lib/libRaptorQ.dylib': dlsym(0x7f7f76cf57e0, RaptorQ_Enc): symbol not found

I have cloned, built and install libRaptorQ from https://fenrirproject.org/Luker/libRaptorQ/tree/master.

There are no references to RaptorQ_Enc in libRaptorQ source code. In fact the only place where RaptorQ_Enc appears in this project is in the documentation under doc/libRaptorQ.tex

It would be nice if the developers would suggest a fix or a workaround.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.