Giter VIP home page Giter VIP logo

lzma-rs's People

Contributors

bors[bot] avatar ccamel avatar cccs-sadugas avatar chyyran avatar dependabot[bot] avatar dragly avatar gendx avatar ibaryshnikov avatar killercup avatar lucab avatar nilfit avatar qnga avatar shnatsel avatar xvilka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

lzma-rs's Issues

XZ: Unknown filter id 7

Trying to move my library to using this rust implementation, but running into some issues with decompressing some images. I added a print of the bytes for your help if you wish to fix this issue. It might be something non-standard with these pieces of firmware.

--- test_08 stdout ----
Fetching file out.squashfs ...  => Hash mismatch: found debe0986658b276be78c3836779d20464a03d9ba0a40903e6e8e947e434f4d67, expected ce0bfab79550885cb7ced388caaaa9bd454852bf1f9c34789abc498eb6c74df6
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, ea, fe, 01, 80, 80, 08, 21, 01, 0a, 00, cc, 42, c9, 5a, e1, ff, ff, 7f, 62, 5d, 00, 3f, 91, 45, 84, 68]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 9d, 83, 03, 80, 80, 08, 21, 01, 0a, 00, d2, ff, 95, 0d, e1, ff, ff, c1, 95, 5d, 00, 24, 22, 44, 82, 2f]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d4, 84, 03, 80, 80, 08, 21, 01, 0a, 00, 71, be, 5b, 0b, e1, ff, ff, c2, 4c, 5d, 00, 30, 12, 0c, ac, 03]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d2, a4, 03, 80, 80, 08, 21, 01, 0a, 00, 95, 4c, b4, 84, e1, ff, ff, d2, 4a, 5d, 00, 07, 8b, 1c, 3d, f0]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 9b, d9, 03, 80, 80, 08, 21, 01, 0a, 00, 7d, 44, f7, e7, e1, ff, ff, ec, 93, 5d, 00, 45, 9b, 00, 83, 83]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, e7, 9e, 03, 80, 80, 08, 21, 01, 0a, 00, a5, 4f, 71, b1, e1, ff, ff, cf, 5f, 5d, 00, 24, 31, d4, cf, fd]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 04, c1, cd, ff, 01, 80, 80, 08, 07, 00, 21, 01, 0a, 00, 00, 00, a5, 77, a2, 68, e1, ff, ff, 7f, c5, 5d, 00, 20]
thread 'test_08' panicked at 'called `Result::unwrap()` on an `Err` value: XzError("Unknown filter id 7")', src/compressor.rs:103:71
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- test_openwrt_netgear_ex6100v2 stdout ----
File openwrt-22.03.2-ipq40xx-generic-netgear_ex6100v2-squashfs-factory.img has matching hash inside hash list, skipping download
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, a9, 0c, 80, 40, 21, 01, 0c, 00, 00, 00, d8, 5a, ed, b2, e0, 1f, ff, 06, 21, 6c, 00, 01, 80, 1b, e0, 10]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, b6, 0f, 80, 40, 21, 01, 0c, 00, 00, 00, 78, bc, c5, 5d, e0, 1f, ff, 07, ae, 6c, 00, 7f, 80, 3c, 1a, 25]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d2, 0c, 80, 40, 21, 01, 0c, 00, 00, 00, b7, 24, 44, bc, e0, 1f, ff, 06, 4a, 6c, 00, 00, 68, 0a, 10, 2f]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, e0, 08, 80, 40, 21, 01, 0c, 00, 00, 00, be, 27, ae, 8d, e0, 1f, ff, 04, 58, 6c, 00, 00, 68, 82, 3e, fe]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, e1, 0c, 80, 40, 21, 01, 0c, 00, 00, 00, 8c, 1d, 80, 3f, e0, 1f, ff, 06, 59, 6c, 00, 00, 68, 82, 3e, fd]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d4, 07, c0, 21, 21, 01, 0c, 00, 00, 00, 91, 4f, 53, 37, e0, 10, bf, 03, cc, 6c, 00, 52, 00, 3c, 1a, 25]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d4, 07, c0, 21, 21, 01, 0c, 00, 00, 00, 91, 4f, 53, 37, e0, 10, bf, 03, cc, 6c, 00, 52, 00, 3c, 1a, 25]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d1, 20, 80, 40, 21, 01, 0c, 00, 00, 00, 03, ee, b6, 33, e0, 1f, ff, 10, 49, 6c, 00, 1d, 00, 31, 00, 52]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, c5, 01, b0, 03, 21, 01, 0c, 00, 00, 00, e8, 65, 57, 78, e0, 01, af, 00, bd, 6c, 00, 41, 2e, 3c, 40, 0d]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, f1, 08, 80, 40, 21, 01, 0c, 00, 00, 00, 68, cd, c2, 45, e0, 1f, ff, 04, 69, 6c, 00, 1a, 01, bc, 1b, 30]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d1, 20, 80, 40, 21, 01, 0c, 00, 00, 00, 03, ee, b6, 33, e0, 1f, ff, 10, 49, 6c, 00, 1d, 00, 31, 00, 52]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 97, 1b, 80, 40, 21, 01, 0c, 00, 00, 00, eb, c7, 76, 0c, e0, 1f, ff, 0d, 8f, 6c, 00, 00, 04, ff, f2, c3]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, b9, 15, 80, 40, 21, 01, 0c, 00, 00, 00, 1a, 2f, 20, 95, e0, 1f, ff, 0a, b1, 6c, 00, 33, 7f, fc, 40, 00]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, db, 13, e9, 26, 21, 01, 0c, 00, 00, 00, 92, 40, ea, 14, e0, 13, 68, 09, d3, 6c, 00, 00, 68, 9e, 60, af]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 04, c1, a2, 8d, 05, 98, eb, 0f, 07, 00, 21, 01, 0c, 00, 00, 00, d6, 56, 7d, 17, e2, 91, 75, ef, ff, 6c, 00, 11]
thread 'test_openwrt_netgear_ex6100v2' panicked at 'called `Result::unwrap()` on an `Err` value: XzError("Unknown filter id 7")', src/compressor.rs:103:71

---- test_tplink_ax1800 stdout ----
File img-1571203182_vol-ubi_rootfs.ubifs has matching hash inside hash list, skipping download
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, f7, 0e, 80, 40, 21, 01, 02, 00, 00, 00, 96, b3, 19, db, e0, 1f, ff, 07, 6f, 6c, 00, 01, 00, 1b, e0, 10]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 88, 0e, 80, 40, 21, 01, 02, 00, 00, 00, 83, 6d, 5b, dc, e0, 1f, ff, 07, 00, 6c, 00, 00, 68, a3, b3, 14]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, f8, 0f, 80, 40, 21, 01, 02, 00, 00, 00, 58, fc, 69, 3d, e0, 1f, ff, 07, f0, 6c, 00, 17, 9a, 49, c6, 93]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, ef, 11, 80, 40, 21, 01, 02, 00, 00, 00, 2a, ff, 19, 9c, e0, 1f, ff, 08, e7, 6c, 00, 00, 68, 36, 3b, 87]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, b6, 10, 80, 40, 21, 01, 02, 00, 00, 00, eb, 46, 63, ce, e0, 1f, ff, 08, 2e, 6c, 00, 00, 69, 86, 3a, ee]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, ba, 0f, 80, 40, 21, 01, 02, 00, 00, 00, c5, 2b, 26, a7, e0, 1f, ff, 07, b2, 6c, 00, 7f, 80, 3c, 1a, 22]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 84, 10, 80, 40, 21, 01, 02, 00, 00, 00, ee, 14, 65, a2, e0, 1f, ff, 07, fc, 6c, 00, 00, 62, 33, d1, c6]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, e0, 13, 80, 40, 21, 01, 02, 00, 00, 00, 21, 8c, e4, 43, e0, 1f, ff, 09, d8, 6c, 00, 00, 68, 2c, 21, 4b]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, e9, 10, 80, 40, 21, 01, 02, 00, 00, 00, 2e, 9b, 7c, 86, e0, 1f, ff, 08, 61, 6c, 00, 00, 68, 08, b7, 8b]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, a2, 11, 80, 40, 21, 01, 02, 00, 00, 00, 3a, 73, 5d, f7, e0, 1f, ff, 08, 9a, 6c, 00, 01, 80, 1b, e0, 10]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 83, 0c, 80, 40, 21, 01, 02, 00, 00, 00, f2, be, 4d, 0a, e0, 1f, ff, 05, fb, 6c, 00, 00, 67, fe, e0, 12]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, c3, 0c, 80, 40, 21, 01, 02, 00, 00, 00, 52, b9, f7, 94, e0, 1f, ff, 06, 3b, 6c, 00, 00, 67, fe, e0, 12]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d9, 0c, 80, 40, 21, 01, 02, 00, 00, 00, 73, a8, 7b, a4, e0, 1f, ff, 06, 51, 6c, 00, 00, 80, 2e, a3, b5]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, c6, 09, fb, 29, 21, 01, 02, 00, 00, 00, a9, b3, 5d, d7, e0, 14, fa, 04, be, 6c, 00, 05, 80, 30, d4, 90]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, c6, 09, fb, 29, 21, 01, 02, 00, 00, 00, a9, b3, 5d, d7, e0, 14, fa, 04, be, 6c, 00, 05, 80, 30, d4, 90]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 99, 22, 80, 40, 21, 01, 02, 00, 00, 00, e2, f6, f2, 70, e0, 1f, ff, 11, 11, 6c, 00, 00, 6a, 9e, 95, af]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, cd, 07, a0, 15, 21, 01, 02, 00, 00, 00, 7a, b3, f0, 80, e0, 0a, 9f, 03, c5, 6c, 00, 37, 00, 31, 18, 81]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, a8, 0f, 80, 40, 21, 01, 02, 00, 00, 00, 10, 7a, 7d, 84, e0, 1f, ff, 07, a0, 6c, 00, 7f, 05, 14, 42, 80]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 99, 22, 80, 40, 21, 01, 02, 00, 00, 00, e2, f6, f2, 70, e0, 1f, ff, 11, 11, 6c, 00, 00, 6a, 9e, 95, af]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, b5, 1f, 80, 40, 21, 01, 02, 00, 00, 00, 39, 32, ed, fa, e0, 1f, ff, 0f, ad, 6c, 00, 37, 9b, 88, ca, c0]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d9, 1e, 80, 40, 21, 01, 02, 00, 00, 00, 84, c2, 4d, 26, e0, 1f, ff, 0f, 51, 6c, 00, 21, 90, 46, 63, 01]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, bf, 24, 80, 40, 21, 01, 02, 00, 00, 00, ff, fc, ab, 41, e0, 1f, ff, 12, 37, 6c, 00, 04, 80, 09, e7, 03]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 96, 1e, 80, 40, 21, 01, 02, 00, 00, 00, a9, 9e, fc, 49, e0, 1f, ff, 0f, 0e, 6c, 00, 3a, 24, 3e, 61, e0]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, ad, 21, 80, 40, 21, 01, 02, 00, 00, 00, 65, e8, 67, 28, e0, 1f, ff, 10, a5, 6c, 00, 18, 8c, 2c, b4, 00]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 81, 21, 80, 40, 21, 01, 02, 00, 00, 00, 3b, 0b, 06, 7d, e0, 1f, ff, 10, 79, 6c, 00, 37, 19, c9, ea, 30]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, bf, 12, 80, 40, 21, 01, 02, 00, 00, 00, a7, 45, 80, 1c, e0, 1f, ff, 09, 37, 6c, 00, 20, 9d, 4a, 86, f2]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, e7, 0b, d0, 15, 21, 01, 02, 00, 00, 00, e7, 42, eb, 92, e0, 0a, cf, 05, df, 6c, 00, 00, 00, 7f, fe, 90]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, 8b, 2f, 80, 80, 10, 21, 01, 0c, 00, 00, 6d, 72, b3, 7b, e3, ff, ff, 17, 83, 6c, 00, 00, 7f, fe, 97, fe]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 03, c0, d8, 2d, 80, 80, 10, 21, 01, 0c, 00, 00, a0, 67, 66, 07, e3, ff, ff, 16, d0, 6c, 00, 00, 7f, fe, 97, fe]
[fd, 37, 7a, 58, 5a, 00, 00, 01, 69, 22, de, 36, 04, c1, d0, e0, 06, 80, 80, 10, 07, 00, 21, 01, 0c, 00, 00, 00, 47, 1e, 76, 6c, e2, 45, 63, ef, fe, 6c, 00, 3f]
thread 'test_tplink_ax1800' panicked at 'called `Result::unwrap()` on an `Err` value: XzError("Unknown filter id 7")', src/compressor.rs:103:71

issue: wcampbell0x2a/backhand#95
branch: https://github.com/wcampbell0x2a/backhand/tree/use-lzma-rs

API Suggestion: Add encoder and decoder structs

In the compression decompression ecosystem, the usual pattern I see is writing an encoder and decoder structs which implement io::Read and io::Write. While this library's API is perfectly usable, the difference in API makes it tough to swap out an existing library with this one.

I have attached some links for your reference.

References

Limit memory use for huge dictionary

The current implementation of LZBuffer allocates the full dictionary size in the constructor, even if the decompressed file is much smaller, which can slow down decompression for small files. Use a dynamic strategy to grow the dictionary on-demand instead, or check the decompressed size in the header (if provided).

Use const-generics

For now, the DecoderState and BitTree structs use memory allocations (in Vec) to store their state, despite having sizes known at compile time. This is in the most part to avoid code duplication.

Once const generics are stable enough, they should be used instead to remove the need for dynamic allocation, and make the whole state struct more compact and contiguous (therefore potentially more cache-friendly).

This should hopefully improve performance a bit.

Travis/Bors integration broken

Bors timed out in the last two pull-requests (#22 and #26). The continuous-integration/travis-ci/pr doesn't show up on the relevant commits. However, some jobs appear in Travis build history (staging branch).

I've had a quick look at the bors docs, but didn't find an obvious mis-configuration.

If anyone has an idea about why Bors could fail, help is welcome!

Add option to tune how the buffer size is allocated.

As mentioned in #22 (comment), there is a trade-off between memory usage and speed.

I suggest adding an option to control the original size of the LZ buffer:

  • DictSize would initialize a buffer of the full dictionary's size right away. That's the behavior before #22.
  • InitialSize(value: usize) would instead initialize it to min(value, dict_size). The behavior after #22 is InitialSize(0).

It remains to be seen whether there would be a performance regression between the code before #22 and using the DictSize option.

Use io::Write for the decompressor

The decompressor currently writes all the data to a Vec<u8>. The io::Write trait should be supported instead. Such implementation requires to add an internal buffer to maintain the LZ dictionary.

Test against the XZ test suite

The XZ utils source distribution at https://tukaani.org/xz/ contains 63 small files exercising various features of the format. The test files are placed in the public domain and there is a description of the expected behavior for each one.

It would be nice to test lzma-rs against those files - it would have caught #32, for example.

Add better encoders

Encoders for LZMA, LZMA2 and .xz files are quite basic for the moment. Encoders that compress well are welcome!

Add configurable memory limit

Add a configurable memory limit to prevent the dictionary size from causing a denial-of-service style attack via memory exhaustion during decompression.

Right now there doesn't seem to be a fixed upper limit on how much the LZBuffer can grow. It will be flushed when the dict_size is reached but even this could be set to a large value.

The implementation should not prevent decompression when the dict_size included in the lzma header exceeds the memlimit. Instead, it should only result in an error when the buffer's actual size exceeds this limit.

An example implementation is available in this port of the LZMA SDK.

Expose decoder/encoder primitives as public

I maintain a Rust implementation of MAME's Compressed Hunks of Data format, which is essentially chunks of data compressed using various compression algorithms, LZMA included. One of the quirks of this format is that because they use a very old LZMA 19.0, the encoding parameters are not saved into the output stream, and therefore every CHD decoder has to essentially mimic the defaults of LZMA 19.0.

Thankfully lzma-rs allows this but since the encode and decode modules aren't public, I had to fork and vendor lzma-rs to access those primitives like LzmaParams in my crate. Additionally I added LzmaParams::new to construct an instance manually for this purpose.

Ideally I would like to just be able to use mainline lzma-rs. Since decoder is mostly documented already, would it be feasible to expose those primitives as public without much work, possibly behind a feature flag?

New release

@gendx hello! Thank you for the library. Do you want to make new release? I need raw_decoder feature for my project)

Add tests against reference libraries

To check that the implementation is correct, there should be tests against reference libraries such as https://crates.io/crates/lzma-sys, for example:

  • compressing with one implementation and decompressing with the other,
  • decompressing with both implementations and checking that the result (or error) is the same.

Check endianness of headers

Check whether the dictionary size and decompressed length headers use little-endian or big-endian encoding. The decompressed length is often replaced by a placeholder 0xFFFF_FFFF_FFFF_FFFF (in which case an end-of-stream marker is required instead), which makes it more difficult to test.

Extraction fails with `Unknown filter id 8`

Test code

fn main() {
    let file = std::fs::File::open("/tmp/dump.xz").unwrap();
    lzma_rs::xz_decompress(&mut std::io::BufReader::new(file), &mut std::io::sink()).unwrap();
}

Output of above program

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: XzError("Unknown filter id 8")', src/main.rs:5:86
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Output of xz -l dump.xz

Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1       1    683.7 KiB  2,048.0 KiB  0.334  None    dum

Link to file (please extract the dump.zip to get xz file): dump.zip

support streaming read

Currently, a blocking function is provided by the library that reads from io::BufRead and writes to io::Write. This enforces the user of the library to read all contents into memory, or into a file.

Sometimes, however, it is only needed to traverse the data, but not have it all at once.

Such a thing could be achieved by having a function that, given io::Read, gives something that implements io::Read as well. This way, you can progressively read compressed or decompressed stream, while the library will internally read the underlying stream. This is how xz2 crate works, for example, see the function signature of xz2::read::XzDecoder::new. This also looks very flexible and intuitive as well: decompressor starts to act like a "pipe" (in unix terminology), rather than something that writes.

Support of it in lzma-rs would be very nice I think.
Personally, I'm raising the issue because I wanted to try this library in rua https://github.com/vn971/rua Here I am using an intermediate layer of decompression for another function that accepts Read https://github.com/vn971/rua/blob/master/src/tar_check.rs#L26 (however, the underlying library xz2 is not pure Rust, but uses bindings)

Thoughts?

Seemingly valid lzma files result in "Corrupted range coding"

Processing some files result in an

LZMAError("Corrupted range coding")

even though the file is decoded fine by unlzma from XZ utils.

It seems like the assumption that self.code should not equal self.range in RangeDecoder::get_bit might be wrong:

if self.code == self.range {

I have attached a file that reproduces the issue:

bad-random-data.tar.gz (ironically I had to compress it as a .tar.gz to be allowed to upload it to GitHub ๐Ÿ™‚ )

You can verify that it is successfully converted by XZ Utils using

unlzma -k -f bad-random-data.lzma

I will create a PR with a suggested fix that simply drops the error and changes the definition of bit in that function.

Also, thanks for creating this library!

Make debug/tracing optional

The tracing macros such as info!, debug! or trace! from the log crate are not compiled away in release mode, because tracing can be enabled at runtime (via env_logger).

In principle, tracing can be disabled at compile time by enabling some features in the log crate (https://docs.rs/log/0.4.8/log/#compile-time-filters). However, enabling a feature for a crate affects all code that also depend on this crate within a build. So if a binary depends on lzma-rs and wants to log something else, then it cannot optimize away logging in lzma-rs via log's features.

A better solution would be to have a feature in lzma-rs to enable logging (as it is mostly useful for developing lzma-rs). By default, this logging feature will be off so that users of lzma-rs see a performance improvement in general.

Problems when porting lzma code Java -> Rust

Hey, I'm porting a pretty old application which utilizes 7zip and AES. Thing is, for AES to work the buffer needs to be of length dividable by 16. This is a problem, because adding trailing zeroes (like the old java implementation did) crashes the program with Found end-of-stream marker but more bytes are available error. Handling the error doesn't help, because output stream is empty. Would it be a problem to add support for such behaviour? My project really depends on it :)

Publish some benchmarks on `README.md`

Hello!

Would it be possible for you to run some benchmarks against the C wrapper alternatives and post them in the readme or somewhere in this repo? It seems fairly comperableish from my quick look but even if the benchmarks wind up being slower, it wouldn't be bad to see roughly how much room for improvement is left.

Enable no_std support

In principle, lzma-rs doesn't have a long list of dependencies, and implements the core LZMA algorithm purely in Rust. Therefore, it could be made compatible with no_std build targets (at least when none of the dev dependencies/features are activated).

.lz (lzip)

Is it possible to decompress .lz (lzip) files?

Reach 100% coverage.

The current coverage as reported by llvm-cov is 87% (Codecov).

Reaching 100% coverage is a blocker to stabilizing the crate.

lzma2_compress and zx_compress do not compress input

I was comparing compression methods to figure out what to choose for my project and noticed some strange results:

postcard: 14880

postcard deflate: 3499
postcard brotli: 2792
postcard lz4: 4821
postcard lzma: 8496
postcard lzma2: 14884
postcard xz: 14932

In this snippet, lzma, lzma2 and xz are done using lzma_compress, lzma2_compress and xz_compress respectively. It seemed weird to me that lzma2 and xz are so poor, so I stepped through the code and noticed that it's not applying compression at all for lzma2 and xz:

/// Compress data with LZMA2 and default [`Options`](compress/struct.Options.html).
pub fn lzma2_compress<R: io::BufRead, W: io::Write>(
    input: &mut R,
    output: &mut W,
) -> io::Result<()> {
    encode::lzma2::encode_stream(input, output)
}

// ...

pub fn encode_stream<R, W>(input: &mut R, output: &mut W) -> io::Result<()>
where
    R: io::BufRead,
    W: io::Write,
{
    let mut buf = vec![0u8; 0x10000];
    loop {
        let n = input.read(&mut buf)?;
        if n == 0 {
            // status = EOF
            output.write_u8(0)?;
            break;
        }

        // status = uncompressed reset dict
        output.write_u8(1)?;
        // unpacked size
        output.write_u16::<BigEndian>((n - 1) as u16)?;
        // contents
        output.write_all(&buf[..n])?;
    }
    Ok(())
}

This sounds like either a bug, or something that should be documented better. As I understand from the readme file, lzmr-rs is only a decoder which is confusing since these compression functions are present.

Support multi-stream files

A new stream is allowed to start after the previous stream ends, according to the xz spec. I'm not sure what the point of that is, but I'm trying to decode Packages.xz files (part of the Debian package repository structure) and for some reason they have an extra zero-length stream at the end of file:

00150620  e9 8d 27 96 8d 26 27 1c  00 00 00 00 55 59 08 b9  |..'..&'.....UY..|
00150630  33 3c 6c 5d 00 01 a6 8c  54 ff 8b 95 03 00 00 00  |3<l]....T.......|
00150640  fb c4 02 a2 14 17 3b 30  03 00 00 00 00 04 59 5a  |......;0......YZ|
00150650  fd 37 7a 58 5a 00 00 04  e6 d6 b4 46 00 00 00 00  |.7zXZ......F....|
00150660  1c df 44 21 1f b6 f3 7d  01 00 00 00 00 04 59 5a  |..D!...}......YZ|
00150670

If I remove the last 16 bytes the file decompresses correctly, but otherwise I get an error.

Add a test encoder taking LZ matches as input

The current dumb encoder only generates a sequence of byte literals to be encoded one by one. This makes it hard to unit test decoding of LZ matches (distance + length).

Although finding matches in real inputs is a non-trivial task (#9), there could be some "synthetic" encoder taking as input a sequence of already prepared matches.

Something like the following.

enum LZMAElement {
    Literal(u8),
    Match(usize, usize),
}

fn compress(input: impl Iterator<Item=LZMAElement>) {
    /* TODO */
}

Provide features to import parts of the crate

For now, dependents of lzma-rs import all of LZMA, LZMA2 and XZ, even if they only need LZMA for example. This could lead to unnecessary dependencies if for example SHA-256 checksums are added for the XZ layer (#32).

In the next breaking release, the code could be split with Cargo features:

  • by default, lzma is available,
  • the lzma2 feature enables lzma2 on top of it,
  • the xz feature enables all of lzma, lzma2 and xz,
  • the sha256 feature enables support for SHA-256 checksums (instead of returning an Err) - only relevant for xz.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.