Giter VIP home page Giter VIP logo

deflate-rs's Introduction

deflate-rs

Crates.ioDocs

An implementation of a DEFLATE encoder in pure Rust. Not a direct port, but does take some inspiration from zlib, miniz and zopfli. The API is based on the one in the flate2 crate that contains bindings, zlib miniz_oxide, and miniz.

Deflate encoding with and without zlib and gzip metadata (zlib dictionaries are not supported) is supported. No unsafe code is used.

Encoding in gzip format requires enabling the 'gzip' feature.

This library is now mostly in maintenance mode, focus being on the Rust backend of flate2 instead.

The minimum required Rust version is 1.32.0 due to use of library functions for endinaness conversion (unit tests requires a newer version).

Usage:

Simple compression function:

use deflate::deflate_bytes;

let data = b"Some data";
let compressed = deflate_bytes(&data);

Using a writer:

use std::io::Write;

use deflate::Compression;
use deflate::write::ZlibEncoder;

let data = b"This is some test data";
let mut encoder = ZlibEncoder::new(Vec::new(), Compression::Default);
encoder.write_all(data).unwrap();
let compressed_data = encoder.finish().unwrap();

Other deflate/zlib Rust projects from various people

  • flate2 FLATE, Gzip, and Zlib bindings for Rust - can use miniz_oxide for a pure Rust implementation.
  • Zopfli in Rust Rust port of zopfli
  • inflate DEFLATE decoder implemented in Rust
  • miniz-oxide Port of miniz to Rust.
  • libflate Another DEFLATE/Zlib/Gzip encoder and decoder written in Rust. (Only does some very light compression).

License

deflate is distributed under the terms of both the MIT and Apache 2.0 licences.

bitstream.rs is © @nwin and was released under both MIT and Apache 2.0

Some code in length_encode.rs has been ported from the miniz library, which is public domain.

The test data (tests/pg11.txt) is borrowed from Project Gutenberg and is available under public domain, or the Project Gutenberg Licence

deflate-rs's People

Contributors

aschampion avatar atouchet avatar cuviper avatar eclipseo avatar ferjm avatar heroickatora avatar ignatenkobrain avatar johntitor avatar lovasoa avatar mbrubeck avatar oherrala avatar oyvindln avatar razrfalcon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deflate-rs's Issues

Port fast mode from miniz

Miniz has a special fast compression function that's used when max_hash_checks is set to one. It seems to provide something in-between normal compression using hash_checks = 1 and rle mode.

Proper writer behaviour

Write is only supposed to be called one on the wrapped writer in each write call. Currently we call write a fair number of times for each call to {deflate/zlibwriter}::write. In addition to being in violation with the trait, the current implementation assumes that the writer will write all bytes on each write call, which is wrong and can cause compression to fail. We also shouldn't solve this by using write_all internally, see rust-lang/flate2-rs#92

Panic on mips64 / mips64el

When using deflate on a mips64 / mips64el target, the deflate crate panics with the following assertion:

---- parse::llanfair stdout ----
	thread 'parse::llanfair' panicked at 'The generated length codes were not valid!', /cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.16/src/length_encode.rs:393:8
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Travis CI Job

Remove the flate2 dev-dependency

flate2 is listed as a dev-dependency. Unfortunately, there's no way to have optional dev-dependencies. flate2 has a lot of dependencies of its own and it would be great to avoid having to download and build those dependencies during out Docker builds.

I understand that the flate2 dev-dependency is used for comparative benchmarks. It would be great if the benchmarks could be moved to a separate crate (e.g. deflate-bench, similar to crypto-bench) so that projects that don't run the benchmarks don't need to pull in flate2.

Assertion error from fuzzing

Fuzzing (fuzzer code here) triggers an assertion error in this line (looks like an overflow thing):

thread '<unnamed>' panicked at 'assertion failed: `(left == right)` (left: `32767`, right: `0`)', /home/pascal/.cargo/git/checkouts/deflate-rs-44887ade842f84eb/8e1ec1e/src/chained_hash_table.rs:126

You can find the whole log here: https://gist.github.com/killercup/f117fd4a55ba3855b74d04acdfaf46d5 (make sure to look at the crash file in raw mode; it's encoded as a raw Rust byte string).

Implement further ZLIB flush modes.

Using flush on the writer currently ends the compression stream and writes a trailer (without resetting the encoder.) The writers in flate2 calls miniz/zlib with SYNC_FLUSH, which outputs the current pending data and adds an empty block at the end. This is probably the behaviour we should emulate.

Encoding huffman length values sometimes missing trailing zero.

Encoding huffman length values misses a trailing zero value if the last few length values before it results in outputting the code for repeating the previous length value. If this happens, decoding the data will result in garbage. As this can only happen with data that ends up producing no distance values as any subsequent zeroes are ignored when encoding, it's exceedingly rare for this to actually occur.

A fix is incoming.

v0.8.5 generates invalid PNGs

I'm opening this issue because updating deflate from 0.8.4 to 0.8.5 in dezoomify-rs makes the tests fail. deflate is an indirect dependency of this project, used to generate and read PNG files.

The following image is a PNG created with png v0.16.6 and deflate v0.8.5 which cannot be opened with the same libraries :

err

The error returned is : CorruptFlateStream.

Running the same code (generating and reading the PNG) with png v0.16.6 and deflate v0.8.4 works without errors.

debug assertion fails in writer.rs

Hi!

Using deflate through the image crate generates a panic in debug mode on line 50 of writer.rs.

thread 'main' panicked at 'assertion failed: (left == right) (left: 60100, right: 32769)', /home/christer/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.8/src/writer.rs:50

Properly implement lazy matching

Right now we only check one byte ahead, but we should check more bytes, and discard matches if they are short with a very long distance to get a similar compression level to zlib and miniz.

Flushing `ZlibEncoder` loops infinitely on some writers

When the underlying writer only accepts very small write request the flush process of a ZlibEncoder appears to try and write the same data infinitely often.

Full reproduction source code
use std::io::{self, Write};

fn main() {
    let _ = deflate::write::ZlibEncoder::new(SmallWriter::new(vec![], 2), deflate::Compression::Fast).flush();
}

struct SmallWriter<W: Write> {
    writer: W,
    small: usize,
}

impl<W: Write> SmallWriter<W> {
    fn new(writer: W, buf_len: usize) -> SmallWriter<W> {
        SmallWriter {
            writer,
            small: buf_len,
        }
    }
}

impl<W: Write> Write for SmallWriter<W> {
    fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
        // Never write more than `small` bytes at a time.
        let small = buf.len().min(self.small);
        self.writer.write(&buf[..small])
    }

    fn flush(&mut self) -> io::Result<()> {
        Ok(())
    }
}

AFL crash

The AFL fuzz I started in #37 found a crash (after 37 days and 2 cycles!) where the decompressed data does not match the input (with CompressionOptions::default()). I'll look into in the next few days and also PR the fuzz binary. In the meantime here's the crash input.
id:000000,sig:06,src:000831,op:havoc,rep:64.zip

gzip

Implement support for compression with a gzip header/trailer.

Performance improvements

The performance is still not great compared to miniz/zlib on files with long runs of the same byte.

EDIT:
See next post.

Profiling reveals that lz77::longest_match and lz77::get_match_length is where most time is spent.

get_match_length is particularly problematic for data where there is a lot of repetitions of one literal that causes a lot of calls to this function. (As there will be a large amount of entries in the hash chain for the 3-byte sequences of this byte.) Currently it uses two zipped iterators to compare the matches, which may not be ideal performance wise. C implementations of deflate seem to be checking multiple bytes at once by casting the bytes to larger data types. I've tested this, but it didn't seem to make a difference.

In the longest_match function, array lookups seems to be the main cause of the slowdown (maybe because further instructions depend on the array value?). If we can find a way to reduce the number of lookups, or length of the hash chains without impacting compression ratio, that would be helpful to improve performance.

For lower compression levels, other compressors simply hard-limit the length hash chains, and further adaptively reduces the hash chain length when there is a decent match.

Panic in `ChainedHashTable::add_hash_value` when encoding

This code panics when compiled in debug mode:

extern crate deflate;

use std::io::Write;
use deflate::CompressionOptions;
use deflate::write::GzEncoder;

fn main() {
    let fp = Vec::new();
    let mut fp = GzEncoder::new( fp, CompressionOptions::default() );

    fp.write( &[0] ).unwrap();
    fp.flush().unwrap();
    fp.write( &[0] ).unwrap();
    fp.write( &[0, 0] ).unwrap();
}
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `0`', /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/chained_hash_table.rs:141:9
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::print
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
   2: std::panicking::default_hook::{{closure}}
             at libstd/panicking.rs:211
   3: std::panicking::default_hook
             at libstd/panicking.rs:227
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:475
   5: std::panicking::continue_panic_fmt
             at libstd/panicking.rs:390
   6: std::panicking::begin_panic_fmt
             at libstd/panicking.rs:345
   7: deflate::chained_hash_table::ChainedHashTable::add_hash_value
             at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/chained_hash_table.rs:141
   8: deflate::lz77::process_chunk_lazy
             at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/lz77.rs:346
   9: deflate::lz77::process_chunk
             at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/lz77.rs:217
  10: deflate::lz77::lz77_compress_block
             at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/lz77.rs:655
  11: deflate::compress::compress_data_dynamic_n
             at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/compress.rs:130
  12: deflate::writer::compress_until_done
             at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/writer.rs:25
  13: <deflate::writer::DeflateEncoder<W> as std::io::Write>::flush
             at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/writer.rs:137
  14: <deflate::writer::gzip::GzEncoder<W> as std::io::Write>::flush
             at /home/kou/.cargo/registry/src/github.com-1ecc6299db9ec823/deflate-0.7.18/src/writer.rs:454
  15: repro::main
             at src/main.rs:15
  16: std::rt::lang_start::{{closure}}
             at /checkout/src/libstd/rt.rs:74
  17: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:310
  18: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:106
  19: std::rt::lang_start_internal
             at libstd/panicking.rs:289
             at libstd/panic.rs:392
             at libstd/rt.rs:58
  20: std::rt::lang_start
             at /checkout/src/libstd/rt.rs:74
  21: main
  22: __libc_start_main
  23: _start

rustc 1.29.0-nightly (6a1c0637c 2018-07-23)
x86_64-unknown-linux-gnu
deflate 0.7.18

Deal with incompressible data

We want to avoid the compressed stream expanding more than needed when encountering incompressible (high entropy) data. Ideally, stored blocks should be output if compressing a block fails to reduce the size, but we want to do this without having to keep an excessively large input buffer

Remove use of deprecated mem::uninitialized

deprecated

Either use maybeuninit, or see if the current compiler avoids the excessive stack copies so we can avoid unsafe alltogether.

The current usage should be safe, as it's used with a copy type that does not have invalid values, though as stated in the deprecation blog, mem::uninitialized may actually never be completely safe.

deepin linux build error

error: macro undefined: 'assert_ne!'
--> src/writer.rs:19:5
|
19 | assert_ne!(flush_mode, Flush::None);
| ^^^^^^^^^
|
= help: did you mean assert_eq!?

error: aborting due to previous error

error: Could not compile deflate.

thinks

I am a beginner of rust

Add windows CI

Need to set up appveyor, it's important to test on windows as the stack size there is smaller.

Internal assertion fails at images larger than 4800×4800

I recently made use of this library in an effort to make a mandelbrot set, but in my quest to achieve ever higher resolutions I ran into a problem: an internal assertion fails if I exceed 4800×4800 pixels.

You can find the relevant code here: https://github.com/ElectricCoffee/mandelbrot

The error that is generated goes like this:

PS D:\Code\rust\mandelbrot> cargo run
   Compiling num-complex v0.2.4
   Compiling mandelbrot v0.1.0 (D:\Code\rust\mandelbrot)
    Finished dev [unoptimized + debuginfo] target(s) in 4.04s
     Running `target\debug\mandelbrot.exe`
Generating mandelbrot_8000x8000.png.
Generating image data, hold tight...
Flattening...
Writing image data...
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `192008000`,
 right: `192008189`', <::std::macros::panic macros>:5:6
stack backtrace:
   0: backtrace::backtrace::trace_unsynchronized
             at C:\Users\VssAdministrator\.cargo\registry\src\github.com-1ecc6299db9ec823\backtrace-0.3.40\src\backtrace\mod.rs:66
   1: std::sys_common::backtrace::_print_fmt
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\sys_common\backtrace.rs:77
   2: std::sys_common::backtrace::_print::{{impl}}::fmt
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\sys_common\backtrace.rs:59
   3: core::fmt::write
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libcore\fmt\mod.rs:1052
   4: std::io::Write::write_fmt<std::sys::windows::stdio::Stderr>
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\io\mod.rs:1426
   5: std::sys_common::backtrace::_print
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\sys_common\backtrace.rs:62
   6: std::sys_common::backtrace::print
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\sys_common\backtrace.rs:49
   7: std::panicking::default_hook::{{closure}}
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:204
   8: std::panicking::default_hook
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:224
   9: std::panicking::rust_panic_with_hook
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:472
  10: std::panicking::begin_panic_handler
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:380
  11: std::panicking::begin_panic_fmt
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:334
  12: deflate::writer::compress_until_done<alloc::vec::Vec<u8>>
             at <::std::macros::panic macros>:5
  13: deflate::writer::ZlibEncoder<alloc::vec::Vec<u8>>::output_all<alloc::vec::Vec<u8>>
             at C:\Users\Electric Coffee\.cargo\registry\src\github.com-1ecc6299db9ec823\deflate-0.8.3\src\writer.rs:205
  14: deflate::writer::ZlibEncoder<alloc::vec::Vec<u8>>::finish<alloc::vec::Vec<u8>>
             at C:\Users\Electric Coffee\.cargo\registry\src\github.com-1ecc6299db9ec823\deflate-0.8.3\src\writer.rs:212
  15: png::encoder::Writer<std::io::buffered::BufWriter<std::fs::File>>::write_image_data<std::io::buffered::BufWriter<std::fs::File>>
             at C:\Users\Electric Coffee\.cargo\registry\src\github.com-1ecc6299db9ec823\png-0.16.1\src\encoder.rs:172
  16: mandelbrot::main
             at .\src\main.rs:73
  17: std::rt::lang_start::{{closure}}<()>
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\src\libstd\rt.rs:67
  18: std::rt::lang_start_internal::{{closure}}
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\rt.rs:52
  19: std::panicking::try::do_call<closure-0,i32>
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:305
  20: panic_unwind::__rust_maybe_catch_panic
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libpanic_unwind\lib.rs:86
  21: std::panicking::try
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panicking.rs:281
  22: std::panic::catch_unwind
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\panic.rs:394
  23: std::rt::lang_start_internal
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\/src\libstd\rt.rs:51
  24: std::rt::lang_start<()>
             at /rustc/b8cedc00407a4c56a3bda1ed605c6fc166655447\src\libstd\rt.rs:67
  25: main
  26: invoke_main
             at d:\agent\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
  27: __scrt_common_main_seh
             at d:\agent\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
  28: BaseThreadInitThunk
  29: RtlUserThreadStart
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: process didn't exit successfully: `target\debug\mandelbrot.exe` (exit code: 101)
PS D:\Code\rust\mandelbrot>

The above error is for an image 8000×8000 px in size

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.