Comments (4)
It seems like we can do that; and this is where the correct use of decompress_len
as mentioned in #35 belongs. When we add this functionality we will have the following scenario:
- User provided
bytearray
(with or without output_len, as we can also usedecompress_len
for the estimate):- We can de/compress then resize the resulting bytearray to the actual size if needed. Super!
- User provided
bytes
- If they also provided
output_len
we're good for de/compression. - It appears
decompress_len
gives an exact answer; so long as that is successful we can do the single allocation for decompression max_compress_len
gives the max compressed size, in this case, they would likely get trailing null bytes back
- If they also provided
and the _into
addition for raw obviously doesn't matter for these points, so all good there.
For the second part, cramjam tries to follow the layout from the Rust crate it uses for snappy. For example snappy.de/compress_raw
uses the de/encoders from the snap::raw module.
from cramjam.
I'll just give a 👍 in here - snappy-raw is the most important one to get fast from parquet's point of view. I'm happy about the naming convention.
from cramjam.
I don't know if we can get de/compress_raw
functions to support output_len
, as the raw de/compression functions there only output a new buffer, unlike the others which can take any writeable object. I can dig into the src later and see if it would be possible even, but suspect the one who wrote it has a good reason as the other portions of the crate do implement reader/writer parameters for framed de/compression.
The PR referenced here does implement the de/compress_raw_into
, so I hope that is good for you in the mean time.
Would also point out that, while I don't know what data sizes you're working with, in the benchmarks the current de/compress_raw
variants are extremely close with python-snappy and even edge it out in a couple of cases.
from cramjam.
de/compress_raw_into
now follows the same API as other variants from #45 , and de/compress_raw
supports output_len as well.
from cramjam.
Related Issues (20)
- Accept buffers with types other than u8
- pyo3_runtime.PanicException: Failed to import NumPy module HOT 2
- blosc? HOT 5
- Publish Python 3.12 wheel? HOT 6
- Any plans to release pypy wheels for windows? HOT 2
- TypeError: 'Buffer' does not support the buffer interface HOT 4
- Python test test_variants_different_dtypes[brotli] sometimes times out HOT 4
- lzma / xz support? HOT 12
- Remove extra zstd-safe dep
- Please coordinate PyPI and crates.io releases if possible HOT 2
- Equality check on values for Buffer
- proc-macro2 1.0.56 doesn't work with versions of rust since July
- cramjam 2.8.1 release on conda? HOT 4
- Python: tests/test_variants.py::test_variants_different_dtypes randomly fail with `hypothesis.errors.FailedHealthCheck: Examples routinely exceeded the max allowable size`
- Support PyPy output of `bytes` and `memoryview` for de/compress_into functions HOT 9
- Feature request: add xxhash for use with LZ4 HOT 4
- Unused `Cargo.lock` files? HOT 2
- blosc2 experimental feedback/tracking issue
- Fix blosc2 compress_into with destination File
- Support for memoryview and PickleBuffer HOT 17
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cramjam.