Comments (17)
Like this?
use pyo3::buffer::PyBuffer;
// value: &PyAny
let buf: PyBuffer<u8> = value.extract().unwrap(); // expose possible error
let rustbuf: &[u8] = unsafe {
slice::from_raw_parts(buf.buf_ptr() as *const u8, buf.len_bytes())
};
from cramjam.
Will the suggested numpy.frombuffer work for you for now?
Yes, the hack works. Didn't performance-test it but it should be negligible.
from cramjam.
Or
m = memoryview(a).cast("B")
cramjam.lz4.compress_block(m)
All other compression libraries don't have this caveat and are happy to ingest any PickleBuffer; could you fix it? (no rush)
from cramjam.
Can leverage the buffer protocol to make zero copy into numpy and then pass that; would this be suitable? (same goes for PickleBuffer
)
Prefer to steer clear of implementing support for an arbitrary amount of types, especially if those types already implement the buffer protocol.
>>> import numpy as np
>>> data = memoryview(b'data')
>>> buf = np.frombuffer(data, dtype=np.uint8)
>>> cramjam.lz4.compress(buf)
cramjam.Buffer(len=23)
from cramjam.
In retrospect, can likely add a generic PyObject
variant and at runtime poke at it to see if it implements the buffer protocol and get the buffer from there. Suspect it may be slightly slower, but would be a good general solution for the random objects implementing it.
from cramjam.
Suspect it may be slightly slower
I seriously doubt the slowdown will be noticeable as long as the compression itself deals with 10kiB+
from cramjam.
Prefer to steer clear of implementing support for an arbitrary amount of types
I agree that the individual types should not be explicitly implemented (numpy shouldn't either!)
from cramjam.
Will the suggested numpy.frombuffer work for you for now?
from cramjam.
Was interested and prototype'd it this morning:
In [1]: import cramjam
In [2]: data = memoryview(b'data')
In [3]: cramjam.lz4.compress(data)
Out[3]: cramjam.Buffer(len=23)
In [4]: import array
In [5]: out = array.array('B', list(range(23)))
In [6]: out
Out[6]: array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22])
In [7]: cramjam.lz4.compress_into(data, out)
Out[7]: 23
In [8]: out
Out[8]: array('B', [4, 34, 77, 24, 68, 64, 94, 4, 0, 0, 128, 100, 97, 116, 97, 0, 0, 0, 0, 53, 138, 34, 33])
Guess a proper implementation would take a day or two. May take some time for me to get around to in my free time; hopefully the np.frombuffer
is sufficient for you in the meantime.
from cramjam.
Roughly, basically how the prototype goes. Probably would need to wrap it similar to PyBytes/PyByteArray since all variants in BytesType
have Read
/Write
implemented and would need to maintain some positional state value between the read/write call(s). PyBuffer
has the handy checks for thing slike readonly
but as_slice
gives &[ReadOnlyCell<T>]
which is less ergonomic than getting the buffer directly like you did here.
Think whatever wrapper it is, BytesType(PythonBuffer)
, for example, PythonBuffer
ought to have TryFrom<PyBuffer<u8>>
which checks for c_contiguous
, dimensions == 1
, readonly
is appropriate for the context (ie, needs writable for _into
calls) and maybe some other invariants. If all that checks out, use the underlying buffer directly from there.
from cramjam.
Can try out 2.7.0rc2
for the buffer updates done in #100
from cramjam.
It doesn't seem to work
>>> cramjam.__version__
'2.7.0-rc2'
>>> cramjam.lz4.compress_block(bytearray(b"123"))
cramjam.Buffer<len=8>
>>> cramjam.lz4.compress_block(memoryview(b"123"))
cramjam.Buffer<len=8>
>>> cramjam.lz4.compress_block(numpy.ones(10))
TypeError: argument 'data': failed to extract enum BytesType ('Buffer | File | pybuffer')
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'ndarray' object cannot be converted to 'Buffer'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'ndarray' object cannot be converted to 'File'
- variant PyBuffer (pybuffer): TypeError: failed to extract field BytesType::PyBuffer.0, caused by BufferError: buffer contents are not compatible with u8
>>> cramjam.lz4.compress_block(memoryview(numpy.ones(10)))
TypeError: argument 'data': failed to extract enum BytesType ('Buffer | File | pybuffer')
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'memoryview' object cannot be converted to 'Buffer'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'memoryview' object cannot be converted to 'File'
- variant PyBuffer (pybuffer): TypeError: failed to extract field BytesType::PyBuffer.0, caused by BufferError: buffer contents are not compatible with u8
from cramjam.
Needs to be bytes
>>> np.ones(10).dtype
dtype('float64')
>>> cramjam.lz4.compress_block(np.ones(10, dtype=np.uint8)) # or np.ones(10).tobytes()
cramjam.Buffer<len=15>
from cramjam.
Or better yet np.ones(10).view(np.uint8)
for no copies.
from cramjam.
All other compression libraries don't have this caveat and are happy to ingest any PickleBuffer; could you fix it?
I agree, this is not great.
I'd like to get the bytes view of buffers directly which can done, but the current implementation of PyBuffer<T>
requires a type, in this case u8
, and we don't actually need that as we can always just get the bytes view. So would 'only' need a refactor to use the cffi more directly to avoid using PyBuffer<T>
at all.
from cramjam.
@crusaderky v2.7.0rc3 ought to work for you.
from cramjam.
Yep works great! 👍
from cramjam.
Related Issues (20)
- Accept buffers with types other than u8
- pyo3_runtime.PanicException: Failed to import NumPy module HOT 2
- blosc? HOT 5
- Publish Python 3.12 wheel? HOT 6
- Any plans to release pypy wheels for windows? HOT 2
- TypeError: 'Buffer' does not support the buffer interface HOT 4
- Python test test_variants_different_dtypes[brotli] sometimes times out HOT 4
- lzma / xz support? HOT 12
- Remove extra zstd-safe dep
- Please coordinate PyPI and crates.io releases if possible HOT 2
- Equality check on values for Buffer
- proc-macro2 1.0.56 doesn't work with versions of rust since July
- cramjam 2.8.1 release on conda? HOT 4
- Python: tests/test_variants.py::test_variants_different_dtypes randomly fail with `hypothesis.errors.FailedHealthCheck: Examples routinely exceeded the max allowable size`
- Support PyPy output of `bytes` and `memoryview` for de/compress_into functions HOT 9
- Feature request: add xxhash for use with LZ4 HOT 4
- Unused `Cargo.lock` files? HOT 2
- blosc2 experimental feedback/tracking issue
- Fix blosc2 compress_into with destination File
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cramjam.