As of 2.7.0rc1, cramjam seems to be incompatible with memoryview and PickleBuffer obje

Like this? <div class="highlight highlight-source-rust notranslate position-relati

Will the suggested numpy.frombuffer work for you for now? </blockquot

Or <div class="highlight highlight-source-python notranslate position-relative ove

Was interested and prototype'd it this morning: <div class="highlight highlight-so

Support for memoryview and PickleBuffer about cramjam HOT 17 CLOSED

crusaderky commented on June 1, 2024

Support for memoryview and PickleBuffer

from cramjam.

Comments (17)

martindurant commented on June 1, 2024 1

Like this?

use pyo3::buffer::PyBuffer;

        // value: &PyAny
        let buf: PyBuffer<u8> = value.extract().unwrap(); // expose possible error
        let rustbuf: &[u8] = unsafe {
            slice::from_raw_parts(buf.buf_ptr() as *const u8, buf.len_bytes())
        };

from cramjam.

crusaderky commented on June 1, 2024 1

Will the suggested numpy.frombuffer work for you for now?

Yes, the hack works. Didn't performance-test it but it should be negligible.

from cramjam.

crusaderky commented on June 1, 2024 1

m = memoryview(a).cast("B")
cramjam.lz4.compress_block(m)

All other compression libraries don't have this caveat and are happy to ingest any PickleBuffer; could you fix it? (no rush)

from cramjam.

milesgranger commented on June 1, 2024

Can leverage the buffer protocol to make zero copy into numpy and then pass that; would this be suitable? (same goes for PickleBuffer)

Prefer to steer clear of implementing support for an arbitrary amount of types, especially if those types already implement the buffer protocol.

>>> import numpy as np
>>> data = memoryview(b'data')
>>> buf = np.frombuffer(data, dtype=np.uint8)
>>> cramjam.lz4.compress(buf)
cramjam.Buffer(len=23)

from cramjam.

milesgranger commented on June 1, 2024

In retrospect, can likely add a generic PyObject variant and at runtime poke at it to see if it implements the buffer protocol and get the buffer from there. Suspect it may be slightly slower, but would be a good general solution for the random objects implementing it.

from cramjam.

crusaderky commented on June 1, 2024

Suspect it may be slightly slower

I seriously doubt the slowdown will be noticeable as long as the compression itself deals with 10kiB+

from cramjam.

crusaderky commented on June 1, 2024

Prefer to steer clear of implementing support for an arbitrary amount of types

I agree that the individual types should not be explicitly implemented (numpy shouldn't either!)

from cramjam.

milesgranger commented on June 1, 2024

Will the suggested numpy.frombuffer work for you for now?

from cramjam.

milesgranger commented on June 1, 2024

Was interested and prototype'd it this morning:

In [1]: import cramjam

In [2]: data = memoryview(b'data')

In [3]: cramjam.lz4.compress(data)
Out[3]: cramjam.Buffer(len=23)

In [4]: import array

In [5]: out = array.array('B', list(range(23)))

In [6]: out
Out[6]: array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22])

In [7]: cramjam.lz4.compress_into(data, out)
Out[7]: 23

In [8]: out
Out[8]: array('B', [4, 34, 77, 24, 68, 64, 94, 4, 0, 0, 128, 100, 97, 116, 97, 0, 0, 0, 0, 53, 138, 34, 33])

Guess a proper implementation would take a day or two. May take some time for me to get around to in my free time; hopefully the np.frombuffer is sufficient for you in the meantime.

from cramjam.

milesgranger commented on June 1, 2024

Roughly, basically how the prototype goes. Probably would need to wrap it similar to PyBytes/PyByteArray since all variants in BytesType have Read/Write implemented and would need to maintain some positional state value between the read/write call(s). PyBuffer has the handy checks for thing slike readonly but as_slice gives &[ReadOnlyCell<T>] which is less ergonomic than getting the buffer directly like you did here.

Think whatever wrapper it is, BytesType(PythonBuffer), for example, PythonBuffer ought to have TryFrom<PyBuffer<u8>> which checks for c_contiguous, dimensions == 1, readonly is appropriate for the context (ie, needs writable for _into calls) and maybe some other invariants. If all that checks out, use the underlying buffer directly from there.

from cramjam.

milesgranger commented on June 1, 2024

Can try out 2.7.0rc2 for the buffer updates done in #100

from cramjam.

crusaderky commented on June 1, 2024

It doesn't seem to work

>>> cramjam.__version__
'2.7.0-rc2'
>>> cramjam.lz4.compress_block(bytearray(b"123"))
cramjam.Buffer<len=8>
>>> cramjam.lz4.compress_block(memoryview(b"123"))
cramjam.Buffer<len=8>
>>> cramjam.lz4.compress_block(numpy.ones(10))
TypeError: argument 'data': failed to extract enum BytesType ('Buffer | File | pybuffer')
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'ndarray' object cannot be converted to 'Buffer'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'ndarray' object cannot be converted to 'File'
- variant PyBuffer (pybuffer): TypeError: failed to extract field BytesType::PyBuffer.0, caused by BufferError: buffer contents are not compatible with u8
>>> cramjam.lz4.compress_block(memoryview(numpy.ones(10)))
TypeError: argument 'data': failed to extract enum BytesType ('Buffer | File | pybuffer')
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'memoryview' object cannot be converted to 'Buffer'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'memoryview' object cannot be converted to 'File'
- variant PyBuffer (pybuffer): TypeError: failed to extract field BytesType::PyBuffer.0, caused by BufferError: buffer contents are not compatible with u8

from cramjam.

milesgranger commented on June 1, 2024

Needs to be bytes

>>> np.ones(10).dtype
dtype('float64')
>>> cramjam.lz4.compress_block(np.ones(10, dtype=np.uint8))  # or np.ones(10).tobytes()
cramjam.Buffer<len=15>

from cramjam.

milesgranger commented on June 1, 2024

Or better yet np.ones(10).view(np.uint8) for no copies.

from cramjam.

milesgranger commented on June 1, 2024

All other compression libraries don't have this caveat and are happy to ingest any PickleBuffer; could you fix it?

I agree, this is not great.

I'd like to get the bytes view of buffers directly which can done, but the current implementation of PyBuffer<T> requires a type, in this case u8, and we don't actually need that as we can always just get the bytes view. So would 'only' need a refactor to use the cffi more directly to avoid using PyBuffer<T> at all.

from cramjam.

milesgranger commented on June 1, 2024

@crusaderky v2.7.0rc3 ought to work for you.

from cramjam.

crusaderky commented on June 1, 2024

Yep works great! 👍

from cramjam.

Support for memoryview and PickleBuffer about cramjam HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent