Giter VIP home page Giter VIP logo

Comments (17)

martindurant avatar martindurant commented on June 1, 2024 1

Like this?

use pyo3::buffer::PyBuffer;

        // value: &PyAny
        let buf: PyBuffer<u8> = value.extract().unwrap(); // expose possible error
        let rustbuf: &[u8] = unsafe {
            slice::from_raw_parts(buf.buf_ptr() as *const u8, buf.len_bytes())
        };

from cramjam.

crusaderky avatar crusaderky commented on June 1, 2024 1

Will the suggested numpy.frombuffer work for you for now?

Yes, the hack works. Didn't performance-test it but it should be negligible.

from cramjam.

crusaderky avatar crusaderky commented on June 1, 2024 1

Or

m = memoryview(a).cast("B")
cramjam.lz4.compress_block(m)

All other compression libraries don't have this caveat and are happy to ingest any PickleBuffer; could you fix it? (no rush)

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

Can leverage the buffer protocol to make zero copy into numpy and then pass that; would this be suitable? (same goes for PickleBuffer)

Prefer to steer clear of implementing support for an arbitrary amount of types, especially if those types already implement the buffer protocol.

>>> import numpy as np
>>> data = memoryview(b'data')
>>> buf = np.frombuffer(data, dtype=np.uint8)
>>> cramjam.lz4.compress(buf)
cramjam.Buffer(len=23)

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

In retrospect, can likely add a generic PyObject variant and at runtime poke at it to see if it implements the buffer protocol and get the buffer from there. Suspect it may be slightly slower, but would be a good general solution for the random objects implementing it.

from cramjam.

crusaderky avatar crusaderky commented on June 1, 2024

Suspect it may be slightly slower

I seriously doubt the slowdown will be noticeable as long as the compression itself deals with 10kiB+

from cramjam.

crusaderky avatar crusaderky commented on June 1, 2024

Prefer to steer clear of implementing support for an arbitrary amount of types

I agree that the individual types should not be explicitly implemented (numpy shouldn't either!)

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

Will the suggested numpy.frombuffer work for you for now?

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

Was interested and prototype'd it this morning:

In [1]: import cramjam

In [2]: data = memoryview(b'data')

In [3]: cramjam.lz4.compress(data)
Out[3]: cramjam.Buffer(len=23)

In [4]: import array

In [5]: out = array.array('B', list(range(23)))

In [6]: out
Out[6]: array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22])

In [7]: cramjam.lz4.compress_into(data, out)
Out[7]: 23

In [8]: out
Out[8]: array('B', [4, 34, 77, 24, 68, 64, 94, 4, 0, 0, 128, 100, 97, 116, 97, 0, 0, 0, 0, 53, 138, 34, 33])

Guess a proper implementation would take a day or two. May take some time for me to get around to in my free time; hopefully the np.frombuffer is sufficient for you in the meantime.

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

Roughly, basically how the prototype goes. Probably would need to wrap it similar to PyBytes/PyByteArray since all variants in BytesType have Read/Write implemented and would need to maintain some positional state value between the read/write call(s). PyBuffer has the handy checks for thing slike readonly but as_slice gives &[ReadOnlyCell<T>] which is less ergonomic than getting the buffer directly like you did here.


Think whatever wrapper it is, BytesType(PythonBuffer), for example, PythonBuffer ought to have TryFrom<PyBuffer<u8>> which checks for c_contiguous, dimensions == 1, readonly is appropriate for the context (ie, needs writable for _into calls) and maybe some other invariants. If all that checks out, use the underlying buffer directly from there.

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

Can try out 2.7.0rc2 for the buffer updates done in #100

from cramjam.

crusaderky avatar crusaderky commented on June 1, 2024

It doesn't seem to work

>>> cramjam.__version__
'2.7.0-rc2'
>>> cramjam.lz4.compress_block(bytearray(b"123"))
cramjam.Buffer<len=8>
>>> cramjam.lz4.compress_block(memoryview(b"123"))
cramjam.Buffer<len=8>
>>> cramjam.lz4.compress_block(numpy.ones(10))
TypeError: argument 'data': failed to extract enum BytesType ('Buffer | File | pybuffer')
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'ndarray' object cannot be converted to 'Buffer'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'ndarray' object cannot be converted to 'File'
- variant PyBuffer (pybuffer): TypeError: failed to extract field BytesType::PyBuffer.0, caused by BufferError: buffer contents are not compatible with u8
>>> cramjam.lz4.compress_block(memoryview(numpy.ones(10)))
TypeError: argument 'data': failed to extract enum BytesType ('Buffer | File | pybuffer')
- variant RustyBuffer (Buffer): TypeError: failed to extract field BytesType::RustyBuffer.0, caused by TypeError: 'memoryview' object cannot be converted to 'Buffer'
- variant RustyFile (File): TypeError: failed to extract field BytesType::RustyFile.0, caused by TypeError: 'memoryview' object cannot be converted to 'File'
- variant PyBuffer (pybuffer): TypeError: failed to extract field BytesType::PyBuffer.0, caused by BufferError: buffer contents are not compatible with u8

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

Needs to be bytes

>>> np.ones(10).dtype
dtype('float64')
>>> cramjam.lz4.compress_block(np.ones(10, dtype=np.uint8))  # or np.ones(10).tobytes()
cramjam.Buffer<len=15>

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

Or better yet np.ones(10).view(np.uint8) for no copies.

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

All other compression libraries don't have this caveat and are happy to ingest any PickleBuffer; could you fix it?

I agree, this is not great.

I'd like to get the bytes view of buffers directly which can done, but the current implementation of PyBuffer<T> requires a type, in this case u8, and we don't actually need that as we can always just get the bytes view. So would 'only' need a refactor to use the cffi more directly to avoid using PyBuffer<T> at all.

from cramjam.

milesgranger avatar milesgranger commented on June 1, 2024

@crusaderky v2.7.0rc3 ought to work for you.

from cramjam.

crusaderky avatar crusaderky commented on June 1, 2024

Yep works great! 👍

from cramjam.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.