Memory consumption about kryo HOT 9 CLOSED

esotericsoftware commented on July 19, 2024

Memory consumption

from kryo.

Comments (9)

commented on July 19, 2024

From [email protected] on March 20, 2010 17:43:09

While they have the option to perform random access, most (but not all) serializers
access the buffer sequentially and could indeed be written using
InputStream/OutputStream. However, the first time random access is needed, a
stream-based approach degrades to the current situation where a (possibly large)
buffer is required.

For example, the Compressor class wraps a Serializer to perform compression and
decompression. It must write the length of the compressed data, then the compressed
data. The length is needed first because the compressed object may not be at the root
of the object graph, and upon decompression we need to know how many bytes to
process. For this to be stream-based, the Compressor class would have to do its own
buffering of the output data so it could write the length to the OutputStream before
the data.

As you mentioned, an interface could be passed around instead of ByteBuffer to hide
whether ByteBuffers or streams are used. The question is, would this API change be
worth it? The current approach is ideal if you need a byte array or ByteBuffer in the
end anyway, or if you need to write the length first (eg, when writing to a
stream-based protocol such as TCP). The current approach is less ideal when you have
an arbitrarily large object graph, because you may not know the serialized size
beforehand.

You may find the ObjectBuffer class useful. It provides methods to serialize and
deserialize using byte arrays and streams. It handles the necessary buffering. It can
be given an initial and maximum size. If an operation fails because the buffer is too
small, its size will be doubled (up to the maximum) and the operation retried.

In a multithreaded environment, if an ObjectBuffer per thread is too much memory
(thread count * max size), it may be acceptable to use a thread safe pool of
ObjectBuffers. The pool size would be less than the number of threads. When
necessary, threads would block until an ObjectBuffer becomes available.

To serialize an object graph, the graph is normally going to fit in memory. The
buffer size needed to serialize the object graph is normally going to be less than
the memory used by the Java object representation. These days memory is abundant. I'm
not sure how much trouble it is worth to try to avoid buffering the serialized bytes.

Is it possible that this bug is more about the unfriendliness of the API when the
buffer is too small? Maybe ObjectBuffer should make its ByteBuffer available, so the
same buffer growing functionality would be available to users who need a ByteBuffer
rather than a byte array or stream.

from kryo.

commented on July 19, 2024

From [email protected] on March 20, 2010 20:02:45

i'm not the original poster, but the idea of implementing an auto-growing-buffer would
make the API dramatically easier to use. personally, i can't see the move to a stream
helping in any significant way

from kryo.

commented on July 19, 2024

From [email protected] on March 20, 2010 20:49:13

i was thinking that handling the buffer.grow in the buffer itself might be a good
idea. but did a little bit of reading (and wrote a quick ByteArray delegation).
sounds like the conventional wisdom is that it's a problem (see the note in yellow at
the top of the mina link)

http://mina.apache.org/iobuffer.html
"The main reason why MINA has its own wrapper on top of nio ByteBuffer is to have
extensible buffers. This was a very bad decision"

http://stackoverflow.com/questions/1774651/growing-bytebuffer

i think that their primary complaint is the copying of the backing array on grow.
doesn't seem like too big a deal. but then, i guess i think that i'm mostly going to
be sending relatively small hunks

the recommendation is to use a direct ByteBuffer (you're already doing this in your
introduction) and just set a large size - the OS won't actually allocate memory until
it's needed ...

from kryo.

commented on July 19, 2024

From [email protected] on March 20, 2010 21:16:07

When you allocate a direct ByteBuffer, the contiguous block of memory is claimed at
that time.

The OP has a multithreaded environment where he is serializing potentially large
object graphs, so he will need one buffer per thread which can use up a lot of memory.

ObjectBuffer will grow as needed (if inefficiently). This at least means you don't
have to allocate a huge amount per thread, just in case. However, if each thread
really does need a large buffer, currently either you need a lot of memory on the
machine or you'll have to limit the number of large ObjectBuffers you create.

Everything is simplest when a ByteBuffer is used everywhere. Ideally we can work
around any issues this causes.

from kryo.

commented on July 19, 2024

From [email protected] on March 21, 2010 10:01:11

Hi all!

First of all, thanks for the very quick and complete reply. Second, I'm sorry I
posted this as a defect, as this is more of a comment/enhancement.

The reason why I wanted to have InputStream/OutputStream was that I want to use Kryo
with Spring remoting via HTTP
(http://static.springsource.org/spring/docs/3.0.x/reference/html/remoting.html#remoting-httpinvoker),
and as I have no idea what the size of the serialized objects will be, I need to
potentially allocate a "huge" buffer.

I do realize that as the Spring implementation also required to send correct
Content-Length headers, I expect it to be buffered internally on write anyway, but
having two buffers seems non-ideal, so if I could write direcly to the OutputStream,
then the only buffer (probably a ByteArrayOutputStream) is in the Spring implementation.

I will check out the ObjectBuffer and see how that works out, as that would at least
not allocate much more than I need. I guess I could make mye code a bit "adaptive" as
well, by adding an "ExpandObjectBufferListener" that would tell me that a certain
object type required more memory than expected, so an expansion was done.

Thanks again for your time!

Best regards
Morten

from kryo.

commented on July 19, 2024

From [email protected] on March 24, 2010 18:24:16

Spring makes my eyes hurt. I think it is all the pointy brackets. ;)

Not much I can say about the two buffers. Kryo doesn't really have a solution to
avoid this. If it were really needed, I think we would use a list of ByteBuffers,
similar to PyroNet:
http://code.google.com/p/pyronet/source/browse/trunk/%20pyronet/jawnae/pyronet/util/ByteStream.java
This way we could have a hook to allow you to handle each ByteBuffer as it were
filled. For now though, I'd like to try to get by as simple as possible, using just
ByteBuffer.

FWIW, ObjectBuffer logs a debug message when it is resized. Debug logging can be
enabled with "Log.set(Log.LEVEL_DEBUG);" (assuming you aren't using a hardcoded
logging level, see the minlog project for more info). On a related note, what Kryo is
doing can be observed on a very low level with the TRACE logging level.

Labels: -Type-Defect Type-Enhancement

from kryo.

commented on July 19, 2024

From [email protected] on March 24, 2010 18:24:37

(No comment was entered for this change.)

Status: WontFix

from kryo.

commented on July 19, 2024

From [email protected] on June 04, 2010 14:20:31

a proof of concept implementation of a serializer (and a Kryo subclass to simplify
registration) that wraps the normal serializers, and flushes the ByteBuffer as
needed. i'm not wrapping the primitive serializers - so i catch overflows (going to
try intercepting the primitives next)

the flushing method that i use is naive - it just stores the data in a linked list of
byte arrays. but could obviously write to an output stream (for spring) or directly
to the network

i don't do any processing on the read side - just a pure delegate

inspired by nate's comments from the mailing list (8th comment in this thread):
http://groups.google.com/group/kryo-users/browse_thread/thread/f936d2b459638211

> Another solution is to periodically dump the contents of the buffer. This
...
> Interestingly, this could be built without needing changes to Kryo. I am
> curious how well this would work. Anyone feel like implementing it? :)

Attachment: Flush.java

from kryo.

commented on July 19, 2024

From [email protected] on June 05, 2010 14:20:23

another flushing meta serializer implementation - wraps primitives, and never
allocates a larger buffer. doesn't work for (large) strings, but could - probably
best to just provide a FlushStringSerializer

using this slows things down by 10-20% for my simple test case - every primitive
results in an additional call. but the penalty doesn't depend on the size of the
output - there's no reallocation of buffers, no try catch. the try-catch based
flusher above has almost no penalty if the buffer is large enough, but if the initial
buffer is a lot smaller than the size of the largest component, eg an array, the
penalty goes up dramatically (i'm assuming it's O(n^s) but i haven't really checked)

Attachment: Flush.java

from kryo.

Memory consumption about kryo HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent