pmmp / ext-encoding Goto Github PK

High-performance `ByteBuffer` implementation for PHP, planned for use in a future version of PocketMine-MP

M4 0.74% C++ 44.38% C 11.87% PHP 43.01%

ext-encoding's Issues

Use boost endian library for dealing with byte order conversions

this takes away the hassle of dealing with intrinsics, since the library does it on our behalf, meaning basically free performance.

This is low priority right now due to other more effective ways of improving performance.

Reusable buffers

Currently around 2/3 of the overhead of write* comes from zend_string_init() and/or zend_string_alloc() due to emalloc() of tiny new strings.
These strings are almost immediately discarded in practice - after being appended to a buffer, they are no longer useful.
This means this is a giant waste of CPU time (though, it should be noted that this accounts for less than 1/3rd of the total time in benchmarks due to overhead added by PHP code).

This could be avoided by accepting a string by-reference to write bytes into instead of allocating new strings every time, although it would be better to have some dedicated type which we could reserve() bytes in (like a vector or similar).

This would also enable having large pooled reusable buffers for encoding, which would further improve performance.

Unit tests

The following things should be tested:

to be continued ...

ByteBuffer::__construct() behaviour is always surprising to one use case or the other

If the caller wants to read a ByteBuffer for reading, the offset should be 0, so that the bytes can be read from the beginning.
If the caller wants to write a ByteBuffer, the offset should be placed at the end of the input data, as if writeByteArray() was used.

Possible solutions:

Make the offset parameter mandatory
Make a parameter that indicates whether the offset should be at the beginning or the end (slightly easier to use than a mandatory offset parameter)
Make the constructor private and have ::reader() and ::writer() static factory functions
Split ByteBuffer into a ByteBufferReader and ByteBufferWriter
Separate read offset and write offset (effectively what BinaryStream was implicitly doing)

Inefficient resizing of buffer when buffer length exceeded

When appending to the end of a ByteBuffer, only as many bytes as needed are allocated, which will result in many allocations for large data if the initial buffer size was small.

The standard way to do this is to double the size of the buffer whenever it fills up. This is what, for example, std::vector does in C++.

Sub buffers

A ByteBuffer which can read from a subsection of another ByteBuffer's memory would be useful for reducing allocations.

For example, Minecraft packet batches contain length-prefixed packet buffers. Each of these buffers currently has to be copied to a new BinaryStream via substr() in order to be decoded. This allocation could be avoided if BinaryStream were capable of directly reading the memory of the original stream.

API to get current used length of a buffer

This is a surprising omission, forcing the use of toString() which creates a copy of the buffer.

Missing API equivalents

BinaryStream::feof() - tells whether there are bytes left to read
BinaryStream::getRemaining() - effectively substr(buffer, offset)

pmmp / ext-encoding Goto Github PK

ext-encoding's People

Contributors

Stargazers

Watchers

Forkers

ext-encoding's Issues

Use boost endian library for dealing with byte order conversions

Reusable buffers

Unit tests

ByteBuffer::__construct() behaviour is always surprising to one use case or the other

Inefficient resizing of buffer when buffer length exceeded

Sub buffers

API to get current used length of a buffer

Missing API equivalents

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent