Giter VIP home page Giter VIP logo

beve's Introduction

BEVE - Binary Efficient Versatile Encoding

Version 1.0

High performance, tagged binary data specification like JSON, MessagePack, CBOR, etc. But, designed for higher performance and scientific computing.

See Discussions for polls and active development on the specification.

  • Maps to and from JSON
  • Schema less, fully described, like JSON (can be used in documents)
  • Little endian for maximum performance on modern CPUs
  • Blazingly fast, designed for SIMD
  • Future proof, supports large numerical types (such as 128 bit integers and higher)
  • Designed for scientific computing, supports brain floats, matrices, and complex numbers
  • Simple, designed to be easy to integrate

BEVE is designed to be faster on modern hardware than CBOR, BSON, and MessagePack, but it is also more space efficient for typed arrays.

Performance vs MessagePack

The following table lists the performance increase between BEVE with Glaze versus other libraries and their binary formats.

Test Libraries (vs Glaze) Read (Times Faster) Write (Times Faster)
Test Object msgpack-c (c++) 1.9X 13X
double array msgpack-c (c++) 14X 50X
float array msgpack-c (c++) 29X 81X
uint16_t array msgpack-c (c++) 73X 167X

Performance test code

The table below shows binary message size versus BEVE. A positive value means the binary produced is larger than BEVE.

Test Libraries (vs Glaze) Message Size
Test Object msgpack-c (c++) -3.4%
double array msgpack-c (c++) +12%
float array msgpack-c (c++) +25%
uint16_t array msgpack-c (c++) +50%

Why Tagged Messages?

Flexibility and efficiency

JSON is ubiquitous because it is tagged (has keys), and therefore messages can be sent in part. Furthermore, extending specifications and adding more fields is far easier with tagged messages and unordered mapping. Tags also make the format more human friendly. However, tags are entirely optional, and structs can be serialized as generic arrays.

Endianness

The endianness must be little endian.

File Extension

The standard extension for BEVE files is .beve

Implementations

C++

  • Glaze (supports JSON and BEVE through the same API)

Matlab

Python

Right Most Bit Ordering

The right most bit is denoted as the first bit, or bit of index 0.

Concerning Compression

Note that BEVE is not a compression algorithm. It uses some bit packing to be more space efficient, but strings and numerical values see no compression. This means that BEVE binary is very compressible, like JSON, and it is encouraged to use compression algorithms like LZ4, Zstandard, Brotli, etc. where size is critical.

Compressed Unsigned Integer

A compressed unsigned integer uses the first two bits to denote the number of bytes used to express an integer. The rest of the bits indicate the integer value.

Wherever all caps SIZE is used in the specification, it refers to a size indicator that uses a compressed unsigned integer.

SIZE refers to the count of array members, object members, or bytes in a string. It does not refer to the number of raw bytes except for UTF-8 strings.

# Number of Bytes Integer Value (N)
0 1 N < 64 [2^6]
1 2 N < 16384 [2^14]
2 4 N < 1073741824 [2^30]
3 8 N < 4611686018427387904 [2^62]

Byte Count Indicator

Wherever all caps BYTE COUNT is used, it describes this mapping.

#      Number of bytes
0      1
1      2
2      4
3      8
4      16
5      32
6      64
7      128
...

Header

Every VALUE begins with a byte header. Any unspecified bits must be set to zero.

Wherever all caps HEADER is used, it describes this header.

The first three bits denote types:

0 -> null or boolean                          0b00000'000
1 -> number                                   0b00000'001
2 -> string                                   0b00000'010
3 -> object                                   0b00000'011
4 -> typed array                              0b00000'100
5 -> generic array                            0b00000'101
6 -> extension                                0b00000'110
7 -> reserved                                 0b00000'111

Nomenclature

Wherever DATA is used, it denotes bytes of data without a HEADER.

Wherever VALUE is used, it denotes a binary structure that begins with a HEADER.

Wherever SIZE is used, it refers to a compressed unsigned integer that denotes a count of array members, object members, or bytes in a string.

0 - Null

Null is simply 0

0 - Boolean

The next bit is set to indicate a boolean. The 5th bit is set to denote true or false.

false      0b000'01'000
true       0b000'11'000

1 - Number

The next two bits of the HEADER indicates whether the number is floating point, signed integer, or unsigned integer.

Float point types must conform to the IEEE-754 standard.

0 -> floating point      0b000'00'001
1 -> signed integer      0b000'01'001
2 -> unsigned integer    0b000'10'001

The next three bits of the HEADER are used as the BYTE COUNT.

Note: brain floats use a byte count indicator of 1, even though they use 2 bytes per value. This is used because float8_t is not supported and not typically useful.

See Fixed width integer types for integer specification.

bfloat16_t    0b000'00'001 // brain float
float16_t     0b001'00'001
float32_t     0b010'00'001 // float
float64_t     0b011'00'001 // double
float128_t    0b100'00'001
int8_t        0b000'01'001
int16_t       0b001'01'001
int32_t       0b010'01'001
int64_t       0b011'01'001
int128_t      0b100'01'001
uint8_t       0b000'10'001
uint16_t      0b001'10'001
uint32_t      0b010'10'001
uint64_t      0b011'10'001
uint128_t     0b100'10'001

2 - Strings

Strings must be encoded with UTF-8.

Layout: HEADER | SIZE | DATA

Strings as Object Keys or Typed String Arrays

When strings are used as keys in objects or typed string arrays the HEADER is not included.

Layout: SIZE | DATA

3 - Object

The next two bits of the HEADER indicates the type of key.

0 -> string
1 -> signed integer
2 -> unsigned integer

For integer keys the next three bits of the HEADER indicate the BYTE COUNT.

An object KEY must not contain a HEADER as the type of the key has already been defined.

Layout: HEADER | SIZE | KEY[0] | VALUE[0] | ... KEY[N] | VALUE[N]

4 - Typed Array

The next two bits indicate the type stored in the array:

0 -> floating point
1 -> signed integer
2 -> unsigned integer
3 -> boolean or string

For integral and floating point types, the next three bits of the type header are the BYTE COUNT.

For boolean or string types the next bit indicates whether the type is a boolean or a string

0 -> boolean // packed as single bits to the nearest byte
1 -> string // an array of strings (not an array of characters)

Layout: HEADER | SIZE | data

Boolean Arrays

Boolean arrays are stored using single bits for booleans and packed to the nearest byte.

String Arrays

String arrays do not include the string HEADER for each element.

Layout: HEADER | SIZE | string[0] | ... string[N]

5 - Generic Array

Generic arrays expect elements to have headers.

Layout: HEADER | SIZE | VALUE[0] | ... VALUE[N]

See extensions.md for additional extension specifications. These are considered to be a formal part of the BEVE specification, but are not expected to be as broadly implemented.

beve's People

Contributors

matrixberry avatar meftunca avatar stephenberry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

matrixberry

beve's Issues

TypeError: Cannot convert undefined or null to object

JSON and most languages do not support undefined values, and this is how the write_beve function on beve js throws an error for the undefined value passed to it. This is a potential bug for the js. Something needs to be done about this.

Suggestion,

  1. remove Object values with undefined values (like browsers do in rest api requests)
  2. We can convert Array elements with undefined value to null

I was able to solve the problem by adding a new "if" branch under write_value.

else if (value === null) {
        let header = 0;
        header |= 0b00000000;
        writer.append_uint8(header);
    } 

does not parse correctly after compressing large string size

[JavaScript] I get this error if the string size is 64 or more

Input:

const sampleData = {
    postId: "9776bfe3-6a5e-4b5c-b93d-4037447a0b4a",
    title: "Dedecor aspernatur defessus tamdiu amet amita facere tametsi.aa",
    content: "Commemoro vomica cupressus coepi virga demitto. Thesis ipsa dencio acceptus vociferor victus quasi ventito. Corporis tempora territo arcus.",
    createdAt: "Sun Jan 14 2024",
    updatedAt: "Thu Jun 27 2024",
    name: "John Doe",
    age: 30,
    isActive: true,
    isMarried: null,
    isStudent: false,
    courses: ["Math", "Science", "History"],
    grades: [95.3, 88, 92.7, "A+"],
    address: {
        street: "123 Main St",
        city: "Anytown",
        zip: "12345"
    }
};

Output:

{
  postId: "9776bfe3-6a5e-4b5c-b93d-4037447a0b4a",
  title: "Dedecor aspernatur defessus tamdiu amet amita facere tametsi.aa",
  content: "ommemoro vomica cupressus coepi virga demitto. Thesis ipsa deee$createdAt\u0002<Sun Jan 14 2024$updatedAt\u0002<Thu Jun 27 2024\u0010name\u0002 John Doe\fageI\u001e\u0000\u0000\u0000 isActive\u0018$isMarried\u0000$isStudent\b\u001ccourses\u0005\f\u0002\u0010Math\u0002\u001cScience\u0002\u001cHistory\u0018grades\u0005\u0010a33333�W@IX\u0000\u0000\u0000a�����,W@\u0002\bA+\u001caddress\u0003\f\u0018street\u0002,123 Main St\u0010city\u0002\u001cAnytown\fzip\u0002\u001412345",
  "": null,
}

Clarify role of SIZE for arrays and objects

For arrays and objects the SIZE field can be interpreted in one of two ways,

  1. the number of array elements or object members, or
  2. the total number of bytes of all array elements or object members.

I'm assuming the first option is used, but would greatly prefer and suggest this to be explicitly stated in README.md.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.