Giter VIP home page Giter VIP logo

colfer's Introduction

Colfer

Colfer is a binary serialization format optimized for speed and size.

The project's compiler colf(1) generates source code from schema definitions to marshal and unmarshall data structures.

This is free and unencumbered software released into the public domain. The format is inspired by Protocol Buffers.

CI

Language Support

  • C, ISO/IEC 9899:2011 compliant a.k.a. C11, C++ compatible
  • Go, a.k.a. golang
  • Java, Android compatible
  • JavaScript, a.k.a. ECMAScript, NodeJS compatible
  • 🚧 Gergely Bódi realised a functional Dart port.
  • 🚧 Karthik Kumar Viswanathan has a Python alternative under construction.

Features

  • Simple and straightforward in use
  • No dependencies other than the core library
  • Both faster and smaller than the competition
  • Robust against malicious input
  • Maximum of 127 fields per data structure
  • No support for enumerations
  • Framed; suitable for concatenation/streaming

TODO's

  • Rust and Python support
  • Protocol revision

Use

Download a prebuilt compiler or run go get -u github.com/pascaldekloe/colfer/cmd/colf to make one yourself. Homebrew users can also brew install colfer.

The command prints its own manual when invoked without arguments.

NAME
	colf — compile Colfer schemas

SYNOPSIS
	colf [-h]
	colf [-vf] [-b directory] [-p package] \
		[-s expression] [-l expression] C [file ...]
	colf [-vf] [-b directory] [-p package] [-t files] \
		[-s expression] [-l expression] Go [file ...]
	colf [-vf] [-b directory] [-p package] [-t files] \
		[-x class] [-i interfaces] [-c file] \
		[-s expression] [-l expression] Java [file ...]
	colf [-vf] [-b directory] [-p package] \
		[-s expression] [-l expression] JavaScript [file ...]

DESCRIPTION
	The output is source code for either C, Go, Java or JavaScript.

	For each operand that names a file of a type other than
	directory, colf reads the content as schema input. For each
	named directory, colf reads all files with a .colf extension
	within that directory. If no operands are given, the contents of
	the current directory are used.

	A package definition may be spread over several schema files.
	The directory hierarchy of the input is not relevant to the
	generated code.

OPTIONS
  -b directory
    	Use a base directory for the generated code. (default ".")
  -c file
    	Insert a code snippet from a file.
  -f	Normalize the format of all schema input on the fly.
  -h	Prints the manual to standard error.
  -i interfaces
    	Make all generated classes implement one or more interfaces.
    	Use commas as a list separator.
  -l expression
    	Set the default upper limit for the number of elements in a
    	list. The expression is applied to the target language under
    	the name ColferListMax. (default "64 * 1024")
  -p package
    	Compile to a package prefix.
  -s expression
    	Set the default upper limit for serial byte sizes. The
    	expression is applied to the target language under the name
    	ColferSizeMax. (default "16 * 1024 * 1024")
  -t files
    	Supply custom tags with one or more files. Use commas as a list
    	separator. See the TAGS section for details.
  -v	Enable verbose reporting to standard error.
  -x class
    	Make all generated classes extend a super class.

TAGS
	Tags, a.k.a. annotations, are source code additions for structs
	and/or fields. Input for the compiler can be specified with the
	-t option. The data format is line-oriented.

		<line> :≡ <qual> <space> <code> ;
		<qual> :≡ <package> '.' <dest> ;
		<dest> :≡ <struct> | <struct> '.' <field> ;

	Lines starting with a '#' are ignored (as comments). Java output
	can take multiple tag lines for the same struct or field. Each
	code line is applied in order of appearance.

EXIT STATUS
	The command exits 0 on success, 1 on error and 2 when invoked
	without arguments.

EXAMPLES
	Compile ./io.colf with compact limits as C:

		colf -b src -s 2048 -l 96 C io.colf

	Compile ./*.colf with a common parent as Java:

		colf -p com.example.model -x com.example.io.IOBean Java

BUGS
	Report bugs at <https://github.com/pascaldekloe/colfer/issues>.

	Text validation is not part of the marshalling and unmarshalling
	process. C and Go just pass any malformed UTF-8 characters. Java
	and JavaScript replace unmappable content with the '?' character
	(ASCII 63).

SEE ALSO
	protoc(1), flatc(1)

It is recommended to commit the generated source code into the respective version control to preserve build consistency and minimise the need for compiler installations. Alternatively, you may use the Maven plugin.

<plugin>
	<groupId>net.quies.colfer</groupId>
	<artifactId>colfer-maven-plugin</artifactId>
	<version>1.11.2</version>
	<configuration>
		<packagePrefix>com/example</packagePrefix>
	</configuration>
</plugin>

Schema

Data structures are defined in .colf files. The format is quite self-explanatory.

// Package demo offers a demonstration.
// These comment lines will end up in the generated code.
package demo

// Course is the grounds where the game of golf is played.
type course struct {
	ID    uint64
	name  text
	holes []hole
	image binary
	tags  []text
}

type hole struct {
	// Lat is the latitude of the cup.
	lat float64
	// Lon is the longitude of the cup.
	lon float64
	// Par is the difficulty index.
	par uint8
	// Water marks the presence of water.
	water bool
	// Sand marks the presence of sand.
	sand bool
}

See what the generated code looks like in C, Go, Java or JavaScript.

The following table shows how Colfer data types are applied per language.

Colfer C Go Java JavaScript
bool char bool boolean Boolean
uint8 uint8_t uint8 byte † Number
uint16 uint16_t uint16 short † Number
uint32 uint32_t uint32 int † Number
uint64 uint64_t uint64 long † Number ‡
int32 int32_t int32 int Number
int64 int64_t int64 long Number ‡
float32 float float32 float Number
float64 double float64 double Number
timestamp timespec time.Time †† time.Instant Date + Number
text const char* + size_t string String String
binary uint8_t* + size_t []byte byte[] Uint8Array
list * + size_t slice array Array
  • † signed representation of unsigned data, i.e. may overflow to negative.
  • ‡ range limited to [1 - 2⁵³, 2⁵³ - 1]
  • †† timezone not preserved

Lists may contain floating points, text, binaries or data structures.

Security

Colfer is suited for untrusted data sources such as network I/O or bulk streams. Marshalling and unmarshalling comes with built-in size protection to ensure predictable memory consumption. The format prevents memory bombs by design.

The marshaller may not produce malformed output, regardless of the data input. In no event may the unmarshaller read outside the boundaries of a serial. Fuzz testing did not reveal any volnurabilities yet. Computing power is welcome.

Compatibility

Name changes do not affect the serialization format. Deprecated fields should be renamed to clearly discourage their use. For backwards compatibility new fields must be added to the end of colfer structs. Thus the number of fields can be seen as the schema version.

Performance

Colfer aims to be the fastest and the smallest format without compromising on reliability. See the benchmark wiki for a comparison. Suboptimal performance is treated like a bug.

colfer's People

Contributors

dependabot[bot] avatar frantic avatar guilt avatar katrinwab avatar kbarrette avatar magiconair avatar nim4 avatar oliviergfr avatar pascaldekloe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

colfer's Issues

Support fixed-size arrays

For most languages this prevents memory allocation as the content can be embedded into the struct. There are many good uses such as IPv6 addresses, UUIDs and binary signatures.

[JAVA] NPE while comparing two object with .equals

This is the equals method of my class (compiled with colfer):

public final boolean equals(Metadata o) {
    return o != null 
        && o.getClass() == Metadata.class 
        && this.creationTime == null ? o.creationTime == null : this.creationTime.equals(o.creationTime) 
        && this.statistic == null ? o.statistic == null : this.statistic.equals(o.statistic) 
        && this.source == null ? o.source == null : this.source.equals(o.source) 
        && this.agent == null ? o.agent == null : this.agent.equals(o.agent) 
        && this.accessControl == null ? o.accessControl == null : this.accessControl.equals(o.accessControl);
  }

If o == null i get a NPE at the "creationTime" line. From my understanding the ? condition are evaluated before testing the && condition. So in this method this.creationTime.equals triggers a NPE.

I tried adding brackets to see if i could have the VM to evaluate the null check before all the others and it does work:

public final boolean equals(Metadata o) {
    return o != null 
        && o.getClass() == Metadata.class 
        && (this.creationTime == null ? o.creationTime == null : this.creationTime.equals(o.creationTime))
        && (this.statistic == null ? o.statistic == null : this.statistic.equals(o.statistic))
        && (this.source == null ? o.source == null : this.source.equals(o.source))
        && (this.agent == null ? o.agent == null : this.agent.equals(o.agent))
        && (this.accessControl == null ? o.accessControl == null : this.accessControl.equals(o.accessControl));
  }

Problem with int64 type in javascript API

We have a problem: when maven generate javascript api our parameters with int64 type have equals and incorrect index. Please fix it, as soon as possible please, it is very important for us.

In example serverEventDatetime parameter is int64.

this.Journal.prototype.marshal = function () {
    var segs = [];
    if (this.serverEventDatetime) {


        var seg = [4]; // PROBLEM IN THIS LINE, MUST BE: var seg = [0];


        if (this.serverEventDatetime < 0) {
            seg[0] |= 128;
            if (this.serverEventDatetime < Number.MIN_SAFE_INTEGER)
                fail('colfer: api/Journal field serverEventDatetime exceeds Number.MIN_SAFE_INTEGER');
            encodeVarint(seg, -this.serverEventDatetime);
        } else {
            if (this.serverEventDatetime > Number.MAX_SAFE_INTEGER)
                fail('colfer: api/Journal field serverEventDatetime exceeds Number.MAX_SAFE_INTEGER');
            encodeVarint(seg, this.serverEventDatetime);
        }
        segs.push(seg);
    }
    if (this.subsystemCode) {
        var utf = encodeUTF8(this.subsystemCode);
        var seg = [1];
        encodeVarint(seg, utf.length);
        segs.push(seg);
        segs.push(utf)
    }
    if (this.code) {
        var utf = encodeUTF8(this.code);
        var seg = [2];
        encodeVarint(seg, utf.length);
        segs.push(seg);
        segs.push(utf)
    }
// ................................................................................
this.Journal.prototype.unmarshal = function (data) {
// ................................................................................
//IN UNMARSHAL HEADER CORRECT
    if (header == 0) {
        var x = readVarint();
        if (x < 0) fail('colfer: api/IJournal field serverEventDatetime exceeds Number.MAX_SAFE_INTEGER');
        this.serverEventDatetime = x;
        readHeader();
    } else if (header == (0 | 128)) {
        var x = readVarint();
        if (x < 0) fail('colfer: api/Journal field serverEventDatetime exceeds Number.MAX_SAFE_INTEGER');
        this.serverEventDatetime = -1 * x;
        readHeader();
    }

    if (header == 1) {
        var size = readVarint();
        if (size < 0)
            fail('colfer: api.Journal.subsystemCode size exceeds Number.MAX_SAFE_INTEGER');
        else if (size > colferSizeMax)
            fail('colfer: api.Journal.subsystemCode size ' + size + ' exceeds ' + colferSizeMax + ' UTF-8 bytes');

        var start = i;
        i += size;
        if (i > data.length) fail(EOF);
        this.subsystemCode = decodeUTF8(data.subarray(start, i));
        readHeader();
    }

    if (header == 2) {
        var size = readVarint();
        if (size < 0)
            fail('colfer: api.Journal.code size exceeds Number.MAX_SAFE_INTEGER');
        else if (size > colferSizeMax)
            fail('colfer: api.Journal.code size ' + size + ' exceeds ' + colferSizeMax + ' UTF-8 bytes');

        var start = i;
        i += size;
        if (i > data.length) fail(EOF);
        this.code = decodeUTF8(data.subarray(start, i));
        readHeader();
    }
// ................................................................................

Is this used in production anywhere?

Hi, I work for a large mobile app company that is interested in replacing JSON with something faster. The one disappointing thing is that colfer is written in Go and JS it looks like, which we can't use for iOS or Android (without a fair amount of finagling at least), so are there any plans for it to get ported to C/C++? Also, are there any companies that use it? It would be great to see a list of some companies that use or even their experiences, as that would really sell the format. How much security auditing has occurred?

The numbers on https://github.com/eishay/jvm-serializers/wiki are tantalizing, so I'm curious to hear more :)

Support of "oneof" enum based types

Protobuf allows creation of Messages which can hold one element of a certain set.

message Message {
    oneof value {  // <-- creates field and enum with {A, B}
        MsgA A = 1;
        MsgB B = 2;
    }
}
Message MsgA {
   ....
}
Message MsgB {
   ....
}

which allows a switch over the generated enum-field:

Message m = Message.parse(inputstream);
switch(m.getValueCase())
{
    case A: m.getA(); ...
    case B: m.getB(); ...
}

which is good, because java checkstyles can issue warnings when not all enum fields are checked in a switch-statement

from my understanding colfer supports this by doing something like:

type Message struct {
	head text
	body binary
}
type A struct {
...
}

and by constructing java objects like this:

A a = new A(); a.set(....)....;
byte[] b = new byte[1024];
a.marshal(b,0);
Message m = new Message(); m.setHead(A.class.getSimpleName()); m.setBody(b)

which in return allows switching over the class-string

switch(t.getHead()) {
 case "A": new A().unmarschal(t.getBody())
}

but i find the creation of these java object a little cumbersome, just so i can easily switch over them

EBNF Spec Needed

The current spec is ambiguous. e.g.

"Data structures consist of zero or more field value definitions followed by a termination byte 0x7f."

  • Data structures: why the plural? Colfer is one data structure or consecutive data structures in a stream?
  • the whole format is terminated by 0x7f or each of the field followed by 0x7f? (I guess the former)

Could you please write an EBNF spec? EBNF is unambiguous and uses less characters than descriptions. A good example is The Go Programming Language Specification.

No JavaScript Benchmarks?

Would love to see how the built-in JSON and Protobuf fare against colfer. Its kinda unfair to see C, Go, and Java make it to the benchmark games and JS stays at home. Could it be that the JS version couldn't live up to the Colfer performance promise?

Comments in generated code

Hi,

colfer place the name of schema file to the comment it generates via Maven Plugin, I am using a Windows machine and colfer does not escape the "" in file path and java compiler then complains about 'illegal escape character', is there a way to disable this comment or let it the colfer escape it before putting in the comment?

Thx

Java String fields when explicitly set to null throws NullPointerException when marshal() is called

I am using the latest (1.11.2) colfer maven plugin to generate Java objects for serialization and I noticed that when String fields are explicitly set to null, a NullPointerException is thrown. It looks like the generated code is generating the following code for String name title:

if (! this.title.isEmpty()) {

Can this be changed to check for null first before calling isEmpty()? Something like
if (this.title!=null && ! this.title.isEmpty()) {

Java Builder pattern for code chaining

currently all "setValue(Value x)" methods return void , if they would return the object then code-chaining would become possible.

example current:

SomeClass sc = new SomeClass();
sc.setValue(42);
sc.setStringValue("string");

example builder pattern:

SomeClass sc = new SomeClass().setValue(42).setStringValue("string");

partial unmarshal java

Hello,

A question. I have my serialized colfer byte array inside a ByteBuffer. And i would like to access just some specific fields. Is this possible without loading everything or i should write a customised version of generated unmarshal method?

Regards
Tamer

Segmentation fault in a unique case

Hi,
I hope all are great.

Scenario:
I have the following mapping in my chappee.colf file.

package chappee
type mappee {
maptype uint32
msgtype uint32
callmsg text
debugleve text
minseed uint64
maxseed uint64
}

During packing, if I set the last two values, the packing is successful.
Now, if during packing I pack first 4 (or any of them) and don't pack the last 2, I get a segmentation fault.
I think it would be great that Colfer itself should take care of the values which are not set.

It can be taken as a bug, or it can be taken as a feature request.

Cheers,
infoginx.com

How use library for Java

Please help me i look a stats of library for read / write versus other, You have a good performance because i don't undersand how use this. Thank for read.

why not making colf java based on annotation processing

@Colfer could be used to identify pojo classes. Then the annotation processor would scan the class and generate the code for serialization / deserialization.

This is the most popular java way to do it. Annotation processors are a standard part of the java compiler.

Colf has a great potential on Android. And annotation processors are very common in Android builds.

Document uint16 better.

Why is uint16's compressed and uncompressed flipping the behavior of (index | 0x80)? This flag sort of is the anti-pattern of what uint32 and uint64 do. It's a bit confusing.

Compiler generates non compiling code (Java)

I tried it with the downloaded compiler and this colf file:

package asdf

type PrimArraysColfer struct {
id []uint64
time []uint64
valueInteger []int32
valueInteger64 []int64
valueFloat []float32
valueDouble []float64
}

Integer overflow in generated Go code on 32 bit

Hi,
I used 'go get' to get the Colfer tool yesterday and tried today to generate a simple schema. Unfortunately the generated code does not compile under x86 32-bit due to uint overflow errors. Specifically the timestamp causes the issue:
if v := o.From; !v.IsZero() { if s := uint(v.Unix()); s < 1<<32 { l += 9 } else { l += 13 } }

Colfer.go:160: constant 4294967296 overflows uint

I'm running go1.7.3 linux/386 if that helps. I've attached the .colf file and I'll gladly help test the fix!

Br.
Andreas

schema.colf.zip

Suggestion: Continuous Fuzzing

Hi, I'm Yevgeny Pats Founder of Fuzzit - Continuous fuzzing as a service platform.

I saw that you implemented Fuzz targets but they are currently not running as part of the CI.

We have a free plan for OSS and I would be happy to contribute a PR if that's interesting.
The PR will include the following

  • Continuous Fuzzing of master branch which will generate new corpus and look for new crashes
  • Regression on every PR that will run the fuzzers through all the generated corpus and fixed crashes from previous step. This will prevent new or old bugs from crippling into master.

You can see our basic example here and you can see an example of "in the wild" integration here.

Let me know if this is something worth working on.

Cheers,
Yevgeny

Rust support

Hello,

I wanted to ask about rust support.

Considering how such code is usually generated with macros, should we just create a crate that just implements the protocol/ser/deser logic?
Also, conforming with Serde to get automatic ser/deser of rust struct would be a very nice addition as well.

Cheers,

Mathieu

Field order optimizations

The order for struct equality could be improved. Also for C and Go the ordering of the fields is relevant (memory alignment).

Simple usage example

I am interested in seeing a simple Java code example that uses Colfer for serialization into a ByteStream. I am currently conducting a study on the energy efficiency of serialization/deserialization on Android devices, specifically with regards to large numerical data types such as vectors, and will like to include Colfer in my study. Short of asking @eishay, and because it will provide a quick head-start, a link to a very basic use case would be appreciated.

Weird disprepancy in SizeMax sanity checks

Hey again,

While porting the code to Rust, I've come accross a very weird logic difference in two (functionally) identical sanity checks in generated Go code.

Here and here it checks if l is not overflowing ColferMaxSize, which seems logical and reasonable.

But a bit before (here) it performs the same check against x which is only the length of the list of nested objects.
Shouldn't it check against l instead like the others?

Maybe the generated test is not up to date and the bug has already been fixed, I have no idea, but in any case this might need a change.

BufferOverflowException in writeObject

Maybe I missed something, but it looks like a bug:
In Java, in generated class there is writeObject(ObjectOutputStream out) method which uses buf array of size 1024 by default. In case of java.nio.BufferUnderflowException size of it is enlarged 4 times.
But in case when buf array is too small to handle marshalled object, java.nio.BufferOverflowException is thrown by marshal(byte[] buf, int offset) method, not java.nio.BufferUnderflowException, so buf is not enlarged and writeObject ends with exception.

Timezone offset for Go

It would be great if this could be supported. Right now I have to serialise extra fields to rematerialize it correctly.

I think, unless it is very hard, that I can give it a stab if you point me in the right direction.

Support re-use of existing arrays if of same size.

This is Java-specific, I'm not sure how other generations work. When unmarshalling arrays (any list or byte array) a new array is always constructed. This needlessly creates garbage when the old array has the exact same size as the new array and can be directly re-used. This can be a common use-case for some applications (fixed size images or buffers, for example).

JS version works wrong

Hi. I have a problem with generated JS code. I use Golang as a server and JS as client. Golang version makes valid marshal and unmarshal. JS version returns wrong packet after unmarshalling of packet from Golang server.
It's my code:

type update struct {
	id uint16
	x float32
	y float32
	rotation float32
	DT float32
	HP uint8
	commandId uint32
	maxBulletId uint16
}

type update_arr struct {
	packetType uint8
	players []update
}

Exactly update_arr packet works wrong in JS version.

On Golang server I have this data:

{
  "id": 0,
  "x": 100,
  "y": 100,
  "rotation": 0,
  "DT": 0,
  "HP": 100,
  "commandId": 0,
  "maxBulletId": 0
}

But on JS client:

{
  "id": 0,
  "x": 2.4262940954800533e35,
  "y": -131072.03125,
  "rotation": 0,
  "DT": 0,
  "HP": 100,
  "commandId": 0,
  "maxBulletId": 0
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.