mapbox / geobuf Goto Github PK

A compact binary encoding for geographic data.

License: ISC License

JavaScript 100.00%

geobuf's Introduction

Geobuf

Geobuf is a compact binary encoding for geographic data.

Geobuf provides nearly lossless compression of GeoJSON data into protocol buffers. Advantages over using GeoJSON alone:

Very compact: typically makes GeoJSON 6-8 times smaller.
2-2.5x smaller even when comparing gzipped sizes.
Very fast encoding and decoding — even faster than native JSON parse/stringify.
Can accommodate any GeoJSON data, including extensions with arbitrary properties.

The encoding format also potentially allows:

Easy incremental parsing — get features out as you read them, without the need to build in-memory representation of the whole data.
Partial reads — read only the parts you actually need, skipping the rest.

Think of this as an attempt to design a simple, modern Shapefile successor that works seamlessly with GeoJSON. Unlike Mapbox Vector Tiles, it aims for nearly lossless compression of datasets — without tiling, projecting coordinates, flattening geometries or stripping properties.

Note that the encoding schema is not stable yet — it may still change as we get community feedback and discover new ways to improve it.

"Nearly" lossless means coordinates are encoded with precision of 6 digits after the decimal point (about 10cm).

Sample compression sizes

Data	JSON	JSON (gz)	Geobuf	Geobuf (gz)
US zip codes	101.85 MB	26.67 MB	12.24 MB	10.48 MB
Idaho counties	10.92 MB	2.57 MB	1.37 MB	1.17 MB

API

encode

var buffer = geobuf.encode(geojson, new Pbf());

Given a GeoJSON object and a Pbf object to write to, returns a Geobuf as UInt8Array array of bytes. In [email protected] or later, you can use Buffer.from to convert back to a buffer.

decode

var geojson = geobuf.decode(new Pbf(data));

Given a Pbf object with Geobuf data, return a GeoJSON object. When loading Geobuf data over XMLHttpRequest, you need to set responseType to arraybuffer.

Install

Node and Browserify:

npm install geobuf

Browser build CDN links:

Building locally:

npm install
npm run build-dev # dist/geobuf-dev.js (development build)
npm run build-min # dist/geobuf.js (minified production build)

Command Line

npm install -g geobuf

Installs these nifty binaries:

geobuf2json: turn Geobuf from stdin or specified file to GeoJSON on stdout
json2geobuf: turn GeoJSON from stdin or specified file to Geobuf on stdout
shp2geobuf: given a Shapefile filename, send Geobuf on stdout

json2geobuf data.json > data.pbf
shp2geobuf myshapefile > data.pbf
geobuf2json data.pbf > data.json

Note that for big files, geobuf2json command can be pretty slow, but the bottleneck is not the decoding, but the native JSON.stringify on the decoded object to pipe it as a string to stdout. On some files, this step may take 40 times more time than actual decoding.

geobuf's People

Contributors

Stargazers

Watchers

geobuf's Issues

Switch to sint32 for coords

Since we use 1e6 encoding, it might make sense to switch coords from int64 to int32 for better geometry compression.

int32 range is -2147,483,648 through 2147,483,647, which is plenty of headroom for the usual -180..180 plus a handful of repeating worlds. We probably should not care about other CRS because it's going to be ditched out of the GeoJSON spec.

Include projection info

Since this deals in native projections, it should encode something about the projection in the protobuf. How about a proj4 string?

Error: Unimplemented type: 3

I'm making a fairly simple call to the Mapbox API in a node script, and it's failing to decode the response given use of the geobuf example. Logging the body shows a valid, still-encoded response has come back from the API, but attempting to decode throws this:

/Users/wboykinm/github/tribes/processing/water/node_modules/pbf/index.js:204
        else throw new Error('Unimplemented type: ' + type);
                   ^
Error: Unimplemented type: 3
    at Object.Pbf.skip (/Users/wboykinm/github/tribes/processing/water/node_modules/pbf/index.js:204:20)
    at Object.Pbf.readFields (/Users/wboykinm/github/tribes/processing/water/node_modules/pbf/index.js:40:45)
    at Object.decode (/Users/wboykinm/github/tribes/processing/water/node_modules/geobuf/decode.js:17:19)
    at Request._callback (/Users/wboykinm/github/tribes/processing/water/get.js:29:26)
    at Request.self.callback (/Users/wboykinm/github/tribes/processing/water/node_modules/request/request.js:199:22)
    at Request.emit (events.js:110:17)
    at Request.<anonymous> (/Users/wboykinm/github/tribes/processing/water/node_modules/request/request.js:1036:10)
    at Request.emit (events.js:129:20)
    at IncomingMessage.<anonymous> (/Users/wboykinm/github/tribes/processing/water/node_modules/request/request.js:963:12)
    at IncomingMessage.emit (events.js:129:20)

Am I missing some basic preprocessing of the API response?

Leaflet geobuf example

Hi, is it possible to get a leaflet example map, perhaps using the US zip data you mention as an example of how well geobuf performs, and to show how it is implemented?

Thanks,

Conor.

Ditch TopoJSON support?

@mourner: I also had some thoughts about whether I made a mistake by pushing TopoJSON support instead of keeping things simple and limited to GeoJSON
The size compression benefits are good but it makes the format significantly more complex, this will harm adoption rate
well, not very complex currently, but it'll be much more complex when we make it streamable
@tmcw: hm, i think yes, we should dump it.
i'm divided on topojson because of the dual purpose of topology in it
like, a really good open source implementation of a topology-supporting geometry system... super useful
but topojson is mainly doing it to save bytes

Fix 1.0.x release tags

Tags for 1.0.0 and 1.0.1 should be prefixed with a v for consistency with other tags on this project.

Does not retain feature.id

Not sure if this is by design or not?

var assert = require('assert');
var geobuf = require('geobuf');
var f = {
    type: 'Feature',
    id: 'hello there',
    properties: { some: 'thing' },
    geometry: {
        type: 'Point',
        coordinates: [ 0, 0 ]
    }
};

// throws
assert.equal(f, geobuf.geobufToFeature(geobuf.featureToGeobuf(f).toBuffer()));

Geobuf Format: IDs

IDs can be of type string or sint32. Whats the rational behind this? Can we support 64bit IDs? I think in this day and age, 32 bit its are too limiting and OSM data would need 64bit IDs.

shp2geobuf fails with shapefile containing null geometries

node node_modules\geobuf\bin\shp2geobuf contour_5.shp > contour_5.geobuf

node_modules\geobuf\encode.js:51
    if (obj.type === 'FeatureCollection') {
           ^
TypeError: Cannot read property 'type' of null
    at analyze (node_modules\geobuf\encode.js:51:12)
    at analyze (node_modules\geobuf\encode.js:56:9)
    at analyze (node_modules\geobuf\encode.js:52:51)
    at encode (node_modules\geobuf\encode.js:26:5)
    at node_modules\geobuf\bin\shp2geobuf:9:26
    at node_modules\geobuf\node_modules\shapefile\index.js:15:5
    at node_modules\geobuf\node_modules\shapefile\read.js:27:11
    at notify (node_modules\geobuf\node_modules\shapefile\node_modules\queue-async\queue.js:47:18)
    at node_modules\geobuf\node_modules\shapefile\node_modules\queue-async\queue.js:39:16
    at FSReqWrap.oncomplete (fs.js:95:15)

Logging obj at the start of analyze function shows this before the TypeError occurs:

-------obj:
 { type: 'Feature',
  properties: { ID: 51, ELEV: 375 },
  geometry: null }
-------obj:
 null

Delta encoding

geobuf uses delta encoding by virtue of using protobuf

Turns out it's actually not, which leaves a great room for geometry compression. Going to look into this and PR.

NPM latest points to old version (0.2.4)

Would it be possible to npm publish the latest? An updated version would keep people from running into protobufjs/protobuf.js#164.

Thanks!

GeometryCollection support

use protocol-buffers

likely faster than protobuf-js https://github.com/mafintosh/protocol-buffers

Implementation guide.

I'm wondering is there any example implementation guide for geobuf? Anything that could guide someone like me (who's totally lost as to how to use it). Ideally some type of leaflet/mapbox related implementation spec would be great.

I understand you need to first encode your geojson file.
Then include your geobuf browser build in your leaflet file.

For leaflet, I get that you can convert the encoded file to geojson in the browser like so:

var layer = L.geoJson( geobuf.decode( new Pbf(data) ) ).addTo(map);

However, I'm totally lost as to how to bring my pbf file into leaflet. How do I bring it in? Can I just include it in a similar way to how I would have for a normal geojson layer, i.e:

script src="json_County201602090.pbf"></script

And 'data' would be a variable inside my pbf file?

Can someone shed some light on how to properly use geobuf?

Apologies if my issue isn't that sophisticated. I'd really appreciate some kind of guide.

Thanks.

Freeze on a 52k-point GeoJSON route

This test case isn't too big, but freezes geojson2geobuf conversion completely (or at least it was still running after 15 min before I shut it down):
https://github.com/mapbox/geojson-vt/blob/master/debug/data/route.json

geobuf does not play well with pbf => 1.3.6

I have a server -> client setup where i encode and transfer GeoJSON using geobuf. This last 24 hours a couple of new releases of pbf has been made (1.3.6 and 2.0.0). When upgrading to any of these geobuf seems to generate (or just parse) malformed GeoJSON.

This is when I encode with 1.3.6 or 2.0.0 on the server side and use a matching pbf version on the client end. If I downgrade to 1.3.5 everything looks fine again.

Not sure if this is a pbf bug or just interplay problems between pbf and geobuf.

Browser version cannot be created

I downloaded the latest release, ran a npm install and then npm run build-dev (or build-min) and it is erroring out because it cannot find the build-dev or build-min scripts.

What is the correct way of producing browser js?

Comparison to binary JSON formats

Would be interesting to see how this compares to simply using MessagePack or CBOR for encoding.

A revised breaking protobuf schema for Geobuf

The more I find ideas to improve the sizes of geobuf, the more I realize that we would need to completely rewrite the schema to support the improvements, breaking compatibility. So I'm opening the ticket to start a discussion about what a perfect Geobuf schema would look like (not to say this is a priority, but still a good thing to discuss).

I wrote a prototype schema with all the improvements I could think of here: https://gist.github.com/mourner/3c6ddca04c9772593302

The main difference is utilizing the power and flexibility of the new oneof statement to create a better and more compact schema.

Features:

the data itself contains information whether it's Feature, FeatureCollection, Geometry or GeometryCollection, so you don't need to guess this when decoding
keys and values for properties are stored separately in the top-level Data object, and features only store indexes to them (like vector-tile-spec does); keys and possibly values are reduced to unique values #26 — for much better properties packing
geometry coordinates are a oneof set of different fields (depending on type), which solves the ambiguity with null values since a oneof field can be empty (without a default value), and also makes it easier to understand and work with
feature has a oneof of either geometry or geometry collection instead of repeated geometries
coordinates are stored as delta-encoded sint32 to take full advantage of varint and zig-zag encoding #24 — for much more compact geometries
Value message is also a oneof of different value types
the property value int type is sint32 instead of int64 — it's more compact and JS doesn't actually handle int64 values; in addition, uint type becomes uint32
the data contains optional flag indicating whether it contains altitude (third coordinate), since this is a global-level setting
the data also contains optional precision information (6 by default) #25

@tmcw @springmeyer what do you think?

Feature set or feature?

Should this encode single features, like WKT, or featuresets, like GeoJSON? what's the overhead of doing featuresets always?

Geobuf format and streaming writes

/cc @mourner @springmeyer @artemp

The problem

The geobuf format as it is currently defined can't be written in a stream. Instead, the whole data has to be assembled in memory first and then written out. This directly follows from the way Protobuf encodes its messages. A good description of the problem can be found in the header comments of the protobuf writer of the UPB library. In short the problem is that Protobuf uses nested messages to encode the data and each message has a length header which is encoded as a Varint. But we can't write out the length (or even know how long the length field is, because a Varint is of variable length) before we have the whole message assembled.

This is, of course, a major problem for a format that is intended for huge files.

A possible solution: Remove outermost message

Because this is an inherent limitation of the Protobuf format we have to look outside the format for a solution. Of course we could throw away the whole Protobuf format, but thats not needed. Whats needed is a wrapper around it so that the Protobuf encoder/decoder only sees part of the data. In the simplest case we encode the data in pieces:

We remove the outermost message Data and then write each of the other data pieces on their own as their own protobuf message. We might want to move the keys, dimensions, and precision into a message Header or so. The oneof data_type construct doesn't work any more, instead we just have to keep reading messages until EOF. But we don't know which kind of messages will be in there (Feature, FeatureCollection, etc.) so we have to add this information to the newly introduced message Header in some way and then parse accordingly. This should certainly be doable and doesn't require a lot of change. But I think there is...

A better solution: Chunking the data

This looks slightly more complicated in the beginning but has many advantages, so bear with me. Lets encode the data in chunks, each chunk gets a length field and the data:

CHUNK
    LENGTH (4 bytes)
    DATA (LENGTH bytes)
CHUNK
    LENGTH (4 bytes)
    DATA (LENGTH bytes)
...

Each DATA block is a complete Protobuf message which can be parsed on its own. This idea is, of course, not new. It is what the OpenStreetMap OSM PBF is doing.

A typical DATA field will contain maybe a few thousand geometries or features. Note that this does not mean that the contents of the different chunks are somehow logically distinct. Logically this is still one data stream. The chunking is purely an encoding issue and files with the same data split up into different sized chunks would still represent the same data.

This format some added advantages:

the LENGTH field can tell us how much memory to allocate for buffering the DATA part
reading and writing can be done in parallel, because several threads can work on encoding/decoding different chunks at the same time. In fact thats what Libosmium does when parsing OSM PBF.
concatenating two files is trivial: deal with the headers, then just copy data chunks

Now it gets a little bit more complicated than that. (This is again based on experience with the OSM PBF format.) The first chunk should probably contain some kind of header. This could include metadata such as the dimensions setting and the keys. All other chunks contains the data itself. So chunks (OSM PBF calls them Blobs) need to contain some kind of type identifier to differentiate between a header chunk and a data chunk. OSM PBF does this by adding another level of Protobuf encoding (See BlobHeader and Blob messages) which seems like overkill to me. It makes the implementation rather confusing and probably slower. Instead we can just add a type field:

CHUNK
    HEADER (fixed size)
        TYPE (1 byte, first chunk always =META)
        LENGTH (4 bytes)
    DATA (LENGTH bytes)
CHUNK
    HEADER (fixed size)
        TYPE (1 byte, following chunks always =GEOMDATA)
        LENGTH (4 bytes)
    DATA (LENGTH bytes)
...

Strictly speaking we can live without that TYPE field, because the header always has to be the first chunk and following chunks data, but it seems cleaner and allows us more flexibility if we have this type. And maybe we want to have different types of data? This is something which has to be explored.

OSM PBF adds another useful functionality: Encoding chunks (or Blobs) with zlib or other compression formats. Each chunk can be optionally compressed and the type of compression is noted in the chunk header. This can squeeze out the last bytes from the resulting files, but it is still possible to encode and decode the file in chunks and in parallel. To add this we need, again, some type field. And we should also store the size of the uncompressed data, because it allows us to give the decompressor a buffer with the correct size.

Note that I have used 4 byte length fields in my description. This is probably enough for each chunk, in fact chunks should not get too big, because each one has to fit into memory after all (several is we encode/decode in parallel). OSM PBF has some extra constraints on sizes of different structures which can help with implementations because fixed-sized buffers can be used.

Note also that there is no overall length field for the whole file. Thats important, because it allows use to streaming write the data without knowing beforehand how many features the file will contain or how big it will be. (We might want to end the file with some END chunk that marks the end of file to guard against truncation. Optionally it could contain a checksum. This is something that OSM PBF is missing, but could be useful to detect data corruption.)

And while we are at it, I suggest adding a fixed 4-byte (or so) magic header that is always the same but can be used by tools such as find to determine the file size easily and a fixed-size version field for future-proofing the format.

This brings us to something like this:

MAGIC (fixed size)
VERSION (=1, fixed size)
CHUNK
    HEADER (fixed size)
        TYPE (1 byte, first chunk always =META)
        COMPRESSION_TYPE (1 byte)
        RAW_LENGTH (4 bytes)
        ENCODED_LENGTH (4 bytes)
    DATA (LENGTH bytes)
CHUNK
    HEADER (fixed size)
        TYPE (1 byte, first chunk always =GEOMDATA)
        COMPRESSION_TYPE (1 byte)
        RAW_LENGTH (4 bytes)
        ENCODED_LENGTH (4 bytes)
    DATA (LENGTH bytes)
...
CHUNK
    HEADER (fixed size)
        TYPE (1 byte, last chunk always =END)
        COMPRESSION_TYPE (1 byte)
        RAW_LENGTH (4 bytes)
        ENCODED_LENGTH (4 bytes)
    DATA (LENGTH bytes)
        CHECKSUM

Some padding might be necessary to have length fields on 4-byte boundaries etc. And all length fields should probably be encoded in network byte order. Those details can be worked out.

Inside the DATA we'd still use the Protobuf-encoded data (nearly) as before. No big change there. For some things such as the keys we have to discuss whether they fit better in the META header or the DATA part. In the META and END blocks we can use Protobuf, too, or any other encoding. Because thats not a lot of data it isn't that important that we pack it so tightly and using a simpler format might allow simpler access to the metadata. On the other hand Protobuf is tried and true and allows for easy extensibility.

Command line conversion loses the properties of a FeatureCollection

It looks to me in the code that the properties of a FeatureCollection is written.

So I'm thinking this might a bug:

echo '{"type": "FeatureCollection", "properties": {"name": "collection"}, "features": []}' \
 | ./node_modules/.bin/json2geobuf \
 | ./node_modules/.bin/geobuf2json \
 | jq .
# -> {"type":"FeatureCollection","features":[]}

Support missing coordinates

[[10, 10, 5], [10, 10]] currently roundtrips to [[10, 10, 5], [10, 10, 0]]. Ideally we should support things like that.

Geobuf Format: Too many options

Looking at the proto file, if I interpret it correctly, I think there are too many options how the file format can actually look like. The data_type seems to suggest the outermost structure can be either a FeatureCollection, a Feature, a Geometry or a Topology. Is that necessary? Can't it always be a Feature Collection, possibly with just one feature in it which in turn contains one Geometry? I am concerned that different implementors will implement slightly different subsets of the format making implementations incompatible.

Whats the difference between the Value, the properties and the custom_properties? Repeating those fields in Feature and Geometry doesn't look good to me. Do we really need that? In my understanding a feature is a geometry plus some attributes. If we can put attributes on the geometry, why is there a distinction between feature and geometry?

Why has the Topology no properties, just Value and custom_properties? (Unlike Feature and Geometry which have all three.)

MultiPolygon support

add version flag

Though protobuf supports evolving schemas it may be good to have an explicit version too.

Tag and bump commit missing for v0.2.3

encode with e6

not doing this yet.

License

Please add a license file.

I submitted a PR without first checking against this, but assume that you'll use the same license as other mapbox repos.

Dimensions

I'm going to ignore this for now, but we should support XY[ZM] in the future - geojson is very open ended about this

column-major order

For overnoded shapefiles, we might be able to grasp some big advantage by ordering coordinates in dimension order rather than in tuples. But there's a parsing and generation overhead, and random access is harder.

Specification

Same as vector-tile-spec we should have a SPEC.md specification for this.

Are there any efforts to have this integrated with ios sample app?

Using geobuf in browser, require not found

I am trying to use geobuf in browser
I ran
npm run build-min
and got a geobuf-min.js file and have put it in the same folder as html file
I keep getting this error when trying to use the decode function
Uncaught ReferenceError: require is not defined
Probably I am doing some wrong with the browserify part?
Thank you very much

Geobuf Format: Rename Type to GeometryType?

I suggest renaming Type to GeometryType in the proto file. Type is much too generic a name.

Fancy encoding for properties

Since @mourner is checking this library out, might as well:

Could we encode properties in a more efficient way? Right now we're re-encoding property name & value for every single feature. This could save a lot of space if (a) we encode names once and then do an array or (b) we do some magic around structs

JSON-LD proofing

Add some JSON-LD keywords and items from https://github.com/geojson/geojson-ld/blob/master/time.md#instantaneous-event-like-feature to test fixtures and make sure that geobuf is JSON-LD proofed.

Stream multiple geobuf messages

I have a toolchain where I pipe GeoJSON documents as line delimited JSON from one to another. It would be nice to encode the single GeoJSON instances as geobuf and pipe those around too.

My first try was to use binary-split to split a stream into several geobuf instances. Unfortunately you have to use a special splitOn ASCII sequence I don't know as the default linebreak could be part of a geobuf too.

So is there any ASCII sequence which is significant for a geobuf start or ending?
I know of #37, but there only the streaming mode of a single GeoJSON document has been discussed.

Compatibility with "classic GIS file formats"

I think we need to think about compatibility with "classic GIS file formats" like Shapefiles. What I mean by that is all those formats that only support one type of geometry per layer and maybe even only one layer per file. It should be well defined how those files map to Geobuf files. I am not saying we should limit ourselves to what those files support. But I think it would be useful to define a subset of the Geobuf format that is guaranteed to map well to those formats. Maybe even define some flag that can be set promising that the file contents behave in that way.

Geobuf Index

Lets discuss indexing. Previous discussion: #27 (comment)

I think the solution I'd like to see here is a separate PBF-based format that would come as a separate file coupled with a Geobuf file that would store:

serialized R-Tree (rbush) with leafs pointing to feature offsets in the geobuf pbf
a map of feature ids to feature offsets for fast single-feature seeking

The R-tree serialization should avoid embedded messages because they are hard to decode lazily. I'd imagine one possible solution to be nodes stored as a flat set of messages, with references to children implemented as offset pointers.

Nested properties are not handled properly

Nested properties, e.g.,

var feat = {
    type: 'Feature',
    geometry: {
        type: 'Point',
        coordinates: [0, 0]
    },
    properties: {
        nested: {nope: 'yep'}
    }
};

are not handled properly; they are lost during encode / decode.

use recursive arrays instead of multi_array

I think we should be able to describe a coord_array that is arbitrarily nested instead of having a separate multi_array construct.

geobuf.decode() returns empty object in a browser

I've made a simple gist that gets test pbf with XHR and then decodes it with geobuf.decode(). Unfortunately, it returns an empty Object. http://bl.ocks.org/mofoyoda/da963a75a3bb8c9437a6
I've tested decoding with node, and it works ok.

Purpose, use cases, and priorities?

Perhaps you are already planning to address this in #3 but I'd like to know:

What is the primary purpose of this? Why come up with a new format in a landscape dominated by shapefiles, geojson, and (suboptimal) OGC specs? Don't get me wrong, I think applying to ideas of vector tiles to a more open-ended format is brilliant and I want to see where this goes - I just think we'd all benefit from a little more insight into the justification.

What are the primary use cases for this? Is it intended to provide a compact means of transferring data from server to client (i.e., browser), from file storage to server (i.e., vector tiles to mapnik), etc?

What are your main priorities for this? What are the order of things like this?

speed of encoding / decoding
file size
support for streaming parser (i.e., random access to feature at a time) vs parsing entire file into a local data representation that allows random access

shp2geobuf fails with shapefile containing one feature

shp2geobuf data/polygon.shp > data/polygon.geobuf

/Users/artem/Projects/mapbox/geobuf/encode.js:50
    if (obj.type === 'FeatureCollection') {
           ^
TypeError: Cannot read property 'type' of undefined
    at analyze (/Users/artem/Projects/mapbox/geobuf/encode.js:50:12)
    at encode (/Users/artem/Projects/mapbox/geobuf/encode.js:26:5)
    at /Users/artem/Projects/mapbox/geobuf/bin/shp2geobuf:8:26
    at /Users/artem/Projects/mapbox/geobuf/node_modules/shapefile/index.js:14:23
    at /Users/artem/Projects/mapbox/geobuf/node_modules/shapefile/read.js:26:29
    at notify (/Users/artem/Projects/mapbox/geobuf/node_modules/shapefile/node_modules/queue-async/queue.js:45:26)
    at /Users/artem/Projects/mapbox/geobuf/node_modules/shapefile/node_modules/queue-async/queue.js:35:11
    at /Users/artem/Projects/mapbox/geobuf/node_modules/shapefile/read.js:17:33
    at /Users/artem/Projects/mapbox/geobuf/node_modules/shapefile/index.js:70:27
    at readRecordHeader (/Users/artem/Projects/mapbox/geobuf/node_modules/shapefile/shp.js:34:40)

geojson2geobuf binary doesn't handle FeatureCollection or single geometry

https://github.com/mapbox/geobuf/blob/master/bin/geojson2geobuf

Large integer attribute values are altered

For example, using the Census TIGER states dataset, the attributes ALAND and AWATER are large integers (e.g. 62266581604).

Here they are getting encoded as float types and thus get altered (e.g., 62266580992).

For testing I was doing this: shp -> geobuf -> geojson and comparing the property values.

I tried adding the ability to detect and set integer types around line 149 but that produced odd property values in geojson (e.g., "ALAND":{"low":2137039460,"high":14,"unsigned":true}). Presumably this is because Long is not used (but should be)?

Simply setting the values as double type preserves proper values, but produces a bigger geobuf file (as expected).

Presumably the proper fix would be to detect the proper numeric type and encode using that, e.g.,

         switch (typeof v) {
            case 'number':
                if (v|0 === v){
                    val.set((v > 0)? 'uint_value': 'int_value', v);
                }
                else {
                    val.set('float_value', v);
                }
                break;
            case 'boolean':
                val.set('bool_value',  v);
                break;
            case 'string':
                val.set('string_value',  v.toString());

Not sure how to detect which to use: float vs double but in this case we wanted uint anyway.

Inaccurate floating point arithmetics produce invalid Geometries (non-closed LinearRings)

How to reproduce

Input GeoJSON file created with ogr2ogr (polygon.geojson)

{
"type": "FeatureCollection",                                          
"features": [
{ "type": "Feature", "properties": { }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 5425435.733081569895148, 2012689.63544030720368 ], [ 5425333.066045090556145, 2012658.8061882276088 ], [ 5425324.357915714383125, 2012693.518385621719062 ], [ 5425426.5193927353248, 2012720.238697179593146 ], [ 5425435.733081569895148, 2012689.63544030720368 ] ] ] } }
]
}

1.

json2geobuf data/polygon.geojson > data/polygon.geobuf

2.

geobuf2json data/polygon.geobuf

Output:

{"type":"FeatureCollection","features":[{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[5425435.733082,2012689.63544],[5425333.066046,2012658.806188],[5425324.357917,2012693.518385],[5425426.519394,2012720.238697],[5425435.733083,2012689.63544]]]},"properties":{}}]}

Note, first and last vertex are not equal

re: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math/round

Configurable precision

1e6 rounding is hardcoded in geobuf, but some datasets don't loose much with lower precision like 1e4. Perhaps we could make this configurable and also encoded as a property in the format to give more room for geometry compression.

Should get more relevant with delta encoding #23, because lower-precision data will have much lower deltas with configurable precision.

mapbox/pbf as an encoding/decoding alternative?

As an alternative to Protobuf.js and protocol-buffers libraries, could we use Konstantin's pbf? It's much more low level and without proto reading, but also simpler and gives us more control over encoding and decoding, potentially making it faster.