flatgeobuf / flatgeobuf Goto Github PK

A performant binary encoding for geographic data based on flatbuffers

License: BSD 2-Clause "Simplified" License

JavaScript 0.23% Shell 0.26% TypeScript 16.83% C# 17.33% C++ 11.46% Java 11.17% Rust 36.61% Go 5.99% Makefile 0.11%

flatgeobuf's Introduction

FlatGeobuf

A performant binary encoding for geographic data based on flatbuffers that can hold a collection of Simple Features including circular interpolations as defined by SQL-MM Part 3.

Inspired by geobuf and flatbush. Deliberately does not support random writes for simplicity and to be able to cluster the data on a packed Hilbert R-Tree enabling fast bounding box spatial filtering. The spatial index is optional to allow the format to be efficiently written as a stream, support appending, and for use cases where spatial filtering is not needed.

Goals are to be suitable for large volumes of static data, significantly faster than legacy formats without size limitations for contents or metainformation and to be suitable for streaming/random access.

The site switchfromshapefile.org has more in depth information about the problems of legacy formats and provides some alternatives but acknowledges that the current alternatives has some drawbacks on their own, for example they are not suitable for streaming.

FlatGeobuf is open source under the BSD 2-Clause License.

Examples

Specification

MB: Magic bytes (0x6667620366676201)
H: Header (variable size flatbuffer)
I (optional): Static packed Hilbert R-tree index (static size custom buffer)
DATA: Features (variable size flatbuffers)

The fourth byte in the magic bytes indicates major specification version. The last byte of the magic bytes indicate patch level. Patch level is backwards compatible so an implementation for a major version should accept any patch level version.

Any 64-bit flatbuffer value contained anywhere in the file (for example coordinates) is aligned to 8 bytes to from the start of the file or feature to allow for direct memory access.

Encoding of any string value is assumed to be UTF-8.

A changelog of the specification is available here.

Performance

Preliminary performance tests has been done using road data from OSM for Denmark in SHP format from download.geofabrik.de, containing 906602 LineString features with a set of attributes.

	Shapefile	GeoPackage	FlatGeobuf	GeoJSON	GML
Read full dataset	1	1.02	0.46	15	8.9
Read w/spatial filter	1	0.94	0.71	705	399
Write full dataset	1	0.77	0.39	3.9	3.2
Write w/spatial index	1	1.58	0.65	-	-
Size	1	0.72	0.77	1.2	2.1

The test was done using GDAL implementing FlatGeobuf as a driver and measurements for repeated reads using loops of ogrinfo -qq -oo VERIFY_BUFFERS=NO runs and measurements for repeated writes was done with ogr2ogr conversion from the original to a new file with -lco SPATIAL_INDEX=NO and -lco SPATIAL_INDEX=YES respectively.

Note that for the test with spatial filter a small bounding box was chosen resulting in only 1204 features. The reason for this is to primarily test the spatial index search performance.

As performance is highly data dependent I've also made similar tests on a larger dataset with Danish cadastral data consisting of 2511772 Polygons with extensive attribute data.

	Shapefile	GeoPackage	FlatGeobuf
Read full dataset	1	0.23	0.12
Read w/spatial filter	1	0.31	0.26
Write full dataset	1	0.95	0.63
Write w/spatial index	1	1.07	0.70
Size	1	0.77	0.95

Optimizing Remotely Hosted FlatGeobufs

If you're accessing a FlatGeobuf file over HTTP, consider using a CDN to minimize latency.

In particular, when using the spatial filter to get a subset of features, multiple requests will be made. Often round-trip latency, rather than throughput, is the limiting factor. A caching CDN can be especially helpful here.

Fetching a subset of a file over HTTP utilizes Range requests. If the page accessing the FGB is hosted on a different domain from the CDN, Cross Origin policy applies, and the required Range header will induce an OPTIONS (preflight) request.

Popular CDNs, like Cloudfront, support Range Requests, but don't cache the requisite preflight OPTIONS requests by default. Consider enabling OPTIONS request caching . Without this, the preflight authorization request could be much slower than necessary.

Features

Reference implementation for JavaScript, TypeScript, C++, C#, Go, Java and Rust
Efficient I/O (streaming and random access)
GDAL/OGR driver
GeoServer WFS output format

Supported applications / libraries

Fiona (1.8.18 and forward)
GDAL (3.1 and forward)
Geo Data Viewer (Visual Studio Code extension) (2.3 and forward)
GeoServer (2.17 and forward)
GeoTools (23.0 and forward)
MapServer (with GDAL >=3.1.0)
PostGIS (3.2.0 and forward)
pyogrio
QField
QGIS (3.16 and forward)
ldproxy (3.3 and forward)
gogama/flatgeobuf

Documentation

TypeScript / JavaScript

API Docs

Prebuilt bundles (intended for browser usage)

flatgeobuf.min.js (contains the generic module)
flatgeobuf-geojson.min.js (contains the geojson module)
flatgeobuf-ol.min.js (contains the ol module)

Node usage

See this example for a minimal how to depend on and use the flatgeobuf npm package.

FAQ

Why not use WKB geometry encoding?

It does not align on 8 bytes so it not always possible to consume it without copying first.

Why not use Protobuf?

Performance reasons and to allow streaming/random access.

Why not use compression as part of the format?

Separation of concerns and to allow random access.

Why am I not getting expected performance in GDAL?

Default behaviour is to assume untrusted data and verify buffer integrity for safety. If you have trusted data and want maximum performance make sure to set the open option VERIFY_BUFFERS to NO.

What about MapBox Vector Tiles?

FlatGeobuf does not aim to compete with MapBox Vector Tiles. MVTs are great for rendering but they are relatively expensive to create and is a lossy format, where as FlatGeobuf is lossless and very fast to write especially if a spatial index is not needed.

Why does it not work with create-react-app?

See #244 for root cause and workaround.

Does FlatGeobuf support mixing features with and without geometry with spatial index?

Currently it likely does not but could in the future, see #260.

flatgeobuf's People

Contributors

Stargazers

Watchers

flatgeobuf's Issues

CRS meta finalization

wrt. #6 it now looks to me like a simple "wkt" as string property will not cut it. Found that GeoPackage have a fallback for WKT vs. WKT2 described by http://www.geopackage.org/guidance/extensions/wkt_for_crs.md.html so mabye at https://github.com/bjornharrtell/flatgeobuf/blob/master/src/fbs/header.fbs#L32:

Introduce definition assumed to be a WKT definition.
Introduce definition_12_063 assumed to be a WKT2 definition.
Introduce name.
Introduce description.
Remove wkt.

Add constraints on large header size; malicious input could cause OOM

Reading this slice of bytes into the Rust FgbReader will cause memory exhaustion on my machine:

let input: &[u8] = &[102, 103, 98, 3, 102, 103, 98, 0, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 219, 216, 216, 216, 216, 216, 216, 216, 39, 39, 39, 39, 39, 32, 39, 39, 39, 39, 39, 39, 39, 39, 39, 10, 169, 247, 247, 247, 247];

Live Heap Allocations: 681051920 bytes in 616 chunks; quarantined: 82746 bytes in 35 chunks; 151073 other chunks; total chunks: 151724; showing top 95% (at most 8 unique contexts)
656877351 byte(s) (96%) in 1 allocation(s)
    #0 0x11016021d in wrap_malloc+0x9d (librustc-nightly_rt.asan.dylib:x86_64+0x4521d)
    #1 0x10e2188bf in flatgeobuf::file_reader::FgbReader::open::h4507c5254c3f3c4e+0x22f (read:x86_64+0x1000018bf)
    #2 0x10e22b34a in rust_fuzzer_test_input+0x62a (read:x86_64+0x10001434a)
    #3 0x10e22d4ce in __rust_try+0xe (read:x86_64+0x1000164ce)
    #4 0x10e22d163 in LLVMFuzzerTestOneInput+0x133 (read:x86_64+0x100016163)
    #5 0x10e2300c1 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long)+0x131 (read:x86_64+0x1000190c1)
    #6 0x10e22f45d in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*)+0x3d (read:x86_64+0x10001845d)
    #7 0x10e231979 in fuzzer::Fuzzer::MutateAndTestOne()+0x249 (read:x86_64+0x10001a979)
    #8 0x10e2329c5 in fuzzer::Fuzzer::Loop(std::__1::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&)+0x385 (read:x86_64+0x10001b9c5)
    #9 0x10e25351f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long))+0x210f (read:x86_64+0x10003c51f)
    #10 0x10e262d42 in main+0x22 (read:x86_64+0x10004bd42)
    #11 0x7fff20346630 in start+0x0 (libdyld.dylib:x86_64+0x15630)

MS: 1 EraseBytes-; base unit: 5689a55a7e79130882f4b9a5e56078827298d1e0
0x66,0x67,0x62,0x3,0x66,0x67,0x62,0x0,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0xdb,0xd8,0xd8,0xd8,0xd8,0xd8,0xd8,0xd8,0x27,0x27,0x27,0x27,0x27,0x20,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0x27,0xa,0xa9,0xf7,0xf7,0xf7,0xf7,
fgb\x03fgb\x00'''''''''''''\xdb\xd8\xd8\xd8\xd8\xd8\xd8\xd8''''' '''''''''\x0a\xa9\xf7\xf7\xf7\xf7
artifact_prefix='/Users/coreyf/tmp/flatgeobuf/src/rust/fuzz/artifacts/read/'; Test unit written to /Users/coreyf/tmp/flatgeobuf/src/rust/fuzz/artifacts/read/oom-fee1960a9368951f58513973b2c8002cc219f1e4
Base64: ZmdiA2ZnYgAnJycnJycnJycnJycn29jY2NjY2NgnJycnJyAnJycnJycnJycKqff39/c=
SUMMARY: libFuzzer: out-of-memory

A couple options:

Hard limit on max header size
Customizable limit on max header size via a new FgbReaderBuilder

Discovered while fuzzing via #84

Errors on POINT EMPTY (Java)

Empty versions of each of the geometry types are valid geometries, whose WKTs are of the form "POINT EMPTY", "MULTIPOLYGON EMPTY", etc. I added Java tests for each of them (except for GEOMETRYCOLLECTION), which all passed except "POINT EMPTY" (stack trace below).

This may be related to the fact that empty points cannot be serialized into (strict) WKB, with the rough reasoning that any other empty geometry can just have the list of points/geometries be empty, which doesn't work for points. A standard workaround (done by ESRI, EWKB, etc) is to use NaNs for the coordinates for empty points.

Would flatgeobuf be interested in supporting this use case?

java.lang.ArrayIndexOutOfBoundsException: 0

	at org.wololo.flatgeobuf.geotools.GeometryConversions.deserialize(GeometryConversions.java:114)
	at org.wololo.flatgeobuf.geotools.FeatureConversions.deserialize(FeatureConversions.java:86)
	at org.wololo.flatgeobuf.geotools.FeatureCollectionConversions.deserialize(FeatureCollectionConversions.java:153)
	at org.wololo.flatgeobuf.test.GeometryRoundtripTest.roundTrip(GeometryRoundtripTest.java:91)
	at org.wololo.flatgeobuf.test.GeometryRoundtripTest.pointEmpty(GeometryRoundtripTest.java:106)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

Optimize for parallel I/O

v3 spec non indexed FlatGeobuf aren't suitable for massively parallel I/O. I think what is needed to do this are one of:

Mandatory feature index (offsets, in spec v3 this is only available when FlatGeobuf is indexed)
Chunked data

I'm leaning on feature index. Possibly as post data section to allow streaming write.

Fallback to string representation for unknown types

Encoding in QGIS is set to "system"

I know, this might be better issued in QGIS, but maybe that's a shorter way here.

When loading a FlatGeobuf file in QGIS , the encoding gets set to "system" by default, which is never UTF-8. So there are encoding problems.

Could you please advertise the layer as UTF-8 only (or whatever the layer is in)? See OSGeo/gdal#2254

Benchmark comparison

I've added a benchmark to the GeoZero library comparing read performance of different spatial formats.

Description and results: https://github.com/pka/geozero/tree/master/geozero-bench

I'm interested in feedback espcially compared to the mentionend FlatGeobuf benchmark.

Performance tuning

Shapefile/GDAL

Benchmark	Median 18.6.	Median 20.6.
countries/1-shp	1.36 ms	1.22 ms
countries_bbox/1-shp	1.18 ms	1.17 ms
buildings/1-shp	6.21 s	2.72 s
buildings_bbox/1-shp	176.20 ms	92.38 ms

GDAL reads all fields by default, which made it rather slow especially in the buildings benchmark. Time has been signifcantly improved by ignoring all unused fields.

FlatGeobuf

Benchmark	Median 14.6.	Median 20.6.
countries/2-fgb	0.20 ms	0.18 ms
countries_bbox/2-fgb	0.03 ms	0.02 ms
buildings/2-fgb	0.94 s	0.94 s
buildings_bbox/2-fgb	68.50 ms	25.04 ms

FGB results were very good from the beginning. But the buildings_bbox time was beaten by FlatGeobuf over HTTP (59.24 ms), which was surprising. It turned out that the file reader doing many seek operations when filtering with a bbox, doesn't play well with std::io::BufRead. Using the alternative implementation seek_bufread improved the performance by a wide margin.

PostGIS

Benchmark	Median 14.6.	Median 20.6.
countries/7-postgis_postgres	12.02 ms	0.74 ms
countries/7-postgis_sqlx	18.81 ms	0.99 ms
countries_bbox/7-postgis_postgres	11.22 ms	0.20 ms
countries_bbox/7-postgis_sqlx	-	0.15 ms
buildings/7-postgis_postgres	3.23 s	3.16 s
buildings_bbox/7-postgis_postgres	132.20 ms	102.99 ms
buildings_bbox/7-postgis_sqlx	-	158.40 ms

The first obvservation was that the SQLx library was significantly slower than rust-postgres.
It turned out, that it was the only one using an SSL encrypted connection by default. Turning off encryption braught it to the same level, but there is still a difference to be analyzed. After a few rounds I decided to reuse the DB connection between measurement loops. This is more realistic, since all relevant implementations use a connection pool.

Comparison

Combining results from the FlatGeobuf test with the GeoZero benchmarks gives the following results for the bigger dataset:

	Shapefile	GeoPackage	FlatGeobuf
Read full dataset	1	0.23	0.12
buildings	1	1.58	0.35
Read w/spatial filter	1	0.31	0.26
buildings_bbox	1	1.14	0.27

FGB results are extremely close for the spatial filter test, which is the most important one. The GeoPackage results are similar for the small dataset, but completely different for the large dataset. Why GeoPackage with GeoZero is slower than Shapefiles for big datasets has to be analyzed.

Spec v4 goals and features

Dictionary for common strings (#43)
Optimize for parallel I/O (#82)
Defined data boundaries to fx. allow additional data after feature data (probably needed for the above)
Better extensibility with definitions of reserved use (fx. define how to handle unknown/new column types) (#79)
Consider memory model (#106)
Consider attribute data access (alignment, indexable)

Support for circular/arc geometries

TS generic: TypeError: i is not a function

The non-GeoJSON flatgeobuf distribution fails for me (flatgeobuf-geojson.min.js works):

var flatgeobuf_generic = require('flatgeobuf/dist/flatgeobuf.min.js')
var {readFileSync} = require('fs')
var path = '../test/data/ca_a.fgb'
var buf = readFileSync(path)
flatgeobuf_generic.deserialize(buf)

TypeError: i is not a function
    at /Users/kyle/unfolded/sandbox/flatgeobuf-parser/js-test/node_modules/flatgeobuf/dist/flatgeobuf.min.js:1:31696
    at Object.t.deserialize (/Users/kyle/unfolded/sandbox/flatgeobuf-parser/js-test/node_modules/flatgeobuf/dist/flatgeobuf.min.js:1:31720)
    at evalmachine.<anonymous>:1:20
    at Script.runInThisContext (vm.js:120:20)
    at Object.runInThisContext (vm.js:311:38)
    at run ([eval]:1054:15)
    at onRunRequest ([eval]:888:18)
    at onMessage ([eval]:848:13)
    at process.emit (events.js:310:20)
    at emit (internal/child_process.js:876:12)

Tested with this file, with geometry type LINESTRING Z:

ca_a.fgb.zip

Logo proposal

Hello, i am a graphic designer. I would love to contribute a logo free for your project if you would have me.

Add method to add post feature data payloads

Idea from #43 (comment).

Could probably be used for other things too.

[TS/JS] MultiPolygon features can cause unhandled TypeError

Java ref. impl. cannot handle many column/property types

Index optimizations for streaming

Current index structure which is based on Flatbush might not be optimal for streaming traversal. Some observations:

Traversal is backwards as tree has root in the end
Feature offset array is separate from tree (is however nice to find a feature by id)

Perhaps both observations is simply a matter of I/O optimization and strategy.

Incomplete property types support

The list of known types is a bit limited:

https://github.com/bjornharrtell/flatgeobuf/blob/master/src/java/src/main/java/org/wololo/flatgeobuf/geotools/FeatureTypeConversions.java#L44

Should support all Number subtypes, Date, binary if possible, and probably fall back on string for anything else ("For real world code is complex, and full of terrors").

features_count = 0 == unknown ?

https://github.com/bjornharrtell/flatgeobuf/blob/master/src/fbs/header.fbs mentions

  features_count: ulong;        // Number of features in the dataset (0 = unknown)

Is 0 == unknown still true ? I guess this might make sense for a writer that would operate in pure streaming mode, in which case none of the packed Hilbert R-tree indext and Feature offsets index sections should be there. (https://github.com/bjornharrtell/flatgeobuf doesn't indicate that the Feature offsets index section can be optional). As far as I can see, the GDAL driver probably doesn't handle 0 == unknown properly

TS: Need exported flatbuffer files to use generic dist bundle

A bit of a continuation of #67...

I'd like to express that it's difficult currently to use the generic dist bundle (flatgeobuf.min.js), because the only exports are serialize and deserialize.

As described earlier in #67, with the generic endpoint, you have to write your own function to parse each feature. As in the GeoJSON version, you'd have to parse the geometries and properties using something similar to parseProperties and fromGeometry:
https://github.com/bjornharrtell/flatgeobuf/blob/170a59153fe19cbf163fc296c75cf132461a456f/src/ts/geojson/feature.ts#L18-L19

But in order to parse geometries, you need to know the GeometryType:
https://github.com/bjornharrtell/flatgeobuf/blob/170a59153fe19cbf163fc296c75cf132461a456f/src/ts/geojson/geometry.ts#L74-L86

And in order to parse properties, you need to know the ColumnType:
https://github.com/bjornharrtell/flatgeobuf/blob/170a59153fe19cbf163fc296c75cf132461a456f/src/ts/generic/feature.ts#L116-L125

Therefore, flatgeobuf.min.js is rather useless, because you still need to copy all the generated flatbuffer files to do anything.

I'm unable to use anything in lib/ because it's not compiled at all, and I get constant

Uncaught SyntaxError: Cannot use import statement outside a module

errors in Node. (We're working on loaders.gl, a collection of JS file loaders that need to work in both browser and Node.)

In summary, I'd suggest exporting more objects in the generic module. The ol and geojson dist bundles make sense to only have deserialize and serialize exports because there's no customization, but I think at a minimum you should re-export the generated flatbuffer files.

[feature request] get variable names & summary stats

In https://github.com/r-spatial/leafem/blob/master/inst/htmlwidgets/lib/FlatGeoBuf/fgb.js#L173-L200 I am allowing a user to specify field names of a flatgeobuf file to scale different aspects of the visual representation of a leaflet layer depending on the values of those fields.

updateStyle = function(style_obj, feature, scale, scaleValues) {
  var cols = Object.keys(style_obj);
  var vals = Object.values(style_obj);

  var out = {};

  for (i = 0; i < cols.length; i++) {
    if (vals[i] === null) {
      out[cols[i]] = feature.properties[cols[i]];
    } else {
      if (scaleValues !== undefined) {
        //if (Object.keys(feature.properties).includes(vals[i])) {
        if (scaleValues[i] === true) {
          vals[i] = rescale(
            feature.properties[vals[i]]
            , scale[cols[i]].to[0]
            , scale[cols[i]].to[1]
            , scale[cols[i]].from[0]
            , scale[cols[i]].from[1]
          );
        }
      }
      out[cols[i]] = vals[i];
    }
  }

  return out;
};

This can be used to map numeric variables to point size (radius), line width (weight), line opacity (opacity) and fill opacity (fillOpacity). The user will provide the field names in the scaleFields argument and will need to provide some scaling rules in the scale argument. The scale is an object with (up to) 4 key-value pairs for each scaleField, e.g.

{opacity:{to_min:0, to_max:1, from_min:<min_field_value>, from_max:<max_field_value>}}

At the moment this poses two challenges:

I don't know the structure of the attribute table
I don't know the min/max values of the field(s) in question

For 1. I am currently inspecting the first result of deserializeStream to get the field names and compare them to the style object here. The relevant part is:

if (scaleFields === undefined &
                result.value.properties !== undefined) {
              var vls = Object.values(style);
              scaleFields = [];
              vls.forEach(function(name) {
                if (name in result.value.properties) {
                  scaleFields.push(true);
                } else {
                  scaleFields.push(false);
                }
              });
            }

This is fine unless fields are missing from the first result.value.properties because of missing data. The fetch API seems to drop the fields alltogether if the value for the field is missing/undefined. Hence, being able to read field names prior to the fetch call would be great so I can be sure to capture all field names present in the file.

For 2. I am currently relying on user input for from_min and from_max as I don't see any other way of calculating the overall min and max data values for a given field when data is being streamed. So something similar to ogrinfo including min/max values for each field would be great (unless of course the overhead for calculating these will be too large).

Here's an example where I map radius, fillOpacity and weight to 50k points (gif is too large for github, hence the link to the tweet).

https://twitter.com/TimSalabim3/status/1267424569398374403

Hope this is clear enough.

Add GeoTools DataStore

I am interested in adding a Geotools DataStore that would be able to read and write flatgeobuf files. I am working on a Geotools pull request:

jericks/geotools@fc4dc88

but I was wondering if you would like this done as a GeoTools modules or in this repository.

JS: Import as module

I see that your Leaflet/OpenLayers examples load the built source in the script tag, from the /dist/ folder

https://cdn.jsdelivr.net/npm/[email protected]/dist/flatgeobuf-geojson.min.js
https://cdn.jsdelivr.net/npm/[email protected]/dist/flatgeobuf-ol.min.js

However, there's currently no exported object from the module. So if I try to require the package, I get an error:

require('flatgeobuf')
// Error: Cannot find module 'flatgeobuf'

This means that it's not possible now (I think) to use it with a bundler, like Webpack or Browserify.

It would be helpful to export some objects, to be able to do one of

const flatgeobuf = require('flatgeobuf');
import flatgeobuf from "flatgeobuf";

flatgeobuf.deserialize(buffer)

TS/JS usage - fetch features for bounding box

I'm having some trouble figuring out how to use the javascript module to retrieve a subset of features. Is there some way I can parse just the header of a very large flatgeobuf, use the index to identify byte ranges for a given bounding box, and then parse features for those bytes?

Eventually I'd like to stick a large flatgeobuf on s3 and run analysis on subsets of the data, using byte-range GET requests to just grab the data I need. Is that the sort of workflow this format is intended to support?

edit... looking at the format again, it appears I'd have trouble identifying how much of the file's header to retrieve in order to get the entire index. I'd appreciate any thoughts you might have on that as well.

Direct parser to Leaflet

I've investigated Leaflet source to get an idea how to do this but as of yet I don't understand it well enough...

Serialization format

Currently flatbuffers but there might be reasons to reevaluate.

A new interesting contender - https://chronoxor.github.io/FastBinaryEncoding/ and see mapbox/tippecanoe#777 for evidence that flatbuffers isn't optimal in fx. Go.

Support Geometry subclasses in Java

Currently, Geometry subclasses in Java do not appear to be supported.

If I create a subclass like this:

class MyPoint extends Point { public MyPoint(CoordinateSequence coordinates, GeometryFactory factory) { super(coordinates, factory); } }

the GeometryConversions.toGeometryType(MyPoint.class); will throw a RuntimeException "Unknown geometry type".

I think all we need to do is switch around the order of the classes in the isAssignableFrom method:

Instead of

geometryClass.isAssignableFrom(Point.class)

we need

Point.class.isAssignableFrom(geometryClass)

I can work on a PR with tests if this sounds right.

Provide general Column size meta

Should indicate if a Column is of fixed size and in that case the byte length, or if the column is of variable length (and as such has a size prefix),

This would allow new column types to be added without breaking backwards compatibility (which should then be able to simply skip the unknown column type values)

Does not support streaming write

Caused simply by it bailing on featurecount 0, which should be valid for an unindexed flatgeobuf output.

C++ ref impl streaming API

Not sure how to implement. Perhaps need some more library support fx. https://jscheiny.github.io/Streams/.

TS: 3D FlatGeobuf parsed to GeoJSON as 2D

Here only the XY array portion of geometries are parsed to GeoJSON:

https://github.com/bjornharrtell/flatgeobuf/blob/7d9d5d59ee1da99d44da1dddebf2d57c38433e3c/src/ts/geojson/geometry.ts#L63-L76

I'd argue that GeoJSON should include the Z dimension if it exists. From the spec:

An OPTIONAL third-position element SHALL be the height in meters above or below the WGS 84 reference ellipsoid.

Idea: short strings & shared strings

Short string values can be common. It could be decided to have a ShortString type where the string length would be only on a single byte (so strings from 0 to 255 characters).
Variant of the above, have a VarString type where the string length is encoded as a variable-length integer, using Protocol Buffer encoding

There are also situations where many features will share the same value. This is typically true of OSM tag values. This could be handled by having a dictionary of string values, in a zone past the header, and have features pointing to that. This is what is used for mapbox vector tiles.
For example, the most significant bit of the lenght of a string set to 1 could mean "the rest of the bits are the index in the dictionary", and set to 0 "this is the string length, followed by the string value". The writer will have to have some logic to know in advance which strings are to be put in the dictionary and which are not. For cases where the field can only hold a enumerated set of values, this is of course obvious. Actually the zone with the dictionary could possibly be put after all features, so for the sake of generality the header would contain an offset to that zone, so that writers can decide if they put it at the beginning or the end. A streaming writer could still put it at the end by buffering the N first features to establish the dictionary by analyzing if some values are repeated or if they are unique, and use that for the rest of the features.

tippecanoe support

Help @ericfischer out by giving him some more specific details maybe about flatgeobuf please, thx :)

see: mapbox/tippecanoe#777

C++ ref impl allocates and copies memory unnecessarily

Is GeoJSON so slow ?

While I see that flatgeobuf is better than shapefiles, I was surprised by how much faster than GeoJSON it is.
Is that because GeoJSON uses a DOM style method of loading the entire thing first ? Or is it because there is no index in the file itself, requiring a lot of file scanning ?
This isnt really an issue with flatgeobuf itself. I was just wondering if there is a better way to use GeoJSON that doesnt make it look so bad in the benchmarks.

Optimize I/O for JavaScript implementation

Prior art is the general prefetch strategy in GDAL, see https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files-random-access.

Geoserver + flatGeobuff + OpenLayers running example

Hi Bjorn,
is there any working example of Openlayers querying geoserver flatgeobuff features ?
I tried to create a vector layer in openlayer using the following code :
const vectorSource = new VectorSource({
//strategy: allStrategy,
strategy: bboxStrategy,
loader: async function(extent) {
const baseUrl = 'http://localhost:8080/geoserver/topp/ows'
const baseParams = 'service=WFS&version=1.0.0&request=GetFeature'
const typeNameParam = 'typeName=' + 'cite%3Aznet_mv_network'
const bboxParam = 'bbox=' + extent.join(',')
const outputFormatParam = 'outputFormat=application/flatgeobuf'
const url = ${baseUrl}?${baseParams}&${typeNameParam}&${bboxParam}&${outputFormatParam}
const response = await fetch(url)
//const response = await fetch('http://localhost:8080/geoserver/topp/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=topp%3Astates&outputFormat=application%2Fflatgeobuf')
let asyncIterator = flatgeobuf.deserializeStream(response.body, ol)
this.clear()
for await (let feature of asyncIterator)
this.addFeature(feature)
}
});

It looks like the "fbs" coming from geoserver are not correctly received or creted in geoserver.

Thanks
Regards

Support binary property type/values

C++ ref impl creates invalid indexes

Webpack issues with entrypoint module

In one dependent project I now (after changes for #52) get this problem:

ERROR in ./node_modules/flatgeobuf/lib/geojson.js
Module not found: Error: Can't resolve 'web-streams-polyfill/ponyfill' in '/home/bjorn/code/dai-edit-frontend/node_modules/flatgeobuf/lib'
@ ./node_modules/flatgeobuf/lib/geojson.js 1:0-63 8:89-103
@ /tmp/broccoli-57758Xe4x9f86Svt9/cache-465-bundler/staging/app.js
@ multi /tmp/broccoli-57758Xe4x9f86Svt9/cache-465-bundler/staging/l.js /tmp/broccoli-57758Xe4x9f86Svt9/cache-465-bundler/staging/app.js⠋ buildingcleaning up...

This seems more of a webpack issue than anything else.. and I don't know how to fix it other than backtracking on the entry point module as I don't really see the use for it myself.

@kylebarron what is your take on this issue?

Missing support for MultiPolygon and more

more = GC and mixed collections

Project FlatBuffers.Core.csproj seems missing

Project FlatBuffers.Core.csproj seems missing from flatgeobuf/src/net/ directory but referenced in solution file FlatGeobuf.sln

(unless I missed something)

Feature properties data is unaligned

Undecided if this is a spec issue or something that implementations should handle.

Support 4D coords

As discussed in https://lists.osgeo.org/pipermail/proj/2019-June/008657.html it would be good if any future proof format of spatial features can explicitly represent temporal dimension i.e 4D.

By providing well known meta for possible dimensions X,Y,Z,M and T for a coordinate we can support 4D.

Open questions:

Does unix timestamp work as temporal dimension?
Can X,Y vs Z possibly/reasonably need separate temporal dimensions?

Using library with Java 8 leads to ByteBuffer NoSuchMethodErrors

When using the flatgeobuf jar published in maven with Java 8, you get NoSuchMethodErrors when ByteBuffers are used.

java.lang.NoSuchMethodError: java.nio.ByteBuffer.position(I)Ljava/nio/ByteBuffer;

This is a fairly well known issue:

geotools/geotools#2878

https://issues.apache.org/jira/browse/UIMA-5904

jetty/jetty.project#3244

But instead of adding casts, we can just use a maven profile that adds the release 8 configuration when compiling with Java 9 or above.

JS: Parse directly to flat array?

Is https://github.com/bjornharrtell/flatgeobuf/blob/master/src/ts/geojson.ts the current JS/TS API? I'm curious if it's possible to access the parsed data without creating individual Feature objects.

In particular, when using Deck.gl, a high performance GPU-accelerated geospatial visualization library, performance is best when it's possible to keep geometries as flat typed arrays, since 1) you don't have to pay the time cost for individual object creation and 2) the data needs to be in flat typed arrays to be uploaded to the GPU.

Given what I understand of the format, it would seem possible to read the metadata and create flat typed arrays very fast. I presume the number of coordinates of each feature is known from the initial metadata?

Extended column metadata

As per current spec the metadata for a column consists of only name and type (https://github.com/bjornharrtell/flatgeobuf/blob/bb80f92a6d8d6ff277da7b7d590529e8f0c1fe83/src/fbs/header.fbs#L44-L46). It could be helpful to optionally be able to provide more details, fx. length/width/precision and nullability.

This could probably be added without breaking spec v3 backward compatibility.

Decide on geometry binary representation

Currently this is a custom flatbuffers based representation. The advantage is that specification and implementation are simplified by courtesy of flatbuffer schema and code generation, which also guarantees that the data is aligned for direct access. The main drawback is that it non-standard.

The alternative is WKB. AFAIK it does not look like WKB is suitable for direct access though.

Excessive point serialization overhead?

Serializing a (2d) point uses 136 bytes (which in the current implementation is held in a 144-byte capacity byte[]). A WKB serialization takes 21 bytes. A WKB that is padded for alignment would be 24 bytes (endian[1], type[4], padding[3], x[8], y[8]).

Assuming 8 magic bytes and about 13 bytes for the header, could the size be closer to 40 bytes? (magic[8] + header_plus_padding[16] + x[8] + y[8])?

This is relevant if you are parsing a large column of geometries; it would reduce network IO and memory usage by ~66%.

flatgeobuf / flatgeobuf Goto Github PK

flatgeobuf's Introduction

FlatGeobuf

Examples

Specification

Performance

Optimizing Remotely Hosted FlatGeobufs

Features

Supported applications / libraries

Documentation

TypeScript / JavaScript

Prebuilt bundles (intended for browser usage)

Node usage

FAQ

Why not use WKB geometry encoding?

Why not use Protobuf?

Why not use compression as part of the format?

Why am I not getting expected performance in GDAL?

What about MapBox Vector Tiles?

Why does it not work with create-react-app?

Does FlatGeobuf support mixing features with and without geometry with spatial index?

flatgeobuf's People

Contributors

Stargazers

Watchers

Forkers

flatgeobuf's Issues

Performance tuning

Shapefile/GDAL

FlatGeobuf

PostGIS

Comparison

Recommend Projects

Recommend Topics

Recommend Org