Error in example - wrong road layer

Howdy Folks - I just noticed a small error in the example code where the roads layer from mapbox-streets is wrong:

layers: ['roads'] should in fact be layers: ['road']

Similarly in buffer.js it should be:

module.exports = function (tileLayers, opts, done){
  var road = tileLayers.streets.road;
  var bufferedRoad = turf.buffer(road, 20, 'meters');
  done(null, bufferedRoad);

documentation holes

We definitely need to document raw - we don't even mention it right now.

Optionally, we may want to add a section about how to optimize (use raw, use rbush for lots of intersections, etc) and talk about the effect of tiles with buffers and how to generate custom mbtiles without buffers

support other sources

This may end up being irrelevant depending on new architecture, but if we keep data source configuration centralized, we should think about supporting more sources. Maybe use tilelive, so supporting other sources is trivial / zero changes to tile reduce internals

Background: I'm trying to do OSM stats by country. This means that I either:

  • Need to split jobs up by country, OR
  • Need to know what country a tile is in during each worker's step of reduce.

Using Geojson-vt and Natural Earth Admin 0 boundaries, I can easily figure out which countries are present in a tile, but currently that requires me to do geojsonvt + fs.readFile inside every single worker

flexible cover scoping

I think we should make scoping jobs extremely flexible:

  • bbox
  • polygon
  • anything geojson - for example, feed in a feature collection of census blog group points, so only areas where people live are computed; feed in a collection of roads, so only tiles that contain road networks are processed
  • tiles - feed in a list of tiles so custom filters can be precomputed; the tiles can be down sampled or up sampled extremely quickly on the fly using recursive calls to tilebelt.getParent and tilebelt.getChildren

This will be pretty simple to support with tile-cover and tilebelt, and the type of cover can be implicitly classified automatically (given this list anyway).

It seems pretty obvious that bbox and polygon should be supported. Does it make sense to support arbitrary geojson objects (given that tile-cover can handle these already), and tiles which will provide granular control + index caching?

@aaronlidman @MateoV

browser support

It should be possible to make this work in modern browsers via web workers. To do this:

  • factor out getVectorTile, and provide a browser version that uses xhr instead of request. (Note: I think the browser will handle the gzip transparently.)
  • use webworkify as a substitute for child_process.fork
  • tests (how??)

Memory leak

When you're not accumulating results on reduce events, the memory consumption still creeps up so you can easily go out of memory on a large number of tiles. This doesn't look right โ€” there's probably a big memory leak somewhere.

Passing options to workers

I'm thinking about using tile-reduce to power a little utility module, where I'd need to pass options along from the main process to the workers. Could the map module get passed the serialized tile reduce options as one of its initial args?

Can not select by layers from a remote sources

In, there is a example on URL sources:

sources: [
    name: 'streets',
    url: '{z}/{x}/{y}.vector.pbf',
    layers: ['roads'],
    maxrate: 10

I expect only transfer roads layer. However, I still get all layers in streets. The layers seems have no effects.

read files from disk

This feature would allow for caching or pre-downloading a region, which would speed up jobs that use tons of HTTP requests. I'm thinking that a file path with the usual {x} {y} {z} would suffice.

persistent reduce queue

I think the reducer should be fired when all of the map operations are complete. This will slow things down a tiny amount, but not significantly in most cases. It will also eliminate race conditions, and will allow for much better reliability (internet goes down during a job, event gets "lost" for whatever reason, etc.). This will also allow for anonymous reducers off the client's machine that can be run whenever necessary, or even incrementally updated.

There are a few possibilities for how this should be stored, but I am leaning towards dynamo (or dynalite for local jobs, if it is robust enough. If it's not, then we can use leveldb).

I am still thinking through whether or not we should still have the reduce event at all. It could be useful in some form for progress updates, but if thats all we use it for, it could simply send back the percent complete, and the tile processed.

cc @rclark


  • start: reports that the cover has been computed for mapping jobs and sends back the tiles that will be processed
  • reduce: called to return the results of each tile; this should be used for incremental computations and fast accumulating values, rather than heavy computation (that should be done in the worker)
  • end: job is complete, so any results can be tied together and output if necessary
  • error: send back any errors so they can be handled or thrown

processors should be async

Processors may need to use async resources (eg: tile buffer crawling, c++ libs, etc.). For this to be possible, we need a standard node callback interface, instead of the current sync interface.

optionally hit worker even if some sources didn't hold data for the tile

Per this code:

for (var i = 0; i < results.length; i++) {
  data[sources[i].name] = results[i];
  if (!results[i]) return process.send({reduce: true});

the worker bails out and returns a reduce event if any source doesn't have data for the requested tile. This is usually great, but in some cases where you want to compare disparate data sources and are relying on reduce events to send back information about how much data each source does or doesn't exist in a tile, you end up losing information.

For example, if I want to find the length of roads in San Francisco that are matched by GPS datapoints. I would like to keep a tally of the total length of road in the bbox, as well as how much is matchable by GPS points. Right now, if there is no GPS data in the tile, we bail out, so I'm missing some of the total length information.

To maintain compatibility and provide optimization for the usual cases where you want this bail-out behavior, I'm proposing we add a tile-reduce option for this, maybe requireAllSources: false (defaulted true).

cc @morganherlocker @aaronlidman @mourner

example code doesnt match format of latest-planet.mbtiles

I downloaded the lastest planet mbtiles from

The count example works with the included data set, but not with the 22gb planet mbtiles.

The example code uses key value to count i.e. count buldings.

  "vector_layers": [
      "id": "buildings",
      "description": "",
      "minzoom": 15,
      "maxzoom": 15,
      "fields": {
        "id": "Number",
        "osm_id": "Number",
        "type": "String",
        "name": "String"
      "id": "roads",
      "description": "",
      "minzoom": 15,
      "maxzoom": 15,
      "fields": {
        "id": "Number",
        "osm_id": "Number",
        "type": "String",
        "name": "String",
        "tunnel": "Number",
        "bridge": "Number",
        "oneway": "Number",
        "z_order": "Number",
        "class": "String",
        "access": "String",
        "service": "String",
        "ref": "String"

however the planet mbtiles the key seems to always be osm, but the fields contain the tags, im just not able to figure out how to convert the examples to work with the full dataset due to the data structure being different, i.e. building is a field not a key.

  "vector_layers": [
      "id": "osm",
      "description": "",
      "minzoom": 12,
      "maxzoom": 12,
      "fields": {
        "_osm_way_id": "Number",
        "_version": "Number",
        "_changeset": "Number",
        "_uid": "Number",
        "_user": "String",
        "_timestamp": "Number",
        "hires": "String",
        "hires:checkdate": "String",
        "hires:imagery": "String",
        "source": "String",
        "boat": "String",
        "highway": "String",
        "name": "String",
        "note": "String",
        "name:en": "String",
        "waterway": "String",
        "natural": "String",
        "width": "String",
        "boundary": "String",
        "maritime": "String",
        "admin_level": "String",
        "border_type": "String",
        "water": "String",
        "source:name": "String",
        "power": "String",
        "building": "String",

Appreciate any guidance i'm by no means a developer but really interested in using this code for some data analysis.


let's built a simple script that benches count & road diff on a small/moderate area. then we can keep an eye on general perf, and feel out whether newer node versions will bump our perf

request throttling

We should have an option for max worker tile requests per second, along with a conservative default. If we set this to 50/sec, we could safely say that the max with compositing + 4 cores would be ~1k total per second.

Nextgen: dealing with interleaved stdout output

I think I have a good plan for the interleaved output bug. It's clear that we need to pipe processes to the main thread so that the output is done by a single process. My luck with diff on Node 0.12 was probably due to its new feature of stream corking/uncorking (buffering writes) by default, which possibly made the actual writes to stdout happen less often.

Even when piping, many worker streams are still piped to stdout at the same time and each worker pipes buffer chunks instead of logical pieces of output, so interleaved output still happens. To fix it, we need to make sure that we pipe to stdout in logical bits so that output from one tile is never split into several chunks.

We can do that by splitting each stream on tile-by-tile basis before piping to main stdout. Splitting by linebreaks is not ideal since you may not have linebreaks at all (e.g. if you use process.stdout.write in each tile), and you may have many linebreaks in each tile output which we don't want to split by (it can get interleaved). Additionally, after you split, you have to readd a linebreak to each chunk which is an additional performance overhead.

Instead, we could manually write an RS ASCII character (0x1e, borrowed from JSON text sequences spec) after each map fn run in worker.js, and then split by the character. This way we split only per tile, and do not have to append anything to each chunk. Additionally, we can minimize the performance overhead of splitting by using binary-split instead of split, since we don't need string conversion to control the output.

The only limitation that we'd have to impose with this approach is stating in the docs that you MUST output anything just before calling the done callback (and not in a different process tick if the map function is async).

Alternatively, we could introduce a special API, e.g. another argument to done like this:

module.exports = function(data, tile, done) {
  done(null, data.osm.osm.length, "My output");

Another future problem that may arise is when you want to stream binary output (which may contain 0x1e byte), e.g. streaming PNG raster files. But you could probably deal with this in an alternative way, e.g. providing an option to split by a different sequence of characters (each PNG starts with a unique set of bytes).

This is a tricky problem to tackle, but this seems like an acceptable solution.

cc @tcql @morganherlocker

TileReduce or Tile Reduce

Simple question, but we should nail it down: Should we CamelCase or just do two words? I've been using Tile Reduce in blog posts but it would make sense to follow the lead of MapReduce - as much as I'm not sure why the world ever started to CamelCase outside of programming languages ;-)

@morganherlocker @tcql @mourner ?

Invalid GeoJSON Polygons passed to map step

In some cases I'm seeing invalid GeoJSON Polygons passed to the map step. It looks like features that consist of multiple exterior polygons are being converted from vector tiles to a GeoJSON Polygon instead of a GeoJSON MultiPolygon.

Then, when these Polygons are used with turf.intersect(), it throws an "TopologyError: side location conflict" exception.

Here's a test case that shows the problem.

Input file: dc.json

Convert to MBTiles using Tippecanoe:

$ tippecanoe -f -o dc.mbtiles -Z 15 -z 15 -b 0 -ps dc.json

Decode one tile with tippecanoe-decode:

$ tippecanoe-decode dc.mbtiles 15 9378 12535
{ "type": "FeatureCollection", "features": [
{ "type": "Feature", "properties": { "STFIPS": "11", "CTFIPS": "11001", "STATE": "District of Columbia", "COUNTY": "District of Columbia" }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ -76.965699, 38.897320 ], [ -76.968470, 38.893357 ], [ -76.968760, 38.892036 ], [ -76.968237, 38.891032 ], [ -76.970214, 38.891032 ], [ -76.970214, 38.899582 ], [ -76.965852, 38.899582 ], [ -76.965699, 38.897320 ] ] ], [ [ [ -76.965710, 38.891032 ], [ -76.966973, 38.891032 ], [ -76.966501, 38.892122 ], [ -76.966000, 38.894021 ], [ -76.965710, 38.891032 ] ] ], [ [ [ -76.962199, 38.897320 ], [ -76.962100, 38.896821 ], [ -76.963985, 38.891032 ], [ -76.965222, 38.891032 ], [ -76.965699, 38.897320 ], [ -76.964200, 38.899582 ], [ -76.962481, 38.899582 ], [ -76.962199, 38.897320 ] ] ], [ [ [ -76.959227, 38.891032 ], [ -76.962204, 38.891032 ], [ -76.961099, 38.896618 ], [ -76.961132, 38.896812 ], [ -76.961199, 38.897217 ], [ -76.961703, 38.898319 ], [ -76.961169, 38.899509 ], [ -76.961137, 38.899582 ], [ -76.959227, 38.899582 ], [ -76.959227, 38.891032 ] ] ] ] } }
] }

Note that the output is correctly a MultiPolygon (containing 4 Polygons). See

Run through a test TileReduce:

This simply processes the one tile of interest (15 9378 12535), outputs the feature in the tile, and attempts a turf.intersect() which throws an exception. Note that the feature as passed to the map function is a Polygon, not a MultiPolygon as tippecanoe-decode produces for the same tile.

Converting the Polygon to a MultiPolygon manually allows the turf.intersect() to work.

$ ./tilereduce_test.js 
Starting up 8 workers... Job started.
Processing 1 tiles.
1 tiles processed in 0s.
map tile [9378,12535,15]
feature = { type: 'Feature',
   { type: 'Polygon',
      [ [ [ -76.96570068597794, 38.89732062336043 ],
          [ -76.96847140789032, 38.89335845766496 ],
          [ -76.96876108646393, 38.89203699076319 ],
          [ -76.96823805570602, 38.89103282648847 ],
          [ -76.97021484375, 38.89103282648847 ],
          [ -76.97021484375, 38.89958342598271 ],
          [ -76.96585357189178, 38.89958342598271 ],
          [ -76.96570068597794, 38.89732062336043 ] ],
        [ [ -76.965711414814, 38.89103282648847 ],
          [ -76.96697473526001, 38.89103282648847 ],
          [ -76.96650266647339, 38.8921225841508 ],
          [ -76.9660010933876, 38.89402231327574 ],
          [ -76.965711414814, 38.89103282648847 ] ],
        [ [ -76.9622004032135, 38.89732062336043 ],
          [ -76.96210116147995, 38.89682171160487 ],
          [ -76.96398675441742, 38.89103282648847 ],
          [ -76.96522325277328, 38.89103282648847 ],
          [ -76.96570068597794, 38.89732062336043 ],
          [ -76.96420133113861, 38.89958342598271 ],
          [ -76.96248203516006, 38.89958342598271 ],
          [ -76.9622004032135, 38.89732062336043 ] ],
        [ [ -76.959228515625, 38.89103282648847 ],
          [ -76.96220576763153, 38.89103282648847 ],
          [ -76.9611006975174, 38.89661922340707 ],
          [ -76.96113288402557, 38.89681336158753 ],
          [ -76.96119993925095, 38.89721833629872 ],
          [ -76.96170419454575, 38.89832052381641 ],
          [ -76.96117043495178, 38.89951036613891 ],
          [ -76.9611382484436, 38.89958342598271 ],
          [ -76.959228515625, 38.89958342598271 ],
          [ -76.959228515625, 38.89103282648847 ] ] ] },
   { STFIPS: '11',
     CTFIPS: '11001',
     STATE: 'District of Columbia',
     COUNTY: 'District of Columbia' } };
square = { type: 'Feature',
   { type: 'Polygon',
      [ [ [ -76.965, 38 ],
          [ -76, 38 ],
          [ -76, 38.895 ],
          [ -76.965, 38.895 ],
          [ -76.965, 38 ] ] ] },
  properties: {} };
*** turf.intersect exception: TopologyError: side location conflict [ (-76.96570068597794, 38.89732062336043) ]
converting to MultiPolygon
feature = { type: 'Feature',
   { type: 'MultiPolygon',
      [ [ [ [ -76.96570068597794, 38.89732062336043 ],
            [ -76.96847140789032, 38.89335845766496 ],
            [ -76.96876108646393, 38.89203699076319 ],
            [ -76.96823805570602, 38.89103282648847 ],
            [ -76.97021484375, 38.89103282648847 ],
            [ -76.97021484375, 38.89958342598271 ],
            [ -76.96585357189178, 38.89958342598271 ],
            [ -76.96570068597794, 38.89732062336043 ] ] ],
        [ [ [ -76.965711414814, 38.89103282648847 ],
            [ -76.96697473526001, 38.89103282648847 ],
            [ -76.96650266647339, 38.8921225841508 ],
            [ -76.9660010933876, 38.89402231327574 ],
            [ -76.965711414814, 38.89103282648847 ] ] ],
        [ [ [ -76.9622004032135, 38.89732062336043 ],
            [ -76.96210116147995, 38.89682171160487 ],
            [ -76.96398675441742, 38.89103282648847 ],
            [ -76.96522325277328, 38.89103282648847 ],
            [ -76.96570068597794, 38.89732062336043 ],
            [ -76.96420133113861, 38.89958342598271 ],
            [ -76.96248203516006, 38.89958342598271 ],
            [ -76.9622004032135, 38.89732062336043 ] ] ],
        [ [ [ -76.959228515625, 38.89103282648847 ],
            [ -76.96220576763153, 38.89103282648847 ],
            [ -76.9611006975174, 38.89661922340707 ],
            [ -76.96113288402557, 38.89681336158753 ],
            [ -76.96119993925095, 38.89721833629872 ],
            [ -76.96170419454575, 38.89832052381641 ],
            [ -76.96117043495178, 38.89951036613891 ],
            [ -76.9611382484436, 38.89958342598271 ],
            [ -76.959228515625, 38.89958342598271 ],
            [ -76.959228515625, 38.89103282648847 ] ] ] ] },
   { STFIPS: '11',
     CTFIPS: '11001',
     STATE: 'District of Columbia',
     COUNTY: 'District of Columbia' } };
intersect = { type: 'Feature',
  properties: {},
   { type: 'MultiPolygon',
      [ [ [ [ -76.96142100332834, 38.895 ],
            [ -76.959228515625, 38.895 ],
            [ -76.959228515625, 38.89103282648847 ],
            [ -76.96220576763153, 38.89103282648847 ],
            [ -76.96142100332834, 38.895 ] ] ],
        [ [ [ -76.965, 38.89103282648847 ],
            [ -76.965, 38.895 ],
            [ -76.96269454111474, 38.895 ],
            [ -76.96398675441742, 38.89103282648847 ],
            [ -76.965, 38.89103282648847 ] ] ] ] } };

Tile encoding troubles for disconnected road detection

What I thought before was spherical geometry precision problems I now think is really tile-encoding problems.

Example: The real location is lat="38.9347951" lon="-77.0533697"

In the through way it is encoded at z12 as [-77.05332040786743,38.93477700153804], [1247,1476]

In the way that ends there it is encoded at z12 as [-77.05336332321167,38.93479369264057], [1245,1475]

Or at least that's what it looks like. I would have expected tile encoding to drop nodes but not to relocate them.

Change mapOptions from global to a worker parameter

This would be a breaking change. At the moment mapOptions are sent to workers as global objects. I am pretty sure this is safe, since workers should not share globals across processes, however, I think it is better to be explicit. I propose we remove the global assignment and add another parameter to our worker functions. The new interface would look like this:

function(data, tile, write, opts, done) {}


cc @tcql @mourner @aaronlidman @MateoV

Access token?

Looks like the example code is missing an access token in the vtile url that prevents it from being usable.

npm test fails under Windows

I run npm test on tile-reduce 3.0. It fails on Windows 10 with node v5.1.0, while success on Ubuntu. It is strange. Here is the error message:

  1) test/test.count.js count implementation, mbtiles cover found all features in overlapping mbtiles:

      Error: found all features in overlapping mbtiles
      + expected - actual


      at EventEmitter.<anonymous> (test\test.count.js:53:7)
      at shutdown (src\index.js:136:8)
      at reduce (src\index.js:126:36)
      at ChildProcess.handleMessage (src\index.js:47:25)
      at handleMessage (internal/child_process.js:686:10)
      at (internal/child_process.js:440:11)

  2) test/test.count.js count implementation, explicit mbtiles cover found all features in overlapping mbtiles:

      Error: found all features in overlapping mbtiles
      + expected - actual


      at EventEmitter.<anonymous> (test\test.count.js:72:7)
      at shutdown (src\index.js:136:8)
      at reduce (src\index.js:126:36)
      at ChildProcess.handleMessage (src\index.js:47:25)
      at handleMessage (internal/child_process.js:686:10)
      at (internal/child_process.js:440:11)

  3) test/test.count.js count implementation, tileStream cover found all features in listed tiles:

      Error: found all features in listed tiles
      + expected - actual


      at EventEmitter.<anonymous> (test\test.count.js:91:7)
      at shutdown (src\index.js:136:8)
      at reduce (src\index.js:126:36)
      at ChildProcess.handleMessage (src\index.js:47:25)
      at handleMessage (internal/child_process.js:686:10)
      at (internal/child_process.js:440:11)

Output results to mapbox studio

Hi, I am looking for suggestions. Recently, I am using tile-reduce to do some statistics on osm-qa-tiles. I want to calculate a metric (e.g. road density) on each tile of zoom 12, then output to mapbox studio for visualization. The problem is how to store the result. As zoom 12 has 16 million tiles, using GeoJson, MBtiles, or UTF8Grids maybe too large for storage. So I hope to get some advice from you as you are experts in this field.:pray:

is tilereduce a class or a function?

  • classes should be uppercased and initialized
  • functions should be lowercased

The example has a confusing invocation of tilereduce:

var TileReduce = new require('tile-reduce');
var tilereduce = TileReduce(bbox, opts);

A more common expectation would be

var TileReduce = require('tile-reduce');
var tilereduce = new TileReduce(bbox, opts);

document api

There is an example now, but there should also be explicit docs.

  • What is a valid coverArea?
  • What goes into the options object?

optional extra throttle

Algorithms that involve crawling tiles may make extra requests. This option would allow for throttling beyond the default 200 per/sec limit to account for this.

to-fix example

@aaronlidman can you sketch out what to-fix input would like? My understanding is that if we output a csv where each row holds geometry (WKT?), to-fix should be able to handle this. Do we need to make a custom plugin for each type of task with its own UI?

For the purpose of this example, let's say we had a tile-reduce job that output a collection of geojson points where there were disconnected major roads identified. What would be the best way to get this data into to-fix?

cc @lxbarth @ericfischer

