Giter VIP home page Giter VIP logo

tile-reduce's Introduction

TileReduce

Build Status

TileReduce is a geoprocessing library that implements MapReduce to let you run scalable distributed spatial analysis using JavaScript and Mapbox Vector Tiles. TileReduce coordinates tasks across all available processors on a machine, so your analysis runs lightning fast.

Install

npm install @mapbox/tile-reduce

Usage

A TileReduce processor is composed of two parts; the "map" script and the "reduce" script. The "map" portion comprises the expensive processing you want to distribute, while the "reduce" script comprises the quick aggregation step.

'map' script

The map script operates on each individual tile. It's purpose is to receive one tile at a time, do analysis or processing on the tile, and write data and send results to the reduce script.

See the count example processor's map script

'reduce' script

The reduce script serves both to initialize TileReduce with job options, and to handle reducing results returned by the map script for each tile.

See the count example processor's reduce script

Options

Basic Options

zoom (required)

zoom specifies the zoom level of tiles to retrieve from each source.

tilereduce({
	zoom: 15,
	// ...
})

map (required)

Path to the map script, which will be executed against each tile

tilereduce({
	map: path.join(__dirname, 'map.js')
	// ...
})

maxWorkers

By default, TileReduce creates one worker process per CPU. maxWorkers may be used to limit the number of workers created

tilereduce({
  maxWorkers: 3,
  // ...
})

output

By default, any data written from workers is piped to process.stdout on the main process. You can pipe to an alternative writable stream using the output option.

tilereduce({
	output: fs.createWriteStream('output-file'),
	// ...
})

log

Disables logging and progress output

tilereduce({
	log: false,
	// ...
})

mapOptions

Passes through arbitrary options to workers. Options are made available to map scripts as global.mapOptions

tilereduce({
	mapOptions: {
		bufferSize: 4
	}
	// ...
})
// map.js
module.exports = function (sources, tile, write, done) {
  global.mapOptions.bufferSize; // = 4
};

Specifying Sources (required)

Sources are specified as an array in the sources option:

tilereduce({
	sources: [
		/* source objects */
	],
	// ...
})

MBTiles sources:

tilereduce({
    sources: [
      {
        name: 'osmdata',
        mbtiles: __dirname+'/latest.planet.mbtiles',
        layers: ['osm']
      }
    ]
})

MBTiles work well for optimizing tasks that request many tiles, since the data is stored on disk. Create your own MBTiles from vector data using tippecanoe, or use OSM QA Tiles, a continuously updated MBTiles representation of OpenStreetMap.

URL

Remote Vector Tile sources accessible over HTTP work well for mashups of datasets and datasets that would not be practical to fit on a single machine. Be aware that HTTP requests are slower than mbtiles, and throttling is typically required to avoid disrupting servers at high tile volumes. maxrate dictates how many requests per second will be made to each remote source.

sources: [
  {
    name: 'streets',
    url: 'https://b.tiles.mapbox.com/v4/mapbox.mapbox-streets-v6/{z}/{x}/{y}.vector.pbf',
    layers: ['roads'],
    maxrate: 10
  }
]

raw

By default, sources will be automatically converted from their raw Vector Tile representation to GeoJSON. If you set raw: true in an MBTiles or URL source, the raw Vector Tile data will be provided, allowing you to lazily parse features as needed. This is useful in some situations for maximizing performance.

sources: [
  {
    name: 'streets',
    url: 'https://b.tiles.mapbox.com/v4/mapbox.mapbox-streets-v6/{z}/{x}/{y}.vector.pbf',
    raw: true
  }
]

Specifying Job Area

Jobs run over a geographic region represented by a set of tiles. TileReduce also accepts several area definitions that will be automatically converted into tiles.

BBOX

A valid bounding box array.

tilereduce({
	bbox: [w, s, e, n],
	// ...
})

GeoJSON

A valid GeoJSON geometry of any type.

tilereduce({
	geojson: {"type": "Polygon", "coordinates": [/* coordinates */]},
	// ...
})

Tile Array

An array of quadtiles represented as xyz arrays.

tilereduce({
	tiles: [
		[x, y, z]
	],
	// ...
})

Tile Stream

Tiles can be read from an object mode node stream. Each object in the stream should be either a string in the format x y z or an array in the format [x, y, z].

tilereduce({
	tileStream: /* an object mode node stream */,
	// ...
})

Line separated tile list files can easily be converted into the appropriate object mode streams using binary-split:

var split = require('binary-split'),
	fs = require('fs');

tilereduce({
	tileStream: fs.createReadStream('/path/to/tile-file').pipe(split()),
	// ...
})

Source Cover

When using MBTiles sources, a list of tiles to process can be automatically retrieved from the source metadata

tilereduce({
	sourceCover: 'osmdata',
	sources: [
		{
			name: 'osmdata',
			mbtiles: __dirname+'/latest.planet.mbtiles'
		}
	]
	// ...
})

Events

TileReduce returns an EventEmitter.

start

Fired once all workers are initialized and before the first tiles are sent for processing

tilereduce({/* ... */})
.on('start', function () {
	console.log('starting');
});

map

Fired just before a tile is sent to a worker. Receives the tile and worker number assigned to process the tile.

tilereduce({/* ... */})
.on('map', function (tile, workerId) {
	console.log('about to process ' + JSON.stringify(tile) +' on worker '+workerId);
});

reduce

Fired when a tile has finished processing. Receives data returned in the map function's done callback (if any), and the tile.

var count = 0;
tilereduce({/* ... */})
.on('reduce', function (result, tile) { 
	console.log('got a count of ' + result + ' from ' + JSON.stringify(tile));
	count++;
});

end

Fired when all queued tiles have been processed. Use this event to output final reduce results.

var count = 0;
tilereduce({/* ... */})
.on('end', function () {
	console.log('Total count was: ' + count);
});

Processor Examples

Development

Testing

npm test

Linting

npm run lint

Test Coverage

npm run cover

tile-reduce's People

Contributors

aaronlidman avatar bsrinivasa avatar defvol avatar deniscarriere avatar e-n-f avatar emgrasmeder avatar ingalls avatar jingsam avatar mateov avatar morganherlocker avatar mourner avatar tcql avatar tmcw avatar tyrasd avatar waldyrious avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tile-reduce's Issues

persistent reduce queue

I think the reducer should be fired when all of the map operations are complete. This will slow things down a tiny amount, but not significantly in most cases. It will also eliminate race conditions, and will allow for much better reliability (internet goes down during a job, event gets "lost" for whatever reason, etc.). This will also allow for anonymous reducers off the client's machine that can be run whenever necessary, or even incrementally updated.

There are a few possibilities for how this should be stored, but I am leaning towards dynamo (or dynalite for local jobs, if it is robust enough. If it's not, then we can use leveldb).

I am still thinking through whether or not we should still have the reduce event at all. It could be useful in some form for progress updates, but if thats all we use it for, it could simply send back the percent complete, and the tile processed.

cc @rclark

Passing options to workers

I'm thinking about using tile-reduce to power a little utility module, where I'd need to pass options along from the main process to the workers. Could the map module get passed the serialized tile reduce options as one of its initial args?

Invalid GeoJSON Polygons passed to map step

In some cases I'm seeing invalid GeoJSON Polygons passed to the map step. It looks like features that consist of multiple exterior polygons are being converted from vector tiles to a GeoJSON Polygon instead of a GeoJSON MultiPolygon.

Then, when these Polygons are used with turf.intersect(), it throws an "TopologyError: side location conflict" exception.

Here's a test case that shows the problem.

Input file: dc.json https://gist.github.com/jamesbursa/2026d7338b7a3d227732#file-dc-json

Convert to MBTiles using Tippecanoe:

$ tippecanoe -f -o dc.mbtiles -Z 15 -z 15 -b 0 -ps dc.json

Decode one tile with tippecanoe-decode:

$ tippecanoe-decode dc.mbtiles 15 9378 12535
{ "type": "FeatureCollection", "features": [
{ "type": "Feature", "properties": { "STFIPS": "11", "CTFIPS": "11001", "STATE": "District of Columbia", "COUNTY": "District of Columbia" }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ -76.965699, 38.897320 ], [ -76.968470, 38.893357 ], [ -76.968760, 38.892036 ], [ -76.968237, 38.891032 ], [ -76.970214, 38.891032 ], [ -76.970214, 38.899582 ], [ -76.965852, 38.899582 ], [ -76.965699, 38.897320 ] ] ], [ [ [ -76.965710, 38.891032 ], [ -76.966973, 38.891032 ], [ -76.966501, 38.892122 ], [ -76.966000, 38.894021 ], [ -76.965710, 38.891032 ] ] ], [ [ [ -76.962199, 38.897320 ], [ -76.962100, 38.896821 ], [ -76.963985, 38.891032 ], [ -76.965222, 38.891032 ], [ -76.965699, 38.897320 ], [ -76.964200, 38.899582 ], [ -76.962481, 38.899582 ], [ -76.962199, 38.897320 ] ] ], [ [ [ -76.959227, 38.891032 ], [ -76.962204, 38.891032 ], [ -76.961099, 38.896618 ], [ -76.961132, 38.896812 ], [ -76.961199, 38.897217 ], [ -76.961703, 38.898319 ], [ -76.961169, 38.899509 ], [ -76.961137, 38.899582 ], [ -76.959227, 38.899582 ], [ -76.959227, 38.891032 ] ] ] ] } }
] }

Note that the output is correctly a MultiPolygon (containing 4 Polygons). See https://gist.github.com/jamesbursa/2026d7338b7a3d227732#file-tile-json

Run through a test TileReduce:
https://gist.github.com/jamesbursa/2026d7338b7a3d227732#file-tilereduce_test-js
https://gist.github.com/jamesbursa/2026d7338b7a3d227732#file-tilereduce_test_map-js

This simply processes the one tile of interest (15 9378 12535), outputs the feature in the tile, and attempts a turf.intersect() which throws an exception. Note that the feature as passed to the map function is a Polygon, not a MultiPolygon as tippecanoe-decode produces for the same tile.

Converting the Polygon to a MultiPolygon manually allows the turf.intersect() to work.

$ ./tilereduce_test.js 
Starting up 8 workers... Job started.
Processing 1 tiles.
1 tiles processed in 0s.
map tile [9378,12535,15]
---------------------------------------------------------------
feature = { type: 'Feature',
  geometry: 
   { type: 'Polygon',
     coordinates: 
      [ [ [ -76.96570068597794, 38.89732062336043 ],
          [ -76.96847140789032, 38.89335845766496 ],
          [ -76.96876108646393, 38.89203699076319 ],
          [ -76.96823805570602, 38.89103282648847 ],
          [ -76.97021484375, 38.89103282648847 ],
          [ -76.97021484375, 38.89958342598271 ],
          [ -76.96585357189178, 38.89958342598271 ],
          [ -76.96570068597794, 38.89732062336043 ] ],
        [ [ -76.965711414814, 38.89103282648847 ],
          [ -76.96697473526001, 38.89103282648847 ],
          [ -76.96650266647339, 38.8921225841508 ],
          [ -76.9660010933876, 38.89402231327574 ],
          [ -76.965711414814, 38.89103282648847 ] ],
        [ [ -76.9622004032135, 38.89732062336043 ],
          [ -76.96210116147995, 38.89682171160487 ],
          [ -76.96398675441742, 38.89103282648847 ],
          [ -76.96522325277328, 38.89103282648847 ],
          [ -76.96570068597794, 38.89732062336043 ],
          [ -76.96420133113861, 38.89958342598271 ],
          [ -76.96248203516006, 38.89958342598271 ],
          [ -76.9622004032135, 38.89732062336043 ] ],
        [ [ -76.959228515625, 38.89103282648847 ],
          [ -76.96220576763153, 38.89103282648847 ],
          [ -76.9611006975174, 38.89661922340707 ],
          [ -76.96113288402557, 38.89681336158753 ],
          [ -76.96119993925095, 38.89721833629872 ],
          [ -76.96170419454575, 38.89832052381641 ],
          [ -76.96117043495178, 38.89951036613891 ],
          [ -76.9611382484436, 38.89958342598271 ],
          [ -76.959228515625, 38.89958342598271 ],
          [ -76.959228515625, 38.89103282648847 ] ] ] },
  properties: 
   { STFIPS: '11',
     CTFIPS: '11001',
     STATE: 'District of Columbia',
     COUNTY: 'District of Columbia' } };
square = { type: 'Feature',
  geometry: 
   { type: 'Polygon',
     coordinates: 
      [ [ [ -76.965, 38 ],
          [ -76, 38 ],
          [ -76, 38.895 ],
          [ -76.965, 38.895 ],
          [ -76.965, 38 ] ] ] },
  properties: {} };
*** turf.intersect exception: TopologyError: side location conflict [ (-76.96570068597794, 38.89732062336043) ]
---------------------------------------------------------------
converting to MultiPolygon
feature = { type: 'Feature',
  geometry: 
   { type: 'MultiPolygon',
     coordinates: 
      [ [ [ [ -76.96570068597794, 38.89732062336043 ],
            [ -76.96847140789032, 38.89335845766496 ],
            [ -76.96876108646393, 38.89203699076319 ],
            [ -76.96823805570602, 38.89103282648847 ],
            [ -76.97021484375, 38.89103282648847 ],
            [ -76.97021484375, 38.89958342598271 ],
            [ -76.96585357189178, 38.89958342598271 ],
            [ -76.96570068597794, 38.89732062336043 ] ] ],
        [ [ [ -76.965711414814, 38.89103282648847 ],
            [ -76.96697473526001, 38.89103282648847 ],
            [ -76.96650266647339, 38.8921225841508 ],
            [ -76.9660010933876, 38.89402231327574 ],
            [ -76.965711414814, 38.89103282648847 ] ] ],
        [ [ [ -76.9622004032135, 38.89732062336043 ],
            [ -76.96210116147995, 38.89682171160487 ],
            [ -76.96398675441742, 38.89103282648847 ],
            [ -76.96522325277328, 38.89103282648847 ],
            [ -76.96570068597794, 38.89732062336043 ],
            [ -76.96420133113861, 38.89958342598271 ],
            [ -76.96248203516006, 38.89958342598271 ],
            [ -76.9622004032135, 38.89732062336043 ] ] ],
        [ [ [ -76.959228515625, 38.89103282648847 ],
            [ -76.96220576763153, 38.89103282648847 ],
            [ -76.9611006975174, 38.89661922340707 ],
            [ -76.96113288402557, 38.89681336158753 ],
            [ -76.96119993925095, 38.89721833629872 ],
            [ -76.96170419454575, 38.89832052381641 ],
            [ -76.96117043495178, 38.89951036613891 ],
            [ -76.9611382484436, 38.89958342598271 ],
            [ -76.959228515625, 38.89958342598271 ],
            [ -76.959228515625, 38.89103282648847 ] ] ] ] },
  properties: 
   { STFIPS: '11',
     CTFIPS: '11001',
     STATE: 'District of Columbia',
     COUNTY: 'District of Columbia' } };
intersect = { type: 'Feature',
  properties: {},
  geometry: 
   { type: 'MultiPolygon',
     coordinates: 
      [ [ [ [ -76.96142100332834, 38.895 ],
            [ -76.959228515625, 38.895 ],
            [ -76.959228515625, 38.89103282648847 ],
            [ -76.96220576763153, 38.89103282648847 ],
            [ -76.96142100332834, 38.895 ] ] ],
        [ [ [ -76.965, 38.89103282648847 ],
            [ -76.965, 38.895 ],
            [ -76.96269454111474, 38.895 ],
            [ -76.96398675441742, 38.89103282648847 ],
            [ -76.965, 38.89103282648847 ] ] ] ] } };
---------------------------------------------------------------

to-fix example

@aaronlidman can you sketch out what to-fix input would like? My understanding is that if we output a csv where each row holds geometry (WKT?), to-fix should be able to handle this. Do we need to make a custom plugin for each type of task with its own UI?

For the purpose of this example, let's say we had a tile-reduce job that output a collection of geojson points where there were disconnected major roads identified. What would be the best way to get this data into to-fix?

cc @lxbarth @ericfischer

benchmark

let's built a simple script that benches count & road diff on a small/moderate area. then we can keep an eye on general perf, and feel out whether newer node versions will bump our perf

TileReduce or Tile Reduce

Simple question, but we should nail it down: Should we CamelCase or just do two words? I've been using Tile Reduce in blog posts but it would make sense to follow the lead of MapReduce - as much as I'm not sure why the world ever started to CamelCase outside of programming languages ;-)

@morganherlocker @tcql @mourner ?

read files from disk

This feature would allow for caching or pre-downloading a region, which would speed up jobs that use tons of HTTP requests. I'm thinking that a file path with the usual {x} {y} {z} would suffice.

Tile encoding troubles for disconnected road detection

What I thought before was spherical geometry precision problems I now think is really tile-encoding problems.

Example: http://www.openstreetmap.org/node/1004264211. The real location is lat="38.9347951" lon="-77.0533697"

In the through way http://www.openstreetmap.org/way/38132834 it is encoded at z12 as [-77.05332040786743,38.93477700153804], [1247,1476]

In the way that ends there http://www.openstreetmap.org/way/6054333 it is encoded at z12 as [-77.05336332321167,38.93479369264057], [1245,1475]

Or at least that's what it looks like. I would have expected tile encoding to drop nodes but not to relocate them.

is tilereduce a class or a function?

  • classes should be uppercased and initialized
  • functions should be lowercased

The example has a confusing invocation of tilereduce:

var TileReduce = new require('tile-reduce');
...
var tilereduce = TileReduce(bbox, opts);

A more common expectation would be

var TileReduce = require('tile-reduce');
...
var tilereduce = new TileReduce(bbox, opts);

request throttling

We should have an option for max worker tile requests per second, along with a conservative default. If we set this to 50/sec, we could safely say that the max with compositing + 4 cores would be ~1k total per second.

Change mapOptions from global to a worker parameter

This would be a breaking change. At the moment mapOptions are sent to workers as global objects. I am pretty sure this is safe, since workers should not share globals across processes, however, I think it is better to be explicit. I propose we remove the global assignment and add another parameter to our worker functions. The new interface would look like this:

function(data, tile, write, opts, done) {}

Thoughts?

cc @tcql @mourner @aaronlidman @MateoV

npm test fails under Windows

I run npm test on tile-reduce 3.0. It fails on Windows 10 with node v5.1.0, while success on Ubuntu. It is strange. Here is the error message:

  1) test/test.count.js count implementation, mbtiles cover found all features in overlapping mbtiles:

      Error: found all features in overlapping mbtiles
      + expected - actual

      -0
      +36597

      at EventEmitter.<anonymous> (test\test.count.js:53:7)
      at shutdown (src\index.js:136:8)
      at reduce (src\index.js:126:36)
      at ChildProcess.handleMessage (src\index.js:47:25)
      at handleMessage (internal/child_process.js:686:10)
      at Pipe.channel.onread (internal/child_process.js:440:11)

  2) test/test.count.js count implementation, explicit mbtiles cover found all features in overlapping mbtiles:

      Error: found all features in overlapping mbtiles
      + expected - actual

      -0
      +36597

      at EventEmitter.<anonymous> (test\test.count.js:72:7)
      at shutdown (src\index.js:136:8)
      at reduce (src\index.js:126:36)
      at ChildProcess.handleMessage (src\index.js:47:25)
      at handleMessage (internal/child_process.js:686:10)
      at Pipe.channel.onread (internal/child_process.js:440:11)

  3) test/test.count.js count implementation, tileStream cover found all features in listed tiles:

      Error: found all features in listed tiles
      + expected - actual

      -0
      +16182

      at EventEmitter.<anonymous> (test\test.count.js:91:7)
      at shutdown (src\index.js:136:8)
      at reduce (src\index.js:126:36)
      at ChildProcess.handleMessage (src\index.js:47:25)
      at handleMessage (internal/child_process.js:686:10)
      at Pipe.channel.onread (internal/child_process.js:440:11)

optionally hit worker even if some sources didn't hold data for the tile

Per this code:

for (var i = 0; i < results.length; i++) {
  data[sources[i].name] = results[i];
  if (!results[i]) return process.send({reduce: true});
}

the worker bails out and returns a reduce event if any source doesn't have data for the requested tile. This is usually great, but in some cases where you want to compare disparate data sources and are relying on reduce events to send back information about how much data each source does or doesn't exist in a tile, you end up losing information.

For example, if I want to find the length of roads in San Francisco that are matched by GPS datapoints. I would like to keep a tally of the total length of road in the bbox, as well as how much is matchable by GPS points. Right now, if there is no GPS data in the tile, we bail out, so I'm missing some of the total length information.

To maintain compatibility and provide optimization for the usual cases where you want this bail-out behavior, I'm proposing we add a tile-reduce option for this, maybe requireAllSources: false (defaulted true).

cc @morganherlocker @aaronlidman @mourner

document api

There is an example now, but there should also be explicit docs.

  • What is a valid coverArea?
  • What goes into the options object?

flexible cover scoping

I think we should make scoping jobs extremely flexible:

  • bbox
  • polygon
  • anything geojson - for example, feed in a feature collection of census blog group points, so only areas where people live are computed; feed in a collection of roads, so only tiles that contain road networks are processed
  • tiles - feed in a list of tiles so custom filters can be precomputed; the tiles can be down sampled or up sampled extremely quickly on the fly using recursive calls to tilebelt.getParent and tilebelt.getChildren

This will be pretty simple to support with tile-cover and tilebelt, and the type of cover can be implicitly classified automatically (given this list anyway).

It seems pretty obvious that bbox and polygon should be supported. Does it make sense to support arbitrary geojson objects (given that tile-cover can handle these already), and tiles which will provide granular control + index caching?

@aaronlidman @MateoV

example code doesnt match format of latest-planet.mbtiles

I downloaded the lastest planet mbtiles from https://s3.amazonaws.com/mapbox/osm-qa-tiles/latest.planet.mbtiles.gz

The count example works with the included data set, but not with the 22gb planet mbtiles.

The example code uses key value to count i.e. count buldings.

{
  "vector_layers": [
    {
      "id": "buildings",
      "description": "",
      "minzoom": 15,
      "maxzoom": 15,
      "fields": {
        "id": "Number",
        "osm_id": "Number",
        "type": "String",
        "name": "String"
      }
    },
    {
      "id": "roads",
      "description": "",
      "minzoom": 15,
      "maxzoom": 15,
      "fields": {
        "id": "Number",
        "osm_id": "Number",
        "type": "String",
        "name": "String",
        "tunnel": "Number",
        "bridge": "Number",
        "oneway": "Number",
        "z_order": "Number",
        "class": "String",
        "access": "String",
        "service": "String",
        "ref": "String"
      }
    }
  ]
}

however the planet mbtiles the key seems to always be osm, but the fields contain the tags, im just not able to figure out how to convert the examples to work with the full dataset due to the data structure being different, i.e. building is a field not a key.


{
  "vector_layers": [
    {
      "id": "osm",
      "description": "",
      "minzoom": 12,
      "maxzoom": 12,
      "fields": {
        "_osm_way_id": "Number",
        "_version": "Number",
        "_changeset": "Number",
        "_uid": "Number",
        "_user": "String",
        "_timestamp": "Number",
        "hires": "String",
        "hires:checkdate": "String",
        "hires:imagery": "String",
        "source": "String",
        "boat": "String",
        "highway": "String",
        "name": "String",
        "note": "String",
        "name:en": "String",
        "waterway": "String",
        "natural": "String",
        "width": "String",
        "boundary": "String",
        "maritime": "String",
        "admin_level": "String",
        "border_type": "String",
        "water": "String",
        "source:name": "String",
        "power": "String",
        "building": "String",

Appreciate any guidance i'm by no means a developer but really interested in using this code for some data analysis.

Nextgen: dealing with interleaved stdout output

I think I have a good plan for the interleaved output bug. It's clear that we need to pipe processes to the main thread so that the output is done by a single process. My luck with diff on Node 0.12 was probably due to its new feature of stream corking/uncorking (buffering writes) by default, which possibly made the actual writes to stdout happen less often.

Even when piping, many worker streams are still piped to stdout at the same time and each worker pipes buffer chunks instead of logical pieces of output, so interleaved output still happens. To fix it, we need to make sure that we pipe to stdout in logical bits so that output from one tile is never split into several chunks.

We can do that by splitting each stream on tile-by-tile basis before piping to main stdout. Splitting by linebreaks is not ideal since you may not have linebreaks at all (e.g. if you use process.stdout.write in each tile), and you may have many linebreaks in each tile output which we don't want to split by (it can get interleaved). Additionally, after you split, you have to readd a linebreak to each chunk which is an additional performance overhead.

Instead, we could manually write an RS ASCII character (0x1e, borrowed from JSON text sequences spec) after each map fn run in worker.js, and then split by the character. This way we split only per tile, and do not have to append anything to each chunk. Additionally, we can minimize the performance overhead of splitting by using binary-split instead of split, since we don't need string conversion to control the output.

The only limitation that we'd have to impose with this approach is stating in the docs that you MUST output anything just before calling the done callback (and not in a different process tick if the map function is async).

Alternatively, we could introduce a special API, e.g. another argument to done like this:

module.exports = function(data, tile, done) {
  done(null, data.osm.osm.length, "My output");
};

Another future problem that may arise is when you want to stream binary output (which may contain 0x1e byte), e.g. streaming PNG raster files. But you could probably deal with this in an alternative way, e.g. providing an option to split by a different sequence of characters (each PNG starts with a unique set of bytes).

This is a tricky problem to tackle, but this seems like an acceptable solution.

cc @tcql @morganherlocker

Memory leak

When you're not accumulating results on reduce events, the memory consumption still creeps up so you can easily go out of memory on a large number of tiles. This doesn't look right โ€” there's probably a big memory leak somewhere.

support other sources

This may end up being irrelevant depending on new architecture, but if we keep data source configuration centralized, we should think about supporting more sources. Maybe use tilelive, so supporting other sources is trivial / zero changes to tile reduce internals

Background: I'm trying to do OSM stats by country. This means that I either:

  • Need to split jobs up by country, OR
  • Need to know what country a tile is in during each worker's step of reduce.

Using Geojson-vt and Natural Earth Admin 0 boundaries, I can easily figure out which countries are present in a tile, but currently that requires me to do geojsonvt + fs.readFile inside every single worker

Can not select by layers from a remote sources

In README.md, there is a example on URL sources:

sources: [
  {
    name: 'streets',
    url: 'https://b.tiles.mapbox.com/v4/mapbox.mapbox-streets-v6/{z}/{x}/{y}.vector.pbf',
    layers: ['roads'],
    maxrate: 10
  }
]

I expect only transfer roads layer. However, I still get all layers in streets. The layers seems have no effects.

Output results to mapbox studio

Hi, I am looking for suggestions. Recently, I am using tile-reduce to do some statistics on osm-qa-tiles. I want to calculate a metric (e.g. road density) on each tile of zoom 12, then output to mapbox studio for visualization. The problem is how to store the result. As zoom 12 has 16 million tiles, using GeoJson, MBtiles, or UTF8Grids maybe too large for storage. So I hope to get some advice from you as you are experts in this field.:pray:

Error in example - wrong road layer

Howdy Folks - I just noticed a small error in the example code where the roads layer from mapbox-streets is wrong:

layers: ['roads'] should in fact be layers: ['road']

Similarly in buffer.js it should be:

module.exports = function (tileLayers, opts, done){
  var road = tileLayers.streets.road;
  var bufferedRoad = turf.buffer(road, 20, 'meters');
  done(null, bufferedRoad);
}

optional extra throttle

Algorithms that involve crawling tiles may make extra requests. This option would allow for throttling beyond the default 200 per/sec limit to account for this.

documentation holes

We definitely need to document raw - we don't even mention it right now.

Optionally, we may want to add a section about how to optimize (use raw, use rbush for lots of intersections, etc) and talk about the effect of tiles with buffers and how to generate custom mbtiles without buffers

Access token?

Looks like the example code is missing an access token in the vtile url that prevents it from being usable.

events

  • start: reports that the cover has been computed for mapping jobs and sends back the tiles that will be processed
  • reduce: called to return the results of each tile; this should be used for incremental computations and fast accumulating values, rather than heavy computation (that should be done in the worker)
  • end: job is complete, so any results can be tied together and output if necessary
  • error: send back any errors so they can be handled or thrown

browser support

It should be possible to make this work in modern browsers via web workers. To do this:

  • factor out getVectorTile, and provide a browser version that uses xhr instead of request. (Note: I think the browser will handle the gzip transparently.)
  • use webworkify as a substitute for child_process.fork
  • tests (how??)

processors should be async

Processors may need to use async resources (eg: tile buffer crawling, c++ libs, etc.). For this to be possible, we need a standard node callback interface, instead of the current sync interface.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.