mapbox / node-mapnik-bench Goto Github PK

Framework for quickly exploring tile rendering performance across mapnik versions

JavaScript 74.02% Shell 6.80% HTML 19.18%

node-mapnik-bench's Introduction

Node Mapnik Bench

node-mapnik-bench is a set of scripts used for testing performance between versions of Node Mapnik.

Setup

This will install some basic node.js deps into ./node_modules and it will install node-mapnik binaries + related tilelive modules into the ./mapnik-versions directory.

Clone repository

git clone [email protected]:mapbox/node-mapnik-bench.git

Install dependencies

npm install

Install Node Mapnik versions you'd like to benchmark

cd mapnik-versions/<version>
npm install

To install files needed for the benchmark (this takes a little while):

node scripts/download.js

Usage

There are three major ways to use Node Mapnik Bench.

bin: `bench`

usage:
  bench <file> <list of mapnik versions>

example:
  # will test us-counties-polygon against latest and v3.5.0 versions of Node Mapnik
  bench ./test/fixtures/us-counties-polygons.geojson latest v3.5.0

output:
  # { source: '/Users/mapsam/mapbox/node-mapnik-bench/test/fixtures/us-counties-polygons.geojson',
  #   version: 'v3.5.0',
  #   options: { threadpool: 6 },
  #   time: 
  #     { start: 1460401496573,
  #       xml: 1460401496796,
  #       bridge: 1460401496835,
  #       info: 1460401496835,
  #       load: 1460401496835,
  #       copy: 1460401497500 },
  #   sink: 'noop://',
  #   memory: 
  #   { max_rss: '51.02MB',
  #     max_heap: '12.07MB',
  #     max_heap_total: '29.71MB' },
  #   tile_count: 533 }

bin: `benchall`

Test a group of files

usage:
  benchall <fixture_index> <type> <list of mapnik versions>

example:
  # will test all geojsons in /testcases against latest and v3.5.0 versions of Node Mapnik
  benchall ./testcases/index.js geojson latest v3.5.0

output:
  # saves a timestamped JSON file to /visual
  visual/1454461994.json

View the output by opening the JSON with visual/index.html file.

cd visual
python -m SimpleHTTPSever
localhost:8000/visual/index.html?1454461994 # in your browser

`bench(file, version, options, callback)`

var bench = require('./lib/index.js');

bench('./test/fixtures/us-counties-polygons.geojson', 'latest', options, function(err, stats) {
  if (err) throw err;
  console.log(stats); // same as JSON from ./bin/bench above
});

Testcases

In order to test an entire suite of files against multiple versions of Node Mapnik, they must be structured in a particlar manner. Check out the testcases directory to get started.

Add a new testcase

TODO

Mapnik versions

All of the mapnik versions we test are in the mapnik-versions directory. To add a new version you can create a new directory named after the tag, branch, or gitsha you would like to use. The name of the directory is the name you'll use in the benchmark commands.

Add a package.json with the proper pointers - here is what latest looks like:

{
  "name": "gdal-tiling-bench-version",
  "version": "1.0.0",
  "main": "package.json",
  "dependencies": {
    "mapnik": "https://github.com/mapnik/node-mapnik/tarball/master",
    "tilelive-bridge":"https://github.com/mapbox/tilelive-bridge/tarball/master"
  }
}

Once added, you can npm install in that new directory.

Test

npm test

node-mapnik-bench's People

Contributors

Stargazers

Watchers

Forkers

testbigorg rubythonode isabella232

node-mapnik-bench's Issues

feature suggestions

@mapsam (/cc @springmeyer ) you might want to look at (and maybe carry over) same additions I did in https://github.com/BergWerkGIS/investigate-gpx

`---concurrency`

control how many maps are created in the mapnik pool.
I've carried over Math.ceil(require('os').cpus().length * 16) from some tilelive module, but I think somewhere else it gets limited to 16 anyway (don't remember where I saw that).

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L23

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L150

`--show-progress`

more verbose output than --verbose.
e.g. shows tiles/sec not only at the end, but also in between

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L24

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L35

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L130-L137

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L153-L161

workarounds to generate `webp` tiles

If your are working with vector sources the trick is to include a raster into the mapnik.xml:
https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/style.xml.template#L28-L39

Caution(!): This adds a little raster at null-island so be sure to always include --bounds when using that otherwise the map extent will be pretty big and slow down things considerably.

The mapnik.xml has to have style in its name to set the file extension of the resulting tiles correctly:

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L88

This is just for getting the correct file extension.
No matter what extension you pass here

sink = 'file://' + path.join(__dirname, argv.output+'?filetype=' + filetype);

the output format of the tiles is always determined by tilelive-brigde automatically depending what types of layers are found in the map.
Hence (workaround #1).

slippy map for viewing generated tiles

run the bench, then node show-tiles-server.js, then http://localhost:666

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/show-tiles-server.js

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/show-tiles.html

https://github.com/BergWerkGIS/investigate-gpx/tree/9e86ced0559e88195857417a66609552837d1125/leaflet

It's working for webp tiles.
Not so much for pbf files. I somehow have troubles extracting them.
Maybe that's an exercise left to the reader 😏
https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/show-tiles-server.js#L59-L95

Include vector data

Right now gdal-tiling-bench is only testing agains TIF files but with the advancement of vector data inputs in mapnik, it seems important to benchmark on file types such as GeoJSON and Shapefiles.

Additionally running benchmarks against different instances of these vector data will be helpful. This includes points, lines, polygons, multipolygons, etc. This will allow us to begin bechmarking against specific operations running in GDAL/mapnik to get a more defined view into where bottlenecks.

I like the idea of keeping these large files up on s3, so will continue working that way unless we are using tiny files.

Any input from @flippmoke @jakepruitt on what will make this benchmarking more useful for the v2 push is appreciated!

Todo

GeoJSON test cases including different data types and sizes
Shapefile additions with different data types and sizes (@springmeyer mentioned working with indexed tiles here?)

Fix Appveyor failures

Currently all deploys to AppVeyor are failing, which is annoying!

cc/ @springmeyer

Should we run this on an ec2 instance?

From @jakepruitt

When running these benchmarks to compare the v2 branch, should we do this on a morec2 to homogenize the data and make it reproducible?

@springmeyer agrees it would be great to see these results as well, and this makes it more apparent to include operating system information on the benchmarks as we run them so we aren't comparing 🍎 to 📙 (no orange fruit emoji 😢).

Seems worthwhile to keep local testing, though. Let's look into running on AWS as an option.

Running benchall in the same process results in mapnik dupes

@springmeyer uncovered a sizeable issue with the current refactor of Node Mapnik Bench. Since we are now running everything within the same node process, we are introducing multiple versions of mapnik. I thought I had checked for this, but unfortunately it was missed. So, it's unclear how much of an effect this has had on our numbers, but it's safe to say that we need to fix it asap in order to make our benchmark numbers more reliable.

After chatting with @springmeyer we're going to stick with running bench, but refactor the benchall.js script so it executes each bench command in its own process using child_process.exec - like we do for generating XML on the fly - and capturing the output from stdout and writing to a file.

I'll take a crack at this so we can get @flippmoke some numbers fast!

cc @mapbox/mapnik

Ignore downloaded files

something like:

testcases/**/*.shp
testcases/**/*.geojson
testcases/**/*.tif

update `usage` file

currently the usage file does not reflect how to use this bencher, needs some updates!

cc @BergWerkGIS

Add instructions for non root install

This would help make setup easier. I'll make some changes in the coming days.

Issues with Threading GDAL

We are going to have big problems with any performant multithreading of TIFFs in GDAL. My last attempted led me down the path of this and this. After lots of changes this only really allowed multithreaded read safe processing on memory datasets. I was planning on expanding the scope of the RFC to make multithreading safe processing across all of GDAL and its drivers.

The discussions for this can be found:

So the core issues in GDAL associated with this are:

Thread safety of global objects
Thread safety of the block cache (which isn't currently block safe)
Thread safety of the individual dataset object (many operations around it are not thread safe)
Thread safety of the drivers (TIFF isn't thread safe)
Thread safety of individual file handlers (inside each driver)

I have ideas on how to solve all of these problems, but this is a HUGE effort, and will require a lot of thinking and help/support from the GDAL community.

/cc @springmeyer

Rename this repo to mapnik-bench

We are starting to move strictly into the mapnik version world here, and the gdal-tiling-bench name lives on. What's the protocol for updating this @springmeyer?

Investigate the RasterIO window size Mapnik requests

@rouault noticed:

Actually, looking at RasterIO calls, I see that the first ones are done with large source windows, and involve (nearest neighbour) downsampling on GDAL side. e.g. RasterIO(band=1,5208,0,5228,5228,bufxsize=1026,bufysize=1026). Later calls are done on 165x165 windows and don't involve downsampling on GDAL side.

This is odd to me and worth a closer look. My assumption was that because tilelive-bridge is requesting 512x512 tiles from node-mapnik and because buffer-size:0 for the layer and filter-factor is 2 the window requested of GDAL should be 1024x1024.

points-1000 fails in v2_spec

Getting this error:

~/mapbox/gdal-tiling-bench[visualize]$ node test.js testcases/geojson/points-1000/map.xml mapnik-versions/v2_spec/

Config -> using node v0.10.40
Config -> using mapnik at /Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/mapnik/lib/mapnik.js
Config -> source options: {"close":true,"minzoom":0,"maxzoom":4,"bounds":[-179.4798170775175,-89.94831437710673,179.27351941354573,89.85857545863837],"type":"pyramid"}
Config -> threadpool size: 6
Config -> sink: noop://

/Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/index.js:260
    map.render(new mapnik.VectorTile(+z,+x,+y), opts, function(err, image) {
               ^
TypeError: required parameters (z, x, and y) must be greater then or equal to zero
    at Function.Bridge.getVector (/Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/index.js:260:16)
    at /Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/index.js:174:20
    at /Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/node_modules/mapnik-pool/node_modules/generic-pool/lib/generic-pool.js:291:11
    at loaded (/Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/node_modules/mapnik-pool/index.js:27:28)

Remove webp encoding

I plan to write a custom tilelive source that pulls 512x512 tiles from node-mapnik but does not encode them. This will allow us to 1) drop the tilelive-bridge dependency and 2) focus the benchmark more on rasterio read speeds.

Run travis on cron

I think we should run this repo on a cron to ensure its always green when we come back to utilize. I propose running on a monthly interval. Sound good? /cc @mapsam @BergWerkGIS @dnomadb

Output data into JSON for visualization

It would be really neat to store all of the information we're printing to the console into a data.json file so we can visualize it. I'm imagining something like this table that just looks at speeds against versions and different fixture data:

Overviews

Questions to look into:

How much do overviews help the overall run-time of this benchmark
How to ensure overviews are built most efficiently to benefit optimized mercator tile requests (and can we also think about validating overviews as "happy" like https://github.com/mapbox/node-happytiff)
When using VRT's when is it more optimal to generate the overviews on the VRT vs on the referenced tiffs themselves

Migrate all s3 downloads to a common bucket

Right now we have things in springmeyer personal and mapbox/playground/mapsam. Something like mapbox/mapnik-bench?

Optimization potential?

/cc @springmeyer @mapsam

Looking into speeding up tiff uploads I used a new tool from Microsoft (Concurrency Visualizer) and ran benchall with 2 tiffs and 1 node-mapnik version (latest changes from master (exec) are not yet carried over to my branch).

I think it's interesting that 84% of the time are spent with synchronisation and only 16% with execution.

Also, there seem to be 7 threads that don't to anything at all.

I think, I'm going to dig a bit deeper here and see if there is an easy way to increase parallelization and decrease time spent with synchronisation.