Giter VIP home page Giter VIP logo

node-mapnik-bench's Introduction

Node Mapnik Bench

Build Status

node-mapnik-bench is a set of scripts used for testing performance between versions of Node Mapnik.

screenshot

Setup

This will install some basic node.js deps into ./node_modules and it will install node-mapnik binaries + related tilelive modules into the ./mapnik-versions directory.

Clone repository

git clone [email protected]:mapbox/node-mapnik-bench.git

Install dependencies

npm install

Install Node Mapnik versions you'd like to benchmark

cd mapnik-versions/<version>
npm install

To install files needed for the benchmark (this takes a little while):

node scripts/download.js

Usage

There are three major ways to use Node Mapnik Bench.

bin: bench

usage:
  bench <file> <list of mapnik versions>

example:
  # will test us-counties-polygon against latest and v3.5.0 versions of Node Mapnik
  bench ./test/fixtures/us-counties-polygons.geojson latest v3.5.0

output:
  # { source: '/Users/mapsam/mapbox/node-mapnik-bench/test/fixtures/us-counties-polygons.geojson',
  #   version: 'v3.5.0',
  #   options: { threadpool: 6 },
  #   time: 
  #     { start: 1460401496573,
  #       xml: 1460401496796,
  #       bridge: 1460401496835,
  #       info: 1460401496835,
  #       load: 1460401496835,
  #       copy: 1460401497500 },
  #   sink: 'noop://',
  #   memory: 
  #   { max_rss: '51.02MB',
  #     max_heap: '12.07MB',
  #     max_heap_total: '29.71MB' },
  #   tile_count: 533 }

bin: benchall

Test a group of files

usage:
  benchall <fixture_index> <type> <list of mapnik versions>

example:
  # will test all geojsons in /testcases against latest and v3.5.0 versions of Node Mapnik
  benchall ./testcases/index.js geojson latest v3.5.0

output:
  # saves a timestamped JSON file to /visual
  visual/1454461994.json

View the output by opening the JSON with visual/index.html file.

cd visual
python -m SimpleHTTPSever
localhost:8000/visual/index.html?1454461994 # in your browser

bench(file, version, options, callback)

var bench = require('./lib/index.js');

bench('./test/fixtures/us-counties-polygons.geojson', 'latest', options, function(err, stats) {
  if (err) throw err;
  console.log(stats); // same as JSON from ./bin/bench above
});

Testcases

In order to test an entire suite of files against multiple versions of Node Mapnik, they must be structured in a particlar manner. Check out the testcases directory to get started.

Add a new testcase

TODO

Mapnik versions

All of the mapnik versions we test are in the mapnik-versions directory. To add a new version you can create a new directory named after the tag, branch, or gitsha you would like to use. The name of the directory is the name you'll use in the benchmark commands.

Add a package.json with the proper pointers - here is what latest looks like:

{
  "name": "gdal-tiling-bench-version",
  "version": "1.0.0",
  "main": "package.json",
  "dependencies": {
    "mapnik": "https://github.com/mapnik/node-mapnik/tarball/master",
    "tilelive-bridge":"https://github.com/mapbox/tilelive-bridge/tarball/master"
  }
}

Once added, you can npm install in that new directory.

Test

npm test

node-mapnik-bench's People

Contributors

flippmoke avatar mapsam avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

node-mapnik-bench's Issues

feature suggestions

@mapsam (/cc @springmeyer ) you might want to look at (and maybe carry over) same additions I did in https://github.com/BergWerkGIS/investigate-gpx

---concurrency

control how many maps are created in the mapnik pool.
I've carried over Math.ceil(require('os').cpus().length * 16) from some tilelive module, but I think somewhere else it gets limited to 16 anyway (don't remember where I saw that).

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L23

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L150

--show-progress

more verbose output than --verbose.
e.g. shows tiles/sec not only at the end, but also in between

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L24

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L35

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L130-L137

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L153-L161

workarounds to generate webp tiles

If your are working with vector sources the trick is to include a raster into the mapnik.xml:
https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/style.xml.template#L28-L39

Caution(!): This adds a little raster at null-island so be sure to always include --bounds when using that otherwise the map extent will be pretty big and slow down things considerably.

The mapnik.xml has to have style in its name to set the file extension of the resulting tiles correctly:

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/bench.js#L88

This is just for getting the correct file extension.
No matter what extension you pass here

sink = 'file://' + path.join(__dirname, argv.output+'?filetype=' + filetype);

the output format of the tiles is always determined by tilelive-brigde automatically depending what types of layers are found in the map.
Hence (workaround #1).

slippy map for viewing generated tiles

run the bench, then node show-tiles-server.js, then http://localhost:666

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/show-tiles-server.js

https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/show-tiles.html

https://github.com/BergWerkGIS/investigate-gpx/tree/9e86ced0559e88195857417a66609552837d1125/leaflet

It's working for webp tiles.
Not so much for pbf files. I somehow have troubles extracting them.
Maybe that's an exercise left to the reader ๐Ÿ˜
https://github.com/BergWerkGIS/investigate-gpx/blob/9e86ced0559e88195857417a66609552837d1125/show-tiles-server.js#L59-L95

Include vector data

Right now gdal-tiling-bench is only testing agains TIF files but with the advancement of vector data inputs in mapnik, it seems important to benchmark on file types such as GeoJSON and Shapefiles.

Additionally running benchmarks against different instances of these vector data will be helpful. This includes points, lines, polygons, multipolygons, etc. This will allow us to begin bechmarking against specific operations running in GDAL/mapnik to get a more defined view into where bottlenecks.

I like the idea of keeping these large files up on s3, so will continue working that way unless we are using tiny files.

Any input from @flippmoke @jakepruitt on what will make this benchmarking more useful for the v2 push is appreciated!

Todo

  • GeoJSON test cases including different data types and sizes
  • Shapefile additions with different data types and sizes (@springmeyer mentioned working with indexed tiles here?)

Should we run this on an ec2 instance?

From @jakepruitt

When running these benchmarks to compare the v2 branch, should we do this on a morec2 to homogenize the data and make it reproducible?

@springmeyer agrees it would be great to see these results as well, and this makes it more apparent to include operating system information on the benchmarks as we run them so we aren't comparing ๐ŸŽ to ๐Ÿ“™ (no orange fruit emoji ๐Ÿ˜ข).

Seems worthwhile to keep local testing, though. Let's look into running on AWS as an option.

Running benchall in the same process results in mapnik dupes

@springmeyer uncovered a sizeable issue with the current refactor of Node Mapnik Bench. Since we are now running everything within the same node process, we are introducing multiple versions of mapnik. I thought I had checked for this, but unfortunately it was missed. So, it's unclear how much of an effect this has had on our numbers, but it's safe to say that we need to fix it asap in order to make our benchmark numbers more reliable.

After chatting with @springmeyer we're going to stick with running bench, but refactor the benchall.js script so it executes each bench command in its own process using child_process.exec - like we do for generating XML on the fly - and capturing the output from stdout and writing to a file.

I'll take a crack at this so we can get @flippmoke some numbers fast!

cc @mapbox/mapnik

Issues with Threading GDAL

We are going to have big problems with any performant multithreading of TIFFs in GDAL. My last attempted led me down the path of this and this. After lots of changes this only really allowed multithreaded read safe processing on memory datasets. I was planning on expanding the scope of the RFC to make multithreading safe processing across all of GDAL and its drivers.

The discussions for this can be found:

So the core issues in GDAL associated with this are:

  • Thread safety of global objects
  • Thread safety of the block cache (which isn't currently block safe)
  • Thread safety of the individual dataset object (many operations around it are not thread safe)
  • Thread safety of the drivers (TIFF isn't thread safe)
  • Thread safety of individual file handlers (inside each driver)

I have ideas on how to solve all of these problems, but this is a HUGE effort, and will require a lot of thinking and help/support from the GDAL community.

/cc @springmeyer

Investigate the RasterIO window size Mapnik requests

@rouault noticed:

Actually, looking at RasterIO calls, I see that the first ones are done with large source windows, and involve (nearest neighbour) downsampling on GDAL side. e.g. RasterIO(band=1,5208,0,5228,5228,bufxsize=1026,bufysize=1026). Later calls are done on 165x165 windows and don't involve downsampling on GDAL side.

This is odd to me and worth a closer look. My assumption was that because tilelive-bridge is requesting 512x512 tiles from node-mapnik and because buffer-size:0 for the layer and filter-factor is 2 the window requested of GDAL should be 1024x1024.

points-1000 fails in v2_spec

Getting this error:

~/mapbox/gdal-tiling-bench[visualize]$ node test.js testcases/geojson/points-1000/map.xml mapnik-versions/v2_spec/

Config -> using node v0.10.40
Config -> using mapnik at /Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/mapnik/lib/mapnik.js
Config -> source options: {"close":true,"minzoom":0,"maxzoom":4,"bounds":[-179.4798170775175,-89.94831437710673,179.27351941354573,89.85857545863837],"type":"pyramid"}
Config -> threadpool size: 6
Config -> sink: noop://

/Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/index.js:260
    map.render(new mapnik.VectorTile(+z,+x,+y), opts, function(err, image) {
               ^
TypeError: required parameters (z, x, and y) must be greater then or equal to zero
    at Function.Bridge.getVector (/Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/index.js:260:16)
    at /Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/index.js:174:20
    at /Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/node_modules/mapnik-pool/node_modules/generic-pool/lib/generic-pool.js:291:11
    at loaded (/Users/mapsam/mapbox/gdal-tiling-bench/mapnik-versions/v2_spec/node_modules/tilelive-bridge/node_modules/mapnik-pool/index.js:27:28)

Remove webp encoding

I plan to write a custom tilelive source that pulls 512x512 tiles from node-mapnik but does not encode them. This will allow us to 1) drop the tilelive-bridge dependency and 2) focus the benchmark more on rasterio read speeds.

Output data into JSON for visualization

It would be really neat to store all of the information we're printing to the console into a data.json file so we can visualize it. I'm imagining something like this table that just looks at speeds against versions and different fixture data:

screen shot 2016-01-29 at 4 52 51 pm

Overviews

Questions to look into:

  • How much do overviews help the overall run-time of this benchmark
  • How to ensure overviews are built most efficiently to benefit optimized mercator tile requests (and can we also think about validating overviews as "happy" like https://github.com/mapbox/node-happytiff)
  • When using VRT's when is it more optimal to generate the overviews on the VRT vs on the referenced tiffs themselves

Optimization potential?

/cc @springmeyer @mapsam

Looking into speeding up tiff uploads I used a new tool from Microsoft (Concurrency Visualizer) and ran benchall with 2 tiffs and 1 node-mapnik version (latest changes from master (exec) are not yet carried over to my branch).

I think it's interesting that 84% of the time are spent with synchronisation and only 16% with execution.

Also, there seem to be 7 threads that don't to anything at all.

I think, I'm going to dig a bit deeper here and see if there is an easy way to increase parallelization and decrease time spent with synchronisation.

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.