Giter VIP home page Giter VIP logo

raster_sampling's Introduction

raster_sampling

Compilation of resources and inter-comparison of raster sampling strategies

Introduction and Motivation

Sampling a raster or ndarray at points is a common, basic need, but there is surprisingly little discussion or intercomparison of available options.

This is such an important topic, and the scientific literature is full of errors related to a lack of expertise when sampling (i.e. using nearest neighbor for large grid cells). Most new users will go with rasterio sample function, without considering consequences.

There is value in general implementation of different approaches in a centralized library, potentially with different wrappers depending on input object (DataFrame, numpy arrays, xarray DataSet). This could be installed as a dependency and used for different applications.

Goals for sprint

  • Continue compiling notes/resources on available approaches
  • Discuss and prepare test data
  • Design intercomparison function/notebook to document spread of values/results when using different methods
  • Discuss potential integration with existing efforts (demquery, icepyx, rioxarray)

Existing approaches

  1. rasterio sample for 2D raster

  2. Zonal statistics (local window or circular region)

  3. scipy.ndimage.map_coordinates

  4. scipy interpolation options (for a local window of raster pixels)

  5. demquery https://github.com/kylebarron/demquery (thanks to @scottyhq!)

  6. xarray interpolation and sampling (http://xarray.pydata.org/en/stable/interpolation.html#advanced-interpolation)

    • Example for GeoDataFrame of points points (creates new dimension "z" to store output):
    import xarray, rioxarray, geopandas
    
    #Open raster dataset to sample
    dem = rioxarray.open_rasterio("dem.tif")
    
    #Open point vector dataset and reproject
    gdf = geopandas.read_file("points.geojson").to_crs(dem.rio.crs)
    
    #Add z dimension to store samples
    x = xarray.DataArray(gdf.geometry.x.values, dims=“z”)
    y = xarray.DataArray(gdf.geometry.y.values, dims=“z”)
    
    #Sample the raster
    samples = dem.interp(x=x, y=y) #default method is linear
    
    #Add to the geodataframe
    gdf[‘dem’] = samples.values
    
    #Compute difference with values in another column
    gdf[‘h_mean diff’] = gdf[‘h_mean’] - gdf[‘dem’]
    
  7. pangeo-pyinterp: https://github.com/CNES/pangeo-pyinterp

  8. Custom implementations in existing tools from @SmithB and @tsutterley

  9. Build bilinear equations in lsq framework, then solve: https://github.com/SmithB/LSsurf

  10. https://geoviews.org/user_guide/Resampling_Grids.html (suggested by Friedrich)

  11. https://github.com/OGGM/oggm

  12. https://pyresample.readthedocs.io/en/latest/

  13. https://docs.generic-mapping-tools.org/dev/grdtrack.html

Scaling Considerations

  1. scipy ndinterp - create interpolator once, use multiple times
  2. @tsutterley has implemention for MPI tile-based loading/sampling, using scipy splines:

Issues

  • Handling of raster nodata/nan - many algorithms will propagate nans differently
  • Performance for large point arrays, large rasters
  • Input raster cell dimensions vs point sampling density
  • Handling of different raster data types (e.g., only use nearest for classified rasters)
  • xarray - great for ndarrays from GCMs, most efficient ways to chunk large point coordinate arrays, perform operations

How to choose?

  • Functions to assess input raster properties (e.g., classified vs. continuous, large gradients, spatial frequency content) and point properties
  • Relative point density for points and raster
  • Potentially add some logic to select best method based on these metrics

Method Comparison

  • For same input points and raster/ndarray, extract values from all samples
  • No "truth", but can look at various statistics
  • ATL06 Already has DEM attributes in ATL06, ArcticDEM/REMA (reverts to GDEM) - details on how sampling is performed unknown (Jeff Lee)

Candidate use cases

  • Simple use case: starting with point coordinates in NumPy arrays, raster DEM as GeoTiff
  • More advanced use case: GCM reanalysis data, xyzt point coordinates

User Guide and Documentation

  • Provide general recommendations based on typical scenarios
  • Document case where different methods yield very different results

Next steps

  • Review resources listed above, compile notes
  • Start creating notebooks with snippet for approach or specific problem, links to documentation

raster_sampling's People

Contributors

dshean avatar friedrichknuth avatar shashankbice avatar

Stargazers

Tauseef Bashir avatar Zack Aemmer avatar Robin Cole avatar  avatar Bob McNabb avatar

Watchers

James Cloos avatar  avatar  avatar Tyler Sutterley avatar Yara Mohajerani avatar

Forkers

bessoh2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.