Giter VIP home page Giter VIP logo

stars's Introduction

Spatiotemporal Arrays: Raster and Vector Datacubes

R-CMD-check CRAN cran checks Downloads status Codecov test coverage

Spatiotemporal data often comes in the form of dense arrays, with space and time being array dimensions. Examples include

  • socio-economic or demographic data,
  • environmental variables monitored at fixed stations,
  • raster maps
  • time series of satellite images with multiple spectral bands,
  • spatial simulations, and
  • climate or weather model output.

This R package provides classes and methods for reading, manipulating, plotting and writing such data cubes, to the extent that there are proper formats for doing so.

Raster and vector data cubes

The canonical data cube most of us have in mind is that where two dimensions represent spatial raster dimensions, and the third time (or band), as e.g. shown here:

By data cubes however we also consider higher-dimensional cubes (hypercubes) such as a five-dimensional cube where in addition to time, spectral band and sensor form dimensions:

or lower-dimensional cubes such as a raster image:

suppressPackageStartupMessages(library(dplyr))
library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
tif = system.file("tif/L7_ETMs.tif", package = "stars")
read_stars(tif) |>
  slice(index = 1, along = "band") |>
  plot()

Raster data do not need to be regular and aligned with North/East, and package stars supports besides regular also rotated, sheared, rectilinear and curvilinear rasters:

Vector data cubes arise when we do not have two regularly discretized spatial dimensions, but a single dimension that points to distinct spatial feature geometries, such as polygons (e.g. denoting administrative regions):

or points (e.g. denoting sensor locations):

NetCDF’s CF-convention calls this a discrete axis.

NetCDF, GDAL

stars provides two functions to read data: read_ncdf and read_stars, where the latter reads through GDAL. (In the future, both will be integrated in read_stars.) For reading NetCDF files, package RNetCDF is used, for reading through GDAL, package sf provides the binary linking to GDAL.

For vector and raster operations, stars uses as much as possible the routines available in GDAL and PROJ (e.g. st_transform, rasterize, polygonize, warp). Read more about this in the vignette on vector-raster conversions, reprojection, warping.

Out-of-memory (on-disk) rasters

Package stars provides stars_proxy objects (currently only when read through GDAL), which contain only the dimensions metadata and pointers to the files on disk. These objects work lazily: reading and processing data is postponed to the moment that pixels are really needed (at plot time, or when writing to disk), and is done at the lowest spatial resolution possible that still fulfills the resolution of the graphics device. More details are found in the stars proxy vignette.

The following methods are currently available for stars_proxy objects:

methods(class = "stars_proxy")
#  [1] [               [[<-            [<-             adrop          
#  [5] aggregate       aperm           as.data.frame   c              
#  [9] coerce          dim             droplevels      filter         
# [13] hist            image           initialize      is.na          
# [17] Math            merge           mutate          Ops            
# [21] plot            predict         print           pull           
# [25] rename          select          show            slice          
# [29] slotsFromS3     split           st_apply        st_as_sf       
# [33] st_as_stars     st_crop         st_dimensions<- st_downsample  
# [37] st_mosaic       st_normalize    st_redimension  st_sample      
# [41] st_set_bbox     transmute       write_stars    
# see '?methods' for accessing help and source code

Raster and vector time series analysis example

In the following, a curvilinear grid with hourly precipitation values of a hurricane is imported and the first 12 time steps are plotted:

prec_file = system.file("nc/test_stageiv_xyt.nc", package = "stars")
(prec = read_stars(gdal_subdatasets(prec_file)[[1]]))
# stars object with 3 dimensions and 1 attribute
# attribute(s):
#                                         Min. 1st Qu. Median     Mean 3rd Qu.
# Total_precipitation_surface... [kg/m^2]    0       0   0.75 4.143009    4.63
#                                           Max.
# Total_precipitation_surface... [kg/m^2] 163.75
# dimension(s):
#      from  to                  offset   delta         refsys
# x       1  87                      NA      NA WGS 84 (CRS84)
# y       1 118                      NA      NA WGS 84 (CRS84)
# time    1  23 2018-09-13 19:00:00 UTC 1 hours        POSIXct
#                                  values x/y
# x    [87x118] -80.61 [°],...,-74.88 [°] [x]
# y      [87x118] 32.44 [°],...,37.62 [°] [y]
# time                               NULL    
# curvilinear grid
# or: (prec = read_ncdf(prec_file, curvilinear = c("lon", "lat"), ignore_bounds = TRUE))
sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"), "nc.gpkg") |> 
  st_transform(st_crs(prec)) -> nc # transform from NAD27 to WGS84
nc_outline = st_union(st_geometry(nc))
plot_hook = function() plot(nc_outline, border = 'red', add = TRUE)
prec |>
  slice(index = 1:12, along = "time") |>
  plot(downsample = c(3, 3, 1), hook = plot_hook)

and next, intersected with with the counties of North Carolina, where the maximum precipitation intensity was obtained per county, and plotted:

a = aggregate(prec, by = nc, FUN = max)
plot(a, max.plot = 23, border = 'grey', lwd = .5)

We can integrate over (reduce) time, for instance to find out when the maximum precipitation occurred. The following code finds the time index, and then the corresponding time value:

index_max = function(x) ifelse(all(is.na(x)), NA, which.max(x))
b = st_apply(a, "geom", index_max)
b |>  mutate(when = st_get_dimension_values(a, "time")[b$index_max]) |>
  select(when) |>
  plot(key.pos = 1, main = "time of maximum precipitation")

With package cubble, we can make a glyph map to see the magnitude and timings of county maximum precipitation:

library(cubble)
# 
# Attaching package: 'cubble'
# The following object is masked from 'package:stats':
# 
#     filter
library(ggplot2)
a |> setNames("precip") |>
  st_set_dimensions(2, name = "tm") |>
  units::drop_units() |>
  as_cubble(key = id, index = tm) |>
  suppressWarnings() -> a.cb
a.cb |>
  face_temporal() |>
  unfold(long, lat) |>
  mutate(tm = as.numeric(tm)) |>
  ggplot(aes(x_major = long, x_minor = tm, y_major = lat, y_minor = precip)) +
  geom_sf(data = nc, inherit.aes = FALSE) +
  geom_glyph_box(width = 0.3, height = 0.1) +
  geom_glyph(width = 0.3, height = 0.1)
# Warning: There were 84 warnings in `dplyr::mutate()`.
# The first warning was:
# ℹ In argument: `y = .data$y_major + rescale11(.data$y_minor) * .data$height/2`.
# ℹ In group 12: `group = 12`.
# Caused by warning in `min()`:
# ! no non-missing arguments to min; returning Inf
# ℹ Run `dplyr::last_dplyr_warnings()` to see the 83 remaining warnings.
# Warning: Removed 966 rows containing missing values or values outside the scale range
# (`geom_glyph_box()`).
# Warning: Removed 966 rows containing missing values or values outside the scale range
# (`geom_glyph()`).

Other packages for data cubes

Package gdalcubes can be used to create data cubes (or functions from them) from image collections, sets of multi-band images with varying

  • spatial resolution
  • spatial extent
  • coordinate reference systems (e.g., spread over multiple UTM zones)
  • observation times

and does this by resampling and/or aggregating over space and/or time. It reuses GDAL VRT’s and gdalwarp for spatial resampling and/or warping, and handles temporal resampling or aggregation itself.

ncdfgeom reads and writes vector data cubes from and to netcdf files in a standards-compliant way.

Packages raster and its successor, terra are powerful packages for handling raster maps and stacks of raster maps both in memory and on disk, but do not address

  • non-raster time series,
  • multi-attribute rasters time series
  • rasters with mixed type attributes (e.g., numeric, logical, factor, POSIXct)
  • rectilinear or curvilinear rasters

A list of stars commands matching existing raster commands is found in this wiki. A list of translations in the opposite direction (from stars to raster or terra) still needs to be made.

A comment on the differences between stars and terra is found here.

Other stars resources:

Acknowledgment

This project has been realized with financial support from the

stars's People

Contributors

a-benini avatar ailich avatar ateucher avatar btupper avatar dblodgett-usgs avatar djnavarro avatar edzer avatar erickchacon avatar ethanwhite avatar etiennebr avatar flahn avatar floriandeboissieu avatar gavg712 avatar gdkrmr avatar github-actions[bot] avatar iod-ine avatar joshobrien avatar kadyb avatar kendonb avatar loreabad6 avatar mdsumner avatar mtennekes avatar nowosad avatar olivroy avatar pat-s avatar przell avatar rubak avatar tsamsonov avatar uribo avatar yutannihilation avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stars's Issues

missing value where TRUE/FALSE needed

I am opening the file used in the first stars blog via raster, then to transform it to stars via st_as_stars. However I am getting this error:
Error in if (diff(range(ud))/mean(ud) < 1e-10) { : missing value where TRUE/FALSE needed

If instead I read two bands at a time, converting them works.

To reproduce:

inputFile <- "full_data_daily_2013.nc" #Same data used in star's first blog post:http://r-spatial.org/r/2017/11/23/stars1.html#reading-a-raster-time-series-netcdf

r <- raster::raster(inputFile, varname="p")
s <- raster::stack(inputFile, varname="p", bands=1:2)

r2 <- stars::st_as_stars(r) # DOES NOT WORK
s2 <- stars::st_as_stars(s) # WORKS

Merging stars objects

While trying to merge existing stars objects (stored in a list as_stars) along bands with c.stars(), I found that the "to" value for that dimension to be inconsistent:

> as_stars
[[1]]
stars object with 3 dimensions and 1 attribute
attribute(s):
    raster     
 Min.   : 951  
 1st Qu.:1032  
 Median :1057  
 Mean   :1063  
 3rd Qu.:1086  
 Max.   :1352  
dimension(s):
     from  to  offset delta                       refsys point values
x       1 300  699960    10 +proj=utm +zone=34 +south...    NA   NULL
y       1 300 7900000   -10 +proj=utm +zone=34 +south...    NA   NULL
band    1   1      NA    NA                           NA    NA      z

[[2]]
stars object with 3 dimensions and 1 attribute
attribute(s):
    raster       
 Min.   : 238.0  
 1st Qu.: 519.0  
 Median : 595.0  
 Mean   : 620.9  
 3rd Qu.: 699.0  
 Max.   :3016.0  
dimension(s):
     from  to  offset delta                       refsys point values
x       1 300  699960    10 +proj=utm +zone=34 +south...    NA   NULL
y       1 300 7900000   -10 +proj=utm +zone=34 +south...    NA   NULL
band    1   1      NA    NA                           NA    NA      z

[[3]]
stars object with 3 dimensions and 1 attribute
attribute(s):
    raster       
 Min.   : 411.0  
 1st Qu.: 646.0  
 Median : 731.0  
 Mean   : 741.2  
 3rd Qu.: 821.0  
 Max.   :2535.0  
dimension(s):
     from  to  offset delta                       refsys point values
x       1 300  699960    10 +proj=utm +zone=34 +south...    NA   NULL
y       1 300 7900000   -10 +proj=utm +zone=34 +south...    NA   NULL
band    1   1      NA    NA                           NA    NA      z

Creating a new stars object, t1 equal to the 1st element in the list as_stars:

> t1 = as_stars[[1]]
> t1
stars object with 3 dimensions and 1 attribute
attribute(s):
    raster     
 Min.   : 951  
 1st Qu.:1032  
 Median :1057  
 Mean   :1063  
 3rd Qu.:1086  
 Max.   :1352  
dimension(s):
     from  to  offset delta                       refsys point values
x       1 300  699960    10 +proj=utm +zone=34 +south...    NA   NULL
y       1 300 7900000   -10 +proj=utm +zone=34 +south...    NA   NULL
band    1   1      NA    NA                           NA    NA      z

Adding the 2nd element of as_stars to the new stars object, t1

> t1 = c(t1, as_stars[[2]], along = "band")
> t1
stars object with 3 dimensions and 1 attribute
attribute(s):
    raster       
 Min.   : 238.0  
 1st Qu.: 595.0  
 Median : 991.0  
 Mean   : 842.1  
 3rd Qu.:1059.0  
 Max.   :3016.0  
dimension(s):
     from  to  offset delta                       refsys point values
x       1 300  699960    10 +proj=utm +zone=34 +south...    NA   NULL
y       1 300 7900000   -10 +proj=utm +zone=34 +south...    NA   NULL
band    1   2      NA    NA                           NA    NA      z

Looks fine till here, but after adding the 3rd element of as_stars to t1...

> t1 = c(t1, as_stars[[3]], along = "band")
> t1
stars object with 3 dimensions and 1 attribute
attribute(s):
    raster       
 Min.   : 238.0  
 1st Qu.: 620.0  
 Median : 776.0  
 Mean   : 808.5  
 3rd Qu.:1036.0  
 Max.   :3016.0  
dimension(s):
     from  to  offset delta                       refsys point values
x       1 300  699960    10 +proj=utm +zone=34 +south...    NA   NULL
y       1 300 7900000   -10 +proj=utm +zone=34 +south...    NA   NULL
band    1   4      NA    NA                           NA    NA      z

The "to" value for "band" is 4 instead of the expected 3. Further adding the 1st element of as_stars to t1 results in

> t1 = c(t1, as_stars[[1]], along = "band")
> t1
stars object with 3 dimensions and 1 attribute
attribute(s):
    raster       
 Min.   : 238.0  
 1st Qu.: 668.0  
 Median : 992.0  
 Mean   : 872.2  
 3rd Qu.:1059.0  
 Max.   :3016.0  
dimension(s):
     from  to  offset delta                       refsys point values
x       1 300  699960    10 +proj=utm +zone=34 +south...    NA   NULL
y       1 300 7900000   -10 +proj=utm +zone=34 +south...    NA   NULL
band    1   8      NA    NA                           NA    NA      z

t1 having a "to" value of 8 for bands while it should be 4. It seems to be increasing exponentially in powers of 2 in general. Values obtained using dim() seems to be fine though:

> dim(t1)
   x    y band 
 300  300    4 

Thanks!

stars_proxy objects

Brought up by @adrfantini in #48.

This is a first and very rough shot at proxy objects in stars. They contain only the dimension table and a pointer to the data (now: the file name, we could add the GDALDataset pointer if this helps perfomance-wise). Based on this we can do subsetting, cropping, and thinning; only when we plot those cells that we can see as pixels are read. The speedup for an Sentinel-2 image (4 10-m bands, 10000 x 10000) over complete reading and thinning in memory is a factor 100 or so (at thinning factor 43).

library(stars)
# Loading required package: abind
# Loading required package: sf
# Linking to GEOS 3.6.2, GDAL 2.2.3, proj.4 4.9.3

tif = system.file("tif/L7_ETMs.tif", package = "stars")
s2 = "SENTINEL2_L1C:/vsizip/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.SAFE/MTD_MSIL1C.xml:10m:EPSG_32632"

p = read_stars(s2, proxy = TRUE)
p
# stars_proxy object with 1 attribute in file:
# $`MTD_MSIL1C.xml:10m:EPSG_32632`
# [1] "SENTINEL2_L1C:/vsizip/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.zip/S2A_MSIL1C_20180220T105051_N0206_R051_T32ULE_20180221T134037.SAFE/MTD_MSIL1C.xml:10m:EPSG_32632"

# dimension(s):
#       from    to offset delta                       refsys point values
# x        1 10980  3e+05    10 +proj=utm +zone=32 +datum...    NA   NULL
# y        1 10980  6e+06   -10 +proj=utm +zone=32 +datum...    NA   NULL
# bands    1     4     NA    NA                           NA    NA   NULL
st_crs(p)
# Coordinate Reference System:
#   EPSG: 32632 
#   proj4string: "+proj=utm +zone=32 +datum=WGS84 +units=m +no_defs"
st_dimensions(p)
#       from    to offset delta                       refsys point values
# x        1 10980  3e+05    10 +proj=utm +zone=32 +datum...    NA   NULL
# y        1 10980  6e+06   -10 +proj=utm +zone=32 +datum...    NA   NULL
# bands    1     4     NA    NA                           NA    NA   NULL

#p2 = p[,1:4000, 40:4060,]
p2 = p

stars:::get_downsample(dim(p2))
# [1] 43
png("s2.png", 1000, 1000)
system.time(plot(p2, rgb = 2:4))
#    user  system elapsed 
#   0.970   0.208   0.942 
system.time(plot(read_stars(s2), rgb = 2:4))
#    user  system elapsed 
#  82.905   5.385  36.679 

s2

NetCDF speed

I've done a comparison of stars to raster for multiple files, and stars wins for a time series of 10 files, but raster takes over at 100.

One improvement is to avoid reading all the metadata that ncmeta gets:

 #meta = ncmeta::nc_meta(.x)
  meta = list(grid = ncmeta::nc_grids(.x), 
              axis = ncmeta::nc_axes(.x), 
              dimension = ncmeta::nc_dims(.x))

but also key is not reading that for every file, so vectorizing read_ncdf will provide the edge that raster has (with its quick = TRUE in stack (assuming all the files are equivalent).

Not reproducible, requires local file preparation.

## files obtained with bowerbird from ## ftp.sltac.cls.fr/Core/SEALEVEL_GLO_PHY_L4_REP_OBSERVATIONS_008_047/dataset-duacs-rep-global-merged-allsat-phy-l4-v3/1993/
##f <- raadtools::sshfiles()$fullname[1:10]

# f <- c("dt_global_allsat_phy_l4_19930101_20170110.nc", "dt_global_allsat_phy_l4_19930102_20170110.nc", 
#   "dt_global_allsat_phy_l4_19930103_20170110.nc", "dt_global_allsat_phy_l4_19930104_20170110.nc", 
#   "dt_global_allsat_phy_l4_19930105_20170110.nc", "dt_global_allsat_phy_l4_19930106_20170110.nc", 
#   "dt_global_allsat_phy_l4_19930107_20170110.nc", "dt_global_allsat_phy_l4_19930108_20170110.nc", 
#   "dt_global_allsat_phy_l4_19930109_20170110.nc", "dt_global_allsat_phy_l4_19930110_20170110.nc"
# )


## object for ncsub arbument
mat <- cbind(start = c(600, 100, 1), count = c(200, 200, 1))

## concatenate along time (for these files)
c3 <- function(...) c.stars(..., along = 3)

## set 0 ncol, 0 nrow as extent so that we crop like NetCDF start/count
set_ext0 <- function(x) {
  raster::setExtent(x, raster::extent(0, ncol(x), 0, nrow(x)))
}

## STARS
rbenchmark::benchmark(stars = do.call(c3, lapply(f, read_ncdf, var = "sla", ncsub = mat)), 
                      raster = crop(set_ext0(raster::stack(f, varname = "sla", quick = TRUE)), extent(600, 800, 100, 300)), 
                      replications = 10)
# test replications elapsed relative user.self sys.self user.child sys.child
# 2 raster           10   4.339    1.061     3.680    0.101          0         0
# 1  stars           10   4.088    1.000     3.017    0.088          0         0

That said, ncdf4::nc_open than ncmeta, and all the metadata is in that connection object. I used RNetCDF for Thredds cross platform, ultimately it might be best to simply parse the ncdf4 object (now that I understand things).

I think this will be too dynamic for PRs for a while, but this example is very encouraging.

The output from above:

x
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
      sla         
 Min.   :-0.7071  
 1st Qu.:-0.0641  
 Median :-0.0203  
 Mean   :-0.0145  
 3rd Qu.: 0.0334  
 Max.   : 0.5710  
 NA's   :1990     
dimension(s):
  from  to  offset delta refsys point values
1    1 200 149.875  0.25     NA    NA   NULL
2    1 200 -65.125  0.25     NA    NA   NULL
3    1 100   15706     1     NA    NA   NULL
> y
class       : RasterBrick 
dimensions  : 200, 200, 40000, 10  (nrow, ncol, ncell, nlayers)
resolution  : 1, 1  (x, y)
extent      : 600, 800, 100, 300  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +rf=298.257 +a=6378136.3 
data source : in memory
names       : Sea.level.anomaly.1, Sea.level.anomaly.2, Sea.level.anomaly.3, Sea.level.anomaly.4, Sea.level.anomaly.5, Sea.level.anomaly.6, Sea.level.anomaly.7, Sea.level.anomaly.8, Sea.level.anomaly.9, Sea.level.anomaly.10 
min values  :             -0.7071,             -0.6642,             -0.6066,             -0.5564,             -0.5812,             -0.6116,             -0.6391,             -0.6520,             -0.6627,              -0.6613 
max values  :              0.5650,              0.5710,              0.5752,              0.5806,              0.5996,              0.6237,              0.6507,              0.6746,              0.6927,               0.7106 

Errors reading NetCDF files coming from climate model

Thanks for stars, it looks extremely promising. I'm using the latest github version.

I've tested it on a few files coming from our climate model, RegCM, which produces well-described NetCDF data as output.
The data is on a Lambert Conformal Conic grid and the CRS is described, as per CORDEX guidelines, by a crs variable.

Two kind of errors are recurring for different files. The first (input file link):

> sts <- st_stars("CORDEXRUN_4_STS.1990010100.nc")
xlon, xlat, mask, topo, ps, tsmax, tsmin, prmax, pr, sund, psmin, psavg, mrros, mrro, tasmax, tasmin, tas, sfcWindmax, time_bnds, 
Error in check_equal_dimensions(dots) : 
  object 1 and 5 have different dimensions
In addition: Warning messages:
1: In CPL_read_gdal(x, options, driver, TRUE) :
  GDAL Message 1: dimension #1 (time_bounds) is not a Longitude/X dimension.
2: In CPL_read_gdal(x, options, driver, TRUE) :
  GDAL Message 1: dimension #0 (time) is not a Latitude/Y dimension.

The second (input file link):

> rad <- st_stars("CORDEXRUN_4_RAD.1990010100.nc")
xlon, xlat, mask, topo, ps, rsns, rsnl, rtnscl, rsnscl, rtnlcl, rsnlcl, rts, rsnt, prw, clwvi, clivi, rtl, clh, clm, cll, cl, Error: inherits(value, "units") || inherits(value, "symbolic_units") is not TRUE

You can check the structure of those files using ncdf4::nc_open() from R or ncdump -h from the command line.

Reading multi-band multi-temporal rasters

While trying to create a stars object from a timeseries of GeoTIFF files whose filenames (with paths relative to the location of the table), time index, timestamps, band numbers and band labels are listed in a file like this using read_stars() as seen here, we found the dimensions being reported by dim() and attr() to be different.

While this could still be fixed by changing the underlying structures manually, it is probably not a good idea in general. I think modifying the read_stars() function instead so that it could take as argument

  • the file names of the rasters with path (list or array of strings)
  • the number of time-steps (integer)
  • the number of bands in each time-step (integer)
  • timestamps (list or array of strings or POSIXct objects)
  • band labels (list or array of strings)

so that the dimensions are calculated automatically would be useful for a number of EO applications using rasters.

stars to Raster object

I was trying to convert a stars object stars_obj to a Raster object using as() and encountered the following error:

> stars_obj
stars object with 4 dimensions and 1 attribute
attribute(s):
     attr        
 Min.   : 238.0  
 1st Qu.: 621.0  
 Median :1009.0  
 Mean   : 887.3  
 3rd Qu.:1132.0  
 Max.   :3631.0  
dimension(s):
     from  to              offset   delta                       refsys point              values
x       1 300              699960      10 +proj=utm +zone=34 +south...    NA                NULL
y       1 300             7900000     -10 +proj=utm +zone=34 +south...    NA                NULL
band    1   2                  NA      NA                           NA    NA                   z
time    1   3 2017-05-01 08:16:11 10 days                           NA    NA 2017-05-11 08:20:11
> as(stars_obj, "Raster")
Error: length(dim(x)) %in% c(2, 3) is not TRUE

I believe it has something to do with this line here in st_as_raster(). I am not sure about its role but is that conditional really necessary? Thanks!

borrow ideas from xarray?

I am not sure where to leave this comment, but the stars proposal (which I am very excited about!) reminds me of the Python xarray package, which seems to provide support for multidimensional arrays, with operator broadcasting, multiple indices and dataframe-like structures. Would it be right to compare the goals of stars to those of xarray?

Performance of different polygonizing methods analyzed

From #12 I performed some tests on the performance of different polygonizing methods for raster data, both for stars and for raster data.

The input files are the same used in the first blog post about stars, that is the NetCDF file from ftp://ftp.dwd.de/pub/data/gpcc/full_data_daily_V1/full_data_daily_2013.nc.gz.

Code:

library(microbenchmark)
library(ggplot2)

inputFile <- "full_data_daily_2013.nc" #Same data used in star's first blog post:http://r-spatial.org/r/2017/11/23/stars1.html#reading-a-raster-time-series-netcdf

r <- raster::raster(inputFile, varname="p")
s <- raster::stack(inputFile, varname="p", bands=1:2)

r2 <- stars::st_as_stars(r)
s2 <- stars::st_as_stars(s)

mb <- microbenchmark(times=5,
    tr1 <- spex::polygonize(r, na.rm=TRUE),
    tr2 <- raster::rasterToPolygons(r, na.rm=TRUE),
    tr3 <- as(raster::rasterToPolygons(r, na.rm=TRUE), "sf"),
    tr4 <- sf::st_as_sf(r2, as_points = FALSE, na.rm=TRUE),
    tr5 <- stars::st_xy2sfc(r2, as_points = FALSE, na.rm=TRUE),
    ts1 <- spex::polygonize(s, na.rm=TRUE),
    ts2 <- raster::rasterToPolygons(s, na.rm=TRUE),
    ts3 <- as(raster::rasterToPolygons(s, na.rm=TRUE), "sf"),
    ts4 <- sf::st_as_sf(s2, as_points = FALSE, na.rm=TRUE),
    ts5 <- stars::st_xy2sfc(s2, as_points = FALSE, na.rm=TRUE),
    fr1 <- spex::polygonize(r, na.rm=FALSE),
    fr2 <- raster::rasterToPolygons(r, na.rm=FALSE),
    fr3 <- as(raster::rasterToPolygons(r, na.rm=FALSE), "sf"),
    fr4 <- sf::st_as_sf(r2, as_points = FALSE, na.rm=FALSE),
    fr5 <- stars::st_xy2sfc(r2, as_points = FALSE, na.rm=FALSE),
    fs1 <- spex::polygonize(s, na.rm=FALSE),
    fs2 <- raster::rasterToPolygons(s, na.rm=FALSE),
    fs3 <- as(raster::rasterToPolygons(s, na.rm=FALSE), "sf"),
    fs4 <- sf::st_as_sf(s2, as_points = FALSE, na.rm=FALSE),
    fs5 <- stars::st_xy2sfc(s2, as_points = FALSE, na.rm=FALSE)
)

mb$expr <- factor(mb$expr, levels=(as_tibble(mb) %>% group_by(expr) %>% summarise(meantime=mean(time)) %>% arrange(meantime))$expr)
mb$na.rm <- FALSE
mb$na.rm[grep("na.rm = TRUE", mb$expr)] <- TRUE
gg <- ggplot(mb, aes(x=time/1000000000, y=expr, color=na.rm)) +
    geom_point() +
    scale_y_discrete("", position = "right") +
    scale_x_continuous("Seconds", expand=c(0.02, 0)) +
    ggtitle("Timings of different methods for polygonizing raster data in R, 5 iterations each")
ggsave(gg, file="timings.png", h=5, w=10)

Result:
Result:

spex::polygonize is by far the fastest. The methods from sf and stars also are missing na.rm so they take a performance hit on a dataset with many NAs, but in any case they do not outperform the other methods. Note that each different function outputs slightly different data: some output SpatialPolygonsDataFrame, some sf and some stars objects.
Overall spex::polygonize is not only the fastest but also the most convenient, outputting a clean simple features collection with a different column for each band (timestep).

Note that, as it should be, polygonizing a multi-band file (s, s2) has almost no performance hit over a single-band file (r, r2).

Support subsetting by index (slice())

Python's xarray conveniently provides a way to subset a dimension by value (sel(), equivalent to our filter()) and by index (isel(), equivalent to dplyr's slice(), I think). How can one achieve the same selection-by-index with stars? If not yet implemented, is this planned?

Setting ob_tran CRS

I have a netcdf file (link) of which I know the proper CRS, but the CRS is not encoded in the file itself:
+proj=ob_tran +o_proj=longlat +o_lat_p=29.0 +o_lon_p=0.0 +lon_0=15.0 +a=6367470 +b=6367470

stars, however, refuses to use it:

st_crs(s) = '+proj=ob_tran +o_proj=longlat +o_lat_p=29.0 +o_lon_p=0.0 +lon_0=15.0 +a=6367470 +b=6367470'
GDAL cannot import PROJ.4 string `+proj=ob_tran +o_proj=longlat +o_lat_p=29.0 +o_lon_p=0.0 +lon_0=15.0 +a=6367470 +b=6367470': returning missing CRS

This is very similar to what happened in r-spatial/sf#651 and has to do, as far as I understand it, with limited support in GDAL for the transformation ob_tran.

In the case of sf, a workaround using lwgeom was possible. Is there anything similar that can be done for stars?

st_bbox() + raster::extent issue

(I am not sure if the stars repo is the best place to open this issue. However, I was unable to use st_bbox() on the raster::extent object without loading stars).

st_bbox always return the same coordinates and crs (-180, -90, 180, 90; 4326) from different raster::extent objects.
The reproducible code is below.

library(raster)                                               
#> Loading required package: sp
library(sf)                                                   
#> Linking to GEOS 3.6.1, GDAL 2.2.1, proj.4 4.9.3
library(stars)                                                
#> Loading required package: abind
r = raster(system.file("external/test.grd", package="raster"))
r                                                             
#> class       : RasterLayer 
#> dimensions  : 115, 80, 9200  (nrow, ncol, ncell)
#> resolution  : 40, 40  (x, y)
#> extent      : 178400, 181600, 329400, 334000  (xmin, xmax, ymin, ymax)
#> coord. ref. : +init=epsg:28992 +towgs84=565.237,50.0087,465.658,-0.406857,0.350733,-1.87035,4.0812 +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs 
#> data source : /home/jakub/R/x86_64-redhat-linux-gnu-library/3.4/raster/external/test.grd 
#> names       : test 
#> values      : 128.434, 1805.78  (min, max)
st_bbox(extent(r))                                            
#> xmin ymin xmax ymax 
#> -180  -90  180   90
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.3 (2017-11-30)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2018-03-30
#> Packages -----------------------------------------------------------------
#>  package   * version    date       source                          
#>  abind     * 1.4-5      2016-07-21 CRAN (R 3.4.1)                  
#>  backports   1.1.2      2017-12-13 CRAN (R 3.4.3)                  
#>  base      * 3.4.3      2017-11-30 local                           
#>  class       7.3-14     2015-08-30 CRAN (R 3.4.3)                  
#>  classInt    0.1-24     2017-04-16 cran (@0.1-24)                  
#>  compiler    3.4.3      2017-11-30 local                           
#>  datasets  * 3.4.3      2017-11-30 local                           
#>  DBI         0.8        2018-03-02 cran (@0.8)                     
#>  devtools    1.13.4     2017-11-09 CRAN (R 3.4.1)                  
#>  digest      0.6.15     2018-01-28 cran (@0.6.15)                  
#>  e1071       1.6-8      2017-02-02 cran (@1.6-8)                   
#>  evaluate    0.10.1     2017-06-24 CRAN (R 3.4.1)                  
#>  graphics  * 3.4.3      2017-11-30 local                           
#>  grDevices * 3.4.3      2017-11-30 local                           
#>  grid        3.4.3      2017-11-30 local                           
#>  htmltools   0.3.6      2017-04-28 CRAN (R 3.4.1)                  
#>  knitr       1.20       2018-02-20 cran (@1.20)                    
#>  lattice     0.20-35    2017-03-25 CRAN (R 3.4.3)                  
#>  magrittr    1.5        2014-11-22 CRAN (R 3.4.1)                  
#>  memoise     1.1.0      2017-04-21 CRAN (R 3.4.1)                  
#>  methods   * 3.4.3      2017-11-30 local                           
#>  raster    * 2.6-7      2017-11-13 cran (@2.6-7)                   
#>  Rcpp        0.12.16    2018-03-13 cran (@0.12.16)                 
#>  rgdal       1.2-16     2017-11-21 cran (@1.2-16)                  
#>  rmarkdown   1.8        2017-11-17 CRAN (R 3.4.1)                  
#>  rprojroot   1.3-2      2018-01-03 CRAN (R 3.4.3)                  
#>  sf        * 0.6-2      2018-03-30 Github (r-spatial/sf@08248e6)   
#>  sp        * 1.2-7      2018-01-19 cran (@1.2-7)                   
#>  stars     * 0.1-1      2018-03-30 Github (r-spatial/stars@290d9ba)
#>  stats     * 3.4.3      2017-11-30 local                           
#>  stringi     1.1.6      2017-11-17 CRAN (R 3.4.1)                  
#>  stringr     1.3.0      2018-02-19 cran (@1.3.0)                   
#>  tools       3.4.3      2017-11-30 local                           
#>  udunits2    0.13       2016-11-17 CRAN (R 3.4.1)                  
#>  units       0.5-1      2018-01-08 cran (@0.5-1)                   
#>  utils     * 3.4.3      2017-11-30 local                           
#>  withr       2.1.1.9000 2018-02-26 Github (jimhester/withr@5d05571)
#>  yaml        2.1.16     2017-12-12 cran (@2.1.16)

Will I be able to create NetCDF files?

Hi and congrats by the proposal grant. Many people are excited about this package, including me.

I have a question and not really an issue.

Will we be able to create NetCDF files? I have my package which creates a flux of emissions in a grid (currently as SpatialGridDataFrame but I'm gonna migrate to SF soon) and it would be very beneficial to me to have a tool to create NetCDF from this grids. The idea is to create inputs for atmospheric models such as WRF-Chem, CMAQ, BRAMS etc.

Thanks and congrats

Add geom_stars for ggplot plotting

A while back sf has implemented a geom_sf which provides a ggplot2 plotting interface for sf objects. And it is awesome.

stars imho should also provide something similar.

str problem with units class

I see

x <- read_stars(system.file("nc/reduced.nc", package = "stars"))
str(x)
#List of 4
# $ sst :Error in Ops.units(ao, 1e-10) : 
# both operands of the expression should be "units" objects

I can't see where this is happening, it makes things a bit hard because the units are not at all relevant for mechanical uses, though I can unclass to get the information needed, but only if I know something about the structure already i.e.

str(unclass(x))  ## no help

str(unclass(x[[1]]))  ## now I have hook

Is there any way we could lighten up on the units formality to make exploration easier?

360_day calendar attribute is ignored ; a warning could be issued

Several climate models do not use the gregorian calendar in their NetCDF output: they often use 360_day calendars, which are quite challenging to deal with in R (AFAIK only the PCICt package can help, Python has more advanced tools for this). See the CF-Conventions for the list of convened-upon NetCDF calendars.

stars currently ignores the calendar option and treats the data (using units) as if it was in a gregorian calendar, as it is the only calendar supported (see also r-quantities/units#80).

It would be helpful if a warning was issued if a calendar attribute was found with value other than gregorian or standard, and maybe proleptic_gregorian.

However I do not know if you want to introduce such an ad-hoc check into stars or units.

fasterize compatibility

I envisioned fasterize as a link between sf and raster data types. Only one (one-way) operation is currently implemented but I had planned a fast version of raster2polygons, and potentially others. I had just been thinking about outputs besides in-memory rasters, and would be happy to try to implement stars-style input and output formats.

If netcdf files with identical timestamps are read in, only the first one is actually read

I can attach files if needed, but it's trivial to reproduce by taking one file and copying it with a different name.
This holds true for files with any number of timesteps.

read_stars(c("file.nc", "file_copy.nc"), sub="pr")
pr, 
pr, 
pr, 
stars object with 3 dimensions and 1 attribute
attribute(s):
 NETCDF:"file.nc":pr [kg/m^2/s]
 Min.   :0.000e+00                          
 1st Qu.:7.000e-08                          
 Median :4.038e-06                          
 Mean   :1.205e-05                          
 3rd Qu.:1.793e-05                          
 Max.   :4.492e-03                          
dimension(s):
     from  to              offset  delta                       refsys point
x       1 527            -3162000  12000 +proj=lcc +lat_1=30 +lat_...    NA
y       1 527             3162000 -12000 +proj=lcc +lat_1=30 +lat_...    NA
time    1   1 1970-03-31 01:30:00     NA                      POSIXct    NA
     values
x      NULL
y      NULL
time   NULL

bowerbird package to build example data sources

This a heads-up about one tool to build example data sets. We can use this to build collections of a variety of file sources for intensive testing. I am happy to help anyone with this. This issue is just a general resource, it can be closed but otherwise having a general place to put links and resources is useful (perhaps the Wiki tab here?)

bowerbird is our general tool for getting data, including rasters, remote sensing, model output in NetCDF, GRIB, HDF, vector data (any format actually, though most in practice are "gridded") - it can be used to build test file sets, and will keep the collection up to date with efficient source/target comparison

https://github.com/AustralianAntarcticDivision/bowerbird

One very simple example is the daily Optimally Interpolated Sea Surface Temperature data (1981 to now) on a regular 0.25 degree grid, one file is used in this early example:

#5

Data reading and writing in chunked NetCDF files

Just a heads up.
As you know NetCDF4 files are chunked: the data is stored in big data blocks which can be of any dimension. The chunks can be zipped. Different variables can have different chunking schemes and compression.

Say I have a XYT 100x100x50 dataset. If I want to access a single time slice, a chunking of 100x100x1 makes much more sense than a chunking of 10x10x50. The opposite of course is true if I want to read time series.

  1. When writing NetCDF files (if this option will be provided), stars should provide an interface to choose the chunking. Zipping should be an option too.
  2. When opening NetCDF files, stars should provide information about the chunking.
  3. When reading data from NetCDF files, stars should use the most efficient access method given the chunk sizes, i.e. read as few chunks as possible and avoid reading them multiple times (yes, some programs do this, ignoring the chunking specs).

AFAIK this should all be doable through GDAL, but I might be wrong.

Other examples

The abstract is very clear. I don't know if it's relevant to include these examples in the abstract, but I wanted to share my thoughts:

  • Spatial simulation often yield multidim rasters.
  • Multi-band time series such as Landsat series are a good example of something that requires heavy adaptation of current infrastructures
  • This is an extension and it might not apply to the current proposal, but raster catalog (a large list of -potentially sparse-- rasters) is also not very well supported, but it is similar to running small batches.

Having great support for gdal vrt format could be a good infrastructure to build on, I think it is listed in the proposal?

reading grib files

Hello,

One of the most commonly used formats in the world of numerical weather prediction is the grib format. Will it be possible with the package stars to read the metadata of grib files ?

Grib files are already readable thanks the rgdal and raster packages but, unless I'm mistaken, it is not possible to have access to their metadata. We can not know, for example, which weather variables are included in these files, what are the forecasting steps, the run of the forecast etc ...

Two other packages, rNOMADS and gribr, exist but they are for the moment rather incomplete and do not benefit from all the formalism already developed for sf and stars.

Files in Grib format are available here:
ftp://nomads.ncdc.noaa.gov/GFS/Grid4/201803/20180311/

A tool developed by the European center to read these files with a python API exists:
https://software.ecmwf.int/wiki/display/ECC/ecCodes+Home

Maybe this post is off-topic for the stars package but be aware that if you made reading grib files easier with R, it could benefit a lot of people, avoiding to convert grib files into netcdf ones with external tools.

Best regards,

Ch.Chaussin

issue with "new_dim" temporary name

I don't totally get the context yet, so this is a bit of a blind report - but it's a useful data set for experimenting with high-D.

This file has an interesting set of two 4D and two 5D variables (subdatasets), they are individually on four different grids.

https://github.com/potterzot/easyncdf/raw/master/tests/data/test.nc4

Obtain that file, or install the package, get the sds names:

f <- system.file("tests/data/test.nc4", package = "easyncdf")
sds <- c("Precipitable_water_entire_atmosphere_single_layer_ens", "Total_precipitation_surface_6_Hour_Accumulation_ens", 
"v-component_of_wind_isobaric_ens", "Minimum_temperature_height_above_ground_6_Hour_Minimum_ens")
## ncmeta::nc_vars(f) %>% dplyr::filter(ndims > 3) %>% dplyr::pull(name)

Each variable ends up with "new_dim", as an extra dim - which doesn't belong:

lapply(sds, function(x) names(st_dimensions(stars::read_stars(f, sub = x))))

I believe that means this line must be a loop so that dim_name is passed explicitly via c, but there's probably a better solution (and I couldn't get it to work).

structure(do.call(c, ret), names = nms)

Error in st_as_sf for subsets of data

I'm getting an error when trying to st_as_sf() a subset of a dataset, if the subset does not start from 1:

library(stars)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)
st_as_sf(x[,,,1], as_points=FALSE) #works
st_as_sf(x[,1:200,1:200,1], as_points=FALSE) #works
st_as_sf(x[,1:2,1:2,1], as_points=FALSE) #works
st_as_sf(x[,2:3,2:3,1], as_points=FALSE) #Does not work:
#Error in xy2sfc(cc, dims, as_points) : prod(dm) == nrow(cc) is not TRUE

Provide a method for modifying dimensions

Sometimes I want to modify dimensions. Say, for example, my file has 4 timesteps, one for each season. They have timestamps associated to them, but I'd prefer to have character values c("DJF", "MAM", "JJA", "SON").

Currently one can do this with the following:

attr(ss2, "dimensions")[["time"]]$values <- c("DJF", "MAM", "JJA", "SON")

I realize however this is not elegant nor, probably, correct, since I should also change refsys and possibly other metadata.
A mechanism to help with this could be handy... in case it is not there already and I completely missed it!

Support for reading station data timeseries (NetCDF example)

Followup of our discussions at EGU: stars aims at reading station data interpreting the X-Y coordinates correctly as sf point features. I did a bit of digging on the topic, although I can only speak for NetCDF data since that 's what I know best.

The CF conventions (ch. 9) define a set of supported features: point, timeSeries, trajectory, profile, timeSeriesProfile, trajectoryProfile. The convention goes into the details on how these should be defined, see here for timeSeries for example. There are several recommendations and a few mandatory attributes.

Here is how to create an example timeseries file for 10 stations over 20 timesteps using R:

library(ncdf4)

time <- ncdim_def("time"   , "days since 1970-01-01 00:00:00 UTC", vals=as.integer(1:20), longname="time", calendar="gregorian")
stat <- ncdim_def("station", ""                                  , vals=as.integer(1:10), create_dimvar=FALSE)

num  <- ncvar_def("num" , ""              , list(stat)      , longname="Station number", prec="integer")
var  <- ncvar_def("pr"  , "kg m-2 s-1"    , list(time, stat), longname="Total precipitation flux", missval=-10)
lat  <- ncvar_def("lat" , "degrees_north" , list(stat)      , longname="Station latitude")
lon  <- ncvar_def("lon" , "degrees_east"  , list(stat)      , longname="Station longitude")
alt  <- ncvar_def("alt" , "m"             , list(stat)      , longname="Vertical distance above the surface")

nc <- nc_create("timeseries.nc", list(num, var, lat, lon, alt), force_v4=TRUE)

ncvar_put(nc, num, vals=as.integer(1:10))
ncvar_put(nc, var, vals=runif(20 * 10, 0, 100/86400))
ncvar_put(nc, lat, vals=40:49)
ncvar_put(nc, lon, vals=10:19)
ncvar_put(nc, alt, vals=100*(1:10))

ncatt_put(nc, var, "coordinates", "lat lon alt num")
ncatt_put(nc, num, "cf_role", "timeseries_id")
ncatt_put(nc, var, "standard_name", "precipitation_flux")
ncatt_put(nc, lat, "standard_name", "latitude")
ncatt_put(nc, lon, "standard_name", "longitude")
ncatt_put(nc, alt, "standard_name", "height")

ncatt_put(nc, 0, "featureType", "timeSeries")
ncatt_put(nc, 0, "Conventions", "CF-1.7")

nc_close(nc)

The created file has metadata (ncdump -h timeseries.nc):

netcdf timeseries {
dimensions:
        station = 10 ;
        time = 20 ;
variables:
        int num(station) ;
                num:long_name = "Station number" ;
                num:cf_role = "timeseries_id" ;
        int time(time) ;
                time:units = "days since 1970-01-01 00:00:00 UTC" ;
                time:long_name = "time" ;
                time:calendar = "gregorian" ;
        float pr(station, time) ;
                pr:units = "kg m-2 s-1" ;
                pr:_FillValue = -10.f ;
                pr:long_name = "Total precipitation flux" ;
                pr:coordinates = "lat lon alt num" ;
                pr:standard_name = "precipitation_flux" ;
        float lat(station) ;
                lat:units = "degrees_north" ;
                lat:long_name = "Station latitude" ;
                lat:standard_name = "latitude" ;
        float lon(station) ;
                lon:units = "degrees_east" ;
                lon:long_name = "Station longitude" ;
                lon:standard_name = "longitude" ;
        float alt(station) ;
                alt:units = "m" ;
                alt:long_name = "Vertical distance above the surface" ;
                alt:standard_name = "height" ;

// global attributes:
                :featureType = "timeSeries" ;
                :Conventions = "CF-1.7"
}

This AFAICT should be 100% CF-compliant.

stars currently does not understand what to do with this dataset:

library(stars)
(a <- read_stars("timeseries.nc"))
stars object with 2 dimensions and 1 attribute
attribute(s):
 timeseries.nc      
 Min.   :6.871e-07  
 1st Qu.:3.588e-04  
 Median :6.584e-04  
 Mean   :6.104e-04  
 3rd Qu.:8.768e-04  
 Max.   :1.151e-03  
dimension(s):
  from to offset delta refsys point values
x    1 20     NA    NA     NA    NA   NULL
y    1 10     NA    NA     NA    NA   NULL
Warning messages:
1: In CPL_read_gdal(x, options, driver, read_data, NA_value) :
  GDAL Message 1: dimension #1 (time) is not a Longitude/X dimension.
2: In CPL_read_gdal(x, options, driver, read_data, NA_value) :
  GDAL Message 1: dimension #0 (station) is not a Latitude/Y dimension.

Obviously we would want to have single sf point dimension (lat, lon, elevation), plus a time dimension.

Whether this needs to be implemented by GDAL or by stars directly I do not know.

Support for curvilinear grids

I added some Sentinel5P data to the starsdata package, and examples of creating, plotting and subsampling curvilinear grids here (yes, starsdata is now on travis too). Comments are welcome.

I think that reading such grids with GDAL will never work, since this is a data model not covered by GDAL; reading it with a read_ncdf driver could work directly, I guess.

About naming: we now have a read_stars and a read_ncdf. Is the idea to merge these into read_stars, or to keep them split into read_gdal and read_ncdf?

Error while reading file: time unit not recognized

I am trying to open this 700KB netcdf file using stars, but I get the following error:

read_stars("chym-intdb-fil_masked_reg1_22_01.nc")   
Error in as.POSIXct.numeric(pr$dim_extra[[v]]) : 
  'origin' must be supplied
In addition: Warning message:
ignoring unrecognized unit: day as %Y%m%d.%f

The reason is that the time in the file is specified in this weird format:

        double time(time) ;
                time:standard_name = "time" ;
                time:units = "day as %Y%m%d.%f" ;
                time:calendar = "proleptic_gregorian" ;
                time:axis = "T"

Maybe Stars should:
a) (maybe) be able to read such format (this is one for the units package) and/or
b) be robust to this kind of problems, with a flag to ignore errors and return NA values for the problematic variables, so that one can fix the problem by issuing st_set_dimensions afterwards

Subsetting with a polygon with different CRS should error more explicitly if CRS is different

I am subsetting a stars object with a polygon, just like in the blog post:

tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)[,,,1]
pol <- x %>% st_bbox() %>% st_as_sfc() %>% st_centroid() %>% st_buffer(300)
plot(x[pol])   #works
plot(x[st_transform(pol, 4326)])   #Does not work

The error is:

Error in .Primitive("[")(x = L7_ETMs.tif, c(1L, 0L, -1L, -2L, -3L, -4L,  : 
  only 0's may be mixed with negative subscripts

Which is not very explicit and can be confusing to newcomers. An error like the one returned by st_intersection() (Error: st_crs(x) == st_crs(y) is not TRUE) would be more fitting.

Also, when using a different, much larger dataset (which I cannot share right now due to network limitations), the error instead is:

Error in `[.default`(x = `NETCDF:"griddedhourly2.nc":pr`, c(2848691L,  : 
  subscript out of bounds

But the function hangs instead of closing, forcing CTRL+C. In this case, too, manually transforming the CRS beforehand fixes the problem.

improve vignettes

Add to the vignettes without results (eval=FALSE) links to where the vignettes (blog posts) with evaluate=TRUE are found.

Calculate area of cells and/or provide a way to calculate area-weighted statistics

In many cases, the area of the cells of a stars object is not constant (e.g. lat-lon grids, or irregular grids in general). It would sometimes be useful to be able to obtain the area with the likes of st_area().

One example use-case is doing field (x-y) averages of a given variable over the third dimension:

tif = system.file("tif/L7_ETMs.tif", package = "stars")
x1 = read_stars(tif)
st_apply(x1, 3, mean)

This average, for irregular grids, is not area-weighted and thus can differ quite a bit from the real area average. Being able to calculate areas without going through sf would facilitate the correct computation of field means.

Also, if not too complicated, it would be nice if a way to calculate area-weighted statistics (any-dimension-weighted, really) were provided... although I have no idea on how this could be implemented.

Sparse matrix data linked to rasters

Helo, I was wondering to turn my one-off code that manipulates sparse matrices (from Matrix and my mefa4 package) and raster objects, but then I run into this project. I was wondering if there is anything on the horizon re supporting sparse matrix representations along with simple features and the proposed raster type objects?

To put my question into context, here is a use case from my experience. Quite often in ecology we summarize landscapes for prediction in discrete units (unit area pixels in a grid, or some mostly square and similar sized units from legal township surveys, see example). Within each unit, composition of land cover is assessed (either satellite based pixels or resource inventory polygons). This naturally leads to sparsity (~10% matrix fill) as the number of classes increases, because there is usually quite a bit of class imbalance.

I'd be happy to provide prototype or collaborate if there is interest within the project.

names of spatial raster dimensions need to be x and y

@mdsumner I noted that in

> library(stars)
Loading required package: abind
Loading required package: sf
Linking to GEOS 3.6.2, GDAL 2.2.3, proj.4 4.9.3
> example(read_ncdf)

rd_ncd> f <- system.file("nc/reduced.nc", package = "stars")

rd_ncd> read_ncdf(f)
stars object with 4 dimensions and 4 attributes
attribute(s):
     anom               err             ice             sst       
 Min.   :-10.160   Min.   :0.110   Min.   :0.010   Min.   :-1.80  
 1st Qu.: -0.580   1st Qu.:0.160   1st Qu.:0.470   1st Qu.:-0.03  
 Median : -0.080   Median :0.270   Median :0.920   Median :13.65  
 Mean   : -0.186   Mean   :0.263   Mean   :0.718   Mean   :12.99  
 3rd Qu.:  0.210   3rd Qu.:0.320   3rd Qu.:0.960   3rd Qu.:24.81  
 Max.   :  2.990   Max.   :0.840   Max.   :1.000   Max.   :32.97  
 NA's   :4448      NA's   :4448    NA's   :13266   NA's   :4448   
dimension(s):
     from  to offset delta refsys point values
lon     1 180      0     2     NA    NA   NULL
lat     1  90    -89     2     NA    NA   NULL
zlev    1   1      0   NaN     NA    NA   NULL
time    1   1   1460   NaN     NA    NA   NULL

the coordinates are named lon and lat. This makes perfect sense from a NetCDF point of view (where dimensions are named, although not standardized: lon may also be long or longitude), but in GDAL they are not named (though function have names with "x", "y" and "band").

Following GDAL (and sp where coordinate dimensions names are not well implemented, and simple features where they are hard coded named X Y Z M) in stars I made the design decision that x and y raster coordinates are named x and y. Objects like above are valid, but won't plot and won't integrate with sf objects without changing the coordinate names. I don't find that terribly elegant, but do you, @mdsumner see another way out? An alternative would e.g. be the first dimension is always (assumed to be) x, the second (assumed to be) y. Right now it is arbitrary which dimension numbers are x and y.

Subsetting stars by variable in a non global environment

We encountered an issue, when you try to subset a stars object with a variable that was not defined in the global environment. With the following code snippet you can reproduce the error.

library(stars)
foo = function(index) {
  tif = system.file("tif/L7_ETMs.tif", package = "stars")
  x = read_stars(tif)
  x[,,,index]
}
foo(index=1)

Error in eval(mc, x) : object 'index' not found

It seem that there is a scope issue when calling eval here.

Add support for discrete non-square (hexagonal) raster grids

On Roger Bivand's suggestion, I'm reposting here from R-sig-Geo:

I am wondering if current R raster operations (e.g. extract(), resample(), distance(), etc.) can be modified to work with honeycomb (hexagonal) grids instead of simple square grids? This is driven by e.g.
work at Uber https://www.slideshare.net/stonse/ml-and-data-science-at-uber-gitpro-talk-2017, and other articles http://strimas.com/spatial/hexagonal-grids/ and also R packages:

I'm being asked to evaluate the pros/cons of using different tessellations, and wondering what's the extent of R's support for non square grids.

Thanks!

OpenDataCube

A python interface, postGIS index to collection of netCDF files, here.

See this gdal-dev thread.

stars::st_transform produces different st_bbox than sf::st_transform

As a follow-up to the discussion in #12 (comment) I found some strange behaviour when calculating st_bbox on a transformed stars object. I think the resulting bbox should be identical to what we receive when we transform the bbox only (using sf).

library(mapview)
library(stars)
library(raster)

r = raster::raster(nrows = 4, ncols = 3, ymn = 48, ymx = 49, xmn = 10, xmx = 11)
r[] = 1:12
x = st_as_stars(r)

bb4326 = st_as_sfc(st_bbox(x))
bb3857 = st_transform(bb4326, crs = 3857)
bb3857_2 = st_as_sfc(st_bbox(st_transform(x, 3857)))
> bb3857

Geometry set for 1 feature 
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: 1113195 ymin: 6106855 xmax: 1224514 ymax: 6274861
epsg (SRID):    3857
proj4string:    +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs
POLYGON ((1113195 6106855, 1224514 6106855, 122...
> bb3857_2

Geometry set for 1 feature 
geometry type:  POLYGON
dimension:      XY
bbox:           xmin: 1113195 ymin: 6077470 xmax: 1231630 ymax: 6274861
epsg (SRID):    NA
proj4string:    +proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs
POLYGON ((1113195 6077470, 1231630 6077470, 123...
mapview(bb3857_2, col.regions = "red") + mapview(bb3857)

screenshot at 2018-01-20 13 06 14

As we can see only xmin and ymax are identical.
The raster equivalent produces the same extent as bb3857 when using projectExtent but something different altogether when projecting the raster first and then computing the extent.

ext3857 = extent(projectExtent(r, crs = "+init=epsg:3857"))
ext3857_2 = extent(projectRaster(r, crs = "+init=epsg:3857", method = "ngb"))

> ext3857
class       : Extent 
xmin        : 1113195 
xmax        : 1224514 
ymin        : 6106855 
ymax        : 6274861 
> ext3857_2
class       : Extent 
xmin        : 1113195 
xmax        : 1224495 
ymin        : 6106861 
ymax        : 6274861 


> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stars_0.0          magrittr_1.5       dplyr_0.7.4        sf_0.6-1           mapview_2.2.2     
[6] leaflet_1.1.0.9000 raster_2.6-7       sp_1.2-6          

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.14      bindr_0.1         compiler_3.4.3    plyr_1.8.4        R.methodsS3_1.7.1
 [6] R.utils_2.5.0     base64enc_0.1-3   iterators_1.0.8   class_7.3-14      tools_3.4.3      
[11] gdalUtils_2.0.1.7 digest_0.6.12     jsonlite_1.5      tibble_1.3.4      satellite_1.0.0  
[16] lattice_0.20-35   viridisLite_0.2.0 pkgconfig_2.0.1   rlang_0.1.4       png_0.1-7        
[21] foreach_1.4.3     shiny_1.0.4       DBI_0.7           crosstalk_1.0.0   parallel_3.4.3   
[26] yaml_2.1.14       rgdal_1.2-8       bindrcpp_0.2      e1071_1.6-8       gdtools_0.1.6    
[31] htmlwidgets_0.9   webshot_0.4.1     stats4_3.4.3      classInt_0.1-24   grid_3.4.3       
[36] svglite_1.2.1     glue_1.1.1        R6_2.2.2          udunits2_0.13     scales_0.4.1.9002
[41] codetools_0.2-15  htmltools_0.3.6   units_0.5-1       abind_1.4-5       assertthat_0.2.0 
[46] mime_0.5          xtable_1.8-2      colorspace_1.3-2  httpuv_1.3.5      munsell_0.4.3    
[51] R.oo_1.21.0      

adding to stars plot does not work

library(stars)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)
pol <- x %>% st_bbox() %>% st_as_sfc() %>% st_centroid() %>% st_buffer(300)
plot(x[pol][,,,1], reset = FALSE)
plot(pol, add = TRUE)

should add the polygon, but doesn't

How to rename dimensions?

How can one rename a dimension in a stars object? Maybe a st_dimensions<- method would be appropriate for this? Or maybe an option to st_set_dimensions()?

Error in CPL_read_gdal(x, options, driver, read_data, NA_value) : negative length vectors are not allowed

I am getting the error in the title when trying to open a 450MB NetCDF file which was created with gdal_translate. What is causing it?
Here is the description of the file:

read_stars_meta

read_stars_meta("DEM20.nc")
$filename                                                                                                                                                            
[1] "/home/afantini/places/clima-archive/flood/DEM/PCN_IT_DEM/DEM20.nc"                                                                                              

$driver                                                                                                                                                              
[1] "netCDF"                     "Network Common Data Format"                                                                                                        

$cols                                                                                                                                                                
[1]     1 52281                                                                                                                                                      

$rows                                                                                                                                                                
[1]     1 67098                                                                                                                                                      

$bands                                                                                                                                                               
[1] 1 1                                                                                                                                                              

$proj_wkt                                                                                                                                                            
[1] "PROJCS[\"UTM Zone 32, Northern Hemisphere\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"METERS\",1]]"                                                                                               

$proj4string                                                                                                                                                         
[1] "+proj=utm +zone=32 +datum=WGS84 +units=m +no_defs "                                                                                                             

$geotransform                                                                                                                                                        
[1]  309908.1      20.0       0.0 5271300.4       0.0     -20.0                                                                                                      

$datatype                                                                                                                                                            
[1] "Int16"                                                                                                                                                          

$sub                                                                                                                                                                 
[1] NA                                                                                                                                                               

$meta                                                                                                                                                                
 [1] "Band1#grid_mapping=transverse_mercator"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
 [2] "Band1#long_name=GDAL Band Number 1"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
 [3] "Band1#_FillValue=-32768"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
 [4] "NC_GLOBAL#Conventions=CF-1.5"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
 [5] "NC_GLOBAL#GDAL=GDAL 2.1.0, released 2016/04/25"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
 [6] "NC_GLOBAL#history=Fri May 05 11:47:09 2017: GDAL CreateCopy( /home/netapp-clima-users1/afantini/DEM20.nc, ... )"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
 [7] "transverse_mercator#false_easting=500000"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
 [8] "transverse_mercator#false_northing=0"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
 [9] "transverse_mercator#GeoTransform=309908.1464839161 20 0 5271300.41447363 0 -20 "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
[10] "transverse_mercator#grid_mapping_name=transverse_mercator"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
[11] "transverse_mercator#inverse_flattening=298.257223563"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
[12] "transverse_mercator#latitude_of_projection_origin=0"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[13] "transverse_mercator#longitude_of_central_meridian=9"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
[14] "transverse_mercator#longitude_of_prime_meridian=0"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
[15] "transverse_mercator#long_name=CRS definition"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
[16] "transverse_mercator#scale_factor_at_central_meridian=0.9996"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
[17] "transverse_mercator#semi_major_axis=6378137"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
[18] "transverse_mercator#spatial_ref=PROJCS[\"UTM Zone 32, Northern Hemisphere\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"METERS\",1]]"                                                              
[19] "x#long_name=x coordinate of projection"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
[20] "x#standard_name=projection_x_coordinate"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
[21] "x#units=m"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
[22] "y#long_name=y coordinate of projection"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
[23] "y#standard_name=projection_y_coordinate"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
[24] "y#units=m"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    

$band_meta                                                                                                                                                           
$band_meta[[1]]                                                                                                                                                      
[1] "grid_mapping=transverse_mercator" "long_name=GDAL Band Number 1"                                                                                                
[3] "NETCDF_VARNAME=Band1"             "_FillValue=-32768"                                                                                                           


$units                                                                                                                                                               
[1] NA                                                                                                                                                               

$point                                                                                                                                                               
[1] NA                                                                                                                                                               

attr(,"dims")                                                                                                                                                        
    x     y bands                                                                                                                                                    
52281 67098     1                                                                                                                                                    
attr(,"file_names")                                                                                                                                                  
[1] "/home/afantini/places/clima-archive/flood/DEM/PCN_IT_DEM/DEM20.nc"                                                                                              
attr(,"dimensions")                                                                                                                                                  
      from    to   offset delta                       refsys point values                                                                                            
x        1 52281 309908.1    20 +proj=utm +zone=32 +datum...    NA   NULL                                                                                            
y        1 67098  5271300   -20 +proj=utm +zone=32 +datum...    NA   NULL                                                                                            
bands    1     1       NA    NA                           NA    NA   NULL                                                                                            
attr(,"class")                                                                                                                                                       
[1] "stars_meta"

gdalinfo

gdalinfo DEM20.nc
Driver: netCDF/Network Common Data Format                                                                                                                            
Files: DEM20.nc                                                                                                                                                      
Size is 52281, 67098                                                                                                                                                 
Coordinate System is:                                                                                                                                                
PROJCS["UTM Zone 32, Northern Hemisphere",                                                                                                                           
    GEOGCS["WGS 84",                                                                                                                                                 
        DATUM["WGS_1984",                                                                                                                                            
            SPHEROID["WGS 84",6378137,298.257223563,                                                                                                                 
                AUTHORITY["EPSG","7030"]],                                                                                                                           
            AUTHORITY["EPSG","6326"]],                                                                                                                               
        PRIMEM["Greenwich",0,                                                                                                                                        
            AUTHORITY["EPSG","8901"]],                                                                                                                               
        UNIT["degree",0.0174532925199433,                                                                                                                            
            AUTHORITY["EPSG","9122"]],                                                                                                                               
        AUTHORITY["EPSG","4326"]],                                                                                                                                   
    PROJECTION["Transverse_Mercator"],                                                                                                                               
    PARAMETER["latitude_of_origin",0],                                                                                                                               
    PARAMETER["central_meridian",9],                                                                                                                                 
    PARAMETER["scale_factor",0.9996],                                                                                                                                
    PARAMETER["false_easting",500000],                                                                                                                               
    PARAMETER["false_northing",0],                                                                                                                                   
    UNIT["METERS",1]]                                                                                                                                                
Origin = (309908.146483916090801,5271300.414473630487919)
Pixel Size = (20.000000000000000,-20.000000000000000)
Metadata:
  Band1#grid_mapping=transverse_mercator
  Band1#long_name=GDAL Band Number 1
  Band1#_FillValue=-32768
  NC_GLOBAL#Conventions=CF-1.5
  NC_GLOBAL#GDAL=GDAL 2.1.0, released 2016/04/25
  NC_GLOBAL#history=Fri May 05 11:47:09 2017: GDAL CreateCopy( /home/netapp-clima-users1/afantini/DEM20.nc, ... )
  transverse_mercator#false_easting=500000
  transverse_mercator#false_northing=0
  transverse_mercator#GeoTransform=309908.1464839161 20 0 5271300.41447363 0 -20 
  transverse_mercator#grid_mapping_name=transverse_mercator
  transverse_mercator#inverse_flattening=298.257223563
  transverse_mercator#latitude_of_projection_origin=0
  transverse_mercator#longitude_of_central_meridian=9
  transverse_mercator#longitude_of_prime_meridian=0
  transverse_mercator#long_name=CRS definition
  transverse_mercator#scale_factor_at_central_meridian=0.9996
  transverse_mercator#semi_major_axis=6378137
  transverse_mercator#spatial_ref=PROJCS["UTM Zone 32, Northern Hemisphere",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",9],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["METERS",1]]
  x#long_name=x coordinate of projection
  x#standard_name=projection_x_coordinate
  x#units=m
  y#long_name=y coordinate of projection
  y#standard_name=projection_y_coordinate
  y#units=m
Corner Coordinates:
Upper Left  (  309908.146, 5271300.414) (  6d28'21.68"E, 47d34' 2.15"N)
Lower Left  (  309908.146, 3929340.414) (  6d54'15.98"E, 35d29'21.97"N)
Upper Right ( 1355528.146, 5271300.414) ( 20d15'59.73"E, 47d 2'15.85"N)
Lower Right ( 1355528.146, 3929340.414) ( 18d22'39.42"E, 35d 8'32.39"N)
Center      (  832718.146, 4600320.414) ( 12d59' 6.49"E, 41d29' 7.62"N)
Band 1 Block=52281x1 Type=Int16, ColorInterp=Undefined
  NoData Value=-32768
  Metadata:
    grid_mapping=transverse_mercator
    long_name=GDAL Band Number 1
    NETCDF_VARNAME=Band1
    _FillValue=-32768

ncdump

ncdump -h DEM20.nc
netcdf DEM20 {
dimensions:
        x = 52281 ;
        y = 67098 ;
variables:
        char transverse_mercator ;
                transverse_mercator:grid_mapping_name = "transverse_mercator" ;
                transverse_mercator:longitude_of_central_meridian = 9. ;
                transverse_mercator:false_easting = 500000. ;
                transverse_mercator:false_northing = 0. ;
                transverse_mercator:latitude_of_projection_origin = 0. ;
                transverse_mercator:scale_factor_at_central_meridian = 0.9996 ;
                transverse_mercator:long_name = "CRS definition" ;
                transverse_mercator:longitude_of_prime_meridian = 0. ;
                transverse_mercator:semi_major_axis = 6378137. ;
                transverse_mercator:inverse_flattening = 298.257223563 ;
                transverse_mercator:spatial_ref = "PROJCS[\"UTM Zone 32, Northern Hemisphere\",GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563,AUTHORITY[\"EPSG\",\"7030\"]],AUTHORITY[\"EPSG\",\"6326\"]],PRIMEM[\"Greenwich\",0,AUTHORITY[\"EPSG\",\"8901\"]],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AUTHORITY[\"EPSG\",\"4326\"]],PROJECTION[\"Transverse_Mercator\"],PARAMETER[\"latitude_of_origin\",0],PARAMETER[\"central_meridian\",9],PARAMETER[\"scale_factor\",0.9996],PARAMETER[\"false_easting\",500000],PARAMETER[\"false_northing\",0],UNIT[\"METERS\",1]]" ;
                transverse_mercator:GeoTransform = "309908.1464839161 20 0 5271300.41447363 0 -20 " ;
        double x(x) ;
                x:standard_name = "projection_x_coordinate" ;
                x:long_name = "x coordinate of projection" ;
                x:units = "m" ;
        double y(y) ;
                y:standard_name = "projection_y_coordinate" ;
                y:long_name = "y coordinate of projection" ;
                y:units = "m" ;
        short Band1(y, x) ;
                Band1:long_name = "GDAL Band Number 1" ;
                Band1:_FillValue = -32768s ;
                Band1:grid_mapping = "transverse_mercator" ;

// global attributes:
                :Conventions = "CF-1.5" ;
                :GDAL = "GDAL 2.1.0, released 2016/04/25" ;
                :history = "Fri May 05 11:47:09 2017: GDAL CreateCopy( /home/netapp-clima-users1/afantini/DEM20.nc, ... )" ;
}

Move C++ parts of stars (reading GDAL) to sf

This will have several advantages:

  • only one package to maintain with GDAL bindings
  • st_read could return a stars object when reading a raster dataset (you could argue that is not type safe, but there is little point in having an st_read for feature, and st_stars for raster/array data; the tidy analogues read_sf and read_stars would make sense)
  • it would be easier for early adopters to install dev versions of stars because no compiling required

Provide a way for subsetting with polygons without cropping the bounding box

I have a particular task in which I subset several gridded files (identical in X-Y size) with different polygons, then write them back by going through raster. The issue is that after subsetting all the files have different extent, since by default subsetting crops the stars item to the polygon's bounding box.

Would it be possible to provide an option for subsetting (something like stars[sf, crop = FALSE]) which allows for the bounding box to be retained, but the external values set to NA?

Dimensions do not show up as changed after filter(), but underlying data is correctly filtered

There is no visual indication that the size of a filter()ed stars object has changed, take e.g. the example file:

tif = system.file("tif/L7_ETMs.tif", package = "stars")
(x = read_stars(tif))

Gives:

dimension(s):
     from  to  offset delta                       refsys point values
x       1 349  288776  28.5 +proj=utm +zone=25 +south... FALSE   NULL
y       1 352 9120761 -28.5 +proj=utm +zone=25 +south... FALSE   NULL
band    1   6      NA    NA                           NA    NA   NULL

While if we filter it x %>% filter(band>3) gives... exactly the same dimensions (but different summary values, omitted here):

dimension(s):
     from  to  offset delta                       refsys point values
x       1 349  288776  28.5 +proj=utm +zone=25 +south... FALSE   NULL
y       1 352 9120761 -28.5 +proj=utm +zone=25 +south... FALSE   NULL
band    1   6      NA    NA                           NA    NA   NULL

dim(x) ; dim(x %>% filter(band>3)) gives correct dimension sizes:

   x    y band 
 349  352    6 
   x    y band 
 349  352    3

Stars/GDAL reading wrong data

See this file: a 4-dimension x-y-t-bin file with counts of occurrences of a given event for 426x426 cells, 4 timesteps and 201 (intensity) bins. The file metadata is misleading, it is not precipitation, disregard it.

Lets first open the file with ncdf4 and see the total count of occurrences in bin number 1, for each timestep and for the whole region:

library(ncdf4)
nc = nc_open("testfile.nc")
nc #dimension order is x,y,bin,time
sum(ncvar_get(nc, "pr", start=c(1,1,1,1), count=c(-1,-1,1,1)))
# 432302623 for timestep 1 in bin 1
sum(ncvar_get(nc, "pr", start=c(1,1,1,2), count=c(-1,-1,1,1)))
# 441358876 for timestep 2 in bin 1
sum(ncvar_get(nc, "pr", start=c(1,1,1,3), count=c(-1,-1,1,1)))
# 418819656 for timestep 3 in bin 1
sum(ncvar_get(nc, "pr", start=c(1,1,1,4), count=c(-1,-1,1,1)))
# 409579295 for timestep 4 in bin 1

Let's try with stars:

library(stars)
s = read_stars("testfile.nc")
names(s) = "Count"
s #dimension order is x,y,time,bin
sum(s[,,,1,1]$Count)
# 432302623  for timestep 1 in bin 1, correct
sum(s[,,,2,1]$Count)
# 34734822 for timestep 2 in bin 1, wrong
sum(s[,,,3,1]$Count)
# 9828848 for timestep 3 in bin 1, wrong
sum(s[,,,4,1]$Count)
# 3336025 for timestep 4 in bin 1, wrong

What am I doing wrong? Is GDAL reading the file wrongly, or is it me?

Errors in plot() and for small subset of data

I'm getting an error when plotting small subsets of data:

library(stars)
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)
plot(x) # works
plot(x[,100:120, 100:120,1]) # works
plot(x[,1:5, 1:5,1]) # works
plot(x[,1:4, 1:4,1]) # Does not work:
#Error in .image_scale(values, col, breaks = breaks, key.pos = key.pos,  : 
#  must have one more break than colour

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.