Giter VIP home page Giter VIP logo

Comments (24)

adrfantini avatar adrfantini commented on July 18, 2024 1

About naming: we now have a read_stars and a read_ncdf. Is the idea to merge these into read_stars, or to keep them split into read_gdal and read_ncdf?

That depends on the level of complexity that you want the ncdf reader to reach, I guess. It's worth to split it, if you want it to be full-featured, for clarity's sake.

from stars.

mdsumner avatar mdsumner commented on July 18, 2024 1

I never replied about the read_ncdf/read_gdal/read_stars name issue.

I guess we need an abstraction over the ncsub and RasterIO idioms, I don't see that either of them is a good user-interface, but at least ncsub is the general form if you take it to cbind(start, count, outlength).

Use that as the general form and provide a bunch of helpers, to convert between geotransform/extent-dim-res/coord-ranges/RasterIO/start-count-stride? I've kind of been collecting these little components slowly, not being sure how to express them all in a single place.

Users will want a bounding box in coordinate space, and an output size or resolution - and often they'll be created by drawing or dragging sliders rather than directly numerically, or by a bit of back and forth guesswork reading the extent/proxy. Does it make sense to use the NetCDF start/count model, and add NearestNeighbour resampling option for outlength (after getting from the source), and then the gdal version can convert those to offset/window/outsize and also allow added resampling options?

from stars.

edzer avatar edzer commented on July 18, 2024 1

Thanks for your thoughts - great ideas; these should go probably into a new issue, given their importance and the topic of this one.

Continuing on support for curvilinear grids, two thoughts came up related to st_transforming them:

  1. transforming a curvilinear grid should be a simple matter of transforming the two coordinate matrices
  2. transforming any non-curvilinear grid should, at least optionally, return a curvilinear grid without doing any resampling.

from stars.

mdsumner avatar mdsumner commented on July 18, 2024

Just had a first look at this file

starsdata/sentinel5p/S5P_NRTI_L2__NO2____20180717T120113_20180717T120613_03932_01_010002_20180717T125231.nc

and it's not readable either by CRAN RNetCDF or ncdf4, because it's composed of groups. RNetCDF is aiming for group support in 2.0.0 but afaik no such plans (or appetite) exist for ncdf4 - oops, sorry I was wrong ncdf4 has had groups for a while now.

mjwoods/RNetCDF#7

You could definitely read it with hdf5r, but I see you use GDAL with SDS syntax. (There might be other better hdf5 packages, it's been pretty dynamic lately).

I'd parked this with tidync, but it's probably time to get across the groups and the data model. It's been a while but with RNetCDF 2.0 from Github you need the extra group levels but the variables are accessible:

s5p = system.file("sentinel5p/S5P_NRTI_L2__NO2____20180717T120113_20180717T120613_03932_01_010002_20180717T125231.nc", package = "starsdata")
con = open.nc(s5p)
grp.inq.nc(con, "/PRODUCT/SUPPORT_DATA/GEOLOCATIONS")

## get latitude from the right group
 var.get.nc(grp.inq.nc(con, "/PRODUCT")$self, "latitude")

from stars.

adrfantini avatar adrfantini commented on July 18, 2024

I've personally never seen a real curvilinear grid - that is, a grid which is NOT rectilinear in some projection - in a NetCDF file, is this something that you've seen often, or is it just a (very convenient) way to work around the many projected files which lack proper projection information?
Most programs people use in climate (e.g. CDO, GrADS) just define the grid as curvilinear whenever they do not understand the projection.

Are curvilinear grids going to play well with st_as_sf? What about raster conversion (iirc raster only support regular grids)?
(will test but can't right now, internet too slow to download starsdata)

from stars.

edzer avatar edzer commented on July 18, 2024

@adrfantini I think it boils down to what you mean by some projection. If you mean a known, parametric projection (e.g. known PROJ string) then an example might be quite a few level 0 or level 1 satellite datasets; think of a raw dove satellite image. And yes, for model data it seems that you can choose a projection (although, as we've seen, if you rotate the Earth first, it doesn't make things easier), in which case it is more of a way of cheating your way out of doing it properly. A third where they arise: when you transform a regular or rectilinear grid to anything else, but don't want to resample (yet).

The plot commands you see in the data_model vignette first convert to sf and then plots.

from stars.

edzer avatar edzer commented on July 18, 2024

These lines btw convert from raster centers to (the coordinates of) raster cell outlines. I simply take the first and last step size and copy it one cell out. I have the feeling that it will not be robust against grids covering poles or date lines (but who knows).

from stars.

mdsumner avatar mdsumner commented on July 18, 2024

There are heaps of examples of all kinds. The case @adrfantini mentions is pretty common in polar data, but also for Mercator -no record of the regular grid is kept in but sometimes can be inferred. It's not for automatic detection, though. There's an intermediate between points and polygons, for both the AREA and POINT cases - way more efficient to store and plot as meshes. There's no model for this in stars but I think it can hang off the bare structures as they are (storing rectilnear vectors or coordinate arrays). I got distracted by the groups thing in Sentinel but hoping to illustrate this soon.

from stars.

adrfantini avatar adrfantini commented on July 18, 2024

@edzer Thanks for the clarification, got it now.
@mdsumner would love to see meshes mesh with stars ;)

from stars.

adrfantini avatar adrfantini commented on July 18, 2024

Testing curvilinear grids right now... nice!

I have ben trying with a cropped version of the dataset in #52 (link, 2.3MB):

library(stars)
hmr = '/dev/shm/hmr.nc'
lat = read_stars(hmr, sub='lat') #Does not work, reads 'pr' instead. Separate issue?
library(raster) #Let's use raster to get the data in then
lat = raster(hmr, varname = 'lat') %>% st_as_stars %>% adrop
lon = raster(hmr, varname = 'lon') %>% st_as_stars %>% adrop
nit = read_stars(hmr) %>% adrop
ll = setNames(c(lon, lat), c("x", "y"))
nit.c = st_as_stars(nit, curvilinear = ll)
st_crs(nit.c) = 4326
plot(nit.c, breaks = "equal", reset = FALSE, axes = TRUE, as_points = FALSE, border = NA)
library(rnaturalearth)
plot(st_geometry(ne_countries(scale = "medium", returnclass="sf")), add = TRUE, border = 'grey')

hmr
Yay!

It's still not clear to me what happens when exporting to Raster:

as(nit.c, 'Raster')

class       : RasterLayer 
dimensions  : 248, 223, 55304  (nrow, ncol, ncell)
resolution  : 0.07258437, 0.05175483  (x, y)
extent      : 4.015204, 20.20152, 35.5148, 48.35  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : layer 
values      : 0, 51.40039  (min, max)

The resulting dataset is equally spaced in x-y (which afaik is the only kind of data that raster support). Isn't this wrong?

from stars.

mdsumner avatar mdsumner commented on July 18, 2024

Here is very rough form from your example of treating this as a triangulated mesh, with the coordinate values as the vertices of the triangles.

coords <- cbind(as.vector(st_dimensions(nit.c)$x$values), 
                as.vector(st_dimensions(nit.c)$y$values))

tri <- RTriangle::triangulate(RTriangle::pslg(P = cbind(rep(seq_len(nrow(nit.c)), ncol(nit.c)), 
                                                        rep(seq_len(ncol(nit.c)), each = nrow(nit.c))))
)




graphics.off()
plot(coords, type = "n", asp = 1/cos(42 * pi/180))

XY <- coords[t(tri$T), ]
ID <- rep(1:nrow(tri$T), each = 3)
## watch out here, we need a triangle vertex to get the colours in the right order
COL <- grey(seq(0, 1, length = 12))[scales::rescale(nit.c$hmr_ita.nc[tri$T[,1]], c(1, 12))]
isLL <- FALSE

vps <- gridBase::baseViewports()
grid::pushViewport(vps$inner, vps$figure, vps$plot)
grid::grid.polygon(XY[,1], XY[,2], ID,  gp = grid::gpar(col = NA, fill = COL),
                   default.units = "native")
grid::popViewport(3)
plot(st_geometry(ne_countries(scale = "medium", returnclass="sf")), add = TRUE, border = 'grey')

Essentially we triangulate the index of the coordinates, then apply the values of the coordinates as we like - so you can trivially reproject them too, and it'll scale better than conversion to POLYGON.

The triangulation is on the coordinates, not the implied corners of the cells - so it's shrunk in a bit, I make this distinction more obvious here: http://rpubs.com/cyclemumner/415712

(I'm not suggesting we use RTriangle to do this basic index-to-triangulation, I just don't have the brain space to get the indexing logic right at the moment - I'll fix it soonish!)

I have to use grid::polygon because it's fast, it's vectorized over ID so doesn't need to repeatedly call to draw different colours. So, I also need to mix grid and base graphics here.

image

I worked out a few other things here, I've been chasing this for a while from work in quadmesh::mesh_plot - and it made me realize what's wrong with quadmesh's interpolation with raster::extract: http://rpubs.com/cyclemumner/421078

from stars.

adrfantini avatar adrfantini commented on July 18, 2024

That is indeed fast. Took 0.4s on my machine, as compared to 11s for the normal sf::plot.
Impressive.

from stars.

mdsumner avatar mdsumner commented on July 18, 2024

I made a comment about ncdf4 and groups above that was wrong, it does already support them - they are discoverable inside nc_open(file)$groups.

from stars.

mdsumner avatar mdsumner commented on July 18, 2024

Back to the NetCDF groups:

ncdf4 doesn't seem to support groups in a useable way, but I'll keep exploring. RNetCDF 2.0.0 definitely will support it well, but doesn't have a release schedule afaik (the project is active, though).

An alternative is rhdf5 from BioConductor, installed with "BiocInstaller::biocLite" - the nice things is the data frame info upfront with all the variables and their groups nice arranged. The dim is in there as a character column as well.

(s5p = system.file("sentinel5p/S5P_NRTI_L2__NO2____20180717T120113_20180717T120613_03932_01_010002_20180717T125231.nc", package = "starsdata"))

info <- rhdf5::h5ls(s5p)
paths <- file.path(info$group, info$name)
lon <- rhdf5::h5read(s5p, paths[grep("^longitude$", info$name)])
lat <- rhdf5::h5read(s5p, paths[grep("^latitude$", info$name)])
nit <- rhdf5::h5read(s5p, paths[grep("^nitrogendioxide_summed_total_column$", info$name)])
str(lon)
# num [1:450, 1:278, 1] 3.38 3.46 3.55 3.63 3.71 ...
str(nit)
# num [1:450, 1:278, 1] 9.97e+36 5.72e-05 6.01e-05 6.05e-05 6.78e-05 ...

A downside is we don't get missing value handling as per ncdf4 or GDAL, I haven't explored further

from stars.

mdsumner avatar mdsumner commented on July 18, 2024

Another correction, for ncdf4: the group-path qualified variable names are simply available as names in the usual way.

I.e. get latitude

nc <- nc_open(s5p)
ncvar_get(nc, names(nc$var)[1])

Apologies for the long exploration, ncdf4 is definitely the best option!

from stars.

adrfantini avatar adrfantini commented on July 18, 2024

I am trying to reproduce this using a domain file from our climate model: here it is.

Everything is fine but nothing at all is plotted:

lon = read_stars('/home/netapp-clima-scratch/afantini/regcm_domain/EURO-CORDEX_81_DOMAIN000.nc', sub = 'xlon')
lat = read_stars('/home/netapp-clima-scratch/afantini/regcm_domain/EURO-CORDEX_81_DOMAIN000.nc', sub = 'xlat')
topo = read_stars('/home/netapp-clima-scratch/afantini/regcm_domain/EURO-CORDEX_81_DOMAIN000.nc', sub = 'topo')
ll = setNames(c(lon, lat), c("x", "y"))
topo.c = st_as_stars(topo, curvilinear = ll)
plot(topo.c, breaks = "equal", reset = FALSE, axes = TRUE, as_points = TRUE, pch = 16, key.pos = NULL)

from stars.

edzer avatar edzer commented on July 18, 2024

I see this plot:
x

from stars.

edzer avatar edzer commented on July 18, 2024

Note, btw, that the refsys is misleading:

> topo.c
stars object with 2 dimensions and 1 attribute
attribute(s):
 NETCDF:"EURO-CORDEX_81_DOMAIN000.nc":topo [m]
 Min.   :   0.00                              
 1st Qu.:   0.00                              
 Median :  83.59                              
 Mean   : 265.15                              
 3rd Qu.: 354.25                              
 Max.   :3202.35                              
dimension(s):
  from  to offset delta                       refsys point
x    1 530     NA    NA +proj=lcc +lat_1=30 +lat_...    NA
y    1 530     NA    NA +proj=lcc +lat_1=30 +lat_...    NA
                                    values    
x [530x530] -63.2598 [°], ..., 82.8959 [°] [x]
y  [530x530] 14.6105 [°], ..., 76.5259 [°] [y]
curvilinear grid
> st_crs(topo.c)
Coordinate Reference System:
  No EPSG code
  proj4string: "+proj=lcc +lat_1=30 +lat_2=65 +lat_0=48 +lon_0=9.75 +x_0=-6000 +y_0=-6000 +a=6371229 +b=6371229 +units=m +no_defs"

from stars.

mdsumner avatar mdsumner commented on July 18, 2024

It looks like we should be ignoring the lon,lat arrays in this case and just using jx,jy which seems like eminently sensible regular rectlinear coordinates.

nc <- ncdf4::nc_open("EURO-CORDEX_81_DOMAIN000_54/EURO-CORDEX_81_DOMAIN000.nc")
image(ncdf4::ncvar_get(nc, "jx"), 
          ncdf4::ncvar_get(nc, "iy"), 
         ncdf4::ncvar_get(nc, "topo"), useRaster = TRUE)
plot(st_transform(rnaturalearth::ne_coastline(returnclass = "sf"), "+proj=lcc +lat_1=30.00 +lat_2=65.00 +lat_0=48.00 +lon_0=9.75 +x_0=-6000. +y_0=-6000. +ellps=sphere +a=6371229. +b=6371229. +units=m +no_defs")$geometry, add = TRUE, border = "black")

I've never seen it done so clearly, the actual regular grid is right there. Most examples I've seen are more cryptic.

from stars.

mdsumner avatar mdsumner commented on July 18, 2024

GDAL does it right.

gdalinfo NETCDF:"EURO-CORDEX_81_DOMAIN000.nc":topo
Size is 530, 530
Coordinate System is:
PROJCS["unnamed",
    GEOGCS["unknown",
        DATUM["unknown",
            SPHEROID["Spheroid",6371229,0]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["standard_parallel_1",30],
    PARAMETER["standard_parallel_2",65],
    PARAMETER["latitude_of_origin",48],
    PARAMETER["central_meridian",9.75],
    PARAMETER["false_easting",-6000],
    PARAMETER["false_northing",-6000],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]
Origin = (-3180000.000000000000000,3180000.000000000000000)
Pixel Size = (12000.000000000000000,-12000.000000000000000)

from stars.

adrfantini avatar adrfantini commented on July 18, 2024

Note, btw, that the refsys is misleading:

@edzer why do you say so?

from stars.

edzer avatar edzer commented on July 18, 2024

This is also the way read_stars reads it:

o = read_stars("EURO-CORDEX_81_DOMAIN000.nc", sub = "topo")
plot(o, axes=T)
st_crs(o)
# Coordinate Reference System:
#   No EPSG code
#   proj4string: "+proj=lcc +lat_1=30 +lat_2=65 +lat_0=48 +lon_0=9.75 +x_0=-6000 +y_0=-6000 +a=6371229 +b=6371229 +units=m +no_defs"

lcc

I guess that when we create a curvilinear (long/lat) grid out of it in the second st_as_stars step, we should drop the native (lcc) CRS and replace it with ... st_crs(4326)?

from stars.

adrfantini avatar adrfantini commented on July 18, 2024

I'm not sure. Both grids are correct: a rectilinear equally spaced grid in x-y and a curvilinear lat-lon grid, both indicating the same points (albeit grid corners might be slightly different?). It might be handy to have both.

from stars.

adrfantini avatar adrfantini commented on July 18, 2024

I see this plot:
x

Updated to latest stars, works. Might have been a few commits behind on this machine, sorry. The title is garbled, tho.

from stars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.