Giter VIP home page Giter VIP logo

intake-thredds's Introduction

intake-thredds

CI GitHub Workflow Status Code Coverage Status pre-commit.ci status
Docs Documentation Status
Package Conda PyPI
License License

Intake interface to THREDDS data catalogs.

Installation

Intake-thredds can be installed from PyPI with pip:

python -m pip install intake-thredds

Intake-thredds is also available from conda-forge for conda installations:

conda install -c conda-forge intake-thredds

See documentation for more information.

intake-thredds's People

Contributors

aaronspring avatar andersy005 avatar dependabot[bot] avatar martindurant avatar pre-commit-ci[bot] avatar raybellwaves avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

intake-thredds's Issues

use open_mfdataset with concat_dim

eventually I would like to concat different members or initial dates. so far we can only concat along a dimension existing in all the datasets.

url = "simplecache::https://www.ncei.noaa.gov/thredds/catalog/model-gefs-003/202008/20200831/catalog.xml"
intake.open_thredds_merged(url, ['NCEP gens-a Grid 3 Member-Forecast 1[1-2]*-372 for 2020-08-31 00:00*'], driver='netcdf', xarray_kwargs=dict(
        engine="cfgrib",
        concat_dim='member',
        backend_kwargs=dict(
            filter_by_keys={"typeOfLevel": "heightAboveGround", "shortName": "2t"}
        ),
    )
).to_dask()

Implementing an NCSS Driver

Coming from the MetPy/siphon world and wanting to better integrate my workflows into the Pangeo ecosystem, I'd like to start using intake-thredds to work with TDSs. While it looks like the package currently makes use of opendap and full netcdf file access straightforward, there is nothing yet to handle use of the NetCDF Subset Service (NCSS), which is a convenient feature of siphon.

Would there be support for including an NCSS intake driver in intake-thredds? If so, I'd be glad to put together a PR as I work to incorporate intake into my workflows (and understand how intake drivers work in more detail).

add Aaron's tweet example to docs

I added a minor tweak here to use cfVarName which matches the name of the returned DataArray.

url = "simplecache::https://www.ncei.noaa.gov/thredds/catalog/model-gefs-003/202008/20200831/catalog.xml"
intake.open_thredds_merged(
    url,
    path=["NCEP gens-a Grid 3 Member-Forecast *-372 for 2020-08-31 00:00*"],
    driver="netcdf",
    concat_kwargs={"dim": "member"},
    xarray_kwargs=dict(
        engine="cfgrib",
        backend_kwargs=dict(
            filter_by_keys={"typeOfLevel": "heightAboveGround", "cfVarName": "t2m"}
        ),
    ),
).to_dask()

issues opening GFS archive

Thanks for this packages. I tried this today but couldn't open a file. I'm able to open it use xr.open_dataset and i'm not sure if intake-thredds wants the data to be in a certain format.

I'm reading in archive GFS forecast (https://rda.ucar.edu/datasets/ds084.1/#!description). Note you will have to get a log in to access it (https://stackoverflow.com/questions/66178846/read-in-authorized-opendap-url-using-xarray)

url = "https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200201/gfs.0p25.2020020100.f000.grib2"
ds = xr.open_mfdataset([url])

vs

cat_url = "https://rda.ucar.edu/thredds/catalog/files/g/ds084.1/2020/20200201/catalog.xml"
catalog = intake.open_thredds_cat(cat_url, name="GFS-catalog")
file = list(catalog)[0]
source = catalog[file]
ds = source().to_dask()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-c6e6fbd68981> in <module>
----> 1 ds = source().to_dask()
      2 ds

~/miniconda/envs/main/lib/python3.8/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

~/miniconda/envs/main/lib/python3.8/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

~/miniconda/envs/main/lib/python3.8/site-packages/intake/source/base.py in _load_metadata(self)
    234         """load metadata only if needed"""
    235         if self._schema is None:
--> 236             self._schema = self._get_schema()
    237             self.dtype = self._schema.dtype
    238             self.shape = self._schema.shape

~/miniconda/envs/main/lib/python3.8/site-packages/intake_xarray/base.py in _get_schema(self)
     16 
     17         if self._ds is None:
---> 18             self._open_dataset()
     19 
     20             metadata = {

~/miniconda/envs/main/lib/python3.8/site-packages/intake_xarray/opendap.py in _open_dataset(self)
     92         import xarray as xr
     93         store = self._get_store()
---> 94         self._ds = xr.open_dataset(store, chunks=self.chunks, **self._kwargs)

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
    555 
    556     with close_on_error(store):
--> 557         ds = maybe_decode_store(store, chunks)
    558 
    559     # Ensure source filename always stored in dataset object (GH issue #2550)

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/api.py in maybe_decode_store(store, chunks)
    451 
    452     def maybe_decode_store(store, chunks):
--> 453         ds = conventions.decode_cf(
    454             store,
    455             mask_and_scale=mask_and_scale,

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta)
    637         encoding = obj.encoding
    638     elif isinstance(obj, AbstractDataStore):
--> 639         vars, attrs = obj.load()
    640         extra_coords = set()
    641         close = obj.close

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/common.py in load(self)
    111         """
    112         variables = FrozenDict(
--> 113             (_decode_variable_name(k), v) for k, v in self.get_variables().items()
    114         )
    115         attributes = FrozenDict(self.get_attrs())

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/pydap_.py in get_variables(self)
     97 
     98     def get_variables(self):
---> 99         return FrozenDict(
    100             (k, self.open_store_variable(self.ds[k])) for k in self.ds.keys()
    101         )

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/core/utils.py in FrozenDict(*args, **kwargs)
    451 
    452 def FrozenDict(*args, **kwargs) -> Frozen:
--> 453     return Frozen(dict(*args, **kwargs))
    454 
    455 

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/pydap_.py in <genexpr>(.0)
     98     def get_variables(self):
     99         return FrozenDict(
--> 100             (k, self.open_store_variable(self.ds[k])) for k in self.ds.keys()
    101         )
    102 

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/pydap_.py in open_store_variable(self, var)
     94     def open_store_variable(self, var):
     95         data = indexing.LazilyOuterIndexedArray(PydapArrayWrapper(var))
---> 96         return Variable(var.dimensions, data, _fix_attributes(var.attributes))
     97 
     98     def get_variables(self):

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/core/variable.py in __init__(self, dims, data, attrs, encoding, fastpath)
    340         """
    341         self._data = as_compatible_data(data, fastpath=fastpath)
--> 342         self._dims = self._parse_dimensions(dims)
    343         self._attrs = None
    344         self._encoding = None

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/core/variable.py in _parse_dimensions(self, dims)
    601         dims = tuple(dims)
    602         if len(dims) != self.ndim:
--> 603             raise ValueError(
    604                 "dimensions %s must have the same length as the "
    605                 "number of data dimensions, ndim=%s" % (dims, self.ndim)

ValueError: dimensions ('height_above_ground_layer',) must have the same length as the number of data dimensions, ndim=2

KeyError: 'HTTPServer'

A combination of #37 and #26

I was trying the same logic to access the GFS archive

url = "simplecache::https://rda.ucar.edu/thredds/catalog/files/g/ds084.1/2020/20200101/catalog.xml"
intake.open_thredds_merged(
    url,
    path=["gfs.0p25.2020010100.f*.grib2"],
    driver="netcdf",
    concat_kwargs={"dim": "time"},
    xarray_kwargs=dict(
        engine="cfgrib", backend_kwargs=dict(filter_by_keys={"cfVarName": "t2m"})
    ),
).to_dask()

This gives KeyError: 'HTTPServer' with the Traceback below

KeyError                                  Traceback (most recent call last)
<ipython-input-6-d2ab48b9bee8> in <module>
      1 url = "simplecache::https://rda.ucar.edu/thredds/catalog/files/g/ds084.1/2020/20200101/catalog.xml"
----> 2 intake.open_thredds_merged(
      3     url,
      4     path=["gfs.0p25.2020010100.f*.grib2"],
      5     driver="netcdf",

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake/source/base.py in _load_metadata(self)
    234         """load metadata only if needed"""
    235         if self._schema is None:
--> 236             self._schema = self._get_schema()
    237             self.dtype = self._schema.dtype
    238             self.shape = self._schema.shape

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_xarray/base.py in _get_schema(self)
     16 
     17         if self._ds is None:
---> 18             self._open_dataset()
     19 
     20             metadata = {

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/source.py in _open_dataset(self)
     95 
     96         if self._ds is None:
---> 97             cat = ThreddsCatalog(self.urlpath, driver=self.driver)
     98             for i in range(len(self.path)):
     99                 part = self.path[i]

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/cat.py in __init__(self, url, driver, **kwargs)
     28         self.url = url
     29         self.driver = driver
---> 30         super().__init__(**kwargs)
     31 
     32     def _load(self):

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake/catalog/base.py in __init__(self, entries, name, description, metadata, ttl, getenv, getshell, persist_mode, storage_options)
     98         self.updated = time.time()
     99         self._entries = entries if entries is not None else self._make_entries_container()
--> 100         self.force_reload()
    101 
    102     @classmethod

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake/catalog/base.py in force_reload(self)
    156         """Imperative reload data now"""
    157         self.updated = time.time()
--> 158         self._load()
    159 
    160     def reload(self):

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/cat.py in _load(self)
     79 
     80         self._entries.update(
---> 81             {
     82                 ds.name: LocalCatalogEntry(
     83                     ds.name,

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/cat.py in <dictcomp>(.0)
     85                     self.driver,
     86                     True,
---> 87                     {'urlpath': access_urls(ds, self), 'chunks': {}},
     88                     [],
     89                     [],

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/cat.py in access_urls(ds, self)
     73             elif self.driver == 'netcdf':
     74                 driver_for_access_urls = 'HTTPServer'
---> 75             url = ds.access_urls[driver_for_access_urls]
     76             if 'fsspec_pre_url' in self.metadata.keys():
     77                 url = f'{self.metadata["fsspec_pre_url"]}{url}'

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/siphon/catalog.py in __getitem__(self, key)
    219     def __getitem__(self, key):
    220         """Return value from case-insensitive lookup of ``key``."""
--> 221         return super(CaseInsensitiveDict, self).__getitem__(CaseInsensitiveStr(key))
    222 
    223     def __setitem__(self, key, value):

Unnecessarily Large Data Request

I'm not sure if this is a bug report, feature request, or user error. I'm trying to access a giant dataset from the NCAR RDA in a smart way (only downloading what's necessary for the calculation), but a large data request is made anyway that exceeds the server's 500 MB limit.

Here's my code:

import numpy as np
import xarray as xr
from dask.diagnostics import ProgressBar
import intake


wrf_url = ('https://rda.ucar.edu/thredds/catalog/files/g/ds612.0/'
           'PGW3D/2006/catalog.xml')
catalog_u = intake.open_thredds_merged(wrf_url, path=['*_U_2006060*'])
catalog_v = intake.open_thredds_merged(wrf_url, path=['*_V_2006060*'])

ds_u = catalog_u.to_dask()
ds_u['U'] = ds_u.U.chunk("auto")
ds_v = catalog_v.to_dask()
ds_v['V'] = ds_v.V.chunk("auto")
ds = xr.merge((ds_u, ds_v))


def unstagger(ds, var, coord, new_coord):
    var1 = ds[var].isel({coord: slice(None, -1)})
    var2 = ds[var].isel({coord: slice(1, None)})
    return ((var1 + var2) / 2).rename({coord: new_coord})


with ProgressBar():
    ds['U_unstaggered'] = unstagger(ds, 'U', 'west_east_stag', 'west_east')
    ds['V_unstaggered'] = unstagger(ds, 'V', 'south_north_stag', 'south_north')
    ds['speed'] = np.hypot(ds.U_unstaggered, ds.V_unstaggered)
    ds.speed.isel(bottom_top=10).sel(Time='2006-06-07T18:00').plot()

This fails with

Traceback (most recent call last):
  File "/home/decker/classes/met325/rda_plot.py", line 29, in <module>
    ds.speed.isel(bottom_top=10).sel(Time='2006-06-07T18:00').plot()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/plot/plot.py", line 862, in __call__
    return plot(self._da, **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/plot/plot.py", line 293, in plot
    darray = darray.squeeze().compute()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/dataarray.py", line 951, in compute
    return new.load(**kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/dataarray.py", line 925, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/dataset.py", line 862, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/base.py", line 571, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/threaded.py", line 79, in get
    results = get_async(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/local.py", line 507, in get_async
    raise_exception(exc, tb)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/local.py", line 315, in reraise
    raise exc
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/local.py", line 220, in execute_task
    result = _execute_task(task, data)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/array/core.py", line 116, in getter
    c = np.asarray(c)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 357, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 521, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 422, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/conventions.py", line 62, in __getitem__
    return np.asarray(self.array[key], dtype=self.dtype)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 422, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/pydap_.py", line 39, in __getitem__
    return indexing.explicit_indexing_adapter(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 711, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/pydap_.py", line 47, in _getitem
    result = robust_getitem(array, key, catch=ValueError)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/common.py", line 64, in robust_getitem
    return array[key]
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/model.py", line 323, in __getitem__
    out.data = self._get_data_index(index)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/model.py", line 353, in _get_data_index
    return self._data[index]
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/handlers/dap.py", line 170, in __getitem__
    raise_for_status(r)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/net.py", line 38, in raise_for_status
    raise HTTPError(
webob.exc.HTTPError: 403 403

because the data request is too large.

Folks at NCAR tell me the request comes across as

rda.ucar.edu/thredds/dodsC/files/g/ds612.0/PGW3D/2006/wrf3d_d01_PGW_U_20060607.nc.dods?U%5B0:1: 7%5D%5B0:1:49%5D%5B0:1:1014%5D%5B0:1:1359%5D

essentially pulling an entire variable.

Is what I'm trying to do supposed to work?

I can use siphon directly w/o issue:

import numpy as np
import matplotlib.pyplot as plt
from siphon.catalog import TDSCatalog

catUrl = ('https://rda.ucar.edu/thredds/catalog/files/g/ds612.0/'
          'PGW3D/2006/catalog.xml')
catalog = TDSCatalog(catUrl)
U_file = 'wrf3d_d01_PGW_U_20060718.nc'
V_file = 'wrf3d_d01_PGW_V_20060718.nc'
ds = catalog.datasets[U_file]
dataset = ds.remote_access()
u = dataset.variables['U']
ds = catalog.datasets[V_file]
dataset = ds.remote_access()
v = dataset.variables['V']
speed = np.hypot(u[1, 10, 0:1014, 0:1359], v[1, 10, 0:1014, 0:1359])
plt.imshow(speed)
plt.show()

but in that case I don't have all the xarray niceties w/o extra work.

Transfer repository to intake organization

intake-thredds has been in a working state for a while, and I was wondering if we could transfer it to the intake GitHub organization so as to increase its visibility.

@martindurant, are there any checklist or guidelines that need to be followed for a project to be accepted under the intake GitHub organization?

issue opening multiple obs4MIP

TDS Catalog: https://dpesgf03.nccs.nasa.gov/thredds/catalog/esgcet/legacy/obs4MIPs.ECMWF.ERA-interim.atmos.mon.v20160614.html

HTTPServer: https://dpesgf03.nccs.nasa.gov/thredds/fileServer/obs4MIPs/ECMWF/assimilation/obs4MIPs/reanalysis/atmos/ua/mon/grid/ECMWF/assimilation/V1.0/ua_assimilation-ECMWF_level-4_v1.0_201201-201212.nc

import intake_thredds
intake_thredds.THREDDSMergedSource(url='https://dpesgf03.nccs.nasa.gov/thredds/catalog/esgcet/legacy/obs4MIPs.ECMWF.ERA-interim.atmos.mon.v20160614.xml', path='ua_assimilation-ECMWF_level-4_v1.0_201201-201212.nc', driver='netcdf').to_dask()
...
KeyError: 'HTTPServer'

enable xarray_kwargs

likely connected to #26

currently there is no way to specify kwargs to be passed to xarray.open_dataset.

import intake_xarray

intake_xarray.NetCDFSource('simplecache::https://www.ncei.noaa.gov/thredds/fileServer/model-gefs-003/202008/20200831/gensanl-b_3_20200831_1800_000_20.grb2',
                           xarray_kwargs=dict(engine='cfgrib', backend_kwargs=dict(filter_by_keys={'typeOfLevel': 'meanSea'})),
).to_dask()

but we do not allow such xarray_kwargs in intake-thredds

add xarray_kwargs to ThreddsCatalog

I'm looking at GFS data e.g. https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/catalog.html?dataset=grib/NCEP/GFS/Global_0p25deg/Best

you can't get it through pydap as

import xarray as xr
ds = xr.open_dataset("https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p25deg/Best", engine="pydap")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 69191: ordinal not in range(128)

pydap/pydap#196
I may create an issue in xarray to workout how to set pydap/pydap#196 (comment) in the above call maybe using a backeng_kwargs.

Therefore, you have to get it through netcdf. e.g

import xarray as xr
ds = xr.open_dataset("https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p25deg/Best", engine="netcdf4")
ds = xr.open_dataset("https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p25deg/GFS_Global_0p25deg_20210913_1800.grib2", engine="netcdf4")

with netcdf as the driver here it's required to have a HTTPServer e.g. https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/GFS_Global_0p25deg_20210913_1800.grib2/catalog.html?dataset=grib/NCEP/GFS/Global_0p25deg/GFS_Global_0p25deg_20210913_1800.grib2

Using this with ThreddsCatalog will pass it though to intake_xarray.netcdf.NetCDFSource.

However, intake_xarray.netcdf.NetCDFSource can't guess the engine so it would be nice to specify that in the ThreddsCatalog stage.

cat_url = "https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/GFS_Global_0p25deg_20210913_1800.grib2/catalog.xml"

import intake
catalog = intake.open_thredds_cat(cat_url, driver="netcdf")
# nice to do intake.open_thredds_cat(cat_url, driver="netcdf", xarray_kwargs=dict(engine="netcdf4"))
source = catalog["GFS_Global_0p25deg_20210913_1800.grib2"]
ds = source().to_dask()
ValueError                                Traceback (most recent call last)
/var/folders/rf/26llfhwd68x7cftb1z3h000w0000gp/T/ipykernel_837/3050318223.py in <module>
----> 1 ds = source().to_dask()
      2 ds

~/miniconda3/envs/main/lib/python3.9/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

~/miniconda3/envs/main/lib/python3.9/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

~/miniconda3/envs/main/lib/python3.9/site-packages/intake/source/base.py in _load_metadata(self)
    234         """load metadata only if needed"""
    235         if self._schema is None:
--> 236             self._schema = self._get_schema()
    237             self.dtype = self._schema.dtype
    238             self.shape = self._schema.shape

~/miniconda3/envs/main/lib/python3.9/site-packages/intake_xarray/base.py in _get_schema(self)
     16 
     17         if self._ds is None:
---> 18             self._open_dataset()
     19 
     20             metadata = {

~/miniconda3/envs/main/lib/python3.9/site-packages/intake_xarray/netcdf.py in _open_dataset(self)
     90             url = fsspec.open(self.urlpath, **self.storage_options).open()
     91 
---> 92         self._ds = _open_dataset(url, chunks=self.chunks, **kwargs)
     93 
     94     def _add_path_to_ds(self, ds):

~/miniconda3/envs/main/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    479 
    480     if engine is None:
--> 481         engine = plugins.guess_engine(filename_or_obj)
    482 
    483     backend = plugins.get_backend(engine)

~/miniconda3/envs/main/lib/python3.9/site-packages/xarray/backends/plugins.py in guess_engine(store_spec)
    146         )
    147 
--> 148     raise ValueError(error_msg)
    149 
    150 

ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib', 'pydap', 'rasterio', 'zarr']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
http://xarray.pydata.org/en/stable/getting-started-guide/installing.html
http://xarray.pydata.org/en/stable/user-guide/io.html

circle CI fail before and after #3

I tried to play around with #3 but I didnt get it running. @andersy005 @martindurant

Then I saw that circle CI had been failing already before #3 https://app.circleci.com/pipelines/github/NCAR/intake-thredds?branch=master

Maybe we need to use OpendapSource from intake_xarray instead of NetcdfSource? xr.open_dataset() for the underlying links
https://gist.github.com/aaronspring/fa96793675c94504c5158610e4f8e007

BTW: this is a really neat jupyterlab extension for THREDDS: https://github.com/eWaterCycle/jupyterlab_thredds

Can I choose `parallel=True` like in `xr.open_mfdataset`?

Hi! I stumbled across this intake driver and it looks like what I need, so thank you so much for your work!

I have already made an example for myself where I have a bunch of file locations that are on a thredds server, and then I read them in with xr.open_mfdataset(), like so:

ds = xr.open_mfdataset(filelocs, drop_variables=['siglay','siglev','Itime2'], parallel=True, compat='override', 
                       combine='by_coords', data_vars='minimal', coords='minimal')

This ended up being about 2 minutes for 127 files.

But, I need to get this combined dataset represented in an intake catalog, which is where intake-thredds comes in. I think I have properly mapped the keywords I used in xr.open_mfdataset to the API in intake-thredds like this:

cat_url = 'https://opendap.co-ops.nos.noaa.gov/thredds/catalog/NOAA/LEOFS/MODELS/catalog.xml'
source = intake.open_thredds_merged(
         cat_url, path=[date.strftime('%Y'),
                   date.strftime('%m'),
                   date.strftime('%d'),
                   date.strftime(f'nos.leofs.fields.????.%Y%m%d.t12z.nc')],
    concat_kwargs={"dim": "time",
                  'data_vars': 'minimal',
                   'coords': 'minimal',
                   'compat': 'override',
                   'combine_attrs': "override"
                  },
    xarray_kwargs=dict(
        drop_variables=['siglay','siglev','Itime2'], 
    ),

)

But, when I then try to look at the resultant combined lazily loaded Dataset with source.to_dask(), it takes forever to try to load and breaks with "Bad Gateway" before it can finish. The only difference I think is that I don't see how to use parallel=True in the call to intake.open_thredds_merged which I used in calling xr.open_mfdataset.

Is there a way to use parallel=True? Thank you for your help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.