intake / intake-thredds Goto Github PK

View Code? Open in Web Editor NEW

11.0 8.0 7.0 223 KB

Intake interface to THREDDS data catalogs

Home Page: https://intake-thredds.readthedocs.io/

License: Apache License 2.0

Makefile 6.44% Python 93.56%

intake thredds-catalogs siphon opendap netcdf

intake-thredds's Introduction

intake-thredds

CI
Docs
Package
License

Intake interface to THREDDS data catalogs.

Installation

Intake-thredds can be installed from PyPI with pip:

python -m pip install intake-thredds

Intake-thredds is also available from conda-forge for conda installations:

conda install -c conda-forge intake-thredds

See documentation for more information.

intake-thredds's People

Contributors

Stargazers

Watchers

Forkers

martindurant aaronspring raybellwaves kthyng pbranson datalayer-externals bonnland

intake-thredds's Issues

use open_mfdataset with concat_dim

eventually I would like to concat different members or initial dates. so far we can only concat along a dimension existing in all the datasets.

url = "simplecache::https://www.ncei.noaa.gov/thredds/catalog/model-gefs-003/202008/20200831/catalog.xml"
intake.open_thredds_merged(url, ['NCEP gens-a Grid 3 Member-Forecast 1[1-2]*-372 for 2020-08-31 00:00*'], driver='netcdf', xarray_kwargs=dict(
        engine="cfgrib",
        concat_dim='member',
        backend_kwargs=dict(
            filter_by_keys={"typeOfLevel": "heightAboveGround", "shortName": "2t"}
        ),
    )
).to_dask()

Intake Take2 is about to release

Please consider pinning intake<2 to keep this package working, if it breaks with the new code.

how to handle url's which aren't xarray friendly

This came recently as I was struggling to open a url using xarray. Goes back to pydata/xarray#2233; pydata/xarray#2368

@rsignell-usgs suggested using NetCDF4 https://stackoverflow.com/a/66247797/6046019

Just an FYI. Not sure if it will cause a headache here.

Implementing an NCSS Driver

Coming from the MetPy/siphon world and wanting to better integrate my workflows into the Pangeo ecosystem, I'd like to start using intake-thredds to work with TDSs. While it looks like the package currently makes use of opendap and full netcdf file access straightforward, there is nothing yet to handle use of the NetCDF Subset Service (NCSS), which is a convenient feature of siphon.

Would there be support for including an NCSS intake driver in intake-thredds? If so, I'd be glad to put together a PR as I work to incorporate intake into my workflows (and understand how intake drivers work in more detail).

how to install?

Looking at https://github.com/NCAR/intake-thredds/blob/master/README.rst#installation

says pip install intake-thredds

get

(test_env2) ray.bell@administrators-MacBook-Pro test_files % pip install intake-thredds
ERROR: Could not find a version that satisfies the requirement intake-thredds
ERROR: No matching distribution found for intake-thredds

add Aaron's tweet example to docs

I added a minor tweak here to use cfVarName which matches the name of the returned DataArray.

url = "simplecache::https://www.ncei.noaa.gov/thredds/catalog/model-gefs-003/202008/20200831/catalog.xml"
intake.open_thredds_merged(
    url,
    path=["NCEP gens-a Grid 3 Member-Forecast *-372 for 2020-08-31 00:00*"],
    driver="netcdf",
    concat_kwargs={"dim": "member"},
    xarray_kwargs=dict(
        engine="cfgrib",
        backend_kwargs=dict(
            filter_by_keys={"typeOfLevel": "heightAboveGround", "cfVarName": "t2m"}
        ),
    ),
).to_dask()

issues opening GFS archive

Thanks for this packages. I tried this today but couldn't open a file. I'm able to open it use xr.open_dataset and i'm not sure if intake-thredds wants the data to be in a certain format.

I'm reading in archive GFS forecast (https://rda.ucar.edu/datasets/ds084.1/#!description). Note you will have to get a log in to access it (https://stackoverflow.com/questions/66178846/read-in-authorized-opendap-url-using-xarray)

url = "https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200201/gfs.0p25.2020020100.f000.grib2"
ds = xr.open_mfdataset([url])

cat_url = "https://rda.ucar.edu/thredds/catalog/files/g/ds084.1/2020/20200201/catalog.xml"
catalog = intake.open_thredds_cat(cat_url, name="GFS-catalog")
file = list(catalog)[0]
source = catalog[file]
ds = source().to_dask()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-c6e6fbd68981> in <module>
----> 1 ds = source().to_dask()
      2 ds

~/miniconda/envs/main/lib/python3.8/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

~/miniconda/envs/main/lib/python3.8/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

~/miniconda/envs/main/lib/python3.8/site-packages/intake/source/base.py in _load_metadata(self)
    234         """load metadata only if needed"""
    235         if self._schema is None:
--> 236             self._schema = self._get_schema()
    237             self.dtype = self._schema.dtype
    238             self.shape = self._schema.shape

~/miniconda/envs/main/lib/python3.8/site-packages/intake_xarray/base.py in _get_schema(self)
     16 
     17         if self._ds is None:
---> 18             self._open_dataset()
     19 
     20             metadata = {

~/miniconda/envs/main/lib/python3.8/site-packages/intake_xarray/opendap.py in _open_dataset(self)
     92         import xarray as xr
     93         store = self._get_store()
---> 94         self._ds = xr.open_dataset(store, chunks=self.chunks, **self._kwargs)

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
    555 
    556     with close_on_error(store):
--> 557         ds = maybe_decode_store(store, chunks)
    558 
    559     # Ensure source filename always stored in dataset object (GH issue #2550)

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/api.py in maybe_decode_store(store, chunks)
    451 
    452     def maybe_decode_store(store, chunks):
--> 453         ds = conventions.decode_cf(
    454             store,
    455             mask_and_scale=mask_and_scale,

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/conventions.py in decode_cf(obj, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta)
    637         encoding = obj.encoding
    638     elif isinstance(obj, AbstractDataStore):
--> 639         vars, attrs = obj.load()
    640         extra_coords = set()
    641         close = obj.close

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/common.py in load(self)
    111         """
    112         variables = FrozenDict(
--> 113             (_decode_variable_name(k), v) for k, v in self.get_variables().items()
    114         )
    115         attributes = FrozenDict(self.get_attrs())

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/pydap_.py in get_variables(self)
     97 
     98     def get_variables(self):
---> 99         return FrozenDict(
    100             (k, self.open_store_variable(self.ds[k])) for k in self.ds.keys()
    101         )

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/core/utils.py in FrozenDict(*args, **kwargs)
    451 
    452 def FrozenDict(*args, **kwargs) -> Frozen:
--> 453     return Frozen(dict(*args, **kwargs))
    454 
    455 

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/pydap_.py in <genexpr>(.0)
     98     def get_variables(self):
     99         return FrozenDict(
--> 100             (k, self.open_store_variable(self.ds[k])) for k in self.ds.keys()
    101         )
    102 

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/backends/pydap_.py in open_store_variable(self, var)
     94     def open_store_variable(self, var):
     95         data = indexing.LazilyOuterIndexedArray(PydapArrayWrapper(var))
---> 96         return Variable(var.dimensions, data, _fix_attributes(var.attributes))
     97 
     98     def get_variables(self):

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/core/variable.py in __init__(self, dims, data, attrs, encoding, fastpath)
    340         """
    341         self._data = as_compatible_data(data, fastpath=fastpath)
--> 342         self._dims = self._parse_dimensions(dims)
    343         self._attrs = None
    344         self._encoding = None

~/miniconda/envs/main/lib/python3.8/site-packages/xarray/core/variable.py in _parse_dimensions(self, dims)
    601         dims = tuple(dims)
    602         if len(dims) != self.ndim:
--> 603             raise ValueError(
    604                 "dimensions %s must have the same length as the "
    605                 "number of data dimensions, ndim=%s" % (dims, self.ndim)

ValueError: dimensions ('height_above_ground_layer',) must have the same length as the number of data dimensions, ndim=2

KeyError: 'HTTPServer'

A combination of #37 and #26

I was trying the same logic to access the GFS archive

url = "simplecache::https://rda.ucar.edu/thredds/catalog/files/g/ds084.1/2020/20200101/catalog.xml"
intake.open_thredds_merged(
    url,
    path=["gfs.0p25.2020010100.f*.grib2"],
    driver="netcdf",
    concat_kwargs={"dim": "time"},
    xarray_kwargs=dict(
        engine="cfgrib", backend_kwargs=dict(filter_by_keys={"cfVarName": "t2m"})
    ),
).to_dask()

This gives KeyError: 'HTTPServer' with the Traceback below

KeyError                                  Traceback (most recent call last)
<ipython-input-6-d2ab48b9bee8> in <module>
      1 url = "simplecache::https://rda.ucar.edu/thredds/catalog/files/g/ds084.1/2020/20200101/catalog.xml"
----> 2 intake.open_thredds_merged(
      3     url,
      4     path=["gfs.0p25.2020010100.f*.grib2"],
      5     driver="netcdf",

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake/source/base.py in _load_metadata(self)
    234         """load metadata only if needed"""
    235         if self._schema is None:
--> 236             self._schema = self._get_schema()
    237             self.dtype = self._schema.dtype
    238             self.shape = self._schema.shape

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_xarray/base.py in _get_schema(self)
     16 
     17         if self._ds is None:
---> 18             self._open_dataset()
     19 
     20             metadata = {

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/source.py in _open_dataset(self)
     95 
     96         if self._ds is None:
---> 97             cat = ThreddsCatalog(self.urlpath, driver=self.driver)
     98             for i in range(len(self.path)):
     99                 part = self.path[i]

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/cat.py in __init__(self, url, driver, **kwargs)
     28         self.url = url
     29         self.driver = driver
---> 30         super().__init__(**kwargs)
     31 
     32     def _load(self):

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake/catalog/base.py in __init__(self, entries, name, description, metadata, ttl, getenv, getshell, persist_mode, storage_options)
     98         self.updated = time.time()
     99         self._entries = entries if entries is not None else self._make_entries_container()
--> 100         self.force_reload()
    101 
    102     @classmethod

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake/catalog/base.py in force_reload(self)
    156         """Imperative reload data now"""
    157         self.updated = time.time()
--> 158         self._load()
    159 
    160     def reload(self):

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/cat.py in _load(self)
     79 
     80         self._entries.update(
---> 81             {
     82                 ds.name: LocalCatalogEntry(
     83                     ds.name,

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/cat.py in <dictcomp>(.0)
     85                     self.driver,
     86                     True,
---> 87                     {'urlpath': access_urls(ds, self), 'chunks': {}},
     88                     [],
     89                     [],

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/intake_thredds/cat.py in access_urls(ds, self)
     73             elif self.driver == 'netcdf':
     74                 driver_for_access_urls = 'HTTPServer'
---> 75             url = ds.access_urls[driver_for_access_urls]
     76             if 'fsspec_pre_url' in self.metadata.keys():
     77                 url = f'{self.metadata["fsspec_pre_url"]}{url}'

~/opt/miniconda3/envs/test_env/lib/python3.9/site-packages/siphon/catalog.py in __getitem__(self, key)
    219     def __getitem__(self, key):
    220         """Return value from case-insensitive lookup of ``key``."""
--> 221         return super(CaseInsensitiveDict, self).__getitem__(CaseInsensitiveStr(key))
    222 
    223     def __setitem__(self, key, value):

Unnecessarily Large Data Request

I'm not sure if this is a bug report, feature request, or user error. I'm trying to access a giant dataset from the NCAR RDA in a smart way (only downloading what's necessary for the calculation), but a large data request is made anyway that exceeds the server's 500 MB limit.

Here's my code:

import numpy as np
import xarray as xr
from dask.diagnostics import ProgressBar
import intake


wrf_url = ('https://rda.ucar.edu/thredds/catalog/files/g/ds612.0/'
           'PGW3D/2006/catalog.xml')
catalog_u = intake.open_thredds_merged(wrf_url, path=['*_U_2006060*'])
catalog_v = intake.open_thredds_merged(wrf_url, path=['*_V_2006060*'])

ds_u = catalog_u.to_dask()
ds_u['U'] = ds_u.U.chunk("auto")
ds_v = catalog_v.to_dask()
ds_v['V'] = ds_v.V.chunk("auto")
ds = xr.merge((ds_u, ds_v))


def unstagger(ds, var, coord, new_coord):
    var1 = ds[var].isel({coord: slice(None, -1)})
    var2 = ds[var].isel({coord: slice(1, None)})
    return ((var1 + var2) / 2).rename({coord: new_coord})


with ProgressBar():
    ds['U_unstaggered'] = unstagger(ds, 'U', 'west_east_stag', 'west_east')
    ds['V_unstaggered'] = unstagger(ds, 'V', 'south_north_stag', 'south_north')
    ds['speed'] = np.hypot(ds.U_unstaggered, ds.V_unstaggered)
    ds.speed.isel(bottom_top=10).sel(Time='2006-06-07T18:00').plot()

This fails with

Traceback (most recent call last):
  File "/home/decker/classes/met325/rda_plot.py", line 29, in <module>
    ds.speed.isel(bottom_top=10).sel(Time='2006-06-07T18:00').plot()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/plot/plot.py", line 862, in __call__
    return plot(self._da, **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/plot/plot.py", line 293, in plot
    darray = darray.squeeze().compute()
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/dataarray.py", line 951, in compute
    return new.load(**kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/dataarray.py", line 925, in load
    ds = self._to_temp_dataset().load(**kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/dataset.py", line 862, in load
    evaluated_data = da.compute(*lazy_data.values(), **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/base.py", line 571, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/threaded.py", line 79, in get
    results = get_async(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/local.py", line 507, in get_async
    raise_exception(exc, tb)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/local.py", line 315, in reraise
    raise exc
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/local.py", line 220, in execute_task
    result = _execute_task(task, data)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/dask/array/core.py", line 116, in getter
    c = np.asarray(c)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 357, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 521, in __array__
    return np.asarray(self.array, dtype=dtype)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 422, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/conventions.py", line 62, in __getitem__
    return np.asarray(self.array[key], dtype=self.dtype)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 422, in __array__
    return np.asarray(array[self.key], dtype=None)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/pydap_.py", line 39, in __getitem__
    return indexing.explicit_indexing_adapter(
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/core/indexing.py", line 711, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/pydap_.py", line 47, in _getitem
    result = robust_getitem(array, key, catch=ValueError)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/xarray/backends/common.py", line 64, in robust_getitem
    return array[key]
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/model.py", line 323, in __getitem__
    out.data = self._get_data_index(index)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/model.py", line 353, in _get_data_index
    return self._data[index]
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/handlers/dap.py", line 170, in __getitem__
    raise_for_status(r)
  File "/home/decker/local/miniconda3/envs/met325/lib/python3.10/site-packages/pydap/net.py", line 38, in raise_for_status
    raise HTTPError(
webob.exc.HTTPError: 403 403

because the data request is too large.

Folks at NCAR tell me the request comes across as

rda.ucar.edu/thredds/dodsC/files/g/ds612.0/PGW3D/2006/wrf3d_d01_PGW_U_20060607.nc.dods?U%5B0:1: 7%5D%5B0:1:49%5D%5B0:1:1014%5D%5B0:1:1359%5D

essentially pulling an entire variable.

Is what I'm trying to do supposed to work?

I can use siphon directly w/o issue:

import numpy as np
import matplotlib.pyplot as plt
from siphon.catalog import TDSCatalog

catUrl = ('https://rda.ucar.edu/thredds/catalog/files/g/ds612.0/'
          'PGW3D/2006/catalog.xml')
catalog = TDSCatalog(catUrl)
U_file = 'wrf3d_d01_PGW_U_20060718.nc'
V_file = 'wrf3d_d01_PGW_V_20060718.nc'
ds = catalog.datasets[U_file]
dataset = ds.remote_access()
u = dataset.variables['U']
ds = catalog.datasets[V_file]
dataset = ds.remote_access()
v = dataset.variables['V']
speed = np.hypot(u[1, 10, 0:1014, 0:1359], v[1, 10, 0:1014, 0:1359])
plt.imshow(speed)
plt.show()

but in that case I don't have all the xarray niceties w/o extra work.

Run a local thredds service (for testing purposes) instead of a real connection

For the future, we may wish to run a local thredds service instead of a real connection, e.g., https://github.com/Unidata/thredds-docker

Originally posted by @martindurant in #10 (comment)

Transfer repository to intake organization

intake-thredds has been in a working state for a while, and I was wondering if we could transfer it to the intake GitHub organization so as to increase its visibility.

@martindurant, are there any checklist or guidelines that need to be followed for a project to be accepted under the intake GitHub organization?

issue opening multiple obs4MIP

TDS Catalog: https://dpesgf03.nccs.nasa.gov/thredds/catalog/esgcet/legacy/obs4MIPs.ECMWF.ERA-interim.atmos.mon.v20160614.html

HTTPServer: https://dpesgf03.nccs.nasa.gov/thredds/fileServer/obs4MIPs/ECMWF/assimilation/obs4MIPs/reanalysis/atmos/ua/mon/grid/ECMWF/assimilation/V1.0/ua_assimilation-ECMWF_level-4_v1.0_201201-201212.nc

import intake_thredds
intake_thredds.THREDDSMergedSource(url='https://dpesgf03.nccs.nasa.gov/thredds/catalog/esgcet/legacy/obs4MIPs.ECMWF.ERA-interim.atmos.mon.v20160614.xml', path='ua_assimilation-ECMWF_level-4_v1.0_201201-201212.nc', driver='netcdf').to_dask()
...
KeyError: 'HTTPServer'

release?

Following https://twitter.com/realaaronspring/status/1401865508585082882?s=20

I wouldn't mind trying the concat_kwargs arg.

Any interest in releasing? Happy to help out if needed

.

Add to intake page?

https://intake.readthedocs.io/en/latest/plugin-directory.html

enable xarray_kwargs

likely connected to #26

currently there is no way to specify kwargs to be passed to xarray.open_dataset.

import intake_xarray

intake_xarray.NetCDFSource('simplecache::https://www.ncei.noaa.gov/thredds/fileServer/model-gefs-003/202008/20200831/gensanl-b_3_20200831_1800_000_20.grb2',
                           xarray_kwargs=dict(engine='cfgrib', backend_kwargs=dict(filter_by_keys={'typeOfLevel': 'meanSea'})),
).to_dask()

but we do not allow such xarray_kwargs in intake-thredds

add xarray_kwargs to ThreddsCatalog

I'm looking at GFS data e.g. https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/catalog.html?dataset=grib/NCEP/GFS/Global_0p25deg/Best

you can't get it through pydap as

import xarray as xr
ds = xr.open_dataset("https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p25deg/Best", engine="pydap")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 69191: ordinal not in range(128)

pydap/pydap#196
I may create an issue in xarray to workout how to set pydap/pydap#196 (comment) in the above call maybe using a backeng_kwargs.

Therefore, you have to get it through netcdf. e.g

import xarray as xr
ds = xr.open_dataset("https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p25deg/Best", engine="netcdf4")
ds = xr.open_dataset("https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p25deg/GFS_Global_0p25deg_20210913_1800.grib2", engine="netcdf4")

with netcdf as the driver here it's required to have a HTTPServer e.g. https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/GFS_Global_0p25deg_20210913_1800.grib2/catalog.html?dataset=grib/NCEP/GFS/Global_0p25deg/GFS_Global_0p25deg_20210913_1800.grib2

Using this with ThreddsCatalog will pass it though to intake_xarray.netcdf.NetCDFSource.

However, intake_xarray.netcdf.NetCDFSource can't guess the engine so it would be nice to specify that in the ThreddsCatalog stage.

cat_url = "https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p25deg/GFS_Global_0p25deg_20210913_1800.grib2/catalog.xml"

import intake
catalog = intake.open_thredds_cat(cat_url, driver="netcdf")
# nice to do intake.open_thredds_cat(cat_url, driver="netcdf", xarray_kwargs=dict(engine="netcdf4"))
source = catalog["GFS_Global_0p25deg_20210913_1800.grib2"]
ds = source().to_dask()
ValueError                                Traceback (most recent call last)
/var/folders/rf/26llfhwd68x7cftb1z3h000w0000gp/T/ipykernel_837/3050318223.py in <module>
----> 1 ds = source().to_dask()
      2 ds

~/miniconda3/envs/main/lib/python3.9/site-packages/intake_xarray/base.py in to_dask(self)
     67     def to_dask(self):
     68         """Return xarray object where variables are dask arrays"""
---> 69         return self.read_chunked()
     70 
     71     def close(self):

~/miniconda3/envs/main/lib/python3.9/site-packages/intake_xarray/base.py in read_chunked(self)
     42     def read_chunked(self):
     43         """Return xarray object (which will have chunks)"""
---> 44         self._load_metadata()
     45         return self._ds
     46 

~/miniconda3/envs/main/lib/python3.9/site-packages/intake/source/base.py in _load_metadata(self)
    234         """load metadata only if needed"""
    235         if self._schema is None:
--> 236             self._schema = self._get_schema()
    237             self.dtype = self._schema.dtype
    238             self.shape = self._schema.shape

~/miniconda3/envs/main/lib/python3.9/site-packages/intake_xarray/base.py in _get_schema(self)
     16 
     17         if self._ds is None:
---> 18             self._open_dataset()
     19 
     20             metadata = {

~/miniconda3/envs/main/lib/python3.9/site-packages/intake_xarray/netcdf.py in _open_dataset(self)
     90             url = fsspec.open(self.urlpath, **self.storage_options).open()
     91 
---> 92         self._ds = _open_dataset(url, chunks=self.chunks, **kwargs)
     93 
     94     def _add_path_to_ds(self, ds):

~/miniconda3/envs/main/lib/python3.9/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    479 
    480     if engine is None:
--> 481         engine = plugins.guess_engine(filename_or_obj)
    482 
    483     backend = plugins.get_backend(engine)

~/miniconda3/envs/main/lib/python3.9/site-packages/xarray/backends/plugins.py in guess_engine(store_spec)
    146         )
    147 
--> 148     raise ValueError(error_msg)
    149 
    150 

ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'h5netcdf', 'scipy', 'cfgrib', 'pydap', 'rasterio', 'zarr']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
http://xarray.pydata.org/en/stable/getting-started-guide/installing.html
http://xarray.pydata.org/en/stable/user-guide/io.html

circle CI fail before and after #3

I tried to play around with #3 but I didnt get it running. @andersy005 @martindurant

Then I saw that circle CI had been failing already before #3 https://app.circleci.com/pipelines/github/NCAR/intake-thredds?branch=master

Maybe we need to use OpendapSource from intake_xarray instead of NetcdfSource? xr.open_dataset() for the underlying links
https://gist.github.com/aaronspring/fa96793675c94504c5158610e4f8e007

BTW: this is a really neat jupyterlab extension for THREDDS: https://github.com/eWaterCycle/jupyterlab_thredds

Can I choose `parallel=True` like in `xr.open_mfdataset`?

Hi! I stumbled across this intake driver and it looks like what I need, so thank you so much for your work!

I have already made an example for myself where I have a bunch of file locations that are on a thredds server, and then I read them in with xr.open_mfdataset(), like so:

ds = xr.open_mfdataset(filelocs, drop_variables=['siglay','siglev','Itime2'], parallel=True, compat='override', 
                       combine='by_coords', data_vars='minimal', coords='minimal')

This ended up being about 2 minutes for 127 files.

But, I need to get this combined dataset represented in an intake catalog, which is where intake-thredds comes in. I think I have properly mapped the keywords I used in xr.open_mfdataset to the API in intake-thredds like this:

cat_url = 'https://opendap.co-ops.nos.noaa.gov/thredds/catalog/NOAA/LEOFS/MODELS/catalog.xml'
source = intake.open_thredds_merged(
         cat_url, path=[date.strftime('%Y'),
                   date.strftime('%m'),
                   date.strftime('%d'),
                   date.strftime(f'nos.leofs.fields.????.%Y%m%d.t12z.nc')],
    concat_kwargs={"dim": "time",
                  'data_vars': 'minimal',
                   'coords': 'minimal',
                   'compat': 'override',
                   'combine_attrs': "override"
                  },
    xarray_kwargs=dict(
        drop_variables=['siglay','siglev','Itime2'], 
    ),

)

But, when I then try to look at the resultant combined lazily loaded Dataset with source.to_dask(), it takes forever to try to load and breaks with "Bad Gateway" before it can finish. The only difference I think is that I don't see how to use parallel=True in the call to intake.open_thredds_merged which I used in calling xr.open_mfdataset.

Is there a way to use parallel=True? Thank you for your help!