Giter VIP home page Giter VIP logo

inspectds's Introduction

inspectds

GitHub release (latest by date) CI

A CLI utility to print metadata of datasets in various formats (e.g. NetCDF, zarr, GRIB etc)

powered by xarray

Prerequisites

You need the following binary dependencies:

  • Python >= 3.9
  • Optionally, eccodes, which is necessary for GRIB support.

Installation

The recommended way of installation is pipx:

pipx install inspectds

or if you want support for GRIB, SELAFIN or both:

pipx install 'inspectds[grib]'
pipx install 'inspectds[selafin]'
pipx install 'inspectds[all]'

If you want to install the latest development version from git, then use:

pipx install 'git+https://github.com/pmav99/inspectds.git#egg=inspectds[all]'

Usage

Netcdf

$ inspectds tests/data/example_1.nc

Dimensions: (lat: 5, level: 4, lon: 10, time: 1)
Coordinates:
  * lat      (lat) int32 20 30 40 50 60
  * lon      (lon) int32 -160 -140 -118 -96 -84 -52 -45 -35 -25 -15
  * level    (level) int32 1000 850 700 500
  * time     (time) datetime64[ns] 1996-01-01T12:00:00
Data variables:
    temp     (time, level, lat, lon) float32 ...
    rh       (time, lat, lon) float32 ...

Zarr

$ inspectds tests/data/store.zarr

Dimensions: (lat: 19, lon: 36, time: 12)
Coordinates:
  * lat      (lat) int64 -90 -80 -70 -60 -50 -40 -30 ... 30 40 50 60 70 80 90
  * lon      (lon) int64 -180 -170 -160 -150 -140 -130 ... 130 140 150 160 170
  * time     (time) datetime64[ns] 2001-01-31 2001-02-28 ... 2001-12-31
Data variables:
    aaa      (lon, lat, time) int64 ...

GRIB

$ inspectds tests/data/example.grib

Dimensions: (number: 2, time: 3, isobaricInhPa: 2, latitude: 3, longitude: 4)
Coordinates:
  * number         (number) int64 0 1
  * time           (time) datetime64[ns] 2017-01-01 ... 2017-01-02
    step           timedelta64[ns] ...
  * isobaricInhPa  (isobaricInhPa) float64 850.0 500.0
  * latitude       (latitude) float64 90.0 0.0 -90.0
  * longitude      (longitude) float64 0.0 90.0 180.0 270.0
    valid_time     (time) datetime64[ns] ...
Data variables:
    z        (number, time, isobaricInhPa, latitude, longitude) float32 ...
    t        (number, time, isobaricInhPa, latitude, longitude) float32 ...

SELAFIN

$ inspectds tests/data/iceland.slf
Dimensions: (time: 13, node: 3526)
Coordinates:
    x        (node) float32 14kB -13.99 -14.97 -15.89 ... -13.52 -16.31 -12.28
    y        (node) float32 14kB 57.38 60.19 69.79 63.11 ... 66.37 69.34 63.52
  * time     (time) datetime64[ns] 104B 2017-10-01 ... 2017-10-01T12:00:00
Data variables:
    S        (time, node) float32 183kB ...

More info:

$ inspectds --help

 Usage: inspectds [OPTIONS] PATH

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    path      PATH  The path to the dataset [default: None] [required]                                                                                                                                                                                     │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --dataset-type                                       [auto|netcdf|zarr|grib|selafin]  The dataset type. If 'auto', then it gets inferred from PATH [default: auto]                                                                                          │
│ --mask-and-scale         --no-mask-and-scale                                          Whether to mask and scale the dataset [default: no-mask-and-scale]                                                                                                    │
│ --dimensions             --no-dimensions                                              Whether to include 'Dimensions' in the output [default: dimensions]                                                                                                   │
│ --coordinates            --no-coordinates                                             Whether to include 'Coordinates' in the output [default: coordinates]                                                                                                 │
│ --variables              --no-variables                                               Whether to include 'Variables' in the output [default: variables]                                                                                                     │
│ --variable-attributes    --no-variable-attributes                                     Whether to include the variable attributes in the output [default: no-variable-attributes]                                                                            │
│ --global-attributes      --no-global-attributes                                       Whether to include the global attributes in the output [default: no-global-attributes]                                                                                │
│ --full                   --no-full                                                    Display full output. Overrides any other option [default: no-full]                                                                                                    │
│ --version                                                                             Display the version                                                                                                                                                   │
│ --help                                                                                Show this message and exit.                                                                                                                                           │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Development

mamba env create --file ci/py3.11.yml --name inspectds_dev
conda activate inspectds_dev
make init
make test

inspectds's People

Contributors

pmav99 avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

inspectds's Issues

Inspect remote files

For example:

$ inspectds http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/ncep.reanalysis/surface/air.sig995.2020.nc

Should output something like this:

<xarray.Dataset> Size: 62MB
Dimensions:  (lat: 73, lon: 144, time: 1464)
Coordinates:
  * lat      (lat) float32 292B 90.0 87.5 85.0 82.5 ... -82.5 -85.0 -87.5 -90.0
  * lon      (lon) float32 576B 0.0 2.5 5.0 7.5 10.0 ... 350.0 352.5 355.0 357.5
  * time     (time) datetime64[ns] 12kB 2020-01-01 ... 2020-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 62MB ...
Attributes:
    Conventions:                     COARDS
    title:                           4x daily NMC reanalysis (2014)
    history:                         created 2017/12 by Hoop (netCDF2.3)
    description:                     Data is from NMC initialized reanalysis\...
    platform:                        Model
    dataset_title:                   NCEP-NCAR Reanalysis 1
    _NCProperties:                   version=2,netcdf=4.6.3,hdf5=1.10.5
    References:                      http://www.psl.noaa.gov/data/gridded/dat...
    DODS_EXTRA.Unlimited_Dimension:  time

[pipx] Cannot determine package name from spec

I am trying to

pipx install 'git+https://github.com/pmav99/inspectds.git'

but I get

  ERROR: Command errored out with exit status 1:
   command: /tmp/tmp_rg9cnoj/bin/python /home/nik/.local/pipx/shared/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp0b5qvxbs
       cwd: /tmp/pip-req-build-_o9kujoi
  Complete output (14 lines):
  Traceback (most recent call last):
    File "/home/nik/.local/pipx/shared/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
      main()
    File "/home/nik/.local/pipx/shared/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/home/nik/.local/pipx/shared/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
      return hook(metadata_directory, config_settings)
    File "/tmp/pip-build-env-dh_nw11_/overlay/lib/python3.8/site-packages/poetry/core/masonry/api.py", line 43, in prepare_metadata_for_build_wheel
      poetry = Factory().create_poetry(Path(".").resolve(), with_dev=False)
    File "/tmp/pip-build-env-dh_nw11_/overlay/lib/python3.8/site-packages/poetry/core/factory.py", line 43, in create_poetry
      raise RuntimeError("The Poetry configuration is invalid:\n" + message)
  RuntimeError: The Poetry configuration is invalid:
    - Additional properties are not allowed ('group' was unexpected)

  ----------------------------------------
WARNING: Discarding git+https://github.com/pmav99/inspectds. Command errored out with exit status 1: /tmp/tmp_rg9cnoj/bin/python /home/nik/.local/pipx/shared/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp0b5qvxbs Check the logs for full command output.
ERROR: Command errored out with exit status 1: /tmp/tmp_rg9cnoj/bin/python /home/nik/.local/pipx/shared/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp0b5qvxbs Check the logs for full command output.
Cannot determine package name from spec 'git+https://github.com/pmav99/inspectds'. Check package spec for errors.

Trying to understand if the error is caused by some local misconfiguration or if something needs to be updated in the repository.

Allow dropping variables

There are netcdf files that are not compatible with xarray (source). We should either allow the user to drop the problematic variables or automatically detect them and drop them.

Fail Graciously when inspecting a GRIB and GRIB support is not available

We now throw an exception similar to this:


inspectds /home/user/Downloads/20220725.00.tropical_cyclone.grib
Traceback (most recent call last):

  File "/home/user/.local/bin/inspectds", line 8, in <module>
    sys.exit(app())
             ^^^^^

  File "/home/user/.local/pipx/venvs/inspectds/lib/python3.11/site-packages/inspectds/cli.py", line 126, in inspect_dataset
    dataset_type = infer_dataset_type(path)
                   ^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/user/.local/pipx/venvs/inspectds/lib/python3.11/site-packages/inspectds/cli.py", line 96, in infer_dataset_type
    dataset_type = DATASET_TYPE.GRIB
                   ^^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.11/enum.py", line 786, in __getattr__
    raise AttributeError(name) from None

AttributeError: GRIB

We should suggest installing cfgrib instead.

Use Annotated

The new typer API suggests using Annotated. We should switch to that.

Use indexpath="" on GRIB files

Can't create file 'GRIB/ssrd/ssrd_CDS_era5_2018.grib.923a8.idx'
Traceback (most recent call last):
  File "/home/mavropa/.local/pipx/venvs/inspectds/lib/python3.8/site-packages/cfgrib/messages.py", line 343, in from_indexpath_or_filestream
    with compat_create_exclusive(indexpath) as new_index_file:
  File "/scratch/mavropa/conda/jeodpp/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/mavropa/.local/pipx/venvs/inspectds/lib/python3.8/site-packages/cfgrib/messages.py", line 264, in compat_create_exclusive
    fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_EXCL)
OSError: [Errno 30] Read-only file system: 'GRIB/ssrd/ssrd_CDS_era5_2018.grib.923a8.idx'
Can't read index file 'GRIB/ssrd/ssrd_CDS_era5_2018.grib.923a8.idx'
Traceback (most recent call last):
  File "/home/mavropa/.local/pipx/venvs/inspectds/lib/python3.8/site-packages/cfgrib/messages.py", line 353, in from_indexpath_or_filestream
    index_mtime = os.path.getmtime(indexpath)
  File "/scratch/mavropa/conda/jeodpp/lib/python3.8/genericpath.py", line 55, in getmtime
    return os.stat(filename).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: 'GRIB/ssrd/ssrd_CDS_era5_2018.grib.923a8.idx'

Load coordinates by default

I.e. instead of this:

Dimensions:     (step: 73, latitude: 285, longitude: 313)
Coordinates:
    number      int64 ...
    time        datetime64[ns] ...
  * step        (step) timedelta64[ns] 00:00:00 01:00:00 ... 3 days 00:00:00
    surface     float64 ...
  * latitude    (latitude) float64 74.97 74.9 74.83 74.76 ... 55.15 55.08 55.01
  * longitude   (longitude) float64 -29.95 -29.88 -29.81 ... -8.085 -8.015
    valid_time  (step) datetime64[ns] ...

display this:

Dimensions:     (step: 73, latitude: 285, longitude: 313)
Coordinates:
    number      int64 0
    time        datetime64[ns] 2018-10-01
  * step        (step) timedelta64[ns] 00:00:00 01:00:00 ... 3 days 00:00:00
    surface     float64 0.0
  * latitude    (latitude) float64 74.97 74.9 74.83 74.76 ... 55.15 55.08 55.01
  * longitude   (longitude) float64 -29.95 -29.88 -29.81 ... -8.085 -8.015
    valid_time  (step) datetime64[ns] 2018-10-01 ... 2018-10-04

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.