Giter VIP home page Giter VIP logo

datatree's Introduction

datatree

CI GitHub Workflow Status Code Coverage Status pre-commit.ci status
Docs Documentation Status
Package Conda PyPI
License License

Datatree is a prototype implementation of a tree-like hierarchical data structure for xarray.

Datatree was born after the xarray team recognised a need for a new hierarchical data structure, that was more flexible than a single xarray.Dataset object. The initial motivation was to represent netCDF files / Zarr stores with multiple nested groups in a single in-memory object, but datatree.DataTree objects have many other uses.

Why Datatree?

You might want to use datatree for:

  • Organising many related datasets, e.g. results of the same experiment with different parameters, or simulations of the same system using different models,
  • Analysing similar data at multiple resolutions simultaneously, such as when doing a convergence study,
  • Comparing heterogenous but related data, such as experimental and theoretical data,
  • I/O with nested data formats such as netCDF / Zarr groups.

Talk slides on Datatree from AMS-python 2023

Features

The approach used here is based on benbovy's DatasetNode example - the basic idea is that each tree node wraps a up to a single xarray.Dataset. The differences are that this effort:

  • Uses a node structure inspired by anytree for the tree,
  • Implements path-like getting and setting,
  • Has functions for mapping user-supplied functions over every node in the tree,
  • Automatically dispatches some of xarray.Dataset's API over every node in the tree (such as .isel),
  • Has a bunch of tests,
  • Has a printable representation that currently looks like this:

drawing

Get Started

You can create a DataTree object in 3 ways:

  1. Load from a netCDF file (or Zarr store) that has groups via open_datatree().
  2. Using the init method of DataTree, which creates an individual node. You can then specify the nodes' relationships to one other, either by setting .parent and .chlldren attributes, or through __get/setitem__ access, e.g. dt['path/to/node'] = DataTree().
  3. Create a tree from a dictionary of paths to datasets using DataTree.from_dict().

Development Roadmap

Datatree currently lives in a separate repository to the main xarray package. This allows the datatree developers to make changes to it, experiment, and improve it faster.

Eventually we plan to fully integrate datatree upstream into xarray's main codebase, at which point the github.com/xarray-contrib/datatree repository will be archived. This should not cause much disruption to code that depends on datatree - you will likely only have to change the import line (i.e. from from datatree import DataTree to from xarray import DataTree).

However, until this full integration occurs, datatree's API should not be considered to have the same level of stability as xarray's.

User Feedback

We really really really want to hear your opinions on datatree! At this point in development, user feedback is critical to help us create something that will suit everyone's needs. Please raise any thoughts, issues, suggestions or bugs, no matter how small or large, on the github issue tracker.

datatree's People

Contributors

andersy005 avatar benjaminwoods avatar bzah avatar dependabot[bot] avatar jbusecke avatar keewis avatar lsetiawan avatar malmans2 avatar pre-commit-ci[bot] avatar thewtex avatar tomnicholas avatar wroberts4 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.