Giter VIP home page Giter VIP logo

subseasonal_data's Introduction

Subseasonal Data Python Package

The subseasonal_data package provides an API for loading and manipulating the SubseasonalClimateUSA dataset developed for training and benchmarking subseasonal forecasting models. Here, subseasonal refers to climate and weather forecasts made 2-6 weeks in advance. See DATA.md for a description of dataset contents, sources, and processing.

Getting Started

  • Install the subseasonal data package: pip install subseasonal-data
  • Define the environment variable $SUBSEASONALDATA_PATH to point to your desired data directory; any accessed data files will be read from, saved to, or synced with this directory

This package is compatible with Python version 3.6+.

The underlying data is made available through Azure and is updated through a daily data collection and processing pipeline. To download the data through this package, you will need to have the Azure Storage CLI azcopy installed on your machine.

Usage Examples

Detailed usage examples are provided in the Getting Started and Examples notebooks in the examples folder. It is recommended you start there.

Quick examples:

  • Download all data

WARNING: This requires an estimated 175GB disk space.

from subseasonal_data import downloader

downloader.download()
  • List files in a data directory
downloader.list_subdir_files(data_subdir="combined_dataframes")
  • Download one data file
downloader.download_file(
    data_subdir="combined_dataframes", 
    filename="all_data-us_precip_34w.feather", 
    verbose=True)
  • Load ground truth data
from subseasonal_data import data_loaders

# Loads into a Pandas dataframe
df = data_loaders.get_ground_truth("us_precip")
  • Load combined dataframes
data_loaders.load_combined_data("all_data", "us_tmp2m", "34w")

See the Examples.ipynb notebook for an example on how to retrieve historical temperature data using the subseasonal_data package.

Usage Example

For Developers

Installation

Install from source in editable mode using pip install -e . in this directory or pip install -e path/to/directory from another directory.

Running tests

To test your installation, run python -m unittest [test_name].py from the subseasonal_data/tests directory or python -m unittest path/to/tests/folder/[test_name].py. Example:

python -m unittest subseasonal_data/tests/test_data_loaders.py

Generating Documentation

This project's documentation is generated via Sphinx. The HTML theme used is the Read the Docs sphinx theme which also needs to be installed.

To generate a local copy of the documentation from a clone of this repository, run python setup.py build_sphinx -W -E -a, which will build the documentation and place it under the build/sphinx/html path.

The reStructuredText files that make up the documentation are stored in the docs directory; module documentation is automatically generated by the Sphinx build process.

Data Usage and Citation

The SubseasonalClimateUSA dataset is released under a CC BY 4.0 license, and the subseasonal_data repository code is released under an MIT license.

If you make use of the subseasonal_data package or the SubseasonalClimateUSA dataset, please acknowledge the Python package, the individual data sources described in DATA.md, and the associated SubseasonalClimateUSA publication:

SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking
Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, and Lester Mackey. Advances in Neural Information Processing Systems (NeurIPS). Dec. 2023.

@InProceedings{mouatadid2023subseasonal,
  title = {SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking},
  author = {Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, and Lester Mackey},
  booktitle = {Advances in Neural Information Processing Systems},
  year = {2023},
  volume = {36},
  publisher = {Curran Associates, Inc.},
  editor = {A. Oh and T. Naumann and A. Globerson and K. Saenko and M. Hardt and S. Levine},
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

subseasonal_data's People

Contributors

geflaspohler avatar lmackey avatar paulo-o avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

subseasonal_data's Issues

Datasets no longer accessible

Recieving an azcopy Login Credential error when attempting to download both the gt tmp2m and precip files.

azcopy copy https://subseasonalusa.blob.core.windows.net/subseasonalusa/dataframes/gt-us_tmax-1d.h5 . 
INFO: Scanning...
INFO: azcopy: A newer version 10.16.0 is available to download


failed to perform copy command due to error: Login Credentials missing. No SAS token or OAuth token is present and the resource is not public

SubX CFSv2 data access

I am trying to access the SubX CFSv2 temperature forecasts used in this project in order to do some slightly different preprocessing and am running into an error with the netCDF4 back-end of xarray.

import xarray as xa
x = xa.open_dataset(
  "https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.NCEP/.CFSv2/.forecast/.tas/dods",
    decode_times=False,
)
x['tas'][0, :, :, :, :].data 

causes the following error

File src/netCDF4/_netCDF4.pyx:4406, in netCDF4._netCDF4.Variable.__getitem__()

File src/netCDF4/_netCDF4.pyx:5348, in netCDF4._netCDF4.Variable._get()

IndexError: index exceeds dimension bounds

This also occurs when simply trying to access the data as x['tas'].data

Is this the way you access the data source for this project? If not would you mind detailing how you actually download the source files?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.