Giter VIP home page Giter VIP logo

alliander-opensource / weather-provider-api Goto Github PK

View Code? Open in Web Editor NEW
25.0 5.0 6.0 3.91 MB

This API is intended to help you fetch weather data from different data sources in an efficient and uniform way. By just supplying a list of locations and a time window you can get data for a specific source immediately. This project is licensed under the MPL-2.0 license.

License: Mozilla Public License 2.0

Python 99.56% Dockerfile 0.44%
weather-api cds knmi

weather-provider-api's Introduction

License: MIT Quality Gate Status Maintainability Rating Security Rating Vulnerabilities Bugs

Weather Provider Library and API

This API is intended to help you fetch weather data from different data sources in an efficient and uniform way. By just supplying a list of locations and a time window you can get data for a specific source immediately.

This project can currently be found on the following location: https://github.com/alliander-opensource/Weather-Provider-API

For more information also check out this webinar:

Webinar Weather Provider API

The project uses a number of data sources for the acquisition of weather data. Currently being supported by this API are the following weather data sources:

DATA SOURCE #1: KNMI Historical data per day / hour

Consists of the data from 35 weather stations for temperature, sun, cloud, air pressure, wind and precipitation.

A full description of available weather variables is available for the data per day: http://projects.knmi.nl/klimatologie/daggegevens/selectie.cgi

A full description for the data per hour consists only of a subset of the previous list: http://projects.knmi.nl/klimatologie/uurgegevens/selectie.cgi

DATA SOURCE #2: KNMI prediction data (14 day prediction, per block of 6 hours)

Prediction data for weather stations: De Bilt, Den Helder(De Kooy), Groningen(Eelde), Leeuwarden, Maastricht(Beek), Schiphol, Twente en Vlissingen

Available weather variables are temperature, wind, precipitation, cape for summer, and snow for winter.

An interactive graph can be found at:
https://www.knmi.nl/nederland-nu/weer/waarschuwingen-en-verwachtingen/weer-en-klimaatpluim

DATA SOURCE #3: KNMI prediction data (48 hour, per hour prediction)

Prediction data is updated every 6 hours (00, 06, 12 and 18 UTC+00) based on the HARMONIE AROME model of KNMI.

Geographical resolution is 0.037 grades west-east and 0.023 grades north-south.

A full description of available weather variables is available at: https://www.knmidata.nl/data-services/knmi-producten-overzicht/atmosfeer-modeldata/data-product-1

DATA SOURCE #4: KNMI current weather data()

Actuele waarnemingen

DATA SOURCE #5: CDS (Climate Data Store) hourly data from 1979 to present

ERA5 is the fifth generation ECMWF (European Centre for Medium Range Weather Forecast) atmospheric reanalysis of the global climate. ERA5 data released so far covers the period from 1979 to 2-3 months before the present. ERA5 provides worldwide data for temperature and pressure, wind (at 100 meter height), radiation and heat, clouds, evaporation and runoff, precipitation and rain, snow, soil, etc. The spatial resolution of the data set is approximately 80 km.

A full description of available weather variables is available at: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview

NOTE: The Weather Provider Library and API currently only stores only a selection of the available variables in its archives.

Input parameters to use the API

General input parameters

  • coords: nested 3-layer list representing a list of polygon. In case of points, they are treated as one-point polygon
  • start: start time of output data, for prediction data this is not needed
  • end: end time of output data, for prediction data this is not needed
  • data_time: history or prediction data
  • data_source: KNMI, Climate Data Store(CDS, not available yet), DarkSky (not available yet)
  • data_timestep: day, hour, or day_part (6 hour)
  • weather_factors: list of weather factors, default is all available weather factors
  • output_unit: org (original), human_readable or SI (International System of Units), default is the original names and units in data sources.

Case specific input parameters

Choosing (a group of) weather variables for historical data from KNMI

For historical data from KNMI the value for weather_factors in input can be a list of desired variables in random order, indicated by their acronyms separated by ':', for example TG: TN: EV24.

The following acronyms are defined to indicate groups of variables:

  • WIND = DDVEC:FG:FHX:FHX:FX - wind
  • TEMP = TG:TN:TX:T10N - temperature
  • SUNR = SQ:SP:Q - sunshine duration and global radiation
  • PRCP = DR:RH:EV24 - precipitation and evaporation
  • PRES = PG:PGX - pressure at sea level
  • VICL = VVN:VVX:NG - visibility and clouds
  • MSTR = UG:UX:UN - humidity
  • ALL - all variables (default)

Choosing the name and unit for output

The output data from the four data sources of KNMI may have different names and units for the same weather variable, which may not easy to use in analytics.

This API provides an option to chose a standard name/unit for the mostly used weather variables, see table below. The value of output_unit in input can be set to:

  • org: to keep the originally used names and units
  • SI: to convert the variable-names into SI/human readable name, and convert the units into SI units
  • human: to convert the variable-names into SI/human readable name, and convert the units into human-readable units.
Hist day name Hist day unit Hist hour name Hist hour unit Forecast 14d name Forecast 14d unit Forecast 48h name Forecast 48h unit SI/Human readable name SI unit Human readable unit
FG 0.1 m/s FH 0.1 m/s wind_speed km/uur wind_speed m/s m/s
FHX 0.1 m/s FX 0.1 m/s wind_speed_max m/s m/s
TG 0.1 celsius T 0.1 celsius temperature celsius 2T K temperature K celsius
Q J/cm2 Q J/cm2 GRAD J m**-2 global_radiation J/m2 J/m2
RH 0.1 mm RH 0.1 mm precipitation mm precipitation m mm
PG 0.1 hPa P 0.1 hPa LSP Pa air_pressure Pa Pa
NG [1,2…9] N [1,2…9] cloud_cover [1,2…9] [1,2…9]
UG % U % humidity % %

The CDS data uses only SI units, and as such there is no distinction between org and si .

Getting started - using as a package/project

Prerequisites

This package is supported from Python 3.8 or later. See '''requirements.txt''' for a list of dependencies. This package works under at least Linux and Windows environments. (Other Operating Systems not tested)

Installing

  1. Clone the repo
  2. Navigate to root
  3. Install the dependencies using conda/pip or both, depending on your environment
conda install --file requirements.txt
pip install -r requirements.txt
  1. Ready for use!

Using as a full project

The full API can now be run by executing: main.py With the exception of ERA5 Single Levels and Harmonie Arome data, every data source can now be accessed using either the created end points or the API docs interface at the running location. (127.0.0.1:8080 when running locally)

Specific calls can now be run by executing the proper command. For examples, check out the \bin folder.

Using as a wheel

Install the wheel into your project environment and import the required classes. Usually this will be either a specific Weather Model or the Weather Controller.

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Contact

To contact the project owners directly please e-mail us at [email protected]

Authors

This project was initially created by:

  • Tongyou Gu - Original API development
  • Jeroen van de Logt - Functions in utilities
  • Bas Niesink - Implementation weather REST API
  • Raoul Linnenbank - Active API Development, Geo positioning, CDS ERA5, caching, remodeling, Harmonie Arome and optimisation

Currently, this project is governed in an open source fashion, this is documented in PROJECT_GOVERNANCE.

License

This project is licensed under the Mozilla Public License, version 2.0 - see LICENSE for details

Licenses third-party code

This project includes third-party code, which is licensed under their own respective Open-Source licenses. SPDX-License-Identifier headers are used to show which license is applicable. The concerning license files can be found in the LICENSES directory.

Acknowledgments

Thanks to team Inzicht & Analytics and Strategie & Innovatie to make this project possible.

A big thanks as well to Alliander for being the main sponsor for this open source project.

And of course a big thanks to the guys of IT New Business & R&D to provide such an easy-to-use Python environment in the cloud.

weather-provider-api's People

Contributors

dependabot[bot] avatar jonasvdbo avatar rflinnenbank avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

weather-provider-api's Issues

WPLA 2.x\\CHANGE REQUEST - Save the Harmonie Arome weather forecast data for three years instead of one year

For a subsidy project (Alliander together with VSL, universities and other international partners) we would like to use the historical weather prediction harmonie. In this project the uncertainty of load forecase (based on weather forecast) will be studied. In order to gather enough data to do the analysis, we need the historical forecast for a longer time, e.g. 3 years.

Request: the historical data for the past year is saved in current version. Can it be adjusted to the past 3 years?

WPLA Dev 3.0 \\ Reconstruction of the original version 2.x models for version 3.0

The following features are included:

  • The new models should support v2.x identical output via the harmonizing utilities.
  • The new models should be based on the optimal source for the data they entail.
    (if for instance the data is available on both the KNMI Data Platform as well as the site itself, the KDP should be used. If the data is available in multiple formats, the format should be used that is the least likely to cause problems translating into harmonized NetCDF4)
    For the known formats (excluding GRIB) this order is, from preferred to not preferred:
    NetCDF4, NetCDF3, json, csv, txt
    Formats not mentioned should be judged based on the likely amount of effort required to format them to the harmonized NetCDF4 file format.
    Note: GRIB is not rated as it is not a file format as much as it is a content format. A GRIB file's contents could be harder to parse than trying to translate data from multiple website pages, or almost as easy as loading a NetCDF4 file directly. A secondary reason is that parsing GRIB files requires pygrib or cfgrib. Both have issues operating properly within a Windows environment system.
  • All of the code has both proper sphinx-proof documentation and full unit test coverage.

WPLA 3.x \\ Feature: The addition of a new model (and source) that contain the weather / forecast for the upcoming years

A request has been made to add a new model (and if needed source) to the Weather Provider Libraries and API.

This would be an original addition to the current set of sources and models in the sense that this would be a model requesting data for future dates specifically and not dealing with regular forecasts, but rather extensive predictions based past weather combined with climate weather change estimates.

This new model would be based on the Future climate projections of surface weather variables, wind power, and solar power capacity factors across North-West Europe dataset also used in the CLEARHEADS project (a project with its focus on predicting the effects of climate change on future weather).

--- Original Feature Request as posted by Tongyou Gu---

This question is originally from Dominic Hauck by Alliander. Their team has found that there is a strong correlation between the solar data and voltage complains. In order to make a prognose for the voltage complains in the coming years, their team need the weather/climate forecast. They have for now a work around (calculate the prognose based on historical data). This new data request would be interesting for later.

A possible data source (thank you @raoul):
Als onderdeel van het CLEARHEADS project om voorspellende uitspraken te doen over de beste plaatsingen van windmolens en zonnepanelen met het oog op de klimaatveranderingen is voor naar ik begrepen heb voor heel (noordwestelijk) Europa een voorspellende weer dataset opgebouwd.

Informatie over het project in kwestie in de vorm van het webinar:

CLEARHEADS Data Showcase Webinar - Hannah Bloomfield - YouTube

De dataset voor het aankomende weer:

Future climate projections of surface weather variables, wind power, and solar power capacity factors across North-West Europe - University of Reading Research Data Archive

--- END: Original Feature Request ---

WPLA Dev 3.0 \\ WeatherModel, WeatherSource and WeatherController development

The following features are included:

  • Base functionality for the three classes is operational.
    This includes required attribute detection and the related error handling, abstract method definitions and documentation, hookup of the model to source, and the source to the controller and placeholders for advanced functionality that will be included later. External models and sources, as well as the configuration of those source through manual files should also be possible. Output is limited to only the harmonized NetCDF4 response for the get_weather() function at this stage.
  • The full harmonization system.
  • A dummy WeatherModelBase class and a dummy WeatherSourceBase class are constructed as both a tutorial for users, as wel as the testing of base functionality of the three classes.
  • All of the code has both proper sphinx-proof documentation and full unit test coverage.

WPLA 2.x\\ BUG - Precipitation always zero with KNMI "Uurgegevens"

The Bug:
When using any other format than NetCDF (3 or 4), the returned value for "precipitation" (internal name "RH") is always 0, regardless of the actual amount of rain.

To Reproduce

  1. Request the precipitation for a period with heavy precipitation in any text-based format.
  2. Compare with the NetCDF(4) output for the same period.

Expected behavior
The value should be indicative of the precipitation.

Additional context
From the primary investigation the issue already seems clear:

The precipitation has a default harmonization level of meters per hour.

This was likely still one of the initial settings, as even the hardiest rainy weather in the Netherlands hasn't surpassed 94 mm per hour yet. Because even 94mm is only 0.09m and the average rain in an hour amounts more to about 1 to 2 mm (a.k.a. 0.001m to 0.002m) the cause is clear.

In text based files, values are automatically rounded to 2 decimals, meaning most values end up as 0 or -1.

Screenshots
afbeelding

Please note how the value remains 0, even though the graph itself and increases and decreases with the quantity.

WPLA Dev 3.0 \\ Set up GitHub Actions

The following features are included:

  • GitHub actions should cover at least the following:
    -- automated testing of new pull requests on the main branch
    -- triggered compiling of a new image for a new version release
    -- if needed for advanced functionality, interface setup between SonarCloud and GitHub
  • All of these actions should be tested under a number of extreme situations.

[FEATURE] Activate Depandabot

Is your feature request related to a problem? Please describe.
Make it easier to maintain your dependencies

Describe the solution you'd like
Dependabot takes the effort out of maintaining your dependencies. You can use it to ensure that your repository automatically keeps up with the latest releases of the packages and applications it depends on.

You enable Dependabot version updates by checking a configuration file into your repository. The configuration file specifies the location of the manifest, or of other package definition files, stored in your repository. Dependabot uses this information to check for outdated packages and applications. Dependabot determines if there is a new version of a dependency by looking at the semantic versioning (semver) of the dependency to decide whether it should update to that version.

Describe alternatives you've considered
Alternative is to do it mannually.

Additional context
For more information see: https://docs.github.com/en/code-security/supply-chain-security/keeping-your-dependencies-updated-automatically/about-dependabot-version-updates

Also github support Dependabot, see: https://github.com/alliander-opensource/power-grid-model/network/updates

WPLA Dev 3.0 \\ Setup outputfile formatting toolset

The following features are included:

  • Each file format has both a v.2x output mode and a v3.0 output mode.
    The 2.x output does not contain meta-data and flattens the data regardless of whether the output format supports multidimensional datasets.
    The 3.0 output does contain meta-data and retains its original dimensions if the output format can handle it.
  • At least the following formats for output should be supported for v2.x:
    NetCDF4, NetCDF3, json_data, json, csv
  • At least the following formats for output should be supported for v3.0:
    NetCDF4, NetCDF3, json_data, json, grib, excel, csv
    'excel' returns the 'xlsx' open format. 'grib' returns a reformat of the original NetCDF4 directly into GRIB using a standard transformation. (extended grib tools are not required)
  • All of the code has both proper sphinx-proof documentation and full unit test coverage.

WPLA 2.x\\ Sphinx Autodocumentation actions

Ideally we'd want to have an up to date documentation on everything the Weather Provider API v2.x entails ready on a Github Page.

To achieve this, we'd be setting up a Github Action to build and distribute Sphinx documentation upon version updates.

The following features are included:

  • A Github Action that builds the Sphinx documentation from scratch and places the results at the appropriate Github Page.

WPLA Dev 3.0 \\ Setup API request monitoring

While some metrics should be already available via Prometheus and a decent log interpreting system could pretty much get any required information as requested by scraping the logs, data on issues with the handling of requests, the parsing of data and the storing of files should be more readily available.

By either generating a secure custom interface or (if that is practical enough) using Prometheus the goal is a dashboard that allows for easy enhanced monitoring of the API while it's running. Dashboard configuration settings and methods should be included in the project.

WPLA 2.x\\ MODEL ISSUE - KNMI 'waarnemingen' model no longer appears to work

The model 'waarnemingen' for the source 'knmi' no longer appears to work as intended.

At the moment of writing this issue, a call now usually results in a 412 HTML code and only the output:
{"detail":"wind_direction"}

The most likely cause of this is a change in the way website is build.

Planned course of action:
To locally run model requests to find out what is initially gathered by the model and how does this result in this new output.

This issue was first noticed by Frank Bakker (Alliander)

WPLA Dev 3.0 \\ Set up structured harmonization tool set

Development runs simultaneously to the development of the core classes and the new versions of the old Models, to allow for adjustments on the go. After completion of the new versions, though the current code base for this tool set should be re-evaluated and (if needed) rebuild for more efficiency.

The following features are included:

  • Standard conversion tools for use within Xarray DataSets using pint_xarray. (easy conversion of unit system)
  • Time Formatter for easy swapping between the source data's original timezone and UTC.
  • Name Harmonization Element selector for building a standardized (or semi-standardized if not otherwise possible) name for a field.
  • All of the code has both proper sphinx-proof documentation and full unit test coverage.

Note for the installation of ECCODES under a Windows OS

Currently the entire installation of the Weather Provider API can be done by installing requirements.txt, and for Linux distributions this means you can start using the API immediately.

However for Windows installations there is still a small hurdle to overcome:
One of the dependencies needed for GRIB support (used with for instance the Harmonie Arome WeatherModel) is ECCODES. This dependency contains some tools and definitions needed to handle GRIB files properly. Unfortunately Windows isn't officially supported and though a fully tested roll-out is handled for a Windows Conda distribution, the PyPI roll-out is a singular version tested only for most Linux distributions and Mac OS.

This means the following:

  • If your windows environment has access to Conda you get everything to work by installing the python-eccodes package:
    conda install -c conda-forge python-eccodes
  • If your windows environment does not have access to Conda, you'll either have to manually get and compile the package from GitHub:
    https://github.com/ecmwf/eccodes-python/
    or download the appropriate file from:
    https://anaconda.org/conda-forge/eccodes/files
    and install it manually.
    (The pip installation should already have placed the folders to be replaced, where they're expected for the not quite working version)

Using the latest version supported by Conda should normally work without any problems.

WPLA 2.x \\ BUG - Couldn't select data from ERA5SL repositories.

The bug itself

One of the previous fixes that also brought with it version increases, while not breaking any tests or causing any errors, appears to have come with a bug:

  • When selecting data from the ERA5 repositories, no data could be found, even when directly addressing locations and times known to have been added to the Repositories themselves.

This bug occurs:

  • Always, regardless of OS or deployment method, when selecting data for the "ERA5 SL" or "ERA5 Land" models.

The cause:

The cause was a complicated one...

When constructing the final Xarray Dataset that becomes the source for the output, a field "coord" is build by combining the "lat" and "lon" index fields. This field (in short) allows the NetCDF4 and NetCDF3 output formats to allow for their usage with maps. While the creation of the field itself still works as intended, the creation meant a full rebuild of the existing multi-indexed index.

During this process some form of floating point issue occurred, causing all of the "lon"-values (and somehow only the "lon"-values) to be replaced by values that were slightly off from their 2 decimal counter-parts. This meant that a 3.2 value might suddenly be represented by a relative 3.200001 value, for example when comparing directly to a standard float value of 3.2.

This, of course, caused any comparison to the grid values to be off ever so slightly, resulting in no found data for that value, causing a error indicating "no data found"...

The solution:

As the actual value is still correct, if a different representation of the value, the only issue currently is the comparison of these values to the grid-rounded values of weather data requests. Because of that we'll only address the comparison, and let the actual output formats do the rest of the rounding.

WPLA Dev 3.0\\Sphinx Autodocumentation actions

Ideally we'd want to have an up to date documentation on everything the Weather Provider API v3.0 entails ready on a Github Page.

To achieve this, we'd be setting up a Github Action to build and distribute Sphinx documentation upon version updates.

The following features are included:

  • A Github Action that builds the Sphinx documentation from scratch and places the results at the appropriate Github Page.

WPLA Dev 3.0 \\ RESEARCH 03 - Solve issues with PyGRIB when running on Windows

The goal of WPLA is to be as OS independent as possible. All models, tools, API functionality, etc. should be available for any standard OS
(most Linux distributions, the currently supported Windows versions, and MacOS).

One thorn in this project's side on this has been the parsing of GRIB files.
For python the packages cfgrib and pygrib exist for handling GRIB files, but both of these require the ECCODES toolset for some less standardized access methods for the more complex GRIB files.

Unfortunately ECCODES isn't doing too great on Windows OS installations..

Project Goal:
Find a way to keep OS dependency at zero by either finding a reliable way to use the ECCODES toolset from within any Windows environment without forcing people to do things like install Conda, or by removing the ECCODES toolset from the equation entirely _(though this would likely mean moving away from cfgrib and pygrib...).

Note: One direction to start with is the observation that ECCODES for Windows mostly seems to not work anymore with python 3.7 nowadays. If this proves to be the case, the only people that would still have problems would be those running the API from Windows without Conda (manually installing ECCODES on Windows using Conda allows WPLA to fully work still, even if it is a bit of work).

Pygrib error in requirements.txt

cloned the repo and run the requirments.txt on a windows machine with python 3.9.2 in a venv.
got an error on the pygrib library when building the wheel. Is there an option to got this working?

machine i'm using:

  • windows 10
  • python 3.9.2
  • pycharm
  • powershell
  • venv enviorment

Error message I'm getting:

Building wheels for collected packages: pygrib
  Building wheel for pygrib (PEP 517) ... error
  ERROR: Command errored out with exit status 1:
   command: 'c:\users\user\appdata\local\programs\python\python39\python.exe' 'c:\users\user\appdata\local\programs\python\python39\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\user\AppData\Local\Temp\tmp6xs9yxku'
       cwd: C:\Users\user\AppData\Local\Temp\pip-install-zj4ok7qn\pygrib_ba5bdd98d0974d72b6da402f935163e7
  Complete output (21 lines):
  eccodes not found, build may fail...
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.9
  creating build\lib.win-amd64-3.9\pygrib
  copying pygrib\__init__.py -> build\lib.win-amd64-3.9\pygrib
  running build_ext
  cythoning pygrib/_pygrib.pyx to pygrib\_pygrib.c
  C:\Users\user\AppData\Local\Temp\pip-build-env-9hp1mogx\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-zj4ok7qn\pygrib_ba5bdd98d0974d72b6da402f935163e7\pygrib\_pygrib.pyx
    tree = Parsing.p_module(s, pxd, full_module_name)
  building 'pygrib._pygrib' extension
  creating build\temp.win-amd64-3.9
  creating build\temp.win-amd64-3.9\Release
  creating build\temp.win-amd64-3.9\Release\pygrib
  C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -Ic:\users\user\appdata\local\programs\python\python39\include -Ic:\users\user\appdata\local\programs\python\python39\include -IC:\Users\user\AppData\Local\Temp\pip-build-env-9hp1mogx\overlay\Lib\site-packages\numpy\core\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include -IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt /Tcpygrib\_pygrib.c /Fobuild\temp.win-amd64-3.9\Release\pygrib\_pygrib.obj
  _pygrib.c
  C:\Users\user\AppData\Local\Temp\pip-build-env-9hp1mogx\overlay\Lib\site-packages\numpy\core\include\numpy\npy_1_7_deprecated_api.h(14) : Warning Msg: Using deprecated NumPy API, disable it with #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
  pygrib\_pygrib.c(620): fatal error C1083: Cannot open include file: 'grib_api.h': No such file or directory
  error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.28.29910\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
  ----------------------------------------
  ERROR: Failed building wheel for pygrib
Failed to build pygrib
ERROR: Could not build wheels for pygrib which use PEP 517 and cannot be installed directly

WPLA 2.x\\ BUG - ERA5 Land model downloading limit

When downloading for the ERA5 Land repository, it will hit a 1000 item download limit.
Because a single factor for a full month already amounts to a 744 item count, this means that in no way it would be possible to download this dataset in a single file per month.

Intended fix:
The safest way to fix this is by downloading on a per factor basis. As every factor individually can't overstep the 1000 item count, this would work regardless of the number of factors downloaded, whereas downloading on a "per ... days" basis could theoretically overstep this limit when increasing the number of factors to support in the future.

As such the intended solution is:

  • Download months per factor.
  • Merge the factors together
  • Store the month

WPLA 2.x\\ SUPPORT REQUEST - Remove the _request_weather_factors() function

Removing the _request_weather_factors() function

The _request_weather_factors() function was added to each and every model during version 1.x production to help with the interpretation between the original and the harmonized factor names. The method of implementation however was far from optimal and it became this weirdly duplicated function that parsed factors for every model setting even though it would be used for only one of those models for every use.

It unfortunately never got fixed / cleaned up, due to every model being very different and a small mistake in dealing with this function easily resulting in incorrect responses for one or two models due to the lack of transparency of what was factually used where.

However

To properly leave version 2.x in a stabile and easily manageable state, part of the EOL cycle needs to remove this problematic function in favor of either a model-independent function or multiple model-specific function that have no relation to the other models anymore.

WPLA 2.x\\ BUG - Memory usage issues

Some users of (at least) the Gunicorn image encounter high memory usage when dealing with the AROME updating process and while requesting AROME data. This results in the update process requiring up to and at times over twelve gigabytes of memory, and the request parsing to even run out of memory even when only requesting a few megabytes.

Steps to reproduce the behavior:

  1. Use Gunicorn image based on version 2.47.031.
  2. Adjust lock-file to match that of the reporting party to duplicate exact container settings.
  3. Verify occurrence of memory usage problems

Expected behavior
When updating AROME data, the maximum memory usage should barely go above the size of the files being processed (400-500Mb for now) . The same goes for regular requests, which shouldn't generate any kind of issue under several gigabytes of data processing.

Additional context
This issue is a strange one, because even in the image we install via poetry with a .lock file in the folder. This should normally mean that there aren't that many possible differences between different installations of the image. This makes one wonder even more about the reason for such huge differences in memory usage.

WPLA 2.x\\ BUG - Arome responses too large to get to responses

When asking the API to gather data for Arome, even the smallest requests have become often too huge to process to properly return responses before some form of timeout happens.

This renders Arome requests to be next to impossible to use in practice.

WPLA Dev 3.1 \\ Job queue system development

The following features are included:

  • The system can parse one or multiple get_weather() requests from one or multiple WeatherModels / WeatherSources as one or multiple request-queues. This depending on the estimated required processing bandwidth (memory, download-speeds, storage space available, etc.).
  • The system organizes the requests into a sequential list. To prevent flooding the processing order is primarily based on the request order, but the system will interlace other request sources every other request as follows:
    Request-list A is split in three request-queues: A1, A2 and A3
    A1 > A2 > A3
    Request-list B of three request-queues is added to the list: B1, B2 and B3
    'A1 > B1 > A2 > A3 > B2 > B3 #The interlace skips the A2-A3 sequence for B'
    Request-list C, consisting of a single request-queue is added: C1
    'A1 > B1 > A2 > C1 > A3 > B2 > B3'
    Note: Had C consisted of two queues, this would have been: 'A1 > B1 > A2 > C1 > A3 > B2 > C2 > B3'

    In short: The system looks for two sequential items in the list from the same source. Every uneven match will be used to place a new item for the latest request.
  • The system has extended error handling for partial failures, issues with sequencing, unexpected issues with storage and invalid queue components.
  • All of the code has both proper sphinx-proof documentation and full unit test coverage.

WPLA Dev 3.0 \\ RESEARCH 01 - Investigate the use of the 'Bottleneck' package.

The [bottleneck] package contains a number of speedy numpy array functions. Some users of Xarray swear by using this package in certain circumstances to speed up the handling of NaN values in a Xarray Dataset. Supposedly this can even help accelerate putting these datasets on file, or loading them from a file.

The goal of this research is to find out how this would work, and if this would help with the parsing of repository files or large downloaded NetCDF4 (and possibly even GRIB) files.

Project Goal:
A benchmark has been set up to try and speed up the loading, altering and saving of a large NetCDF4 file (concatenated CDS ERA5 SL repository files?), and comparative results (to the regular transactions) are available.

WPLA Dev 3.0 \\ RESEARCH 02 - Find a reliable method of maintaining the list of KNMI weather stations

Weather stations come and go. Probably not that often, but when it does, ideally the KNMI models should pick up on that.

Project Goal:
Find a method of getting an up to date list of all of the weather stations. (ideally including locations and information on their availability for certain datasets) Contact with the KNMI is required for at least validation of the data, but preferred is a conversation from the start as they know the full extend of where to find information such as this.

WPLA Dev 3.0 \\ API skeleton development

The following features are included:

  • The full API skeleton functionality.
    This means that the API can already be run as intended, even if no Weather classes exist as of yet. The API also already has version validation, basic error handling, Prometheus monitoring and logging.
  • The basic project file layout as it is to be used for the final project.
    This means that most of the package and file folders are already added to the project at this point, even if most of them are still empty.
  • Version handling is taken care of by Poetry (for) now to allow for more version flexibility and easier upgrades.
  • All of the code has both proper sphinx-proof documentation and full unit test coverage.

CDS access bug caused by changes in their handling of embargoed data

When processing data, most of the publicly available data doesn't become available immediately.
For the ERA 5 Single Levels dataset, for instance, the data is embargoed for at least the first five days before becoming publicly available for download.

Up until two months ago there wasn't any problem with this, as the CDS API would just return whatever data requested was within scope. But since about the start of June, the system changed.

Now whenever a range of data results in an embargoed data range being requested, the following error is returned:
RuntimeError: the request you have submitted is not valid. Mars server task finished in error; UserError: Restricted access to ERA5T. Please, check that your date selection is valid. For more information, visit https://climate.copernicus.eu/climate-reanalysis [mars]; Error code is -2; Request failed; Some errors reported (last error -2).

This wouldn't be a problem if at any given time the embargoed data consisted of exactly 5 days or less, but depending on events that embargo may be lengthened to well beyond those 5 days, and there is no known method to properly request which data is protected, and which data isn't.

More information is available at the CDS Confluence User Interaction Forum:
https://confluence.ecmwf.int/pages/viewpage.action?pageId=277352608

Intended solution
For now, we will decide between excluding the most likely range for embargoed data, (5-10 days) and using a CDS API request 'hack' for the most recent two months.

After that a more final fix will be forthcoming as soon as CDS itself presents a solution for the situation.

WPLA Dev 3.0 \\ Repository system development

The following features are included:

  • Working repository base class that allows for all of the three types of repository: 'cache', 'history' and 'hybrid'.
    Error handling should be capable of handling the full scope of expected issues like storage size issues, simultaneous reading/writing issues and invalid data handling.
  • The base class should as generic as possible to help prevent any issues possibly caused by individual settings.
  • All of the code has both proper sphinx-proof documentation and full unit test coverage.

Decisions to be made:

  • Determine whether the system should rank cached data based on the time of the most recent request, or the amount of requests made in a certain time period first?

WPLA 2.x\\ BUG - Duplicate indexation issues with processing AROME files

Since a couple of months, there have been issues when processing new AROME files. Because of duplicates that somehow appear in the indexation, the AROME files can't be properly reconstructed from their original GRIB1 custom file data. This causes data to fail to be stored. A recent patch caused the remaining data to still be included, but as the amount of duplicates is far more than expected, very little data remains for the source files that do have these duplicates, while the source files without these duplicates will generate without any issue.

Steps to reproduce the behavior:

  1. Process a full set of AROME files locally (about 5-8 days)
  2. Compare the source files processed with duplicates to that of without duplicates to identify the cause of duplication.
    (likely the cause is, as often happens with other meteorological data, differences between raw and processed data)
  3. Adapt processing strategy to take into account duplicates based on known facts, allowing for either type of data to exist, and the definitive one to prevail, always.
  4. Verify workings

Expected behavior
AROME files should be downloaded and processed from GRIB1 custom files into stored NetCDF4 files lossless.
(not counting fields that have never had a descriptor or specification by KMNI in their documents)

Additional context
This really brings home once again, how much it matters that the source dataset properly informs users on what is in the dataset en how it can be recognized, at all times..

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.