dhi / modelskill Goto Github PK

View Code? Open in Web Editor NEW

28.0 6.0 8.0 144.33 MB

Compare results from MIKE and other simulations with measurements

Home Page: https://dhi.github.io/modelskill

License: MIT License

Python 100.00% Makefile 0.01%

skill model observations measurements mike validation comparer python

modelskill's Introduction

ModelSkill: Flexible Model skill evaluation.

ModelSkill is a python package for scoring MIKE models (other models can be evaluated as well).

Contribute with new ideas in the discussion, report an issue or browse the documentation. Access observational data (e.g. altimetry data) from the sister library WatObs.

Use cases

ModelSkill would like to be your companion during the different phases of a MIKE modelling workflow.

Model setup - exploratory phase
Model calibration
Model validation and reporting - communicate your final results

Installation

From pypi:

> pip install modelskill

Or the development version:

> pip install https://github.com/DHI/modelskill/archive/main.zip

Example notebooks

Workflow

Define ModelResults
Define Observations
Match Observations and ModelResults
Do plotting, statistics, reporting using the Comparer

Read more about the workflow in the getting started guide.

Example of use

Start by defining model results and observations:

>>> import modelskill as ms
>>> mr = ms.DfsuModelResult("HKZN_local_2017_DutchCoast.dfsu", name="HKZN_local", item=0)
>>> HKNA = ms.PointObservation("HKNA_Hm0.dfs0", item=0, x=4.2420, y=52.6887, name="HKNA")
>>> EPL = ms.PointObservation("eur_Hm0.dfs0", item=0, x=3.2760, y=51.9990, name="EPL")
>>> c2 = ms.TrackObservation("Alti_c2_Dutch.dfs0", item=3, name="c2")

Then, connect observations and model results, and extract data at observation points:

>>> cc = ms.match([HKNA, EPL, c2], mr)

With the comparer object, cc, all sorts of skill assessments and plots can be made:

>>> cc.skill().round(2)
               n  bias  rmse  urmse   mae    cc    si    r2
observation                                                
HKNA         385 -0.20  0.35   0.29  0.25  0.97  0.09  0.99
EPL           66 -0.08  0.22   0.20  0.18  0.97  0.07  0.99
c2           113 -0.00  0.35   0.35  0.29  0.97  0.12  0.99

Overview of observation locations

ms.plotting.spatial_overview([HKNA, EPL, c2], mr, figsize=(7,7))

Scatter plot

cc.plot.scatter()

Timeseries plot

Timeseries plots can either be static and report-friendly (matplotlib) or interactive with zoom functionality (plotly).

cc["HKNA"].plot.timeseries(width=1000, backend="plotly")

modelskill's People

Contributors

Stargazers

Watchers

Forkers

tomitex jaadhi daniel-caichac-dhi mohm-dhi cmitr pauldanielml ducvi newhuacn123

modelskill's Issues

`scatter` bins should allow sequence of numbers

The current implementation of the scatter plot allows bins to be an integer, i.e. the number of bins or a float, i.e. the bin size, but not as a sequence of numbers.

This is necessary for fine-grained control and consistency across plots.

compare now raises error if no data

Makes one of the notebooks fail ... and me think if it is better to just warn and return an empty Comparer (or None) for workflows where you go through a lot of measurements:

Speed up test execution

Is is possible to speed up some of these slow tests, or mark them as slow and only run them in CI?

$ pytest --durations=10
===================================================================================
3.30s call     tests/test_metrics.py::test_pr
1.47s call     tests/test_multivariable_compare.py::test_mv_mm_taylor
1.41s call     tests/test_multimodelcompare.py::test_mm_taylor
1.16s setup    tests/test_multivariable_compare.py::test_mv_mm_taylor
1.15s call     tests/test_multimodelcompare.py::test_mm_scatter
0.87s call     tests/test_combine_comparers.py::test_concat_model_different_time
0.80s call     tests/test_comparer.py::test_minimal_plots
0.79s call     tests/test_multimodelcompare.py::test_custom_metric_skilltable_mm_scatter
0.73s call     tests/regression/test_regression_rose.py::test_wind_rose_image_identical
0.60s call     tests/test_multivariable_compare.py::test_mv_mm_scatter

no mesh info when having model result as .dfs0

In case of loading model results as .dfs0 for track data comparison I am missing info on model domain to retrieve projection as suggested by @ecomodeller here: #27 (comment)

Domain info would also be required for improved spatial plotting.

Should we allow something like ModelResult(fn='my.dfs0', geometry_fn='my_corresponding.dfsu')?

Or what am I missing? Can this already be achieved by adding a second model with .dfsu?

Improve extract() extract all points from dfsu at once

instead of reading the same file n times to extract n point observations

Strange warning issued by scatter plot if show_hist is True

Something to do with the new "norm" keyword?

Known issue (maybe not too relevant?) but when using show_hist=True and backend='matplotlib' the Histogram Bins do not match the Axes grid.
One option is to manually give the bins as the axes, or to adjust the axes to match the bins, but it is not yet resolved.
When using backend='plotly' the bins and the grid do match.

Missing deprecation warnings from several methods in ComparerCollection

Related to #217

ComparerCollection methods mean_skill, mean_skill_points, score, taylor, spatial_skill, hist are missing the deprecation warning that was recently added to many of the methods in Comparer

Validate options default metrics

The user can select default metrics by assigning a list of strings to fmskill.options.metrics.list. Validation however only occurs upon use (e.g. with cmp.skill() ) and not when the user defines the metrics. This is confusing. It is better to fail fast.

Spatial Skill Error

When trying to follow the example of spatial skill notebook I get an error when adding **kwargs
e.g.:
ss.plot("rmse", model='SW_1', cmap='YlOrRd')
plots the figure with the corresponding cmap but then complains
AttributeError: 'Rectangle' object has no property 'cmap'
same when using
ss.plot("bias", model='SW_1', levels=[-1,0,1])
AttributeError: 'Rectangle' object has no property 'levels'

Support pathlib.Path in addition to str as filenames of models and observations

.sel() returning comparer instead of .sel_df()?

How about a .sel() method that returns a copy of the original comparer with reduced data instead of:
https://github.com/DHI/fmskill/blob/aa1e47d7d0d4d8887faca763270a360d9c4d7da7/fmskill/compare.py#L354

That would allow calls like cc.sel().skill(), cc.sel().scatter() or in the future cc.sel().spatial_skill() without copying all the selection arguments to each method that allows selection.

The dataframe could be still returned via cc.sel().all_df(). Or maybe it should then become cc.sel().to_dataframe().

Further methods can easily be added, as long as they all return a comparer. Example: cc.sel().add_domains().skill(), where add_domains() could be a method to assign data points to sub-domains (#12).

Skill table does not accept arguments anymore

Forget it

support dataframes with localized timezones

Seems like it is not possible to create a comparer with dataframes with localized timezone.

Exception has occurred: TypeError
'.dt' accessor only available for DataArray with datetime64 timedelta64 dtype or for arrays containing cftime datetime objects.

Skill table incomplete when using `savefig`

ax = cmp.plot.scatter(skill_table=True)
fig = ax.get_figure()
fig.savefig('../docs/images/scatter_plot.png')

🤔

extract() fails for some observation and model data overlaps

I have tested missing time overlap for PointObservation, see #45

The same might be the case for no overlap in space?

Comparer skill() method takes non-existing arguments without throwing error

The same is probably true with all other methods which use _get_deprecated_args(kwargs) 😱

add a way to compare several ModelResults

Model Results failed to initialize

Discussed in #130

^{Originally posted by conghung June 10, 2022}
When I run to initialize the ModelResult with param is file path then throwed the error like below:

I just use the unit test file provided from source code.

kwords in metrics

Hi,

Very specific request, bu I need to add **kwargs to some metrics, as in

cmp=modelskill.compare(mod=mod,obs=obs)
cmp.skill( metrics=['bias','pr'], AAP=7)
which would change the number annual of peaks in the peak ratio from 2 to 7, but I can't find a way to add these kwargs :(
Also there are already some weight kwarg in some metrics , eg, URMSE
urmse(obs: np.ndarray, model: np.ndarray, weights: Optional[np.ndarray] = None)
I am not sure if I doing this cmp.skill( metrics=['bias','urmse'], weights=0.9) will do anything

scatter plot should have same xlim and ylim

ComparerCollection plot kde - title

WIth many observations the title becomes very long and is not useful.

For a single comparer it works better, but the title still can not be empty

handle circular variables

like wave direction

make_unique_index : dt

Hi,
I have been having some trouble when using a track observation with a Scatterometer data instead of Altimetry data against a model (hence in every timestep I have a bunch of measurements instead of a single point; a good analogy would be a single-beam vs multi-beam bathy survey, or a handgun vs a shotgun). I have worked-around this by increasing the dt in the make_unique_index function, by default 0.01
https://github.com/DHI/fmskill/blob/8d858a15004bd95466a51968d7baf27088db196f/fmskill/utils.py#L16
if I change this to a larger value say 0.153 s then my scripts work. My current problem is doing this outside fmskill because the dt argument is hard coded:
https://github.com/DHI/fmskill/blob/8d858a15004bd95466a51968d7baf27088db196f/fmskill/observation.py#L441
Could the dt be an explicit argument in the connector element or in the track observation ?? or maybe it already is and I am not seeing it?
Thanks

Mikeio Spatial gone?

Hi,

I updated both mikeio to 1.6.1 and fmskill to modelskill 1.0 (alpha), had to do some very small refactoring on my repo to make my tests run, and they DO in my laptop, but then they crash when trying to submit pull requests as the pipelines fail due to spatial.FM_geometry module gone or something like that.
This is the error, and basically now I cannot do import modelskill anymore on the cloud

I literally just changed import fmskill to import modelskill and it works on my laptop, with mikeio 1.6.1 and fmskill latest cloned repo.

Any suggestions?

In the installed dependencies in the pipeline I have
mikeio-1.6.1
modelskill-1.0a0
so it should be ok, but it is not, for whatever reason

Problem with bin size in spatial_skill

Hi,

I think that there is some problem with bin selection in the spatial plots when the bins are not fully defined by the user
as (min,max,delta).
To exemplify this. I got track observations between Lat=[+31.4 , +65.62] deg

then if I define my bin spacing as 0.25deg and make a spatial skill starting from +32deg to +66deg I get data as expected between 32 and 66deg

but if instead I just go with binsize=0.25 and let fmskill define the min and max Latitude I get

ie, I see that my data now starts a roughly Latitude=+36deg even though the bin size is still 0.25deg.
I am somehow losing 4deg of data (~400 km of satellite data).
Not ideal. Most likely a bug.

cc: @Hendrik1987

colors when plotting

Hi,
when plotting time series (eg, c.plot_timeseries() ) I cannot choose the colormap.
Right now the workaround is to set the colormap by hand (discrete colors), eg, c._mod_colors[0]='blue' , ._mod_colors[1]='red'
and so on.
It would be nice to have something like

c.plot_timeseries(cmap='tab10')

Thanks

_wind_rose.py wave direction issue

In the source code for _wind_rose.py, the sectors for the wave direction don't include wave directions from 0 to the start of the half_dir_step.

    dir_step = 360 // n_sectors
    half_dir_step = dir_step / 2

    n_dir_labels = n_sectors if n_dir_labels is None else n_dir_labels

    thetai = np.linspace(
        start=half_dir_step,
        stop=360 + half_dir_step,
        num=int(((360 + half_dir_step) - half_dir_step) / dir_step + 1),
    )
    thetac = thetai[:-1] + half_dir_step

In the source code line 125, the half_dir_step is added to 360 and therefore the bin includes wave directions which are greater than 360 and the directions from 0 to the start are not included.

I made a work around in my implementation by replacing wave directions from 0 to half_dir_step with 360+ wave direction (so the values are greater than 360 degrees).

SI calculation error

Hi all,
I think that here might be an error in the calculation of the scatter index. I have used another script and I got different results, so I decided to check and I think that the mean value is not being taken into account somewhere in the equation. I found that SI can also be interpreted as

with Xmean = Mean of observations
I reviewed a few comparer objects and the other script checks out but I don't get it in fmskill. I am computing SI with this formula

but the one in FMSkill is not taking into account 1/N

Spatial reference of measurements

Create aggregated scatter plots in addition to plots for each observation

API documentation missing

ComparerPlotter and several other modules are missing in the api documentation

Taylor diagram throws warning for no reason

Notice that I am not using these arguments...

It should be possible to create an observation from mikeio.Dataset/DataArray

Like you can with ModelResult

Erroneous track extraction of dfsu from MIKE if input observations are not sorted over time

My observations were altimeter track data that were not sorted over time (I had put together spatial "chunks" of data).
The track extractions on model data from ERA5 (.nc), CMEMS (.nc) and WW3 (.grib) worked fine but when extracting from MIKE (.dfsu) it gave wrong wave parameters e.g. negative values.
Sorting the altimeter track data over time fixed the issue.

add weighting between measurements

Scatter plot with density cross-diagonal artifact

There seems to be an issue with the 2d density plot used in the scatter plot

from modelskill.plotting import scatter
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
X = np.random.multivariate_normal([0, 0], [[1, 0.98], [0.98, 1]], 20000)
x = X[:, 0]
y = X[:, 1]
fig, ax = plt.subplots(1, 2, figsize=(14, 6))
scatter(x,y,ax=ax[0])
scatter(x, y, show_density=False, show_hist=True, show_points=False, bins=100, ax=ax[1]);

These bands seems like an artifact.

NSE should be normalized by observation, not model

Documentation is correct, implementation is wrong.

https://github.com/DHI/fmskill/blob/3967007b13b68ebc3203fb47e87fa88ee0a0e57d/fmskill/metrics.py#L166-L180

Wrong warning when comparing track data and model and some points are outside domain

o3 below is altimetry data partly covering the model domain

The warning should only be issued if no points are covering the domain

Observations not showing in plot_timeseries()

In the first observation in the SW notebook https://github.com/DHI/mikefm-skill/blob/main/notebooks/SW_DutchCoast.ipynb

identify items only through EUMType

Currently a parameter is identified via itemInfo, as either taken from .dfs0
https://github.com/DHI/fmskill/blob/aa1e47d7d0d4d8887faca763270a360d9c4d7da7/fmskill/observation.py#L121
or added by the user manually if observation is read from df.

Is itemInfo sufficient for all our needs? For example EUMType does not allow to distinguish different types of wave periods (Tp, T02), or does it?

I suggest additional flexibility by adding an attrs dict to observations and model results, similar to xarray's attrs http://xarray.pydata.org/en/stable/generated/xarray.DataArray.attrs.html

Also relevant: xarray's logic for plot labels http://xarray.pydata.org/en/stable/plotting.html#simple-example

Z argument lost

Hi, short problem.

With the previous version I was able to do (after the extraction or after cc=con.extract())
obs=cc.observation
and that would return the point observation. My point observation has x y and z, and I can still do that.
However now I cannot seem to be able to access the observation object from the extracted compare, and the issue is that the z value is now lost.
Can we somehow pass the z value to the extraction, or, can I retrieve again the the observation from the compare as it was with cc.observation ?

`ax` argument missing in several plot methods

It should be possible to pass an existing matplotlib ax as argument to all plot method similar to pandas https://pandas.pydata.org/docs/user_guide/visualization.html

It is already present in some methods, e.g.
https://github.com/DHI/modelskill/blob/343487f6af27c2e485d582c6a22c9c1c680a3719/modelskill/comparison/_comparer_plotter.py#L162C3-L162C3

Math in documentation

Hi,
I was trying to see if the formula I wrote for the Peak Ratio looks good in the Math style, yet when I click in documentation I see this:

minor bug spatial_skill default binning

https://github.com/DHI/fmskill/blob/686b146b776ec948dd44398ee736ca8df2aaa1a5/fmskill/compare.py#L611
Better: np.mean([np.min(x),np.max(x)])?

Custom metrics in scatter plot (skill table) error

Hi,

I used to be able to have custom metrics into the skill table before (long time ago, see example figure with custom metric EV)

Now I can still use the custom metric with the .skill() function, see eg. metrics ev and pr

But when trying to plot the scatter plot with the skill table, custom metrics are not allowed anymore, giving me an error

I think the error now prompts because only metrics defined inside metrics.py are allowed
https://github.com/DHI/modelskill/blob/ced02fd248b021de579b13765de8cee51c30cdfb/modelskill/metrics.py#L601C1-L607C2
and before this was not a requirement.

> c3 = c1.add_comparer(c2)

The first comparer c1 is changed. The second is not. You can therefore not do the same operation twice:

> c3 = c1.add_comparer(c2)
> c4 = c1.add_comparer(c2)    # will fail!