nci / scores Goto Github PK
View Code? Open in Web Editor NEWMetrics for the verification, evaluation and optimisation of forecasts, predictions or models.
Home Page: https://scores.readthedocs.io/
License: Apache License 2.0
Metrics for the verification, evaluation and optimisation of forecasts, predictions or models.
Home Page: https://scores.readthedocs.io/
License: Apache License 2.0
Updating the procedure for how to do a scores release following some insights from v0.4 release
Is your feature request related to a problem? Please describe.
To ensure consistent style and code quality we need to add some static code checks. Requested feature is to add new github action to run static code analysis, including black
, pylint
, isort
, and mypy
.
Describe the solution you'd like
New GitHub actions to run above mentioned tools automatically on push
and pull-request
.
Happy to be assigned to this task
I did some testing of dim handling using mse
from main in a jupyter notebook.
Here is some code and problems I found.
fcst = xr.DataArray(
data=[[1., 2, 3], [2, 3, 4]],
dims=['stn', 'date'],
coords=dict(stn=[101, 102], date=['2022-01-01', '2022-01-02', '2022-01-03'])
)
obs = xr.DataArray(
data=[[0., 2, 4], [.5, 2.2, 3.5]],
dims=['source', 'date'],
coords=dict(source=['a', 'b'], date=['2022-01-01', '2022-01-02', '2022-01-03'])
)
mse(fcst, obs, reduce_dims=[])
returns a single value (i.e. reduces all dimensions). It would be preferable if it reduced no dimensions.mse(fcst, obs, preserve_dims=['source', 'date', 'stn'])
returns a single value (i.e. reduces all dimensions) rather than an array with all dimensions preserved. Same as mse(fcst, fcst, preserve_dims=fcst.dims)
mse
says, for 'preserve_dims', that "the forecast and observed dimensions must match precisely". This can be removed as it works perfectly fine if they don't match (using usual xarray broadcasting) and I don't think it is even desirable that fcst and obs have matching dimensions.The intention is to use readthedocs for documentation hosting, including auto generated API documentation.
Lift code coverage from 99 to 100%
Re-introduce a 100% coverage requirement
The initial implementation of weightings works for xarray data but doesn't consider other data types.
The README instructions are slightly incorrect with regards to the sample_data API
The configuration settings in pylintrc should be moved into pyproject.toml in order to adopt a consistent approach and also reduce the number of configuration files at the top level
"scores" is undergoing initial setup and maintenance. Information on this process should be placed in the README.
The docstring in https://github.com/nci/scores/blob/develop/src/scores/probability/functions.py#L19 needs to be clearer to ensure that the user understands that the rounding_precision
arg is different to specifying how many places one wants to round to.
GitHub actions improvment
I plan to fix this myself, jsut adding this as an issue so it doesn't get forgotten:
pre-commit
command in github actions should be changed to pre-commit run -a
so as to scan all files. However many mypy
and lint issues need to be solved first, and there are currently outstanding PRs to fix some of these issues. I will make this change once those PRs are merged.
This will make things like the Flip-Flop index #54 easier to migrate into scores
In various parts of the code, it still says the weighting hasn't been implemented when it actually has https://github.com/nci/scores/blob/develop/src/scores/continuous.py#L32
We need to go through and update these docstrings.
The weightings keyword argument was added as standard to function signatures but not implemented. This is intended to allow a weightings array to be passed through, representing things like area averaging, population density weighting or another kind of importance or significance weighting. Requirements need to first be developed more clearly and then the functionality implemented.
Hi folks, not sure if this is a bug but i thought i'd drop it in here anyway...
I am trying to do simple MAE between two pandas Series data using scores.continuous.mae
, which the docstrings tells me is kosher, and trying to preserve dimensions. When i do this i get an AttributeError: 'Series' object has no attribute 'dims'
trace:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_2250677/1926751918.py in ?()
----> 1 scores.continuous.mae(loaded_site_forecast.data.siteforecast_air_temperature, loaded_site_forecast.data.observations_temperature_at_screen_level, preserve_dims='all')
~/mambaforge/envs/site/lib/python3.11/site-packages/scores/continuous.py in ?(fcst, obs, reduce_dims, preserve_dims, weights)
141 ae = abs(error)
142 ae = scores.functions.apply_weights(ae, weights)
143
144 if preserve_dims is not None or reduce_dims is not None:
--> 145 reduce_dims = scores.utils.gather_dimensions(fcst.dims, obs.dims, reduce_dims, preserve_dims)
146
147 if reduce_dims is not None:
148 _ae = ae.mean(dim=reduce_dims)
~/mambaforge/envs/site/lib/python3.11/site-packages/pandas/core/generic.py in ?(self, name)
6198 and name not in self._accessors
6199 and self._info_axis._can_hold_identifiers_and_holds_name(name)
6200 ):
6201 return self[name]
-> 6202 return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'dims'
I expect that this is purely that no-one has ever run a pd.Series through this pathway before and so the .dim
hasn't been an issue...
Tasks
mypy
to pre-commit configmypy
runs without issues.Add root mean squared error to continuous
When preserve_dims="all"
, gather_dimensions
returns an empty set.
In a scoring function, the dimensions to reduce is an empty set (as we don't want to reduce any dimensions), however dataarray.mean() reduces everything which is the opposite to what we want.
Currently we have scores.stats.tests
for statistical tests. This may be confusing since it's not a directory for pytests and it is worth considering other options.
The intention is to prepare a paper for submission to the Journal of Open Source Software (JOSS), see https://joss.readthedocs.io/en/latest/submitting.html.
This issue is to track the development of 'paper.md', the markdown behind the paper which is submitted to JOSS.
Add nbmake stage for notebook e2e tests to ensure notebooks don’t break during PRs.
Add type hints for all functions (not just the public functions)
Details here:
https://docs.pytest.org/en/7.1.x/explanation/pythonpath.html#test-modules-conftest-py-files-inside-packages
will also cut down the unnecessary directories, relative imports and relative paths in the suite.
We need to migrate the Modified Diebold-Mariano test statistic across.
Add an RMSE score, as described here
This will follow the design pattern evident in scores.continuous.mse
utils_test_data.py needs cleaning up. It contains some code that is commented out as well as some test objects that I don't think are used e.g., utils_test_data.DA_1
Implement isotonic regression.
This is relevant to both continuous and probabilistic forecasts of binary outcomes.
Dimensions handling plain language requirements:
• We accept xarray broadcasting rules
• Sets, lists and tuples are all acceptable here
• Strings supplied to preserve_dims or reduces_dims convert to a string in a list
• If 'all' is specified, and there is a dimension in the data called 'all', do what would normally be done but raise a warning and explain how to control the behaviour by putting 'all' inside a list instead
• If reduce_dims and preserve_dims are both non-null, raise an exception
• (default) If reduce_dims and preserve_dims are both null, reduce all dimensions
• If reduce_dims is [], then reduce nothing
• If reduce_dims is 'all', then reduce everything
• If preserve_dims = [], then reduce everything
• If preserve_dims = 'all', then reduce nothing
• If a dimension is present in reduce_dims or preserve_dims but this dimension does not actually appear in the data, throw an exception
• If reduce_dims = [“a”, “b”], reduce dimensions “a” and “b” (after broadcasting if applicable)
If preserve_dims = [“a”, “b”], reduce all dimensions that are not “a” and “b” (after broadcasting if applicable). Need to test the special case where all dimensions within fcst/obs are specified in the list to avoid the bug in point 2 of #18
multicategorical_impl.py:109: error: Argument 5 to "_single_category_score" has incompatible type "float | None"; expected "float" [arg-type]
multicategorical_impl.py:117: error: Item "int" of "Dataset | Literal[0]" has no attribute "mean" [union-attr]
multicategorical_impl.py:210: error: Incompatible types in assignment (expression has type "int", variable has type "ndarray[Any, dtype[Any]]") [assignment]
multicategorical_impl.py:211: error: Incompatible types in assignment (expression has type "int", variable has type "ndarray[Any, dtype[Any]]") [assignment]
Found 4 errors in 1 file (checked 1 source file)
Coverage is currently around 99%. This addresses a few uncovered lines.
GRIB data is compatible with xarray/scores and is still widely used in the Atmospheric Science.
We should mention somewhere (perhaps the ReadMe) how to do it.
e.g.
use engine='cfgrib'
when opening grib file with xarray.
Requires cfgrib
https://github.com/ecmwf/cfgrib
Currently utils.gather_dimensions
doesn't handle the case when both reduce_dims
and preserve_dims
are both None.
In this case it should return set(all_dims)
Add the scoring function in Taggart, R., Loveday, N. and Griffiths, D., 2022. A scoring framework for tiered warnings and multicategorical forecasts based on fixed risk measures. Quarterly Journal of the Royal Meteorological Society, 148(744), pp.1389-1406.
Also include concave ROC curve calculation once isotonic regression has been completed.
Add type hints to all public function functions (nonprivate functions will be done in a separate issue)
Currently the Diebold Mariano test statistic brings data into memory due to the autocovariance calculation.
We should see if we can wrap this function so that it that it works nicely with dask using something like https://docs.xarray.dev/en/stable/user-guide/dask.html#automatic-parallelization-with-apply-ufunc-and-map-blocks
At the moment you can't choose if you have left or right endpoints for the decision thresholds in the FIRM score. This would be good to add.
It would be nice if contributors had a checklist of things to do/check when creating a pull request for a new metric that aligns with the contributor guide https://docs.github.com/en/communities/using-templates-to-encourage-useful-issues-and-pull-requests/creating-a-pull-request-template-for-your-repository
Ideally it would be nice to have multiple templates to cover pull requests that aren't adding a new score.
murphy_impl.py:93: error: Incompatible types in assignment (expression has type "str", variable has type "Literal['quantile', 'huber', 'expectile']") [assignment]
The readme is missing a couple of metrics in the table and should be updated.
We need some tests that test that when forecast and observation data is a chunked xarray object, that the metric can be called and the data isn't brought into memory before .compute
is called.
These should be applied to each metric.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.