Giter VIP home page Giter VIP logo

doi-bor / pyforecast Goto Github PK

View Code? Open in Web Editor NEW
28.0 28.0 12.0 44.14 MB

PyForecast is a statistical modeling tool used by Reclamation water managers and reservoir operators to train and build predictive models for seasonal inflows and streamflows. PyForecast allows users to make current water-year forecasts using models developed with the program.

License: Other

Python 94.07% JavaScript 4.23% CSS 0.19% HTML 0.92% Batchfile 0.34% Inno Setup 0.25%
forecasting hydrology machine-learning python statistical-models

pyforecast's People

Contributors

dependabot[bot] avatar dloney avatar jslanini avatar kevinfol avatar tjrocha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyforecast's Issues

Brute Force Feature Selection?

Just spit-balling here... Would it be useful to have a Brute Force Feature Selection method incorporated into the software? For some relatively simple models (#Predictors = 15), we can just brute force evaluate all the possible combinations of predictors (#Combinations=32,767) and have the program report the top performing models. Benefits to this become more apparent if there are fewer predictors since the brute force number of equations are of the form (2^n)-1 with n=#Predictors.

On a related note, I noticed that the existing Feature Selection algorithm evaluates the same model multiple times especially if it is not selected/stored in the list of viable regression models. There might be some performance gains to maybe storing in memory just the salient metrics (Predictor IDs & Selected Metric) for every model run and referring to this in-memory object so the algorithm doesn't evaluate the same model multiple times.

Data tab improvements

  1. Default map layer to terrain
  2. default NOAA sites unselected-the stations are overwhelming and low priority for inclusion in a forecast
  3. Data set table: don't display PyID. Order should be:
    Name Parameter Type ID. URL can be hyperlinked to name.
  4. Add PDSI and perhaps climate set selection to HUC when clicked.
  5. remove boxes in lower right for NRCC, PDSI, PRISM, etc.

Saving Bug

If the software crashes while it is trying to save to an existing *.fcst file, the file is emptied out and all the work contained within it is lost. Propose saving to a *.temp.fcst file first before doing a rename/overwrite of the existing *.fcst file.

Hard to reproduce, the software needs to crash while doing a save.

Linearity assumption-transformations and residuals

One of the fundamental assumptions of linear regression is... linearity. We need to provide the ability for users to test that the models meet this assumption. A frequent approach is to plot predicted values (x-axis) vs residuals (y-axis). The residuals will be random if the assumption of linearity is true.

Streamflow data are frequently not normally distributed. A common distribution (and transformation in forecasting) is lognormal. I suggest we implement transformations, starting with lognormal. NRCS also uses square and cubic transformations, but I'm not sure as to the reasoning behind these transformations.

Delete pn-development branch

Since we're using compiled releases now, I don't see any harm in reducing the number of branches to just the master branch. I just merged the pn-development branch into master BTW.

Metric Units - Please add this functionality

Currently parameter units are limited to imperial units. Would it be possible to develop the capacity to use metric units for input parameters and datasets? Your northern neighbours would be very grateful to see this added functionality. Let me know if we can assist in any way :)

Incorporate seasonal to daily disaggregation scheme

@danbroman is creating a temporal disaggregation scheme using a knn approach for risk-based S&T project. When complete, this should be incorporated to allow users to directly generate daily timestep forecasts suitable for reservoir operations modeling (i.e., RiverWare ops models.)

Forecast Options Tab - PN Refinements

  • Allow users to drag-and-drop top-level predictor names from the All Available Predictors tree into the PredictorPool container under the adjacent Equation Pools tree. Code should filter through the subsetted predictors and only add predictors that are in the past relative to the equation. (Ex: dropping SNOTEL site X into the January-01st equation should only add subsetted predictors that come before January and/or aggregated from Oct-01 to Dec-31)
  • Catch error when Apply Options is pressed when the dataset dictionary is empty

Expert Data Import Mode

Allow the Import feature on the Data tab to import data arrays instead of a single series at a time.

PN needs to import custom data arrays that do not currently fit the current data acquisition scheme in PyForecast (monthly data, custom indices, etc). A parallel development effort for an Excel (sigh, i know...) data pre-processor that will generate daily data arrays for PyForecast is under development. Need PyForecast to accept the output array from this tool.

I'm thinking we define the inputs that the current Import feature needs (Dataset Name, Parameter Name, Units, and Resampling) as headers in the input data array. and have the code loop through the columns and add each one of the entries in the array.

Propose improving "Summary" tab

Summary tab right now is out of order from the workflow and can be improved.

  • Move tab after "Regression Tab."

  • Forecast Equations should automatically calculate a forecast when added to summary tab.

  • Physical units should be added to plots.

  • Can we make the Forecast ID editable so forecasters can use the default or make their own meaningful forecast name?

  • Rounding of coefficients and forecasted values.

Propose refactoring of the GUI code

Should we try to refactor the GUI code in PyForecast_GUI.py? We could split up the TABs to have their own code file by GUI elements (summaryTabGui.py, dataTabGui.py, etc). This will allow us to also split up the application.py code file based on which GUI element they support (summaryTabCode.py, dataTabCode.py, etc.).

Doing this should allow us to more easily trace which functions/methods support which GUI operation and perhaps allow us to identify places where we can share/optimize functions/methods.

Thoughts @kevinfol

Correlation matrix crashes

Data Tab->Data Analysis->Data Analysis Options->Show Correlation Matrix crashes. Email me if you need a sample fcst file to crash it with.

Forced Predictors defined at the Equation Pool level

@kevinfol - Looking for feedback on how you and/or your users feel about PN handling forced predictors. We can either Option-1)make another ForcedPredictor list at the EquationPool-level to store the PredictorIDs so users can make a distinction for which predictors are 'forced' into the algorithm or Option-2)we can make this a little more hidden by just storing a parallel array under the existing PredictorPool list to track which PredictorIDs should be forced at feature-selection time.

Screenshot below shows what I'm thinking about. Option-1 will have a separate list of ForcedPredictors on the GUI or Option-2 will allow users to right-click on predictors and set them to forced.

image

Consecutive regression runs crashes software

Fix issues when running consecutive regression runs. 1st regression run works fine but consecutive regression runs with the exact same settings after the first runs slower and eventually crashes the program. Might be an issue with how arrays/objects are being initialized during each regression run.

Issue can be reproduced by running any regression (MLR, PCA, ZSCORE) consecutively with any feature selection algorithm (SFFS, SFBS, BruteForce) one right after the other. Issue is most evident with the BruteForce selection algorithm.

License Decision

Decide on adopting a license for the software. I'm assuming that this software was initially developed with government funds under government time so some kind of Open Source license might be most appropriate. Thoughts @kevinfol ?

https://opensource.org/licenses

Implement LASSO Regression

LASSO regression is a useful technique for building sparse models and aiding in variable selection. Suggest we implement it as an alternative to the existing methods.

Develop unit tests for core functionality

Units tests for the following (at a minimum) should be developed:

  • read/write of *.fcst files
  • web requests for data (USGS, NRCS, GP Hydromet, PN Hydromet)
  • regression outputs (MLR, PCA) - needs development of a good testing dataset
  • validation of JSON files for the mapping interface

Model skill metrics

I suggest replacing Nash Sutcliffe Error with Mallow's Cp as a metric of skill. Nash Sutcliffe should be identical to r-squared in the case of linear regression. Also, all of our metrics are based on squared error. This punishes models with outliers and skews model selection to those that fit really big years.

Also, would it be useful to include mean absolute error, which is similar to RMSE, but weights errors equally. If a forecaster is not worried about the outliers or high water years, it might be nice to select based on non-squared error. For example, we are in a drought year and snowpack is at record lows. We would want to avoid weighting models that fit for the really high water year.

Develop distribution pipeline

Develop distribution mechanism for compiling required code and dependencies (cx_Freeze), and for building an installer (InnoSetup).

Data tab improvements

  1. POR selection-years box should be aligned directly below years check box, rather than with "POR".
  2. Should Fill NaN's be default? or do we want users to make a conscious decision to fill missing data.
  3. Does update data do anything different than import data? if not, remove.
  4. Import dataset-why is import csv/xcel in a different location than import web data? All data importation should be in the same location.

NRCS SOAP Service Error

Instantiating a connection to the NRCS service via NRCS = Client('https://www.wcc.nrcs.usda.gov/awdbWebService/services?WSDL') now fails. Error shown below. E-mailed NRCS to inquire about the error. This impacts both existing PyForecast installations and NextFlow -- something probably changed on the NRCS side...

Files:
..\PyForecast\Resources\DataLoaders\Default\NRCS_WCC.py
..\NextFlow\resources\DataLoaders\NRCS_WCC.py

Error:
HTTPSConnectionPool(host='wcc.sc.egov.usda.gov', port=443): Max retries exceeded with url: /awdbWebService/services?WSDL (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x3681E0D0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.