wswup / gridwxcomp Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 7.0 76.11 MB

Comparison of weather station and gridded climate datasets

Home Page: https://gridwxcomp.readthedocs.io/en/latest/

License: Apache License 2.0

Python 96.53% TeX 3.47%

gridwxcomp's People

Contributors

Stargazers

Watchers

Forkers

maschull thomasott314 markwbrown dongyi1996 emilymarschallniswonger shafeequeigsnrr montimaj

gridwxcomp's Issues

2D interpolation routines

Currently 2D interpolation uses the radial basis function method of scipy.interpolate.Rbf which offers several out of the box options:

'multiquadric': sqrt((r/self.epsilon)**2 + 1)
'inverse': 1.0/sqrt((r/self.epsilon)**2 + 1)
'gaussian': exp(-(r/self.epsilon)**2)
'linear': r
'cubic': r**3
'quintic': r**5
'thin_plate': r**2 * log(r)

And also allows for user defined functions to be passed.

The problem is that most of these methods can result in gradients being overestimated and large local extrema. I have been experimenting with its smoothing parameter which is not well documented. It seems that the "linear" option with smoothing=-1e-3 is the best combination so far testing on the UC and example data. There may be a way to scale the smoothing and epsilon parameters of RBF methods adaptively to deal with different datasets, i.e. sparse or dense.

For now I am going to change the default to Rbf "linear", add the smoothing parameter option with default -1e-3. Results are reasonable.

inverse distance weighting

Apparently the "linear" Rbf behaves somewhat similar to inverse distance weighting but can be quite a bit different. Unfortunately there is no IDW in Python as far as I have found but I have found example codes I am trying to test and modify for our use, see: here and here.

Also I gdal has a command line tool. On the surface (haven't tested it out yet) it has some major advantages: it interpolates and makes the raster in one step with parallelization options, it has several other useful methods like IDW K nearest neighbors. Although it would require a couple preliminary steps like writing a CSV file and a VRT meta file but they look straightforward.

other options and comparisons

I have also tested scipy.interpolate.griddata and it is fairly straightforward however it has a few disadvantages:

only interpolates within inner bounding from points (convex hull)
only has 3 methods: nearest, linear, and cubic
see:
https://stackoverflow.com/questions/50816375/scipy-griddata-with-linear-and-cubic-yields-nan

This post made me initially lean towards RBFs:
https://stackoverflow.com/questions/37872171/how-can-i-perform-two-dimensional-interpolation-using-scipy?rq=1

pip install gridwxcomp error: Could not find module 'geos_c.dll'

addition of pair_kpa column creates conflict with appending to older files

Adding pair_kpa to newer downloads creates an issue with the .dropna() on line 346. Results in file with only updated data (no original data).

Consider adding option to recognize old vs new files or other higher level fix.

Consider adding a try/except/retry option to the opendap gridmet download tool

The error below occurs occasionally when downloading from the northwest knowledge site using the opendap download tool. Consider adding a retry loop to avoid random stops when downloading multiple locations.

Can't create the gridwxcomp environment

I'm trying to install the conda environment for gridwxcomp from the command line.

I've run the code <conda env create -f environment.yml> from the command line in my windows 10 pc. The return is <SpecNotFound: Can't process without a name>.

Thanks for your help.

add wind direction as optional output

Can we add gridMET wind direction as an optional output in the download_girdmet_opendap.py tool?
Default output files should not include wind direction.

Here is the variable name information:

ReadTheDocs issues with sphinx-click, module mocking, and build requirements

Recently I added a reference API for the gridwxcomp command line linterface which uses the Click module. I used the sphinx-click extension to help document it and it worked great, e.g. see here. However, building on ReadTheDocs resulted in many issues and, at least for now, I decided to move the docs to github-pages. This will reduce adding many unwanted commits to master due to debugging issues on ReadTheDocs which are hard to test without committing to GitHub, all issues with docs can be tested and corrected before committing when building the docs manually using sphinx-build.

The issue: sphinx-click is not part of the standard sphinx extensions and needs to be installed before building the docs, so I added it to the build requirements with a conda environment for ReadTheDocs, however when build requirements are given, ReadTheDocs ignores the directives to mock modules such as numpy or earth engine ee, and instead requires all unsupported imports to also be installed as a build requirement.

The conda build env would work just fine by installing all needed dependencies as it does for the module itself however earth engine ee is problematic. Once earth engine is installed by ReadTheDocs, later in the build process it crashes due to the request for the user's Google credentials to proceed (and I don't want to give them away by adding them to docs files) because in one module we initialize it, specifically the line ee.initialize() in gridwxcomp.download_gridmet_ee.py.

I tried different options of module mocking in conf.py including using autodoc_mock_imports and the method here, both work, however I could not get ReadTheDocs to mock some modules (e.g. numpy, ee, bokeh, ...) while installing others needed for building docs. I tried different admin options for the build process on ReadTheDocs including

a virtual env using setup.py install for building dependencies (sphinx-click and click)
using a pip requirements.txt and a conda environment.yml for building dependencies
using the setup.py install option
different options in a ReadTheDocs configuration file

I also tried forcing the install of sphinx-click and click (the two build requirements that actually need to be installed, not mocked) in conf.py (using subprocess) while retaining the module mocking (using autodoc_mock_imports) that had previously worked fine for earth engine, however that also triggers ReadTheDocs to build a virtual env even if it is unchecked in admin advanced settings and then ignoring the module mock resulting in import errors on numpy, scipy, ee, etc.

I did not find any identical problem on stack exchange, Google, etc. however many other's have run into similar issues. Maybe this post will be useful for others.

The updated docs are now at: https://wswup.github.io/gridwxcomp/

Test env and main workflow on Windows

Would be good to test running new scripts on Windows, I have successfully tested the following on Windows 7 but would be good to verify it works on other systems.

Start with activating the windows conda environment:

cd env
conda env create -f env_windows.yml
activate gridwxcomp

For now install from the root directory using

pip install -e .

Make sure earth engine can be initialized first then run the following scripts in order:

# create CSV with station/gridMET pairs from station metadata 
gridwxcomp prep-input example_data/Station_Data.txt 
# download gridMET time series that overlap stations, 2 yrs
gridwxcomp download-gridmet-ee merged_input.csv -o test_gridmet_data -y 2016-2017
# calculate monthly bias ratios
gridwxcomp calc-bias-ratios merged_input.csv -o test_ratios
# create spatial surface using inverse distance rbf at 400m res, 5 cell buffer, calc zonal stats
gridwxcomp spatial -i test_ratios/summary_comp.csv -b 5

The final result should be a series of files in the test_ratios directory containing the monthly
bias ratios for each of the four stations in example_data/Station_Data.txt. These files should include: etr_mm_summary_comp.csv, etr_mm_summary.csv. The final zonal means for each gridMET cell in the interpolation region should be in test_ratios/spatial/etr_mm_invdist_400m/gridMET_stats.csv.

Also, the directory test_ratios/spatial/ will be created containing a fishnet grid grid.shp that corresponds with the overall gridMET grid but only bounding the stations with a 5 cell buffer, a point shapefile with monthly mean bias ratios for each station etr_mm_summary_pts.shp, and 14 GeoTIFF rasters of interpolated surfaces of bias ratios e.g. Jan_mean.tiff should all be in test_ratios/spatial/etr_mm_invdist_400m/.

If all works well you can try modifying the last step to use a different interpolation grid (resolution) by using the -s [--scale] command line option that scales the original gridMET grid size (4 km) the default is 0.1 or 400 m. Can also try using a different interpolation options

'invdist'
'invdistnn'
'average'
'linear'
'nearest'

or radial basis functions for interpolation:

'multiquadric'
'inverse_rbf'
'gaussian'
'linear'
'cubic'
'quintic'
'thin_plate'

using the -f [--function] command line option.

So to use nearest interpolation function at 200m resolution run:

gridwxcomp spatial test_ratios/summary_comp.csv -b 5 -f nearest -s 0.05

Or to specify interpolation of a coefficient of variation as opposed to a mean bias ratio use the [-l, --layer] option:

gridwxcomp spatial test_ratios/summary_comp.csv -l April_to_oct_cv -b 5 -f nearest -s 0.05

spatial.py ouput dir cannot match input dir of summary .csv (crashes)

spatial.py will overwrite the input summary.csv (e.g. eto_mm_summary_comp_all_yrs.csv) and crash mid-run if the -i and -o folder locations match.

Add check or modify "copied" filename, so overwrite doesn't erase original data file mid run.

ploy.py: regression through zero fails with negative values

Possible dataframe data type error due to pandas update

An error was thrown in plot.py when I was running gridwxcomp.interpolate:
'pandas dataframe float object has no attribute astype'

which corresponded to lines 734 and 735
min_yr = df.start_year.min().astype(int)
max_yr = df.end_year.max().astype(int)

I have pandas version 0.25.1, maybe this is the result of a pandas update, but when I made this change, I was able to run the function.
min_yr = int(df.start_year.min())
max_yr = int(df.end_year.max())

I had the same error while running gridwxcomp.plot.daily_comparison for lines 143 and 144 of plot.py

Not sure if it is only with higher versions of pandas, I used the environment.yml file which specifies pandas>=0.24.

Minor type in column header

Awesome package! Thanks so much for creating such a useful and cool package!

I noticed a minor typo in the example_data StationST_Daily_output.csv, the column header was labeled 'Eto (mm)' which was throwing an error in the function calc_bias_ratios when grid_variable = 'eto_mm'

By changing it to 'ETo (mm)' with a capital 'T' instead of lowercase 't', the function worked.

restructure repository, remove legacy

Going to restructure the repo and clean it up by first keeping legacy scripts in a "legacy" branch so that they will always be accessible by switching to this branch. These scripts will include

biascorrect.py
extract_monthlyratios.py
download_gridmet_ETr_tiff_daily_ee.py
download_gridmet_ETr_tiff_monthly_ee.py

Second, a subdirectory named gridwxcomp will be created for all other submodules and data of the gridwxcomp module for packaging with PyPi. This will also clean up the repo quite a bit.

environment.yml file not working to create gridwxcomp env

After running the create environment command using the .yml file I get the following errors:

This is kind of hard to read but the main error says "UnsatisfiableError: The following specifications were found to be incompatible with each other:". Then there is a long list of packages that is apparently incompatible.

Unable to install Jupyter Notebook within gridwxcomp environment

I can't open a jupyter notebook within the gridwxcomp environment, and when I open a notebook in my base environment, it can't see the gridwxcomp environment.

Attempts to install jupyter notebook were met with failure, even after installing nb_conda_kernel as suggested by this post

The error seemed to be caused by a package inconsistency, as follows:

The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

conda-forge/win-64::fiona==1.8.13=py38hb7fdc2d_0

conda-forge/noarch::rasterstats==0.14.0=py_0
failed with initial frozen solve. Retrying with flexible solve.

I haven't worked with jupyter notebooks very much, so it's possible I am missing something very basic here. Has anyone else encountered this problem?

Many thanks in advance.

Add "update" option to the gridmet download tool

Add option to update specific year or range or years. Right now the download script keeps all original data. This feature will be useful when model bugs are identified and corrections are applied after the "permanent" status review.

temperature bias calculations

all temperature bias calculations should be computed based on the difference between station and gridMET, not the ratio of station to gridMET.

Add VPD as optional export variable in download_gridmet_opendap.py

http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_met_vpd_1979_CurrentYear_CONUS.nc.html

vpd_kpa

error message about the fiona package import

I followed the directions on the github for the gridwxcomp and got the environment and installation setup, but unfortunately we get this error message about the fiona package import that gridwxcomp is using. I'm pretty sure fiona handles some of the GIS stuff so perhaps take a look at it.

please copy your response to [email protected] and [email protected]

extend to other variables instead of only ETr

Ultimately the package would be more useful if any climatic variables that are recorded in climate stations and gridMET could be compared, for example precipitation, or observed evapotranspiration, as opposed to modeled potential evapotranspiration only. For now the climatic data at stations may include those that result from first running the pyWeatherQAQC module on climatic station data, i.e.

TAvg (C)	TMax (C)	TMin (C)	TDew (C)	Vapor Pres (kPa)	RHAvg (%)	RHMax (%)	RHMin (%)	Rs (w/m2)	Rs_TR (w/m2)	Rso (w/m2)	Windspeed (m/s)	Precip (mm)	Data_ETr (mm)	Data_ETo (mm)	Calc_ETr (mm)	Calc_ETo (mm)

gridMET data includes:

Maximum temperature, minimum temperature, precipitation accumulation, downward surface shortwave radiation, wind-velocity, humidity (maximum and minimum relative humidity and specific humidity), and Reference evapotranspiration (ASCE Penman-Montieth)

Downloaded from earth engine and renamed by download_gridmet_ee.py gridMET variables include

u2_ms | tmin_c | tmax_c | srad_wm2 | ea_kpa | prcp_mm | etr_mm | eto_mm

So there is quite a bit of overlap. One way is to simply use a dictionary that maps station variable names to gridMET names, e.g.:

{
    'Calc_ETr (mm)' : 'etr_mm',
    'Calc_ETo (mm)' : 'eto_mm',
    ...
}

To avoid redundancy or overwriting files conducted on different variables a system will be needed to either use different working directories or save certain files with a naming system. For example it would not make sense to recreate a fishnet unless it is for a different bounding area or set of stations, if the user wanted to calculate bias ratios for etr and eto they should be able to use the same working directory and fishnet grid for interpolation.