snowex / snowexsql Goto Github PK

View Code? Open in Web Editor NEW

20.0 20.0 5.0 36.4 MB

A project to access the database holding data from the NASA SnowEx campaign

License: Other

Makefile 4.10% Python 85.53% Shell 10.37%

research-tool science snow

snowexsql's People

Contributors

Stargazers

Watchers

Forkers

scottyhq aaarendt mariannecowherd jcbw spestana

snowexsql's Issues

Make an example producing multiple snow depth maps.

Once the lidar datasets are in the db, we should make an example differencing them for comparison. This would make a great notebook.

Add ASO Snow off data to DB

Download the 2016 Data from the NSIDC and upload to the database

Add documentation for adding a new layer/point data type

To add a new layer or point data type, the metadata must be able to pick out the column in a csv. We need to add documentation describing this. This basically consists of documenting adding a name to: https://github.com/hpmarshall/SnowEx2020_SQLcode/blob/b4a0fb2baadedcd96fa95275c3d2262c69ed0cf4/snowxsql/metadata.py#L390

Updates for the new database

The following things need to be fixed with the onset of the new more broad database

Update docs about spatial and timezones of database
Update examples to limit the queries to site_name == 'Grand Mesa'
Add an example on ST_Transform given now there are multiple projections

Add met data to the database

Megan has provided an example of what the met data will look like. We need to a script to upload this so when the final data is ready they can use it as a template for uploading them.

in trying to install snowexsql from cmd line, installation failed due to "A GDAL API version must be specified..", however, this issue was solved by installing fiona. Upon updating the requirements.txt for the microstructure project, everything installs when fiona is a dependency.

Add polarization to metadata for UAVSAR images

Each flie has a polarization associated which is noted in the annotation file. We simply need to include this in the comments of the data.

get_db dependency issue

I have issues with get_db. See below

Here is the pip freeze results:

pip_freeze.pdf

Adding other GPR datasets

Hi Micahs,

I think it would be great to also add the CSU and UNM GPR datasets. These are really complementary with Tate's BSU dataset. Links below.

https://nsidc.org/data/SNEX20_GM_CSU_GPR/versions/1

https://nsidc.org/data/SNEX20_UNM_GPR/versions/1

Thanks!
Ryan

Standardize the vertical datum across datasets

As pointed out by @dshean, the elevation data that was recorded in the pit site data uses the ellipsoid WGS84. The Grand mesa DEM from the USGS is in NAVD88. The snow depths dataset also has an elevation column which is not adjusted. I don't have a preference here but they should all be the same.

This should also be documented.

Credential mods broke the install script

The credentials addition I made a couple of days ago broke the install script on ubuntu. Looks like the role names I chose don't exist at the time of creating the databases so postgres doesn't know who they belong to.

typo in Cheat Sheet

Few instances of PointsData vs. PointData that might cause confusion. See the Querying table for .limit() and .count() code snippets in the Cheat Sheet.

Improve SMP metadata for provenance

As pointed out by Michael Durand, if a user wants to calculate the SSA from an SMP profile they will need the whole profile. Storing the whole profile is impractical right now in the db due to upload time and potentially decrease in db queries (a couple billion records). So instead we should offer the original file name in the meta so that one could find the file in the NSIDC.

Additionally the SMP ID is not being included in the upload by accident due to using description instead of comments. So it is ignored currently.

Add CI to snowexsql

To allow for more collaborations, we need the following at a minimum:

Auto documentation building via read the docs
Add in pytest-ing via github actions.

cheat sheet `ST_Union` uses `func` not `gfunc` which is incorrect

IN the cheat sheet for the usage of the ST_Union we reference the wrong module. This might cause some confusion.

Make a notebook showing with the UAVSAR data is to extracting the coherence and phase change

"What I would like to be able to do with the UAVSAR data is to extract the coherence and phase change from all 4 polarizations, for a 10m area around a given location (say at a pit). That would be an ideal example if you get to it." - HP Marshall

Download all SnowEx Lidar flights and add them to the DB

Download all 4 or 5 sites of the Lidar data to add to the db.

Add more to the cheat sheet (later)

This is a place holder to add items to the cheat sheet as the come up

.count()

Pytest failing only on gh and RTD also failing on jupyter notebooks tag

Not sure what happened but pytest is failing only on GH and not locally. Looks to be more datetime issues.

RTD is failing on missing tags in markdown cells in a couple jupyter notebooks. See https://readthedocs.org/api/v2/build/13828170.txt

Adding 2020 SnowEx Snow Depth from Snow Poles Dataset

Please add the 2020 SnowEx SnowDepth from Snow Poles Dataset. The .csv file will be emailed separately.

Add Snow Classification data set (Sturm & Liston, 2021)

NSIDC data set: https://nsidc.org/data/NSIDC-0768/versions/1 (link)

This data set consists of global, seasonal snow classifications determined from air temperature,
precipitation, and wind speed climatologies. The classifications represent a climatological average
for the 39 year period from 1981–2019.

Specifically for the snowexsql database, I recommend adding the North American downscaled rasters (tiffs, netCDF, and/or ascii) files at 300m, 1km, 5km, and 50km (see the NSIDC User's Guide for detail). The snow classifications (tundra, prairie, montane forest, ephemeral, etc.) will help users filter data from multiple SnowEx campaigns where the objectives were to study snow in different snow climate types.

The associated paper citation for this data set is Sturm, M., & Liston, G. E. (2021). Revisiting the global seasonal snow classification: An updated dataset for earth system applications. Journal of Hydrometeorology, 22(11), 2917-2938.

Make a script to add Elevation Products (3DEP) 1m DEM products to DB

Here's the link to access snow-free LiDAR 1m DEM from Grand Mesa:
https://viewer.nationalmap.gov/basic/

Download for grand mesa, create a script to download them. Sounds like it will download almost all of the data for the area flown.

Download all UAVSAR and add it to the DB

@hpmarshall would like the UAVSAR for all 13 sites added to the database. He has a matlab script I would like to rewrite in python to download them.

Use a geographic location to see if there is a vertical datum issue in depth maps example

Use HWY 65 to confirm whether there is a vertical datum issue in our example showing how to calculate the snow depth maps from the two snow off dems.

Upload script for GPR data

We need to upload the GPR two way travel time data (point data) to the database. This will require a script for uploading and whatever supporting metadata

Create download scripts that reference the NSIDC

Since I am not sure how to download GDrive in a headless state like the EC2 instance is, I figure its a good time to move to scripts that pull the actual NSIDC data.

Compare all point to raster snow depths

HP would like a comparison of all 30K point depth measurements to the the QSI - USGS rasters and QSI - ASO rasters.

All points should be compared (across all dates) to the first flights (QSI 1) and then all points to the second flight (QSI 2).

The simplest way we can do this is to use ST_Value which simply grabs the near pixel. In the other snow depth comparison we make an effort to recast/shape the rasters to a common grid and resolution for matrix math.

Add more abstracted functions/tools to make accessing data easier

It was concluded yesterday during the sprint meeting for hack week that having more tools for extracting a dataset from the db into a file might be very helpful.
I am thinking:

Download a raster to file with a shapefile to crop
Download Points that land in a shapefile

Change Layer Instruments with None to Manual

A great suggestion by @meganmason, datasets that use hands or observation don't have instruments. As such they get listed as None. This should be Manual instead to be clearer and easier to filter on.

Add HRRR data to the DB

HP asked for the HRRR data over GM to be uploaded.

@hpmarshall do you know what variables you want?

Make an example of extracting the mass details from the pits

@hpmarshall wants an example of showing how to extract the following from all the pits across all time:

Surface density.
Mean Density.
SWE
Depth

Need to examine the timezones more

Tests are passing and failing with different tz offsets. We will need to look at this more closely.

Grand mesa pit upload has changed and errors on files

There are new files included in the published Pit data for grand mesa. The uploader script fails encountering a couple of summary files.

The fix is to ignore any summary files in the add_profiles.py uploader

Make a cheat sheet to help Hack Week users

It was proposed by Emilio at UW yesterday during their sprint meeting to have some kind of cheat sheet for people to use for a quick reference for using a the db. I think this is a great idea. I think we could make it in the docs and then have a way for people print that page to pdf if need be.

Make a script to add GPR twt depth and swe to database

We have the CSV from Tate to add Two Way Travel, Depth, and SWE derived from the GPR Data so now we just need to expand the script to add them in.

Surveyor missing in CSV uploads

Querying the DB I found that we are not assigning the surveyor to a few datasets

depth from mesa
depth from magnaprobe
depth from pit ruler
force from snowmicropen
density from LayerData
permittivity from LayerData
manual_wetness from LayerData
grain_type from LayerData
grain_size from LayerData
hand_hardness from LayerData
temperature from LayerData
lwc_vol from LayerData

This looks like the snow depth data could be just updated by passing the surveyor kwarg to uploader.

The pit data is more complex. Site Data is not currently being passed with the actual profiles. Albeit uploaded to sitedetails the profiles never see the operators kwarg.

Depths spiral was written when it was the only points data, needs to specify the dataset by type

During our snowex hackweek call it came up that the depths spiral example in Jupyter notebooks gallery under docs is pulling all the points in the radius. Now running this would probably grab the GPR data too. We need to be more specific with that example.

get_db and Postgres.conf needs adjustments

After our first live run during hackweek we had a ton of issues with the database.

max_connections was 100
tcp_keepalives_idle 0

Still sorting through what else needs to be adjusted but this is the starting place. These eventually need to be added to the mod_conf.py script.

Make a small test dataset for the UAVSAR data

Due to the nature of the binary files, especially the grd files being in complex format, I would like to have a smaller dataset that stays in the repo to test against. This is a little tricky as we have to extract out the binary lines that represent the cropped area. So we need a script to produce this data and then store it in the db to run tests against.

Switch from psycopg2 to ip install psycopg2-binary in reqs

Seems like there might be an issue installing psycopg2 for conda users. @mikedurand and @robbiemallett had this issue.

Mike was able to get past it with pip install psycopg2-binary

Add DOI to the DB

Its come up a couple times on how to distinguish datasets that are published to the NSIDC that are in the DB and those that are still in progress. I think we should add a DOI column for all data. If that the column is None then this is an anticipated dateset. If it has one, then we know it came from the NSIDC. @meganmason and I came up with the idea of having a script to update a list of datasets anticipated/published in the repo for hackweek users to reference which I think would be very useful for project sessions.

Make a script to add UAVSAR data to the DB

Now the UAVSAR data is completely available, finalize the interpretation, re-projection, and uploading to the DB.

Metadata doesn't properly read SMP header

The SMP CSVs are not being read correctly which leads to a missing location information. This has never happened before because when we first uploaded the SMP data it was with the SMP log file to account for updates to the time issue in the files. Thats since been fixed and were no longer using the SMP log file.

What I did:

Ran python3 add_smp.py on the snowex server

What happened:

snowexsql.batch INFO Accessing Database localhost/snowex
snowexsql.batch INFO Preparing to upload 958 files...
snowexsql.metadata INFO Interpreting metadata in /home/ubuntu/snowexsql/scripts/download/data/SNOWEX/SNEX20_SMP.001/csv_resampled/SNEX20_SMP_S06M0655_1N6_20200128.CSV
snowexsql.metadata DEBUG Found end of header at line 8...
snowexsql.metadata INFO Names to be uploaded as main data are: force
snowexsql.metadata DEBUG Column Data found to be 3 columns based on Line 7
snowexsql.metadata DEBUG Discovered 0 lines of valid header info.
Traceback (most recent call last):
  File "add_smp.py", line 45, in <module>
    main()
  File "add_smp.py", line 38, in main
    b.push()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/batch.py", line 223, in push
    self._push_one(f, **meta)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/batch.py", line 127, in _push_one
    d = self.UploaderClass(f, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/upload.py", line 34, in __init__
    self.hdr = DataHeader(profile_filename, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/metadata.py", line 377, in __init__
    self.info = self.interpret_data(info)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/metadata.py", line 717, in interpret_data
    info = add_geom(info, self.epsg)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/projection.py", line 72, in add_geom
    epsg, info['easting'], info['northing']),
KeyError: 'easting'

In testing I added a test and I was able to reproduce locally.

Store rasters in S3

Allow for rasters to be stored in S3 as Cloud Optimized GeoTIFFS

Configure database to read COGs as rasters
Update tests to reflect change

Relevant docs:
https://www.crunchydata.com/blog/postgis-raster-and-crunchy-bridge
https://postgis.net/docs/using_raster_dataman.html#RT_Cloud_Rasters

SMP Data is slow to upload. Reduce size to every 100th sample

After realizing that the SMP data would take 3 days to upload into the database, @hpmarshall HP decided to reduce the data to every 100th sample for now.

Make a very simple notebook explaining how to get started

After walking through the whole process it occurred to me I am making too many assumptions about what the end user might know. We need a very simple notebook to encourage people to see the bare mechanics of the database.

Not all querys have a geom column to make a geodataframe

There have been multiple instances of folks wanting a dataframe but do not necessarily have a geom column to use the query_to_geopandas function which requires a resulting column to be name by sql geom to work.

This can be solve with two approaches.

Add a kwarg pass through for the geom column name for columns that geometry but are not named geom
Add in a new function called query_to_pandas to get out a datafram when no geometry is available.

Reduce dependencies and relax pins for easier installation

https://github.com/hpmarshall/SnowEx2020_SQLcode/blob/master/requirements.txt has many optional dependencies that could be removed or use more relaxed pins to make installation easier in existing python environments.

One issue that arises from the current setup is that installing running on jupyterhub running jlab>=3 can install an incompatible version of jupyterlab:
https://github.com/hpmarshall/SnowEx2020_SQLcode/blob/2fb72ec2dc5f48fd8bd2b207553f90a05100ff75/requirements.txt#L10
One option for all the visualization-related libraries is to define them as optional dependencies:
https://stackoverflow.com/questions/6237946/optional-dependencies-in-distutils-pip

For pinning specific versions for a consistent development environment, consider using https://python-poetry.org or https://pip.pypa.io/en/stable/reference/pip_freeze/ or https://github.com/conda-incubator/conda-lock to generate lock files

Increase os build coverage

Right now we test 3.6, 3.7, 3.8, 3.9 on ubuntu latest. But @hpmarshall has been having issues on mac and during hackweek some folks in the projects had issues on windows. Github actions allows us to add operating systems to our builds.

Duplicate density profile data?

@micah-prime found this issue using the following query.

from snowexsql.data import LayerData 
from snowexsql.db import get_db 
from snowexsql.conversions import query_to_geopandas
from datetime import date 
engine, session = get_db(db_name) 

qry = session.query(LayerData)
qry = qry.filter(LayerData.date == date(2020, 2, 8))
qry = qry.filter(LayerData.site_id =="6N16")
qry = qry.filter(LayerData.type=='density')

df = query_to_geopandas(qry, engine)

Produces a data frame that has the density profile twice where one has the value AND samples A,B,C and the other only has the value.