snowex / snowexsql Goto Github PK
View Code? Open in Web Editor NEWA project to access the database holding data from the NASA SnowEx campaign
License: Other
A project to access the database holding data from the NASA SnowEx campaign
License: Other
Once the lidar datasets are in the db, we should make an example differencing them for comparison. This would make a great notebook.
Download the 2016 Data from the NSIDC and upload to the database
To add a new layer or point data type, the metadata must be able to pick out the column in a csv. We need to add documentation describing this. This basically consists of documenting adding a name to: https://github.com/hpmarshall/SnowEx2020_SQLcode/blob/b4a0fb2baadedcd96fa95275c3d2262c69ed0cf4/snowxsql/metadata.py#L390
The following things need to be fixed with the onset of the new more broad database
site_name == 'Grand Mesa'
ST_Transform
given now there are multiple projectionsMegan has provided an example of what the met data will look like. We need to a script to upload this so when the final data is ready they can use it as a template for uploading them.
in trying to install snowexsql from cmd line, installation failed due to "A GDAL API version must be specified..", however, this issue was solved by installing fiona. Upon updating the requirements.txt for the microstructure project, everything installs when fiona is a dependency.
Each flie has a polarization associated which is noted in the annotation file. We simply need to include this in the comments of the data.
Hi Micahs,
I think it would be great to also add the CSU and UNM GPR datasets. These are really complementary with Tate's BSU dataset. Links below.
https://nsidc.org/data/SNEX20_GM_CSU_GPR/versions/1
https://nsidc.org/data/SNEX20_UNM_GPR/versions/1
Thanks!
Ryan
As pointed out by @dshean, the elevation data that was recorded in the pit site data uses the ellipsoid WGS84. The Grand mesa DEM from the USGS is in NAVD88. The snow depths dataset also has an elevation column which is not adjusted. I don't have a preference here but they should all be the same.
This should also be documented.
The credentials addition I made a couple of days ago broke the install script on ubuntu. Looks like the role names I chose don't exist at the time of creating the databases so postgres doesn't know who they belong to.
Few instances of PointsData vs. PointData that might cause confusion. See the Querying table for .limit() and .count() code snippets in the Cheat Sheet.
As pointed out by Michael Durand, if a user wants to calculate the SSA from an SMP profile they will need the whole profile. Storing the whole profile is impractical right now in the db due to upload time and potentially decrease in db queries (a couple billion records). So instead we should offer the original file name in the meta so that one could find the file in the NSIDC.
Additionally the SMP ID is not being included in the upload by accident due to using description instead of comments. So it is ignored currently.
To allow for more collaborations, we need the following at a minimum:
IN the cheat sheet for the usage of the ST_Union we reference the wrong module. This might cause some confusion.
"What I would like to be able to do with the UAVSAR data is to extract the coherence and phase change from all 4 polarizations, for a 10m area around a given location (say at a pit). That would be an ideal example if you get to it." - HP Marshall
Download all 4 or 5 sites of the Lidar data to add to the db.
This is a place holder to add items to the cheat sheet as the come up
.count()
Not sure what happened but pytest is failing only on GH and not locally. Looks to be more datetime issues.
RTD is failing on missing tags in markdown cells in a couple jupyter notebooks. See https://readthedocs.org/api/v2/build/13828170.txt
Please add the 2020 SnowEx SnowDepth from Snow Poles Dataset. The .csv file will be emailed separately.
NSIDC data set: https://nsidc.org/data/NSIDC-0768/versions/1 (link)
This data set consists of global, seasonal snow classifications determined from air temperature,
precipitation, and wind speed climatologies. The classifications represent a climatological average
for the 39 year period from 1981โ2019.
Specifically for the snowexsql database, I recommend adding the North American downscaled rasters (tiffs, netCDF, and/or ascii) files at 300m, 1km, 5km, and 50km (see the NSIDC User's Guide for detail). The snow classifications (tundra, prairie, montane forest, ephemeral, etc.) will help users filter data from multiple SnowEx campaigns where the objectives were to study snow in different snow climate types.
The associated paper citation for this data set is Sturm, M., & Liston, G. E. (2021). Revisiting the global seasonal snow classification: An updated dataset for earth system applications. Journal of Hydrometeorology, 22(11), 2917-2938.
Here's the link to access snow-free LiDAR 1m DEM from Grand Mesa:
https://viewer.nationalmap.gov/basic/
Download for grand mesa, create a script to download them. Sounds like it will download almost all of the data for the area flown.
@hpmarshall would like the UAVSAR for all 13 sites added to the database. He has a matlab script I would like to rewrite in python to download them.
Use HWY 65 to confirm whether there is a vertical datum issue in our example showing how to calculate the snow depth maps from the two snow off dems.
We need to upload the GPR two way travel time data (point data) to the database. This will require a script for uploading and whatever supporting metadata
Since I am not sure how to download GDrive in a headless state like the EC2 instance is, I figure its a good time to move to scripts that pull the actual NSIDC data.
HP would like a comparison of all 30K point depth measurements to the the QSI - USGS rasters and QSI - ASO rasters.
All points should be compared (across all dates) to the first flights (QSI 1) and then all points to the second flight (QSI 2).
The simplest way we can do this is to use ST_Value which simply grabs the near pixel. In the other snow depth comparison we make an effort to recast/shape the rasters to a common grid and resolution for matrix math.
It was concluded yesterday during the sprint meeting for hack week that having more tools for extracting a dataset from the db into a file might be very helpful.
I am thinking:
A great suggestion by @meganmason, datasets that use hands or observation don't have instruments. As such they get listed as None. This should be Manual instead to be clearer and easier to filter on.
HP asked for the HRRR data over GM to be uploaded.
@hpmarshall do you know what variables you want?
@hpmarshall wants an example of showing how to extract the following from all the pits across all time:
Tests are passing and failing with different tz offsets. We will need to look at this more closely.
There are new files included in the published Pit data for grand mesa. The uploader script fails encountering a couple of summary files.
The fix is to ignore any summary files in the add_profiles.py
uploader
It was proposed by Emilio at UW yesterday during their sprint meeting to have some kind of cheat sheet for people to use for a quick reference for using a the db. I think this is a great idea. I think we could make it in the docs and then have a way for people print that page to pdf if need be.
We have the CSV from Tate to add Two Way Travel, Depth, and SWE derived from the GPR Data so now we just need to expand the script to add them in.
Querying the DB I found that we are not assigning the surveyor to a few datasets
This looks like the snow depth data could be just updated by passing the surveyor kwarg to uploader.
The pit data is more complex. Site Data is not currently being passed with the actual profiles. Albeit uploaded to sitedetails the profiles never see the operators kwarg.
During our snowex hackweek call it came up that the depths spiral example in Jupyter notebooks gallery under docs is pulling all the points in the radius. Now running this would probably grab the GPR data too. We need to be more specific with that example.
After our first live run during hackweek we had a ton of issues with the database.
Still sorting through what else needs to be adjusted but this is the starting place. These eventually need to be added to the mod_conf.py script.
Due to the nature of the binary files, especially the grd files being in complex format, I would like to have a smaller dataset that stays in the repo to test against. This is a little tricky as we have to extract out the binary lines that represent the cropped area. So we need a script to produce this data and then store it in the db to run tests against.
Seems like there might be an issue installing psycopg2 for conda users. @mikedurand and @robbiemallett had this issue.
Mike was able to get past it with pip install psycopg2-binary
Its come up a couple times on how to distinguish datasets that are published to the NSIDC that are in the DB and those that are still in progress. I think we should add a DOI column for all data. If that the column is None then this is an anticipated dateset. If it has one, then we know it came from the NSIDC. @meganmason and I came up with the idea of having a script to update a list of datasets anticipated/published in the repo for hackweek users to reference which I think would be very useful for project sessions.
Now the UAVSAR data is completely available, finalize the interpretation, re-projection, and uploading to the DB.
The SMP CSVs are not being read correctly which leads to a missing location information. This has never happened before because when we first uploaded the SMP data it was with the SMP log file to account for updates to the time issue in the files. Thats since been fixed and were no longer using the SMP log file.
What I did:
python3 add_smp.py
on the snowex serverWhat happened:
snowexsql.batch INFO Accessing Database localhost/snowex
snowexsql.batch INFO Preparing to upload 958 files...
snowexsql.metadata INFO Interpreting metadata in /home/ubuntu/snowexsql/scripts/download/data/SNOWEX/SNEX20_SMP.001/csv_resampled/SNEX20_SMP_S06M0655_1N6_20200128.CSV
snowexsql.metadata DEBUG Found end of header at line 8...
snowexsql.metadata INFO Names to be uploaded as main data are: force
snowexsql.metadata DEBUG Column Data found to be 3 columns based on Line 7
snowexsql.metadata DEBUG Discovered 0 lines of valid header info.
Traceback (most recent call last):
File "add_smp.py", line 45, in <module>
main()
File "add_smp.py", line 38, in main
b.push()
File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/batch.py", line 223, in push
self._push_one(f, **meta)
File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/batch.py", line 127, in _push_one
d = self.UploaderClass(f, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/upload.py", line 34, in __init__
self.hdr = DataHeader(profile_filename, **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/metadata.py", line 377, in __init__
self.info = self.interpret_data(info)
File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/metadata.py", line 717, in interpret_data
info = add_geom(info, self.epsg)
File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/projection.py", line 72, in add_geom
epsg, info['easting'], info['northing']),
KeyError: 'easting'
In testing I added a test and I was able to reproduce locally.
Allow for rasters to be stored in S3 as Cloud Optimized GeoTIFFS
Relevant docs:
https://www.crunchydata.com/blog/postgis-raster-and-crunchy-bridge
https://postgis.net/docs/using_raster_dataman.html#RT_Cloud_Rasters
After realizing that the SMP data would take 3 days to upload into the database, @hpmarshall HP decided to reduce the data to every 100th sample for now.
After walking through the whole process it occurred to me I am making too many assumptions about what the end user might know. We need a very simple notebook to encourage people to see the bare mechanics of the database.
There have been multiple instances of folks wanting a dataframe but do not necessarily have a geom column to use the query_to_geopandas
function which requires a resulting column to be name by sql geom
to work.
This can be solve with two approaches.
https://github.com/hpmarshall/SnowEx2020_SQLcode/blob/master/requirements.txt has many optional dependencies that could be removed or use more relaxed pins to make installation easier in existing python environments.
One issue that arises from the current setup is that installing running on jupyterhub running jlab>=3 can install an incompatible version of jupyterlab:
https://github.com/hpmarshall/SnowEx2020_SQLcode/blob/2fb72ec2dc5f48fd8bd2b207553f90a05100ff75/requirements.txt#L10
One option for all the visualization-related libraries is to define them as optional dependencies:
https://stackoverflow.com/questions/6237946/optional-dependencies-in-distutils-pip
For pinning specific versions for a consistent development environment, consider using https://python-poetry.org or https://pip.pypa.io/en/stable/reference/pip_freeze/ or https://github.com/conda-incubator/conda-lock to generate lock files
Right now we test 3.6, 3.7, 3.8, 3.9 on ubuntu latest. But @hpmarshall has been having issues on mac and during hackweek some folks in the projects had issues on windows. Github actions allows us to add operating systems to our builds.
@micah-prime found this issue using the following query.
from snowexsql.data import LayerData
from snowexsql.db import get_db
from snowexsql.conversions import query_to_geopandas
from datetime import date
engine, session = get_db(db_name)
qry = session.query(LayerData)
qry = qry.filter(LayerData.date == date(2020, 2, 8))
qry = qry.filter(LayerData.site_id =="6N16")
qry = qry.filter(LayerData.type=='density')
df = query_to_geopandas(qry, engine)
Produces a data frame that has the density profile twice where one has the value AND samples A,B,C and the other only has the value.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.