Giter VIP home page Giter VIP logo

snowexsql's People

Contributors

dependabot[bot] avatar hpmarshall avatar jomey avatar kschwab avatar micah-prime avatar micahjohnson150 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

snowexsql's Issues

Updates for the new database

The following things need to be fixed with the onset of the new more broad database

  • Update docs about spatial and timezones of database
  • Update examples to limit the queries to site_name == 'Grand Mesa'
  • Add an example on ST_Transform given now there are multiple projections

Add met data to the database

Megan has provided an example of what the met data will look like. We need to a script to upload this so when the final data is ready they can use it as a template for uploading them.

requirements.txt--> fiona?

in trying to install snowexsql from cmd line, installation failed due to "A GDAL API version must be specified..", however, this issue was solved by installing fiona. Upon updating the requirements.txt for the microstructure project, everything installs when fiona is a dependency.

Standardize the vertical datum across datasets

As pointed out by @dshean, the elevation data that was recorded in the pit site data uses the ellipsoid WGS84. The Grand mesa DEM from the USGS is in NAVD88. The snow depths dataset also has an elevation column which is not adjusted. I don't have a preference here but they should all be the same.

This should also be documented.

Credential mods broke the install script

The credentials addition I made a couple of days ago broke the install script on ubuntu. Looks like the role names I chose don't exist at the time of creating the databases so postgres doesn't know who they belong to.

typo in Cheat Sheet

Few instances of PointsData vs. PointData that might cause confusion. See the Querying table for .limit() and .count() code snippets in the Cheat Sheet.

Improve SMP metadata for provenance

As pointed out by Michael Durand, if a user wants to calculate the SSA from an SMP profile they will need the whole profile. Storing the whole profile is impractical right now in the db due to upload time and potentially decrease in db queries (a couple billion records). So instead we should offer the original file name in the meta so that one could find the file in the NSIDC.

Additionally the SMP ID is not being included in the upload by accident due to using description instead of comments. So it is ignored currently.

Add CI to snowexsql

To allow for more collaborations, we need the following at a minimum:

  1. Auto documentation building via read the docs
  2. Add in pytest-ing via github actions.

Add Snow Classification data set (Sturm & Liston, 2021)

NSIDC data set: https://nsidc.org/data/NSIDC-0768/versions/1 (link)

This data set consists of global, seasonal snow classifications determined from air temperature,
precipitation, and wind speed climatologies. The classifications represent a climatological average
for the 39 year period from 1981โ€“2019.

Specifically for the snowexsql database, I recommend adding the North American downscaled rasters (tiffs, netCDF, and/or ascii) files at 300m, 1km, 5km, and 50km (see the NSIDC User's Guide for detail). The snow classifications (tundra, prairie, montane forest, ephemeral, etc.) will help users filter data from multiple SnowEx campaigns where the objectives were to study snow in different snow climate types.

The associated paper citation for this data set is Sturm, M., & Liston, G. E. (2021). Revisiting the global seasonal snow classification: An updated dataset for earth system applications. Journal of Hydrometeorology, 22(11), 2917-2938.

Upload script for GPR data

We need to upload the GPR two way travel time data (point data) to the database. This will require a script for uploading and whatever supporting metadata

Compare all point to raster snow depths

HP would like a comparison of all 30K point depth measurements to the the QSI - USGS rasters and QSI - ASO rasters.

All points should be compared (across all dates) to the first flights (QSI 1) and then all points to the second flight (QSI 2).

The simplest way we can do this is to use ST_Value which simply grabs the near pixel. In the other snow depth comparison we make an effort to recast/shape the rasters to a common grid and resolution for matrix math.

Add more abstracted functions/tools to make accessing data easier

It was concluded yesterday during the sprint meeting for hack week that having more tools for extracting a dataset from the db into a file might be very helpful.
I am thinking:

  1. Download a raster to file with a shapefile to crop
  2. Download Points that land in a shapefile

Make a cheat sheet to help Hack Week users

It was proposed by Emilio at UW yesterday during their sprint meeting to have some kind of cheat sheet for people to use for a quick reference for using a the db. I think this is a great idea. I think we could make it in the docs and then have a way for people print that page to pdf if need be.

Surveyor missing in CSV uploads

Querying the DB I found that we are not assigning the surveyor to a few datasets

  • depth from mesa
  • depth from magnaprobe
  • depth from pit ruler
  • force from snowmicropen
  • density from LayerData
  • permittivity from LayerData
  • manual_wetness from LayerData
  • grain_type from LayerData
  • grain_size from LayerData
  • hand_hardness from LayerData
  • temperature from LayerData
  • lwc_vol from LayerData

This looks like the snow depth data could be just updated by passing the surveyor kwarg to uploader.

The pit data is more complex. Site Data is not currently being passed with the actual profiles. Albeit uploaded to sitedetails the profiles never see the operators kwarg.

get_db and Postgres.conf needs adjustments

After our first live run during hackweek we had a ton of issues with the database.

  • max_connections was 100
  • tcp_keepalives_idle 0

Still sorting through what else needs to be adjusted but this is the starting place. These eventually need to be added to the mod_conf.py script.

Make a small test dataset for the UAVSAR data

Due to the nature of the binary files, especially the grd files being in complex format, I would like to have a smaller dataset that stays in the repo to test against. This is a little tricky as we have to extract out the binary lines that represent the cropped area. So we need a script to produce this data and then store it in the db to run tests against.

Add DOI to the DB

Its come up a couple times on how to distinguish datasets that are published to the NSIDC that are in the DB and those that are still in progress. I think we should add a DOI column for all data. If that the column is None then this is an anticipated dateset. If it has one, then we know it came from the NSIDC. @meganmason and I came up with the idea of having a script to update a list of datasets anticipated/published in the repo for hackweek users to reference which I think would be very useful for project sessions.

Metadata doesn't properly read SMP header

The SMP CSVs are not being read correctly which leads to a missing location information. This has never happened before because when we first uploaded the SMP data it was with the SMP log file to account for updates to the time issue in the files. Thats since been fixed and were no longer using the SMP log file.

What I did:

  • Ran python3 add_smp.py on the snowex server

What happened:

snowexsql.batch INFO Accessing Database localhost/snowex
snowexsql.batch INFO Preparing to upload 958 files...
snowexsql.metadata INFO Interpreting metadata in /home/ubuntu/snowexsql/scripts/download/data/SNOWEX/SNEX20_SMP.001/csv_resampled/SNEX20_SMP_S06M0655_1N6_20200128.CSV
snowexsql.metadata DEBUG Found end of header at line 8...
snowexsql.metadata INFO Names to be uploaded as main data are: force
snowexsql.metadata DEBUG Column Data found to be 3 columns based on Line 7
snowexsql.metadata DEBUG Discovered 0 lines of valid header info.
Traceback (most recent call last):
  File "add_smp.py", line 45, in <module>
    main()
  File "add_smp.py", line 38, in main
    b.push()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/batch.py", line 223, in push
    self._push_one(f, **meta)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/batch.py", line 127, in _push_one
    d = self.UploaderClass(f, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/upload.py", line 34, in __init__
    self.hdr = DataHeader(profile_filename, **kwargs)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/metadata.py", line 377, in __init__
    self.info = self.interpret_data(info)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/metadata.py", line 717, in interpret_data
    info = add_geom(info, self.epsg)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/snowexsql-0.1.0-py3.8.egg/snowexsql/projection.py", line 72, in add_geom
    epsg, info['easting'], info['northing']),
KeyError: 'easting'

In testing I added a test and I was able to reproduce locally.

Not all querys have a geom column to make a geodataframe

There have been multiple instances of folks wanting a dataframe but do not necessarily have a geom column to use the query_to_geopandas function which requires a resulting column to be name by sql geom to work.

This can be solve with two approaches.

  1. Add a kwarg pass through for the geom column name for columns that geometry but are not named geom
  2. Add in a new function called query_to_pandas to get out a datafram when no geometry is available.

Reduce dependencies and relax pins for easier installation

https://github.com/hpmarshall/SnowEx2020_SQLcode/blob/master/requirements.txt has many optional dependencies that could be removed or use more relaxed pins to make installation easier in existing python environments.

One issue that arises from the current setup is that installing running on jupyterhub running jlab>=3 can install an incompatible version of jupyterlab:
https://github.com/hpmarshall/SnowEx2020_SQLcode/blob/2fb72ec2dc5f48fd8bd2b207553f90a05100ff75/requirements.txt#L10
One option for all the visualization-related libraries is to define them as optional dependencies:
https://stackoverflow.com/questions/6237946/optional-dependencies-in-distutils-pip

For pinning specific versions for a consistent development environment, consider using https://python-poetry.org or https://pip.pypa.io/en/stable/reference/pip_freeze/ or https://github.com/conda-incubator/conda-lock to generate lock files

Increase os build coverage

Right now we test 3.6, 3.7, 3.8, 3.9 on ubuntu latest. But @hpmarshall has been having issues on mac and during hackweek some folks in the projects had issues on windows. Github actions allows us to add operating systems to our builds.

Duplicate density profile data?

@micah-prime found this issue using the following query.

from snowexsql.data import LayerData 
from snowexsql.db import get_db 
from snowexsql.conversions import query_to_geopandas
from datetime import date 
engine, session = get_db(db_name) 

qry = session.query(LayerData)
qry = qry.filter(LayerData.date == date(2020, 2, 8))
qry = qry.filter(LayerData.site_id =="6N16")
qry = qry.filter(LayerData.type=='density')

df = query_to_geopandas(qry, engine)

Produces a data frame that has the density profile twice where one has the value AND samples A,B,C and the other only has the value.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.