Comments (21)
We don't have a "template". I added some thoughts of how the API could/should be in the wiki: https://github.com/oceanmodeling/searvey/wiki/API-design
but feel free to open a new ticket to further discuss this.
from searvey.
I understand the point but I wonder if we should bring this to the attention of Timothy first (with an issue on dataretrieval
) and see what he has to say. Having said that I leave it up to you guys.
from searvey.
I think it would be better to do what you suggest. I already created an issue here DOI-USGS/dataretrieval-python#59. In the last meeting only two of us were present, so I just wanted to relay what was discussed. I haven't yet implemented anything for USGS.
from searvey.
@mroberge, Thanks for mentioning HyRiver. As Martin said, PyGeoHydro includes a class called NWIS that provides access to several NWIS endpoints (you can check out this example notebook). Also, I developed robust and performant engines for working with web services (AsyncRetriever and PyGeoOGC), so feel free to explore them and let me know if you need any help.
from searvey.
@cheginit I learned about your toolset a couple of weeks ago when working on a different project. Your software stack is very impressive and useful, however since searvey
is focused on giving access to the original data from the source at the lowest level, it makes more sense to use minimal packages like dataretrieval
. With that being said, I'm looking forward to using your software stack in other projects.
from searvey.
thanks!
from searvey.
Thanks to @flackdl just share the location of the latest file:
https://github.com/flackdl/cwwed/blob/ad39f0e9bea6a0a3bdbc937fea41994f4ed359ba/scripts/usgs.py
from searvey.
great, I've made a first draft of the implementation of this here:
https://github.com/oceanmodeling/StormEvents/blob/7054095b4cb54ac733ea40091a5a2ffa1210c50b/stormevents/usgs/events.py#L313-L375
from searvey.
Thanks @zacharyburnettNOAA . See the email I just sent to Danny.
from searvey.
@SorooshMani-NOAA provided some input via email. I repost here for completeness:
Today I noticed this package on GitHub: https://github.com/USGS-python/dataretrieval
I was wondering if this retrieves the same data that you were interested in or if there's another USGS database that you'd like to query?
This ones seems to have the following data available for retrieval:
instantaneous values (iv)
daily values (dv)
statistics (stat)
site info (site)
discharge peaks (peaks)
discharge measurements (measurements)
water quality samples (qwdata)
which seems to be what the water services REST API provides:
https://waterservices.usgs.gov/rest/
George, if this is the same database the Jack is interested in, does it make sense to add a "normalization" wrapper on top of the dataretrieval
package or should searvey
directly use REST API?
from searvey.
I looked a bit into dataretrieval
and looks good. It already has users, they are considering doing a conda package
(see issue 44 therein) and the lead developer works for USGS which is beneficial for updates and access.
If it exposes all the data, then we can make a wrapper and use it as upstream dependency.
We can also invite Timothy Hodson to a meeting and discuss it.
from searvey.
Documenting relevant email between me and @Rjialceky (slightly modified):
[...] CSDL [...] is interested in the following observations in support of the coastal application teams modeling work for NOAA products and services:
- Surface water level
- Water level datums, relative and geodetic observations
- Water temperature
- Water salinity
- Water currents
I am primarily interested in datum points in support of navigation products and services; and, where unavailable, interested in the surface water levels to formulate new datums. The challenge of course is to have
searvey
assemble available observations sourced from NOAA, IOC, USGS, etc. into the normalized categories above. In the case of USGS, the number of [potentially] available parameters to sort out from their observation sites looks especially large—so any software API / wrapper that makes that easier, maintainable, etc. should be leveraged:
@brey @pmav99 I can't find the other ticket where we discussed normalization and/or standardization of the outputs. Given the quoted email above, how would you approach adding getter functions? Do we have a template to follow?
from searvey.
So does that mean if we want to add USGS data we (for now) just need to return the raw output we get from their API? In this case, is it really meaningful to have a wrapper around USGS dataretrieval package? Because they're already returning a dataframe
from searvey.
Today I was exploring using dataretrieval
package for obtaining USGS datasets. It seems that dataretrieval
removes a lot of metadata from the NWIS response during the creation of data tables. For example when getting the "instantaneous value" record for a station we might have something like the following as response from the web API:
{
"name": "USGS:0148472405:00035:00000",
"sourceInfo": {
"geoLocation": {
"geogLocation": {
"latitude": 38.1389722,
"longitude": -75.18363889,
"srs": "EPSG:4326"
},
"localSiteXY": []
},
"note": [],
"siteCode": [
{
"agencyCode": "USGS",
"network": "NWIS",
"value": "0148472405"
}
],
"siteName": "BUNTINGS GUT NEAR CEDARTOWN, MD",
"siteProperty": [
{
"name": "siteTypeCd",
"value": "ST-TS"
},
{
"name": "hucCd",
"value": "02040303"
},
{
"name": "stateCd",
"value": "24"
},
{
"name": "countyCd",
"value": "24047"
}
],
"siteType": [],
"timeZoneInfo": {
"daylightSavingsTimeZone": {
"zoneAbbreviation": "EDT",
"zoneOffset": "-04:00"
},
"defaultTimeZone": {
"zoneAbbreviation": "EST",
"zoneOffset": "-05:00"
},
"siteUsesDaylightSavingsTime": true
}
},
"values": [
{
"censorCode": [],
"method": [
{
"methodDescription": "",
"methodID": 234506
}
],
"offset": [],
"qualifier": [
{
"network": "NWIS",
"qualifierCode": "P",
"qualifierDescription": "Provisional data subject to revision.",
"qualifierID": 0,
"vocabulary": "uv_rmk_cd"
}
],
"qualityControlLevel": [],
"sample": [],
"source": [],
"value": [
{
"dateTime": "2022-12-06T12:00:00.000-05:00",
"qualifiers": [
"P"
],
"value": "1.2"
}
]
}
],
"variable": {
"noDataValue": -999999.0,
"note": [],
"oid": "45807109",
"options": {
"option": [
{
"name": "Statistic",
"optionCode": "00000"
}
]
},
"unit": {
"unitCode": "mph"
},
"valueType": "Derived Value",
"variableCode": [
{
"default": true,
"network": "NWIS",
"value": "00035",
"variableID": 45807109,
"vocabulary": "NWIS:UnitValues"
}
],
"variableDescription": "Wind speed, miles per hour",
"variableName": "Wind speed, mph",
"variableProperty": []
}
}
But the resulting data set only returns (examples not from the same station!):
00060 00060_cd site_no 00065 00065_cd
datetime
2022-12-06 08:45:00-05:00 4.48 P 0148471320 3.72 P
Does this make sense then to instead use web API directly (going back to the original question!)? Since in any case we need to create tables of constants, such as parameter codes, quality codes, etc. It may be that dataretrieval
doesn't really take much heavy lifting off of searvey
development in the end.
There's also the delay in fixing issues in dataretrieval
and waiting for it to get to conda
for searvey
to depend on it. Right now, for example, there are some issues when retrieving data from stations with different time zones that results in an exception.
from searvey.
After discussion the comment above with @pmav99 during data retrieval meeting, we decided it makes more sense to start calling the NWIS API directly to start with, and just use our own mapping of response to data frames.
from searvey.
There are a variety of Python packages that use the USGS API. I set up a discussion among the authors here: mroberge/hydrofunctions#79
- Taher Chegini @cheginit just added some elegant code to his HyRiver package that deals with timezone information from the USGS metadata.
- my hydrofunctions requests data, stores the original response, and formats it into dataframes upon request. My plan is to offer more ways to organize the dataframe in the future: a 'tidy' format, wide, and multiindex.
from searvey.
Thank you @mroberge this information is very helpful.
from searvey.
I just realized that the get_iv
metadata item in the returned tuple can include information about the parameter code or site. I though that the metadata only includes header or url information, but if the right arguments are passed, more information is extracted and included. I think the main question now is how much we want to keep the data from REST API untouched?
For IOC and COOPS stations we pretty much return whatever is provided by the web services, but for USGS NWIS we have to do so transformation either way. Can we then just take output of dataretrieval
(or even one of the other packages from #14 (comment)) to be the main source of data and just return that data with minimal changes to fit searvey
API conventions?
from searvey.
@brey, @pmav99, @saeed-moghimi-noaa, if you haven't already, I highly recommend reading this summary by @mroberge: mroberge/hydrofunctions#79. (mentioned in #14 (comment))
After that I'd like us to re-evaluate why we want to add USGS support within searvey
. My take is:
searvey
is a one-stop shop for [original] measurement data used for validating coastal ocean modelsdataretrieval
returns the data in a form very close to original source (NWIS REST API)- We don't want to reimplement the wheel
I'm just thinking out load, but given above (as opposed to what I said to @pmav99 the other day) maybe it makes more sense to follow the original plan of using dataretrieval
package, and just assume the return values are the original data from source.
What do you think?
from searvey.
What you suggested make sense. I am fine with that. However I will let @brey and @pmav99 as the lead developers of searvey to have the final say.
Thanks,
from searvey.
After the discussion with @SorooshMani-NOAA few days back and seeing his progress (!) using dataretrieval
let's go with that. Thanks Soroosh.
I will close this issue and we can open more specific ones if needed during the implementation.
from searvey.
Related Issues (20)
- Change the documentation URL on GitHub to point to the stable version HOT 4
- pandas DeprecationWarning - passing plain strings to `pd.read_html`
- dropna in results
- Update github action for creating PyPi package
- Cannot retrieve tidal predictions HOT 1
- Avoid function calls as default arguments in functions HOT 1
- Adding functionality to download obs in format compatible with OCS's autoval code HOT 15
- Need to be looked at repos / orgs
- Station activity status
- Regarding non-tidal stations
- Make the example Notebooks binder friendly HOT 2
- Including support for Army Corp WL data HOT 3
- CERA_worklow.ipynb is broken HOT 5
- windows and multiprocessing HOT 5
- usgs tests are flaky HOT 16
- USGS stations filter by region HOT 4
- unexpected data value raises an error for IOC HOT 3
- Refresh cached metadata HOT 3
- IOC: hour component is ignored in `endtime`
- Adapt COOPS to have an IOC-like API HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from searvey.