oceanmodeling / searvey Goto Github PK
View Code? Open in Web Editor NEWSea state observational data retrieval
Home Page: https://searvey.readthedocs.io/en/stable/
License: GNU General Public License v3.0
Sea state observational data retrieval
Home Page: https://searvey.readthedocs.io/en/stable/
License: GNU General Public License v3.0
Including new data source: See here: https://www.kristinelarson.net/portfolio/measuring-water-levels-with-gps/
Suggested to look at by John Galetzka - NOAA Federal
Line 150 in 0b0e372
The above referenced normalization creates a problem when trying to save to (geo)json
.
Ex.
from searvey import ioc
import shapely
geo_box = shapely.geometry.box(-25.0, 56.0, -12.0, 74.0)
tg = ioc.get_ioc_stations(region=geo_box)
tg.dtypes
tg.to_file('stations.json')
Setting
tg['country']=tg.country.astype('str')
solves the problem.
Maybe this should be included in the final output on searvey
side? I am not sure why category
dtype
is needed.
StormEvents.coops.coops_stations()
Hi Guys, I have been able to successfully read both the COOPS and IOS GeoDataFrames and display the stations on a map (yellow is COOPS).
I noticed that some variable names are different in the data sets. E.g., the station name in the COOPS dataset is named "name" and in the IOC dataset "location". Will this be unified at some point (naming convention)?
I will start with some filtering of attributes soon and I assume this could be an interesting topic to discuss.
Best!
The IOC stations have links to upstream providers that may provide datum information. E.g.
https://www.ioc-sealevelmonitoring.org/station.php?code=AN15
which reference:
and the links contain links to monthly sea data: https://psmsl.org/data/obtaining/rlr.monthly.data/2098.rlrdata
We should further investigate this
Hey All,
I am trying to retrieve the USGS stations for the CERA workflow and I am just running the following two lines (in a python script, python version is 3.10):
from searvey import usgs, stations
usgs_stations = usgs.get_usgs_stations()
I get a pretty lengthy error log - any idea what that could be? Thanks!
Originally posted by @carolakaiser in #79 (comment)
Hi All,
Would it be possible to retrieve the sensor types for the IOC stations as an metadata attribute (with a list of sensors) for each station?
@gseroka @allenea @carolakaiser @SorooshMani-NOAA
All
Here I am adding some communication between @allenea @carolakaiser and others on how to access Army Corps WL data and its relative vertical datum.
It would be great if we continue this discussion here to have good documentation up until we find a brave volunteer to implement the support in Searvey.
Please tag other here as well,
Thanks
-Saeed
Update
I revised the quoted email formatting style to be easier read:
FYI... I followed up with the US Army Corp (for southern region location) and the only way to know those offsets is by looking at the [dated] website. It looks like most gages are going to be in NGVD29 and the numbers seem to line up with the VDatum offset to NAVD88. However, that likely doesn't apply for all gages from what I've been told. So it may take some reverse engineering to determine what the offset to get to NAVD88 is and from there you could use VDatum to figure out which datum correction gives you that offset value. Or we'll have to manually collect those offsets and store them somewhere that they can be referenced.
Eric
Thank you. That's what we'll do. I just wanted to make sure there was no better streamlined approach for determining that information.
Best,
Eric
If memory serves correct NGVD29 is predominant. You can navigate to the gage on the site and it should tell you what datum is used.
lb
Hi Lysanias,
This is very informative. Thank you for providing such a quick reply.
I am noticing one key piece of information missing. This piece about the "Adjustment for vertical datum NAVD88 (2009.55): -0.96 ft." is important because I don't see in any of these options the vertical datum which the gage collects data in isn't explicitly stated. Maybe all gages are NGVD29?
https://rivergages.mvr.usace.army.mil/WaterControl/stationinfo2.cfm?sid=01300
If that is the case, can you confirm or provide documentation that supports that assumption? I used VDatum and it shows the offset to be -0.961. So not exact but pretty close.
Is this information in the API/web service anywhere or is there documentation that provides this information?
V/r,
Eric
Eric,
RiverGagesThe only parameter that may need to change is the site id unless you are pulling time series data. GetVariableInfo, GetSites, and GetSiteInfo only rely on the “site”parameter while GetValues, and GetFValues require the additional parameters to be set to pull the correct data. All parameters require a value for all methods even if they are not used by the method because of how ColdFusion components operate when invoked.
The first link is an example of how to query for site information. The other methods query for meta-data and observations:
Below are all the "Meth"-ods for the RiverGages WaterML webservice. All the URL parameters are always required for any call; follow the above example as a template.
- GetValues
- GetVariableInfo
- GetSites
- GetSiteInfo
- GetFValues (same as GetFalues except it only returns observed/measured values versus forecast values)
lb
Eric,
We have a web service that provides temporal and meta-data. I will send you the details.
lb
Hi,
I was wondering if you had an API or some sort of way for us to easily look up things like lat/lon, gage elevation, gage datum, offset to NAVD88 to other offsets that you have collected and provided via the website. Do we have to manually go through the website and record these offsets?
We have learned about this method for retrieving data, however, we aren't sure what datum this data is in or what offsets we need to make to correctly adjust the water level datum.
Can you please advise?
Eric
Currently there are 3 different functions for retrieving the COOPS station data:
coops_stations()
coops_stations_within_region()
coops_stations_within_bounds()
I think that these functions could be merged in a single function with something like the following signature:
def coops_stations(
region: Optional[Union[Polygon, MultiPolygon]] = None,
lon_min: Optional[float] = None,
lon_max: Optional[float] = None,
lat_min: Optional[float] = None,
lat_max: Optional[float] = None,
station_status: Optional[StationStatus] = None
) -> pd.DataFrame
The idea is that:
region
nor the BBox are specified, then all the stations are returned. This should be the default behavior.region
is specified then the stations outside of the region are filtered outregion
and BBox are specified then an exception is raised@zacharyburnett what do you think?
Function calls should be avoided as default aguments.
import datetime
import time
def print_isotime(timestamp: datetime.datetime = datetime.datetime.now()) -> None
print(timestamp.isoformat())
for i in range(3):
print_isotime()
time.sleep(1)
The above snippet will print:
2023-03-03T12:15:00.714514
2023-03-03T12:15:00.714514
2023-03-03T12:15:00.714514
I.e the timestamps do not change. This happens due to early bounding of function arguments]and is a well known gotcha
We use this anti-pattern in IOC and USGS. I introduced it and I guess that @SorooshMani-NOAA just followed my lead.
Line 252 in f7763c6
Line 280 in f7763c6
In practical terms this is a problem, because in a long running script, after midnight the date will not be updated.
There are several ways to avoid this. The most common one is to use None
as the default value and make the function call in the body of the function. E.g.:
def print_time(timestamp: datetime.datetime | None = None) -> None
if timestamp is None:
timestamp = datetime.datetime.now()
print(timestamp.isoformat())
Nevertheless, this is not ideal when the code is being statically typed, because you change the signature. Anyhow, we should remove it.
As far as the abstraction architecture goes, I propose that the data getters be abstracted into at least 2 layers:
geopandas.GeoDataFrame
or xarray.Dataset
.The key point here is that one data source may contribute to multiple data products, and one data product might be contributed to by multiple data sources. Separating into these distinct layers would make it much easier to maintain data sources and add new sources and / or products in the future.
In IOC code at:
Lines 327 to 344 in 0fc8bae
is the RateLimit
object shared between all the threads or a new copy is created for each one? If the latter is true, then it might not work as intended.
This is in fact the EMODNET
function custom made for a subset of the available data.
Since critech
has another API, soon to be available from Azure, we should generalise this one and later develop a new one for critech
.
Notes from today's meeting
state
columnget_usgs_stations()
should contain a column with the data_type/variable.get_usgs_stations(region)
should retrieve all the stations and use region to filter them. Now it uses the "region" USGS API and this is limiting the number of stations we can retrieve.searvey.stations.get_stations()
IOC has a lot of stations. Each station sends data at a different moment. When you aggregate all the data in a single xr.Dataset
, you end up with a huge number of timestamps, which means that the dataset has a huge number of NaNs which also means that we need a huge amount of RAM.
In order to reduce the RAM requirements, we decided to truncate the seconds from the timestamps. This helps a lot with the size and it should not change significantly the signal. Nevertheless, if there are stations that return data multiple times per minute, then we end up with duplicate timestamps. At the moment we are dropping these stations.
Instead of that we should:
This is the relevant code:
Lines 233 to 241 in 08941a3
Both the IOC and COOPS Notebooks work fine on Binder
. The CERA one produces some errors and the USGS_data crashes the kernel most likely due to the memory constrain. Note that binder has a ceiling of 2GB of memory.
@SorooshMani-NOAA Can you have a look and push some new versions of the notebooks? There is no need of a release since the Notebooks are read from the repo.
Hey All,
As discussed in our meeting, we have been asked to explore the possibility of adding USGS stations for the US Pacific Coast. This will very likely be a larger number of stations and your service to retrieve USGS stations is very helpful. Thank you for that!
Would it be possible to get any information about the vertical datum for each USGS station? The USGS website has this info in but it requires manual work to get this info. Any way to get this in an automated way would be very much appreciated!
Thank you.
While working on #28 I noticed some random CI failures while trying to build the docs. The error was something like this:
Exception occurred:
File "/opt/hostedtoolcache/Python/3.10.5/x64/lib/python3.10/site-packages/sphinx/builders/html/__init__.py", line 1125, in dump_search_index
os.replace(searchindexfn + '.tmp', searchindexfn)
FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/searvey/searvey/docs/build/html/searchindex.js.tmp' -> '/home/runner/work/searvey/searvey/docs/build/html/searchindex.js'
It seems that there is a pending sphinx issue: sphinx-doc/sphinx#10111
Until the issue gets fixed and/or a workaround gets posted, I don't think that there is anything we can do about it. I am creating this issue just to raise the awareness about it.
FWIW, I don't think we need to block any PR if we get such an error.
This test is currently failing:
Lines 90 to 105 in 2a510e9
with:
=============================================================================================================== FAILURES ===============================================================================================================
__________________________________________________________________________________________ test_get_usgs_station_data[truncate_seconds=False] __________________________________________________________________________________________
tests/usgs_test.py:105: in test_get_usgs_station_data
assert len(df) == no_records
E assert 18 == 11
The dataframe we get is:
df = usgs.get_usgs_station_data(
usgs_code="15484000",
endtime=datetime.date(2022, 10, 1),
period=1,
truncate_seconds=False,
)
value qualifier unit name
site_no datetime code option
15484000 2022-09-30 08:00:00+00:00 00065 7.32 A ft Gage height, feet
2022-09-30 16:30:00+00:00 00065 7.31 A ft Gage height, feet
2022-10-01 00:00:00+00:00 00065 7.3 A ft Gage height, feet
2022-10-01 02:30:00+00:00 00065 7.28 A ft Gage height, feet
2022-10-01 04:15:00+00:00 00065 7.29 A ft Gage height, feet
2022-10-01 19:00:00+00:00 00065 7.27 A ft Gage height, feet
2022-10-02 03:15:00+00:00 00065 7.26 A ft Gage height, feet
2022-09-30 08:00:00+00:00 00065 ak dot&pf bridge datum 639.04 P ft Gage height, feet
2022-09-30 16:30:00+00:00 00065 ak dot&pf bridge datum 639.03 P ft Gage height, feet
2022-10-01 00:00:00+00:00 00065 ak dot&pf bridge datum 639.02 P ft Gage height, feet
2022-10-01 02:30:00+00:00 00065 ak dot&pf bridge datum 639.0 P ft Gage height, feet
2022-10-01 04:15:00+00:00 00065 ak dot&pf bridge datum 639.01 P ft Gage height, feet
2022-10-01 19:00:00+00:00 00065 ak dot&pf bridge datum 638.99 P ft Gage height, feet
2022-10-02 03:15:00+00:00 00065 ak dot&pf bridge datum 638.98 P ft Gage height, feet
2022-09-30 08:25:00+00:00 72414 ak dot&pf bridge datum 632.3 P ft Streambed elevation at measurement point, abov...
2022-09-30 08:55:00+00:00 72414 ak dot&pf bridge datum 632.21 P ft Streambed elevation at measurement point, abov...
2022-09-30 12:55:00+00:00 72414 ak dot&pf bridge datum 632.12 P ft Streambed elevation at measurement point, abov...
2022-09-30 14:55:00+00:00 72414 ak dot&pf bridge datum 632.38 P ft Streambed elevation at measurement point, abov...
[18 rows x 4 columns]
I guess that USGS started to return the additional 7 rows recently. Judging from the output, the extra rows that cause the issue are the first 7 ones, i.e. the ones without an option
.
Is it normal for USGS to add data like this?
Anyhow, this specific test does not seem to be testing what it is supposed to test (i.e. truncate_seconds), because the timestamps don't contain any seconds anyhow. We should probably choose a different combination of usgs_station
and/or time period. Furthermore, it will probably be more robust if we make the assertion like this:
df = usgs.get_usgs_station_data(..., truncate_seconds=False)
df_trunc = usgs.get_usgs_station_data(..., truncate_seconds=True)
assert len(df) > len(df_trunc)
seconds_trunc = df.index.to_frame().datetime.dt.second
assert (seconds_trunc == 0).all()
assert df_with
pinging @SorooshMani-NOAA
PS. There are more tests that are failing, probably due to USGS returning extra data but I didn't dig in deeper in those.
FAILED tests/usgs_test.py::test_get_usgs_station_data[truncate_seconds=True] - assert 18 == 11
FAILED tests/usgs_test.py::test_get_usgs_station_data[truncate_seconds=False] - assert 18 == 11
FAILED tests/usgs_test.py::test_get_usgs_data[truncate_seconds=True] - AssertionError: assert 226 == 222
FAILED tests/usgs_test.py::test_get_usgs_data[truncate_seconds=False] - AssertionError: assert 226 == 222
FAILED tests/usgs_test.py::test_get_usgs_station_data_by_string_enddate - assert 30 == 17
As discussed yesterday in the meeting, it might be a good idea to include datum information in the COOPS metadata/data.
I only had a brief look, but I think that in order to retrieve the datum of each station you need to make an extra HTTP request for each station. E.g.
https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations/1611400/datums.json
That being said, if I understand this correctly, the data we retrieve from COOPS are being requested using the STND datum (i.e. Station Datum) by default which I believe is 0 for all the stations. This might be a bit problematic when you request data from multiple stations because the values will not be comparable across stations.
Another issue, that we might have is what happens when you request historical data. Some stations might be referencing a different epoch, which means that we might need to retrieve the "supersededdatums" instead. E.g.
https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations/2695540/supersededdatums.xml
The os.sched_getaffinity(0)
used in multi.py
is only available for linux
(not osx
nor windows
).
https://docs.python.org/3/library/os.html#interface-to-the-scheduler
Right now since all the functions return basic Dataframe
and Dataset
objects, it is not possible to chain call some methods for easier access. The following snippet demonstrates what I mean
stations = coops.coops_stations_within_region(region=bbox)
data = coops.coops_product_within_region(
'water_level', region=bbox, start_date=start, end_date=end)
In this case stations dataframe has some information not included in data
object, e.g. name
. If we have a lightweight wrapper around these return values, then some additional functionality can be provided to do something like this:
stations = coops.coops_stations_within_region(region=bbox)
data = stations.get_product('water_level', start_date=start, end_date=end)
As well as more sophisticated functionality such as normalizing, vdatum, etc. In this case the raw underlying data can be provided by a .raw
or .data
method or property of the light wrapper.
Of course I understand this goes against the idea of providing light API to get raw data for the user. So maybe this functionality could go to a downstream package that lightly wraps searvey
.
Readme should contain some basic examples of using the API
In stormevents
there's this assumption in one of the automated tests that if an empty geometry is passed to the coops_stations_within_region
, the return data frame will be empty. Right now that test is broken because the logic in searvey
is that:
Lines 824 to 827 in d654ad6
@pmav99, @carolakaiser in your opinions, does it make sense to keep searvey
as it is and just update the stormevents
test or should I modify how coops_stations_within_region
works in searvey
?
What was the use-case for adding this code @carolakaiser? Do you have a case where if you pass empty geometry then you want to get all of the stations? Or are you passing None
instead of empty geometry to get all the stations? If you're usecase is covered by passing None
, then we can change this to if region is None:
then both of our use-cases are covered.
What do you think?
Hi Zach,
Laura and I had a chance to go through the stormevents package. We wanted to send an email to you to describe our current data gaps in case stormevents would like to tackle some of these issues. Laura mentioned that it can be difficult to obtain some USGS and ACE data, especially streamflow data. See the attached spreadsheets for such data. Having this data accessible, and in the same format as co-ops data, would be a big help. In addition, it would be nice to have more unified and standardized data (e.g., similar metadata, similar datum, etc, similar units, similar variable names, etc). We saw that searvey is working on data standardization and were wondering if stormevents is working on data standardization too.
Just some thoughts! Sorry for the tardiness of this email - my internet was down this morning!
Best,
Josh
ACE_stationlist.csv
USGS_stationlist.csv
When COOPS stations are in maintenance the API might respond with corrupted data for those stations (e.g. duplicate timestamp). This causes an exception when in searvey
code we are trying to create a Dataset
from the Dataframe
.
Note that this is true even for the historical data of the stations. We need to distinguish the stations in normal vs maintenance mode and automatically filter stations under maintenance (maybe throw warnings too?) in order to avoid Python exceptions.
Example of a response Dataframe
that cause problem:
v s f q t
2018-08-30 00:00:00 -0.788 0.033 0,0,0,0 v
2018-08-30 00:00:00 0.186 0.042 0,0,0,0 v
2018-08-30 00:06:00 -0.812 0.039 0,0,0,0 v
2018-08-30 00:06:00 0.218 0.045 0,0,0,0 v
2018-08-30 00:12:00 -0.835 0.033 0,0,0,0 v
... ... ... ... ..
2018-09-17 23:48:00 0.661 0.073 0,0,0,0 v
2018-09-17 23:54:00 0.701 0.104 0,0,0,0 v
2018-09-17 23:54:00 -0.442 0.054 0,0,0,0 v
2018-09-18 00:00:00 0.699 0.107 0,0,0,0 v
2018-09-18 00:00:00 -0.449 0.061 0,0,0,0 v
And the line of code is
Line 365 in 0fc8bae
Thanks for the great tool. I just want to report that the download process for water temperature went through nicely:
However it generated error on netcdf file output:
In [1]: run data_region.py
> get data ... water_temperature
> plot figure for water_temperature
> save netcdf file
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File /mnt/c/Users/Saeed.Moghimi/Documents/work/linux_working/00-working/15-stormevents/data_region.py:48, in <module>
46 if True:
47 print(' > save netcdf file')
---> 48 data.to_netcdf(var + '.nc')
51 if True:
52 import geopandas
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/core/dataset.py:1901, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
1898 encoding = {}
1899 from ..backends.api import to_netcdf
-> 1901 return to_netcdf(
1902 self,
1903 path,
1904 mode,
1905 format=format,
1906 group=group,
1907 engine=engine,
1908 encoding=encoding,
1909 unlimited_dims=unlimited_dims,
1910 compute=compute,
1911 invalid_netcdf=invalid_netcdf,
1912 )
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/backends/api.py:1072, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1067 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
1068 # to avoid this mess of conditionals
1069 try:
1070 # TODO: allow this work (setting up the file for writing array data)
1071 # to be parallelized with dask
-> 1072 dump_to_store(
1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
1074 )
1075 if autoclose:
1076 store.close()
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1116 if encoder:
1117 variables, attrs = encoder(variables, attrs)
-> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/backends/common.py:261, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
258 if writer is None:
259 writer = ArrayWriter()
--> 261 variables, attributes = self.encode(variables, attributes)
263 self.set_attributes(attributes)
264 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/backends/common.py:350, in WritableCFDataStore.encode(self, variables, attributes)
347 def encode(self, variables, attributes):
348 # All NetCDF files get CF encoded by default, without this attempting
349 # to write times, for example, would fail.
--> 350 variables, attributes = cf_encoder(variables, attributes)
351 variables = {k: self.encode_variable(v) for k, v in variables.items()}
352 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/conventions.py:859, in cf_encoder(variables, attributes)
856 # add encoding for time bounds variables if present.
857 _update_bounds_encoding(variables)
--> 859 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
861 # Remove attrs from bounds variables (issue #2921)
862 for var in new_vars.values():
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/conventions.py:859, in <dictcomp>(.0)
856 # add encoding for time bounds variables if present.
857 _update_bounds_encoding(variables)
--> 859 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
861 # Remove attrs from bounds variables (issue #2921)
862 for var in new_vars.values():
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/conventions.py:279, in encode_cf_variable(var, needs_copy, name)
277 var = maybe_default_fill_value(var)
278 var = maybe_encode_bools(var)
--> 279 var = ensure_dtype_not_object(var, name=name)
281 for attr_name in CF_RELATED_DATA:
282 pop_to(var.encoding, var.attrs, attr_name)
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/conventions.py:234, in ensure_dtype_not_object(var, name)
231 inferred_dtype = np.dtype(float)
232 fill_value = inferred_dtype.type(np.nan)
--> 234 data = _copy_with_dtype(data, dtype=inferred_dtype)
235 data[missing] = fill_value
236 else:
File ~/miniconda3/envs/adcircpy_env/lib/python3.10/site-packages/xarray/conventions.py:195, in _copy_with_dtype(data, dtype)
189 """Create a copy of an array with the given dtype.
190
191 We use this instead of np.array() to ensure that custom object dtypes end
192 up on the resulting array.
193 """
194 result = np.empty(data.shape, dtype)
--> 195 result[...] = data
196 return result
TypeError: float() argument must be a string or a real number, not 'NAType'
In [2]:
Here is my code:
from shapely.geometry import box
from datetime import datetime,timedelta
from stormevents.coops import coops_product_within_region
import geopandas
from matplotlib import pyplot as plt
#vars = ['water_level','water_temperature','wind','salinity']
#vars = ['wind','salinity']
vars = ['water_level','water_temperature']
vars = ['water_temperature']
#hsofs bnd
#bnd = -99.0,5,-52.8,46.3 #US mainland
bnd = -98.5 ,-84.5,24.,31.5 #GulfM
bnd = -74.5 , -73.55 , 40.35 , 41.2 #san_newyork
now_ = datetime.now()
past_ = now_ - timedelta(0.25)
for var in vars:
print (' > get data ...', var)
data = coops_product_within_region(
product = var,
start_date = past_.isoformat(),
end_date = now_.isoformat(),
region=box(*bnd)
)
print (' > plot figure for ' , var)
plt.figure()
countries = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
figure, axis = plt.subplots(1, 1)
figure.set_size_inches(12, 12 / 1.61803398875)
countries.plot(color='lightgrey', ax=axis, zorder=-1)
plt.scatter(x=data.x,y=data.y,c=data.v.mean(axis=1))
plt.colorbar()
plt.xlim(*bnd[:2])
plt.ylim(*bnd[2:])
axis.set_title(var)
plt.savefig(var + '.png')
plt.close('all')
if True:
print(' > save netcdf file')
data.to_netcdf(var + '.nc')
Please look into this.
The IOC source has a dedicated REST web API for retrieving data see -> http://www.ioc-sealevelmonitoring.org/service.php
We need to test and include a corresponding function.
When filtering the COOPS stations for being active like this:
coops_active = stations_coops[stations_coops['is_active']] # comes back as True/False
there are some results that show a station as active but those stations do either not exist on the https://tidesandcurrents.noaa.gov website or deliver a "No data was found" result.
What API do you use to get the active status (is it the NOAA OpenDap server or anything like that?)
Examples:
https://tidesandcurrents.noaa.gov/waterlevels.html?id=8632837
https://tidesandcurrents.noaa.gov/waterlevels.html?id=8574728
@zacharyburnettNOAA @SorooshMani-NOAA
Adding USGS timeseries from https://stn.wim.usgs.gov/FEV/#FlorenceSep2018
This is a code I got from @flackdl . See this repo developed by Danny as well: https://github.com/flackdl/cwwed .
import os
import re
import sys
import errno
import requests
from io import open
EVENT_ID_MATTHEW = 135 # default
# capture an event_id from the command line, defaulting to Matthew
EVENT_ID = sys.argv[1] if len(sys.argv) > 1 else EVENT_ID_MATTHEW
# file type "data"
# https://stn.wim.usgs.gov/STNServices/FileTypes.json
FILE_TYPE_DATA = 2
# deployment types
# https://stn.wim.usgs.gov/STNServices/DeploymentTypes.json
DEPLOYMENT_TYPE_WATER_LEVEL = 1
DEPLOYMENT_TYPE_WAVE_HEIGHT = 2
DEPLOYMENT_TYPE_BAROMETRIC = 3
DEPLOYMENT_TYPE_TEMPERATURE = 4
DEPLOYMENT_TYPE_WIND_SPEED = 5
DEPLOYMENT_TYPE_HUMIDITY = 6
DEPLOYMENT_TYPE_AIR_TEMPERATURE = 7
DEPLOYMENT_TYPE_WATER_TEMPERATURE = 8
DEPLOYMENT_TYPE_RAPID_DEPLOYMENT = 9
# create output directory
output_directory = 'output'
try:
os.makedirs(output_directory)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
# fetch event data files
files_req = requests.get('https://stn.wim.usgs.gov/STNServices/Events/{}/Files.json'.format(EVENT_ID))
files_req.raise_for_status()
files_json = files_req.json()
# fetch event sensors
sensors_req = requests.get('https://stn.wim.usgs.gov/STNServices/Events/{}/Instruments.json'.format(EVENT_ID))
sensors_req.raise_for_status()
sensors_json = sensors_req.json()
# filter sensors down to barometric ones
barometric_sensors = [sensor for sensor in sensors_json if sensor.get('deployment_type_id') == DEPLOYMENT_TYPE_BAROMETRIC]
# print file urls for barometric sensors for this event
for file in files_json:
if file['filetype_id'] == FILE_TYPE_DATA and file['instrument_id'] in [s['instrument_id'] for s in barometric_sensors]:
file_url = 'https://stn.wim.usgs.gov/STNServices/Files/{}/item'.format(file['file_id'])
# fetch the actual file
file_req = requests.get(file_url, stream=True)
# capture the filename from the headers so we can save it appropriately
match = re.match('.*filename="(?P<filename>.*)"', file_req.headers['Content-Disposition'])
if match:
filename = match.group('filename')
else:
filename = '{}.unknown'.format(file['file_id'])
print('COULD NOT FIND "filename" in header, saving as {}'.format(filename))
print('{}\t\t({})'.format(filename, file_url))
with open('{}/{}'.format(output_directory, filename), 'wb') as f:
for chunk in file_req.iter_content(chunk_size=1024):
f.write(chunk)
Hey All,
I have a shapefile that I would like to use to filter the USGS stations by region.
The shapefile can be downloaded here: https://tinyurl.com/4h6evhfk
I convert the shapefile to a geopandas framework. This is read in successfully (I can see a matplotlib plot). However the retrieval for the USGS stations for this region fails. Could you please help?
My code is this (please note, this is a Python environment under Windows):
CERA_test.py.zip
Spoke with Saeed today about linking searvey with OCS's autoval (https://github.com/noaa-ocs-modeling/autoval), using the Pacific SCHISM model as a test case. Autoval uses csdllib (https://github.com/noaa-ocs-modeling/csdllib) to download CO-OPS observed water level data.
The specific piece of code in csdllib that does this is
https://github.com/noaa-ocs-modeling/csdllib/blob/master/csdllib/data/coops.py -- the getData function (see below).
A sample URL that it downloads is https://opendap.co-ops.nos.noaa.gov/axis/webservices/waterlevelrawsixmin/plain/response.jsp?stationId=8410140&beginDate=20230302%2018:06&endDate=20230310%2012:00&datum=MSL&unit=0&timeZone=0&Submit=Submit
8410140 1 WL 2023-03-02 18:06 -1.868 0.017 1 0 0 0
8410140 1 WL 2023-03-02 18:12 -1.887 0.007 1 0 0 0
8410140 1 WL 2023-03-02 18:18 -1.901 0.010 0 0 0 0
8410140 1 WL 2023-03-02 18:24 -1.905 0.008 1 0 0 0
8410140 1 WL 2023-03-02 18:30 -1.910 0.010 1 0 0 0
8410140 1 WL 2023-03-02 18:36 -1.914 0.009 1 0 0 0
8410140 1 WL 2023-03-02 18:42 -1.923 0.009 0 0 0 0
Would you be able to add capability to searvey to download CO-OPS (and eventually IOC and USGS) observations into this format?
def getData (stationID, dateRange, tmpDir=None,
product='waterlevelrawsixmin', datum='MSL', units='meters', verbose=False):
Allows for downloading the observations from NOAA's
Center for Operational Oceanographic Products and Services (COOPS)
via OpenDAP server http://opendap.co-ops.nos.noaa.gov .
Args:
stationID (str): 7 character-long CO-OPS station ID.
dateRange (datetime, datetime): start and end dates of retrieval.
Trying python -m pip install -e .
gives:
ERROR: Command errored out with exit status 1:
command: /Users/brey/miniconda3/envs/searvey/bin/python /Users/brey/miniconda3/envs/searvey/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py prepare_metadata_for_build_wheel /var/folders/rz/63vzrg3547x1_d0z04_vrb_00000gn/T/tmpsva1lntj
cwd: /Users/brey/GitHub/searvey
Complete output (14 lines):
Traceback (most recent call last):
File "/Users/brey/miniconda3/envs/searvey/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
main()
File "/Users/brey/miniconda3/envs/searvey/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/Users/brey/miniconda3/envs/searvey/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 164, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/private/var/folders/rz/63vzrg3547x1_d0z04_vrb_00000gn/T/pip-build-env-8dcboa1n/overlay/lib/python3.9/site-packages/poetry/core/masonry/api.py", line 43, in prepare_metadata_for_build_wheel
poetry = Factory().create_poetry(Path(".").resolve(), with_dev=False)
File "/private/var/folders/rz/63vzrg3547x1_d0z04_vrb_00000gn/T/pip-build-env-8dcboa1n/overlay/lib/python3.9/site-packages/poetry/core/factory.py", line 43, in create_poetry
raise RuntimeError("The Poetry configuration is invalid:\n" + message)
RuntimeError: The Poetry configuration is invalid:
- Additional properties are not allowed ('group' was unexpected)
Hi All,
I have been trying to get the metadata for the IOC stations and I am currently experiencing an error when trying to set the activity_threshold. As of today (June 27), there seems to be at least one data record in the IOC stations that has not a "date" in the required field but a string called 'NA'.
The code snippet in stations.py is currently doing this :
ioc_gdf = ioc_gdf.assign(
delay=pd.concat(
(
ioc_gdf.delay[ioc_gdf.delay.str.endswith("'")].str[:-1].astype(int),
ioc_gdf.delay[ioc_gdf.delay.str.endswith("h")].str[:-1].astype(int) * 60,
ioc_gdf.delay[ioc_gdf.delay.str.endswith("d")].str[:-1].astype(int) * 24 * 60,
)
)
)
I made the following workaround:
def as_int(x):
try:
return int(x)
except:
pass
return -1
# Normalize IOC
# Convert delay to minutes
ioc_gdf = ioc_gdf.assign(
delay=pd.concat(
(
ioc_gdf.delay[ioc_gdf.delay.str.endswith("'")].str[:-1].apply(as_int),
ioc_gdf.delay[ioc_gdf.delay.str.endswith("h")].str[:-1].apply(as_int) * 60,
ioc_gdf.delay[ioc_gdf.delay.str.endswith("d")].str[:-1].apply(as_int) * 24 * 60,
)
)
)
This is working but maybe you have a better coding syntax. I any case, I wanted to bring this to your attention.
Thanks!
We should ensure that the example notebooks are kept in sync with the code
Line 274 in 7e822c1
Setting datetime.date
above overlooks the hours component. I suggest we use pandas
Timestamps for more flexibility.
I have checked out the latest searvey repository (tag v0.2.1) - it works great, thank you!
For the IOC stations, the coordinates are given with a lower precision (2 or 3 decimal places) but sometimes the stations have better coordinates in the details page of the IOC website.
Would it be possible to get high precision coordinates for IOC? I understand that this would require additional requests, so maybe as a code option?
Line 75 in aa5a640
Currently we define a station as active in a very ad-hoc way. COOPS has an upstream definition while a time threshold is used for IOC (see above).
We need to come up with a consistent definition or avoid the tag altogether and let the user subset the data after retrieval.
@SorooshMani-NOAA I tried to execute the CERA_workflow.ipynb
notebook and it seems that it is broken. You get a traceback in cell 7, but the root problem is in cell 6:
usgs_stations_w_param = usgs_stations[usgs_stations.parm_cd.isin(params_of_interest)]
is_active = (datetime.now() - usgs_stations_w_param.end_date) < timedelta(days=3)
usgs_stations_of_interest = usgs_stations_w_param[is_active]
More specifically the end_date
of usgs_station_w_param
is NaT
for all rows, and as a result usgs_stations_of_interest
is an empty dataframe. This causes the next cell which does the actual plotting to fail.
Can you have a look?
@zacharyburnettNOAA @pmav99 @brey what is the plan to move "get data" functionality of StormEvents
to searvey
?
Should there be standardized input parameters and output dataframe
for all data sources in searvey
? In that case, for example, does it make sense to move USGS high water mark to searvey
? As it has event_id
as input and output dataframe
that has non-station locations based on input county and state. Also how should dataframes be indexed?
this package has some info on how CO-OPS fields should be parsed:
https://github.com/GClunies/noaa_coops/blob/b1dce750dbbd710f146f67aa6d4b0a34d81854d7/noaa_coops/noaa_coops.py#L599-L609
Can we have some examples in a Notebook(s) like the ones in my original repo?
Unfortunately I have deleted them so I do hope you have them somewhere.
This is just a place holder. Perhaps we need to move this to a discussion board.
See:
In order to speed up data retrieval, it makes sense to parallelize the requests for the station data - e.g. using async
or multithreading
. Nevertheless I couldn't find any reference WRT rate limiting in the respective websites. Does anyone know what is the maximum number of concurrent requests that we can send to IOC and COOPS without violating their ToS?
https://api.tidesandcurrents.noaa.gov/api/prod/
https://www.ioc-sealevelmonitoring.org/disclaimer.php
Do we know anyone that we could ask?
Eric Allen (NOAA NWS) asked whether searvey queries can filter on [non-]tidal station designation. The designation needs to be rigorously defined and added as metadata to water level stations from NOAA, USGS, etc. I have attached a PDF document from NOAA CO-OPS SOP re: non-tidal classification.
NOAA_Classification_of_a_Water_Level_Station_as_Non-tidal.pdf
@pmav99 the version of searvey
in pyproject.toml
is still 0.2.1
Line 3 in d654ad6
so the PyPI upload failed with
File already exists.
https://github.com/oceanmodeling/searvey/actions/runs/4269408156/jobs/7432567646#step:7:33
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.