usgs-cmg / stglib Goto Github PK
View Code? Open in Web Editor NEWRoutines used by the USGS Coastal/Marine Hazards & Resources Program to process oceanographic time-series data
License: Other
Routines used by the USGS Coastal/Marine Hazards & Resources Program to process oceanographic time-series data
License: Other
The current link is to: https://ioos.github.io/notebooks_demos/other_resources/, which lands you on a 404.
I think the updated link should be: https://ioos.github.io/ioos_code_lab/content/ioos_installation_conda.html
I have been looking for runvecdat2cdf.py. It is mentioned in the documentation, but I can't find it on the repository. I want to process data from a Nortek vector, specifically including analog outputs. But first, I need to find this file. Thank you!
This is something we have done in the past for up-looking ADCP profiles. Now that we are moving on to python, do we still want this? Or do we trim to a depth bin that guarantees all data are included? I'm asking because I don't see such a method here in stglib, maybe I missed it.
If so, there are two approaches:
Vector Analog Input (SeaPoint) data are being output in counts rather than volts, despite unit specification in config file.
@dnowacki-usgs
I'm trying to use methods from utils.py in core in a new ipynb, and followed the syntax in cdf2nc.py, but get errors. Please set me straight- thanks!
ValueError Traceback (most recent call last)
in ()
1 import xarray as xr
2 import stglib
----> 3 from ..core import utils
ValueError: attempted relative import beyond top-level package
I believe you said the documentation was mostly auto generated from comments in the code, so I'm not sure where or how to add content like- "4/19/18- only works on up-looking non-HR instruments". I tried clicking the edit on git-hub link and the file it was looking for was not found.
It would also be good to put something like this in the processing overview: "the only modification needed to the programs is changing sys.path.insert to the local path to stglib. After that's done, the programs should run without needing additional modification."
Sequoia Scientific's LISST instrument is used by a handful of folks at USGS; it would be great to bring it in to stglib. As we start to add it, here's where we can discuss problems, questions, etc.
This should be for all data with a depth. From Ellyn
exo and dwave data that contains the nominal measurement depth- we normally use WATER_DEPTH- initial_instrument_height. You could use a pressure based depth from each if you prefer- just add an attribute to the variable saying where it came from.
The EXO seems to have the ability to remove outputs from the final nc file, via the "drop_vars" keyword in the config file:
https://stglib.readthedocs.io/en/latest/config.html#exo
Could that function be added to other instruments? Specifically, I'd like to output a vector file that does not include velocities - we used the vector simply to log data from an external sensor (the LISST-ABS), so we want the AnalogInput field, but not the velocities.
gbts.read_nerrs() calls a csv file containing atmospheric data to use in correcting pressure data. Not all atmospheric data will be in the same format, but columns could be organized to match a spec if given. I'm trying to run this chunk of code on data from West Falmouth Harbor - the columns of this data are sample #, date time, press, temp,,,,,, (5 empty columns at the end). Since the csv file referenced isn't provided, it's hard to know how to reformat the data.
So, the LISST-ABS requires a short warm up period before the data it collects is good. The docs for the instrument say it should be about 30 seconds, although it seems to be closer to 3 seconds based on inspecting the data. Is there a flag (or could one be added) to remove a certain number of pings at the start of each burst? I imagine this would live in the yaml file for each instrument. To start, it would be great if this flag worked for a vector (we've been using a vector to log the LISST-ABS), although maybe eventually other instruments would also like to have it.
Specifically, I imagine the flag is something like "n_warmup_pings", and it means that stglib ignores the data from the first "n_warmup_pings" during each burst. So the data output by stglib includes pings n_warmup_pings +1 to the total number of pings collected each burst.
@dnowacki-usgs
I'm still trying to use examples from your code and not getting it quite right... In this gist https://gist.github.com/emontgomery-usgs/18741fdd410668beaece811f7340ee9d, you'll see the code I'm trying to use to convert a file Marinna wrote using Xarray to EPIC. The content seems right, except that it doesn't have time as unlimited. I'm sure you've run into this, and know the magic that I've missed.
I didn't find Grand Bay/py in the repo, and it's needed in the aqd_make_press_ac.ipynb example, so it fails on the "import gbts" line. Not sure where plotly comes from, but it doesn't import successfully either on my computer. Perhaps it should go in the env, so python will know it needs it?
Exo variable names were changed in the new KOR software. Updates were made to the functions read_exo
and read_exo_header
to accommodate for these variable name changes. Ds_rename_vars
was not updated and it is limiting QA/QC for those variables with name changes and excluding attributes in the .nc file.
After line 291 (in exo.py), I suggest adding "Chlorophyll_ug_per_L": "Fch_906", "BGA_PE_RFU": "BGAPErfu", "BGA_PE_ug_per_L": "BGAPE",
Thank you!
This came up today - most of the time now we want to make atmospheric pressure corrections to the recorded pressure data from most of the instruments processed by this library. Some might consider such a correction interpretive - depending on the location of the deployment. Either way, the information necessary to make the decision to correct needs to be available to the code. Some of this is already provided for in .yml files.
The aquadopp and EXO .yml file has
zeroed_pressure: 'Yes' # was pressure zeroed before deployment
P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'
The dwave has
P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'
and should have a zeroed pressure note.
We should add the date, time, time zone and location the zeroing was done, if done, so that the pressure can be looked up, and perhaps even the local measurement of atmospheric pressure with the tag of the weather station used (Chatham, say is KCQX).
Tell me what you think!
I am processing DWave data,
runrskcdf2nc.py (which calls cdf_to_nc) generates what should be an EPIC compliant file and that file contains time_cf, time, time_2, epic_time and epic_time2.
In that configuration, we don't need epic_time*
One could argue that going forward, we should just produce files with time only, in CF convention. The EPIC times are a pain in python, though, these are burst files (time, sample) that get rejected by xarray anyway: 'time' has more than 1-dimension and the same name as one of its dimensions ('time', 'sample'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.
It's not clear how this is happening. Maybe because in cdf2nc.py both ds = utils.create_epic_times(ds)
and ds = utils.create_2d_time(ds)
are applied? We could comment out ds = utils.create_epic_time(ds)
on line 42, but I'll bet there is too much other code dependent on that. So we may want some keyword argument to do this.
Marinna
Processing RBR D|wave burst pressure data using runrskcsv2cdf.py script fails if there are incomplete (or irregular) bursts, except when it is the last burst in the burst.txt file. Existing code in rsk.csv2cdf.py relies on all bursts, except the last one, to be exactly the length specified in samples_per_burst attribute, which it uses to shape the burst data into (time, sample) dimensions. I have encountered burst data that had an irregular burst in the middle of the deployment, but had good bursts otherwise, and the only way to get the good data to process was to manually remove the bad burst(s) from the burst.txt file.
Suggest using burst counter and time stamp in the burst.txt file to check consistency of each burst, and if a bad burst is encountered, fill(trim) missing(extra) values, and proceed. Also look for any unexpected events in the events.txt file, and if encountered warn user to further investigate potential issues with the deployment.
I know that stglib is under constant development, but the philosophy of conda-forge is release early, release often. Would be nice to have it there, if only to do conda install stglib --only-deps
Hi Dan,
I have two quick questions:
Could you please add: import xarray as xr
and import numpy as np
to beginning of nc2waves.py? Those are needed when handling continuous data and using def make_wave_bursts.
Would you be ok with adding the reindexing by nearest method and 10-min tolerance back for atmos_correct?
I am noticing that when processing continuous dwave data, the atmospheric pressure correction is not working properly. Over at the WH office, we are using a Jupyter Notebook to create the atmpres.cdf file and Jupyter Notebook is not able to create a reindexed .cdf file on the 4hz continuous time base (at least in a reasonable time). I have been relying on the run script to do the re-indexing using nearest method.
I see how this can be a possible issue because most atmpres.cdf files are reindexed when created in Jupyter Notebook. So, that atmospheric pressure data will be reindexed twice, but I don't think the second reindex by the script would actually change the time if it's already been matched up in the Jupyter Notebook, right? One way to avoid this would be add an if
elif
statement to atmos_correct that will only re-index if the data is from a continuous dwave. All other atmpres.cdf files will be not be reindexed.
Hopefully all this made sense, if not I can elaborate...
Thanks
Which ones don't?
@dnowacki-usgs, @mmartini-usgs
One of the approaches to using python for our code in the future is using xarray for everything (so time is CF), then at the end, convert back to EPIC. Dan, you've already written most of this I think, but I'm stumped on some of the details.
I'm using a file MM wrote with xarray, to test. In that file _FillValue for all variables is NaN, which doesn't match our convention. In the files I've reviewed generated with your code, _FillValue is correct. Do you avoid having the wrong thing from the get-go using some xarray.ds argument, or is did you write a replace_nan_fillvalue that I haven't found?
In utils I found ds_add_attributes() that has this:
def add_attributes(var, dsattrs):
var.attrs.update({
'serial_number': dsattrs['serial_number'],
'initial_instrument_height': dsattrs['initial_instrument_height'],
'nominal_instrument_depth': dsattrs['nominal_instrument_depth'],
'height_depth_units': 'm',
'sensor_type': dsattrs['INST_TYPE'],
'_FillValue': 1e35})
Is that how you deal with it? What about variables that are defined as short? Is it smart enough to cast the 1e35 to float or double, depending on how the variable is declared?
Thanks!
xmltodict is required when importing stglib.
(https://github.com/dnowacki-usgs/stglib/blob/master/requirements.txt)
If it isn't in one's environment one gets this error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-4-ae9bfbeb1548> in <module>()
11 import sys
12 sys.path.append("c:\\projects\\python\\stglib")
---> 13 import xmltodict
14 import stglib
15 get_ipython().run_line_magic('matplotlib', 'inline')
ModuleNotFoundError: No module named 'xmltodict'
It is not part of the IOOS package (http://ioos.github.io/notebooks_demos/other_resources/)
The work around:
activate IOOS
conda install xmltodict
The permanent solution:
Get xmltodict on the list in ioos.github.io
So that we can create conda environments. See #31.
Line 145 of rsk2cdf.py encountering an error, encoding not a valid keyword.
if ("instrument_type" in ds.attrs) and (ds.attrs["instrument_type"] == "rbr_duo"):
ds["T_28"] = xr.DataArray(
t["temp"],
coords=[times, samples],
dims=("time", "sample"),
name="Temperature",
attrs={
"units": "C",
"long_name": "Temperature",
"epic_code": 28,
"serial_number": ds.attrs["serial_number"],
},
encoding={"_FillValue": 1e35},
)
Hi Dan,
Merge 143 included a continuous dwave test. Because the data is in a zipped folder, it is being unzipped during the test and git is wanting to track the unzipped folder. I guess the only way around this is to submit a PR with the unzipped folder included in the stglib test data folder?
-Bo
For downlookers, is there an option for opting out of trimming, atmospheric correction and waves computation in the config file? I'm fairly sure the last 3-4 lines are only needed if you're doing atmospheric correction and waves- should they be left or removed? Odd they're in the hdr2cdf, since the waves processing happens in diwasp. Are they only used to populate attributes (which shouldn't be added if you're not doing waves with them)?
I'm inclined to say trim_method: 'none', but is there a way to see what the options are? Sorry I don't know how to query it-
I had a seemingly successful run of an example from MVCO14 with your code but the matrix data in the -a is wrong. The program is incredibly fast though. output files attached.
(ioos) C:\home\data\proc\MVCO2014\1010_QD1_21m\aqd_11275\raw>python runaqdhdr2cdf.py ../../../glob_att1010.txt 10104_config.txt
Loading ASCII files
Insrument orientation: DOWN
Center_first_bin = 0.400000
bin_size = 0.200000
bin_count = 12.000000
User instructed that instrument was pointing DOWN
Time shifted by: 65 s
Finished writing data to 10104aqd-raw.cdf
(ioos) C:\home\data\proc\MVCO2014\1010_QD1_21m\aqd_11275\raw>python /Users/emontgomery/python_progs/DNstg/stglib/scripts/runaqdcdf2nc.py 10104aqd-raw.cdf
first burst in full file: 2014-07-01T04:00:00.000000000
last burst in full file: 2014-09-24T17:30:00.000000000
Clipping data using Deployment_date and Recovery_date
first burst in trimmed file: 2014-07-01T04:00:00.000000000
last burst in trimmed file: 2014-09-23T23:50:00.000000000
User instructed that instrument was pointing DOWN
Data are in XYZ coordinates; transforming to Earth coordinates
Rotating heading and horizontal velocities by -14.800000 degrees
Using NON-atmospherically corrected pressure to trim
Done writing netCDF file 10104aqd-a.nc
The resulting -a.nc file opened in ncBrowse and has reasonable looking data in P_1 and hdg_1215, but u and v both just showed up as black. I opened both in malab both u & v contain all 0's. There is variation in the VEL* variables in the CDF file, so it seems good that far... Not sure where to look for what's wrong- pls advise.
The RSK section of code is specific to the RBR DWave and burst data. The Virtuoso Tu has different field names, sampling types, etc. in the raw data file that need to be handled differently than the code is currently structured.
I propose separate files to handle DWave and Virtuoso-Tu data as they are so different.
This seems like something applicable to the mooring, not individual instruments. @emontgomery-usgs any thoughts on this?
Alex and I are processing exo data.
When running runexocsv2cdf.py, Python raises an error message stating, KeyError: 'KOR Export File'
.
This error message is generated when the read_exo_header function is called (defined in exo.py).
The read_exo_header function is written to read both old and new KOR export files and uses a 'try and except' statement.
However, the 'KeyError' exception is not included and Python stops running.
By replacing except pd.ParserError
with except KeyError
after the 'old KOR file' block, read_exo_header will fully run.
More KeyErrors are generated when the read_exo function (defined in exo.py) applies sensor serial numbers to each sensor.
These errors are raised because of differences in variable names between the old and new KOR file versions.
By adding a few 'try and except' statements with variable names corresponding to the new KOR files, no errors are generated when calling the read_exo function.
I think these changes are more of a temporary fix and hope to receive some feedback about how to properly modify these functions.
Thanks!
Bo
Hi Dan,
I've got some AQD data that was affected by periods of burial from marsh material. During burial periods the amplitude/AGC is low and I tried using a cutoff_ampl threshold to fill these periods or burial, but it didn't seem to work. I also couldn't find a def in the AQD cdf2nc.py or aqdutils.py that would fill data by an AGC threshold. Is this located somewhere that I am overlooking?
If it is not in stglib, could I add this function to aqdutils.py as a QAQC option?
Thanks,
Bo
I don't think the LISST functions are able to remove variables from the final outputs, so un-used analoginputs still show up in the final nc files, at the moment. Can the LISST codes be updated so that adding "drop_vars: ['AnalogInput1','AnalogInput2']" to the .yaml file would result in those fields not showing up in the final nc?
Similarly, it might be nice if the command line output said something about drop_vars, i.e. 'Dropping variables 'x and y' from the final nc file.'
I tried downloading stglib on Aug 2, 2022, and today (Aug 10) I tried using two codes from Scripts (runecocsv2cdf.py and runecocdf2nc.py). runecocdf2nc.py uses utils.py.
My version of utils.py has 1195 lines, and line 1008 reads
"deltat = np.asscalar((ds["time"][1] - ds["time"][0]) / np.timedelta64(1, "s"))". Python told me that the "asscalar" was a problem. I think this has already been fixed, as the version of utils.py on github has 1278 lines, and line 1089 (the new location of the add_delta_t function) no longer requires "asscalar".
My conclusion (really Steve's because he was helping me) is that in downloading and setting up with conda I didn't get the updates, since utils.py says it was updated 28 days ago. Might there be an issue here?
Yesterday I got the example config file for Aqd from http://stglib.readthedocs.io/en/latest/config.html, now an exo config file is there. Was this intentional?
I'm trying to figure how to work with xr, and thought this script would help, but don't find read_hobo in grandbay, djnpy or stglib. cvt* doesn't seem to import to get it- where does it live?
could follow the puvq code (I believe some of the functionality has already been added for other instruments)
When looking into issue #77, I noticed some exo variables were not paired with the sensor serial numbers.
For example, Turb, T_28, and S_41 do not have the "sensor_serial_number" attribute after processing to the final .nc file.
I made some changes to read_exo() in a feature branch.
Dan, let me know if I should lump the edits from this issue and issue #77 into one PR.
If you would like to make these changes on yourself, that's fine too!
Thanks!
I tried a clock drift correction when processing some Aquadopp data (mean currents only). The drift correction (+15 sec) was applied to the time units attribute in the -raw.cdf file as an offset. Then when trying to do atmospheric correction (ac) to pressure data in the nc file creation it failed because there was a mismatch between time between 'atmpres' variable from atmpres.cdf and 'Pressure' in aqd-raw.cdf file. Here is link to files;
https://github.com/ssuttles-usgs/stglib/tree/clockerr_issue/clockerr_issue
Also the clock drift should be applied as a linear correction between the time the clock was set before the deployment and when it was checked after recovery. Not a simple offset.
Presently in the stglib workflow any qaqc actions to data variables are specified in the config.yaml file, which is ingested as an argument at the first processing step where the raw instrument data are read and written to a raw.cdf file. It would be desirable to have the added capability to allow qaqc actions to be specified at later steps in the process, so that the raw,cdf file would not need to be recreated each time. One idea that has been discussed would be to allow a new qaqc.yaml file, containing qaqc actions, as an optional argument at the step(s) where the .nc files for data release are generated (e.g runexocdf2nc.py). This could be implemented in a similar way to the optional atmospheric pressure correction argument (--atmpres ) that is used to correct submerged pressure data for changes in local atmospheric pressure.
KorEXO Software V2.3.10.0 (released Oct 2020) renamed "BGA" to "TAL." They are the same variables, just different nomenclature (Blue grean algea vs. total algea). The name change causes some errors when calling various functions from exo.py...
I've made some changes in a branch that seem to have fixed the problem.
-Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.