usgs-cmg / stglib Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 12.0 9.92 MB

Routines used by the USGS Coastal/Marine Hazards & Resources Program to process oceanographic time-series data

License: Other

Python 98.45% MATLAB 0.98% Jupyter Notebook 0.47% Mathematica 0.05% Dylan 0.01% PLSQL 0.01% VHDL 0.01%

stglib's People

Contributors

Stargazers

Watchers

Forkers

emontgomery-usgs mmartini-usgs rbales-usgs odemeo-usgs nrrandall-usgs sccrosby ssuttles-usgs ajolson-usgs rmallen86 zdefne-usgs scmcgill0

stglib's Issues

link in docs/contributing is broken: "Installing the IOOS environment"

The current link is to: https://ioos.github.io/notebooks_demos/other_resources/, which lands you on a 404.

I think the updated link should be: https://ioos.github.io/ioos_code_lab/content/ioos_installation_conda.html

where do run scripts live?

I have been looking for runvecdat2cdf.py. It is mentioned in the documentation, but I can't find it on the repository. I want to process data from a Nortek vector, specifically including analog outputs. But first, I need to find this file. Thank you!

depth trimming using a surface following algorithm

This is something we have done in the past for up-looking ADCP profiles. Now that we are moving on to python, do we still want this? Or do we trim to a depth bin that guarantees all data are included? I'm asking because I don't see such a method here in stglib, maybe I missed it.

If so, there are two approaches:

Use pressure to determine the surface location. assumes the pressure measurement is good. Note that older ADCP data will not have pressure. Maybe that data is old enough we don't care.
Use backscatter peak, preferably from the center beam. This has been written in MATLAB already. Does it exist in python? Should it be written?

Vector Analog Input (SeaPoint) data are being output in counts rather than volts

Vector Analog Input (SeaPoint) data are being output in counts rather than volts, despite unit specification in config file.

syntax to import core/utils?

@dnowacki-usgs
I'm trying to use methods from utils.py in core in a new ipynb, and followed the syntax in cdf2nc.py, but get errors. Please set me straight- thanks!

ValueError Traceback (most recent call last)
in ()
1 import xarray as xr
2 import stglib
----> 3 from ..core import utils

ValueError: attempted relative import beyond top-level package

how to edit content on http://stglib.readthedocs.io/en/latest/aqd.html

I believe you said the documentation was mostly auto generated from comments in the code, so I'm not sure where or how to add content like- "4/19/18- only works on up-looking non-HR instruments". I tried clicking the edit on git-hub link and the file it was looking for was not found.

It would also be good to put something like this in the processing overview: "the only modification needed to the programs is changing sys.path.insert to the local path to stglib. After that's done, the programs should run without needing additional modification."

adding LISST to stglib

Sequoia Scientific's LISST instrument is used by a handful of folks at USGS; it would be great to bring it in to stglib. As we start to add it, here's where we can discuss problems, questions, etc.

depth variable/dimension

This should be for all data with a depth. From Ellyn

exo and dwave data that contains the nominal measurement depth- we normally use WATER_DEPTH- initial_instrument_height. You could use a pressure based depth from each if you prefer- just add an attribute to the variable saying where it came from.

drop_vars functionality in other instruments?

The EXO seems to have the ability to remove outputs from the final nc file, via the "drop_vars" keyword in the config file:
https://stglib.readthedocs.io/en/latest/config.html#exo

Could that function be added to other instruments? Specifically, I'd like to output a vector file that does not include velocities - we used the vector simply to log data from an external sensor (the LISST-ABS), so we want the AnalogInput field, but not the velocities.

Edit config.yaml for more instrument types

atmospheric file format description someplace?

gbts.read_nerrs() calls a csv file containing atmospheric data to use in correcting pressure data. Not all atmospheric data will be in the same format, but columns could be organized to match a spec if given. I'm trying to run this chunk of code on data from West Falmouth Harbor - the columns of this data are sample #, date time, press, temp,,,,,, (5 empty columns at the end). Since the csv file referenced isn't provided, it's hard to know how to reformat the data.

remove a number of pings at the start of each burst

So, the LISST-ABS requires a short warm up period before the data it collects is good. The docs for the instrument say it should be about 30 seconds, although it seems to be closer to 3 seconds based on inspecting the data. Is there a flag (or could one be added) to remove a certain number of pings at the start of each burst? I imagine this would live in the yaml file for each instrument. To start, it would be great if this flag worked for a vector (we've been using a vector to log the LISST-ABS), although maybe eventually other instruments would also like to have it.

Specifically, I imagine the flag is something like "n_warmup_pings", and it means that stglib ignores the data from the first "n_warmup_pings" during each burst. So the data output by stglib includes pings n_warmup_pings +1 to the total number of pings collected each burst.

invalid argument error in to_netcdf

@dnowacki-usgs
I'm still trying to use examples from your code and not getting it quite right... In this gist https://gist.github.com/emontgomery-usgs/18741fdd410668beaece811f7340ee9d, you'll see the code I'm trying to use to convert a file Marinna wrote using Xarray to EPIC. The content seems right, except that it doesn't have time as unlimited. I'm sure you've run into this, and know the magic that I've missed.

ipynb examples need help- gbts and plotly not imported

I didn't find Grand Bay/py in the repo, and it's needed in the aqd_make_press_ac.ipynb example, so it fails on the "import gbts" line. Not sure where plotly comes from, but it doesn't import successfully either on my computer. Perhaps it should go in the env, so python will know it needs it?

Update new variable names in exo.py

Exo variable names were changed in the new KOR software. Updates were made to the functions read_exo and read_exo_header to accommodate for these variable name changes. Ds_rename_vars was not updated and it is limiting QA/QC for those variables with name changes and excluding attributes in the .nc file.

After line 291 (in exo.py), I suggest adding "Chlorophyll_ug_per_L": "Fch_906", "BGA_PE_RFU": "BGAPErfu", "BGA_PE_ug_per_L": "BGAPE",

Thank you!

tracking the zeroing of pressure sensors

This came up today - most of the time now we want to make atmospheric pressure corrections to the recorded pressure data from most of the instruments processed by this library. Some might consider such a correction interpretive - depending on the location of the deployment. Either way, the information necessary to make the decision to correct needs to be available to the code. Some of this is already provided for in .yml files.

The aquadopp and EXO .yml file has
zeroed_pressure: 'Yes' # was pressure zeroed before deployment
P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'

The dwave has
P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'
and should have a zeroed pressure note.

We should add the date, time, time zone and location the zeroing was done, if done, so that the pressure can be looked up, and perhaps even the local measurement of atmospheric pressure with the tag of the weather station used (Chatham, say is KCQX).

Tell me what you think!

Ensure MOORING is string

too many times in Dwave burst file

I am processing DWave data,

runrskcdf2nc.py (which calls cdf_to_nc) generates what should be an EPIC compliant file and that file contains time_cf, time, time_2, epic_time and epic_time2.

In that configuration, we don't need epic_time*

One could argue that going forward, we should just produce files with time only, in CF convention. The EPIC times are a pain in python, though, these are burst files (time, sample) that get rejected by xarray anyway: 'time' has more than 1-dimension and the same name as one of its dimensions ('time', 'sample'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

It's not clear how this is happening. Maybe because in cdf2nc.py both ds = utils.create_epic_times(ds) and ds = utils.create_2d_time(ds) are applied? We could comment out ds = utils.create_epic_time(ds) on line 42, but I'll bet there is too much other code dependent on that. So we may want some keyword argument to do this.

Marinna

Problems with handling irregular burst in RBR D|wave processing

Processing RBR D|wave burst pressure data using runrskcsv2cdf.py script fails if there are incomplete (or irregular) bursts, except when it is the last burst in the burst.txt file. Existing code in rsk.csv2cdf.py relies on all bursts, except the last one, to be exactly the length specified in samples_per_burst attribute, which it uses to shape the burst data into (time, sample) dimensions. I have encountered burst data that had an irregular burst in the middle of the deployment, but had good bursts otherwise, and the only way to get the good data to process was to manually remove the bad burst(s) from the burst.txt file.

Suggest using burst counter and time stamp in the burst.txt file to check consistency of each burst, and if a bad burst is encountered, fill(trim) missing(extra) values, and proceed. Also look for any unexpected events in the events.txt file, and if encountered warn user to further investigate potential issues with the deployment.

runaqdhdr2cdf.py doesn't like '30 m' in read_aqd_hdr

Hi-

I got this error when trying to use your code to re-run 983.

If you have time, would you show me a bit more about how to go about debugging?

So far, all I can tell is that it doesn't like something it found in the Aqd header- the '30M' isn't in either of argument files.

Make new release and available on conda-forge

I know that stglib is under constant development, but the philosophy of conda-forge is release early, release often. Would be nice to have it there, if only to do conda install stglib --only-deps

nc2waves.py and atmos_correct from utils.py

Hi Dan,

I have two quick questions:

Could you please add: import xarray as xr and import numpy as np to beginning of nc2waves.py? Those are needed when handling continuous data and using def make_wave_bursts.
Would you be ok with adding the reindexing by nearest method and 10-min tolerance back for atmos_correct?

I am noticing that when processing continuous dwave data, the atmospheric pressure correction is not working properly. Over at the WH office, we are using a Jupyter Notebook to create the atmpres.cdf file and Jupyter Notebook is not able to create a reindexed .cdf file on the 4hz continuous time base (at least in a reasonable time). I have been relying on the run script to do the re-indexing using nearest method.

I see how this can be a possible issue because most atmpres.cdf files are reindexed when created in Jupyter Notebook. So, that atmospheric pressure data will be reindexed twice, but I don't think the second reindex by the script would actually change the time if it's already been matched up in the Jupyter Notebook, right? One way to avoid this would be add an if elif statement to atmos_correct that will only re-index if the data is from a continuous dwave. All other atmpres.cdf files will be not be reindexed.

Hopefully all this made sense, if not I can elaborate...

Thanks

Add COMPOSITE for all types, should be 0

Most files need start_time and stop_time

Which ones don't?

How do you avoid default fill value of NaN in xarray?

@dnowacki-usgs, @mmartini-usgs
One of the approaches to using python for our code in the future is using xarray for everything (so time is CF), then at the end, convert back to EPIC. Dan, you've already written most of this I think, but I'm stumped on some of the details.

I'm using a file MM wrote with xarray, to test. In that file _FillValue for all variables is NaN, which doesn't match our convention. In the files I've reviewed generated with your code, _FillValue is correct. Do you avoid having the wrong thing from the get-go using some xarray.ds argument, or is did you write a replace_nan_fillvalue that I haven't found?

In utils I found ds_add_attributes() that has this:
def add_attributes(var, dsattrs):
var.attrs.update({
'serial_number': dsattrs['serial_number'],
'initial_instrument_height': dsattrs['initial_instrument_height'],
'nominal_instrument_depth': dsattrs['nominal_instrument_depth'],
'height_depth_units': 'm',
'sensor_type': dsattrs['INST_TYPE'],
'_FillValue': 1e35})

Is that how you deal with it? What about variables that are defined as short? Is it smart enough to cast the 1e35 to float or double, depending on how the variable is declared?

Thanks!

xmltodict module missing

xmltodict is required when importing stglib.
(https://github.com/dnowacki-usgs/stglib/blob/master/requirements.txt)

If it isn't in one's environment one gets this error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-ae9bfbeb1548> in <module>()
     11 import sys
     12 sys.path.append("c:\\projects\\python\\stglib")
---> 13 import xmltodict
     14 import stglib
     15 get_ipython().run_line_magic('matplotlib', 'inline')

ModuleNotFoundError: No module named 'xmltodict'

It is not part of the IOOS package (http://ioos.github.io/notebooks_demos/other_resources/)

The work around:

activate IOOS
conda install xmltodict

The permanent solution:
Get xmltodict on the list in ioos.github.io

Create environment.yml

So that we can create conda environments. See #31.

encoding not a keyword in xr.DataArray

Line 145 of rsk2cdf.py encountering an error, encoding not a valid keyword.

if ("instrument_type" in ds.attrs) and (ds.attrs["instrument_type"] == "rbr_duo"):
    ds["T_28"] = xr.DataArray(
        t["temp"],
        coords=[times, samples],
        dims=("time", "sample"),
        name="Temperature",
        attrs={
            "units": "C",
            "long_name": "Temperature",
            "epic_code": 28,
            "serial_number": ds.attrs["serial_number"],
        },
        encoding={"_FillValue": 1e35},
    )

Dwave test zipped folder

Hi Dan,

Merge 143 included a continuous dwave test. Because the data is in a zipped folder, it is being unzipped during the test and git is wanting to track the unzipped folder. I guess the only way around this is to submit a PR with the unzipped folder included in the stglib test data folder?

-Bo

Does code work with down-looking instruments?

For downlookers, is there an option for opting out of trimming, atmospheric correction and waves computation in the config file? I'm fairly sure the last 3-4 lines are only needed if you're doing atmospheric correction and waves- should they be left or removed? Odd they're in the hdr2cdf, since the waves processing happens in diwasp. Are they only used to populate attributes (which shouldn't be added if you're not doing waves with them)?

I'm inclined to say trim_method: 'none', but is there a way to see what the options are? Sorry I don't know how to query it-

Ensure battery is reported at Bat_106

Output of a sample test run of runaqdhdr2cdf.py on 10104aqd is all 0's- why?

I had a seemingly successful run of an example from MVCO14 with your code but the matrix data in the -a is wrong. The program is incredibly fast though. output files attached.

(ioos) C:\home\data\proc\MVCO2014\1010_QD1_21m\aqd_11275\raw>python runaqdhdr2cdf.py ../../../glob_att1010.txt 10104_config.txt
Loading ASCII files
Insrument orientation: DOWN
Center_first_bin = 0.400000
bin_size = 0.200000
bin_count = 12.000000
User instructed that instrument was pointing DOWN
Time shifted by: 65 s
Finished writing data to 10104aqd-raw.cdf

(ioos) C:\home\data\proc\MVCO2014\1010_QD1_21m\aqd_11275\raw>python /Users/emontgomery/python_progs/DNstg/stglib/scripts/runaqdcdf2nc.py 10104aqd-raw.cdf
first burst in full file: 2014-07-01T04:00:00.000000000
last burst in full file: 2014-09-24T17:30:00.000000000
Clipping data using Deployment_date and Recovery_date
first burst in trimmed file: 2014-07-01T04:00:00.000000000
last burst in trimmed file: 2014-09-23T23:50:00.000000000
User instructed that instrument was pointing DOWN
Data are in XYZ coordinates; transforming to Earth coordinates
Rotating heading and horizontal velocities by -14.800000 degrees
Using NON-atmospherically corrected pressure to trim
Done writing netCDF file 10104aqd-a.nc

The resulting -a.nc file opened in ncBrowse and has reasonable looking data in P_1 and hdg_1215, but u and v both just showed up as black. I opened both in malab both u & v contain all 0's. There is variation in the VEL* variables in the CDF file, so it seems good that far... Not sure where to look for what's wrong- pls advise.

Add DELTA_T for all instrument types

RBR Virtuoso Tu data handling needed

The RSK section of code is specific to the RBR DWave and burst data. The Virtuoso Tu has different field names, sampling types, etc. in the raw data file that need to be handled differently than the code is currently structured.

I propose separate files to handle DWave and Virtuoso-Tu data as they are so different.

Should LatLonDatum be in glob_att instead of instrument config files?

This seems like something applicable to the mooring, not individual instruments. @emontgomery-usgs any thoughts on this?

KORexo file version issues

Alex and I are processing exo data.

When running runexocsv2cdf.py, Python raises an error message stating, KeyError: 'KOR Export File'.

This error message is generated when the read_exo_header function is called (defined in exo.py).
The read_exo_header function is written to read both old and new KOR export files and uses a 'try and except' statement.
However, the 'KeyError' exception is not included and Python stops running.
By replacing except pd.ParserError with except KeyError after the 'old KOR file' block, read_exo_header will fully run.

More KeyErrors are generated when the read_exo function (defined in exo.py) applies sensor serial numbers to each sensor.
These errors are raised because of differences in variable names between the old and new KOR file versions.
By adding a few 'try and except' statements with variable names corresponding to the new KOR files, no errors are generated when calling the read_exo function.

I think these changes are more of a temporary fix and hope to receive some feedback about how to properly modify these functions.

Thanks!
Bo

cutoff_ampl feature for aqd

Hi Dan,

I've got some AQD data that was affected by periods of burial from marsh material. During burial periods the amplitude/AGC is low and I tried using a cutoff_ampl threshold to fill these periods or burial, but it didn't seem to work. I also couldn't find a def in the AQD cdf2nc.py or aqdutils.py that would fill data by an AGC threshold. Is this located somewhere that I am overlooking?

If it is not in stglib, could I add this function to aqdutils.py as a QAQC option?

Thanks,
Bo

drop_vars functionality for processing LISST data

I don't think the LISST functions are able to remove variables from the final outputs, so un-used analoginputs still show up in the final nc files, at the moment. Can the LISST codes be updated so that adding "drop_vars: ['AnalogInput1','AnalogInput2']" to the .yaml file would result in those fields not showing up in the final nc?

Similarly, it might be nice if the command line output said something about drop_vars, i.e. 'Dropping variables 'x and y' from the final nc file.'

Add INST_TYPE for all instrument types

downloading stglib following the "easy (local machine)" strategy doesn't give all the updates

I tried downloading stglib on Aug 2, 2022, and today (Aug 10) I tried using two codes from Scripts (runecocsv2cdf.py and runecocdf2nc.py). runecocdf2nc.py uses utils.py.

My version of utils.py has 1195 lines, and line 1008 reads
"deltat = np.asscalar((ds["time"][1] - ds["time"][0]) / np.timedelta64(1, "s"))". Python told me that the "asscalar" was a problem. I think this has already been fixed, as the version of utils.py on github has 1278 lines, and line 1089 (the new location of the add_delta_t function) no longer requires "asscalar".

My conclusion (really Steve's because he was helping me) is that in downloading and setting up with conda I didn't get the updates, since utils.py says it was updated 28 days ago. Might there be an issue here?

documentation of config file changed on 4/6

Yesterday I got the example config file for Aqd from http://stglib.readthedocs.io/en/latest/config.html, now an exo config file is there. Was this intentional?

Ensure serial number is string for all

in cvt_hobomet2xr, where is read_hobo?

I'm trying to figure how to work with xr, and thought this script would help, but don't find read_hobo in grandbay, djnpy or stglib. cvt* doesn't seem to import to get it- where does it live?

add capabilities to compute wave statistics from a vector

could follow the puvq code (I believe some of the functionality has already been added for other instruments)

Missing serial numbers from some EXO variables

When looking into issue #77, I noticed some exo variables were not paired with the sensor serial numbers.
For example, Turb, T_28, and S_41 do not have the "sensor_serial_number" attribute after processing to the final .nc file.

I made some changes to read_exo() in a feature branch.

Dan, let me know if I should lump the edits from this issue and issue #77 into one PR.
If you would like to make these changes on yourself, that's fine too!

Thanks!

time, time2 need to be 2D for dwaves and AQD waves

Clock drift correction needs fixing.

I tried a clock drift correction when processing some Aquadopp data (mean currents only). The drift correction (+15 sec) was applied to the time units attribute in the -raw.cdf file as an offset. Then when trying to do atmospheric correction (ac) to pressure data in the nc file creation it failed because there was a mismatch between time between 'atmpres' variable from atmpres.cdf and 'Pressure' in aqd-raw.cdf file. Here is link to files;

https://github.com/ssuttles-usgs/stglib/tree/clockerr_issue/clockerr_issue

Also the clock drift should be applied as a linear correction between the time the clock was set before the deployment and when it was checked after recovery. Not a simple offset.

Ensure coordinate transforms are correct for down-looking instruments

Capability to do qaqc without recreating -raw.cdf file

Presently in the stglib workflow any qaqc actions to data variables are specified in the config.yaml file, which is ingested as an argument at the first processing step where the raw instrument data are read and written to a raw.cdf file. It would be desirable to have the added capability to allow qaqc actions to be specified at later steps in the process, so that the raw,cdf file would not need to be recreated each time. One idea that has been discussed would be to allow a new qaqc.yaml file, containing qaqc actions, as an optional argument at the step(s) where the .nc files for data release are generated (e.g runexocdf2nc.py). This could be implemented in a similar way to the optional atmospheric pressure correction argument (--atmpres ) that is used to correct submerged pressure data for changes in local atmospheric pressure.

Updated KorEXO .csv file (variable name change)

KorEXO Software V2.3.10.0 (released Oct 2020) renamed "BGA" to "TAL." They are the same variables, just different nomenclature (Blue grean algea vs. total algea). The name change causes some errors when calling various functions from exo.py...

I've made some changes in a branch that seem to have fixed the problem.

-Thanks!

usgs-cmg / stglib Goto Github PK

stglib's People

Contributors

Stargazers

Watchers

Forkers

stglib's Issues

Recommend Projects

Recommend Topics

Recommend Org