Giter VIP home page Giter VIP logo

stglib's People

Contributors

dnowacki-usgs avatar emontgomery-usgs avatar mmartini-usgs avatar odemeo-usgs avatar rbales-usgs avatar rmallen86 avatar sccrosby avatar scmcgill0 avatar ssuttles-usgs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stglib's Issues

where do run scripts live?

I have been looking for runvecdat2cdf.py. It is mentioned in the documentation, but I can't find it on the repository. I want to process data from a Nortek vector, specifically including analog outputs. But first, I need to find this file. Thank you!

depth trimming using a surface following algorithm

This is something we have done in the past for up-looking ADCP profiles. Now that we are moving on to python, do we still want this? Or do we trim to a depth bin that guarantees all data are included? I'm asking because I don't see such a method here in stglib, maybe I missed it.

If so, there are two approaches:

  1. Use pressure to determine the surface location. assumes the pressure measurement is good. Note that older ADCP data will not have pressure. Maybe that data is old enough we don't care.
  2. Use backscatter peak, preferably from the center beam. This has been written in MATLAB already. Does it exist in python? Should it be written?

syntax to import core/utils?

@dnowacki-usgs
I'm trying to use methods from utils.py in core in a new ipynb, and followed the syntax in cdf2nc.py, but get errors. Please set me straight- thanks!

ValueError Traceback (most recent call last)
in ()
1 import xarray as xr
2 import stglib
----> 3 from ..core import utils

ValueError: attempted relative import beyond top-level package

how to edit content on http://stglib.readthedocs.io/en/latest/aqd.html

I believe you said the documentation was mostly auto generated from comments in the code, so I'm not sure where or how to add content like- "4/19/18- only works on up-looking non-HR instruments". I tried clicking the edit on git-hub link and the file it was looking for was not found.

It would also be good to put something like this in the processing overview: "the only modification needed to the programs is changing sys.path.insert to the local path to stglib. After that's done, the programs should run without needing additional modification."

adding LISST to stglib

Sequoia Scientific's LISST instrument is used by a handful of folks at USGS; it would be great to bring it in to stglib. As we start to add it, here's where we can discuss problems, questions, etc.

depth variable/dimension

This should be for all data with a depth. From Ellyn

exo and dwave data that contains the nominal measurement depth- we normally use WATER_DEPTH- initial_instrument_height. You could use a pressure based depth from each if you prefer- just add an attribute to the variable saying where it came from.

atmospheric file format description someplace?

gbts.read_nerrs() calls a csv file containing atmospheric data to use in correcting pressure data. Not all atmospheric data will be in the same format, but columns could be organized to match a spec if given. I'm trying to run this chunk of code on data from West Falmouth Harbor - the columns of this data are sample #, date time, press, temp,,,,,, (5 empty columns at the end). Since the csv file referenced isn't provided, it's hard to know how to reformat the data.

remove a number of pings at the start of each burst

So, the LISST-ABS requires a short warm up period before the data it collects is good. The docs for the instrument say it should be about 30 seconds, although it seems to be closer to 3 seconds based on inspecting the data. Is there a flag (or could one be added) to remove a certain number of pings at the start of each burst? I imagine this would live in the yaml file for each instrument. To start, it would be great if this flag worked for a vector (we've been using a vector to log the LISST-ABS), although maybe eventually other instruments would also like to have it.

Specifically, I imagine the flag is something like "n_warmup_pings", and it means that stglib ignores the data from the first "n_warmup_pings" during each burst. So the data output by stglib includes pings n_warmup_pings +1 to the total number of pings collected each burst.

ipynb examples need help- gbts and plotly not imported

I didn't find Grand Bay/py in the repo, and it's needed in the aqd_make_press_ac.ipynb example, so it fails on the "import gbts" line. Not sure where plotly comes from, but it doesn't import successfully either on my computer. Perhaps it should go in the env, so python will know it needs it?

Update new variable names in exo.py

Exo variable names were changed in the new KOR software. Updates were made to the functions read_exo and read_exo_header to accommodate for these variable name changes. Ds_rename_vars was not updated and it is limiting QA/QC for those variables with name changes and excluding attributes in the .nc file.

After line 291 (in exo.py), I suggest adding "Chlorophyll_ug_per_L": "Fch_906", "BGA_PE_RFU": "BGAPErfu", "BGA_PE_ug_per_L": "BGAPE",

Thank you!

tracking the zeroing of pressure sensors

This came up today - most of the time now we want to make atmospheric pressure corrections to the recorded pressure data from most of the instruments processed by this library. Some might consider such a correction interpretive - depending on the location of the deployment. Either way, the information necessary to make the decision to correct needs to be available to the code. Some of this is already provided for in .yml files.

The aquadopp and EXO .yml file has
zeroed_pressure: 'Yes' # was pressure zeroed before deployment
P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'

The dwave has
P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'
and should have a zeroed pressure note.

We should add the date, time, time zone and location the zeroing was done, if done, so that the pressure can be looked up, and perhaps even the local measurement of atmospheric pressure with the tag of the weather station used (Chatham, say is KCQX).

Tell me what you think!

too many times in Dwave burst file

I am processing DWave data,

runrskcdf2nc.py (which calls cdf_to_nc) generates what should be an EPIC compliant file and that file contains time_cf, time, time_2, epic_time and epic_time2.

In that configuration, we don't need epic_time*

One could argue that going forward, we should just produce files with time only, in CF convention. The EPIC times are a pain in python, though, these are burst files (time, sample) that get rejected by xarray anyway: 'time' has more than 1-dimension and the same name as one of its dimensions ('time', 'sample'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

It's not clear how this is happening. Maybe because in cdf2nc.py both ds = utils.create_epic_times(ds) and ds = utils.create_2d_time(ds) are applied? We could comment out ds = utils.create_epic_time(ds) on line 42, but I'll bet there is too much other code dependent on that. So we may want some keyword argument to do this.

Marinna

Problems with handling irregular burst in RBR D|wave processing

Processing RBR D|wave burst pressure data using runrskcsv2cdf.py script fails if there are incomplete (or irregular) bursts, except when it is the last burst in the burst.txt file. Existing code in rsk.csv2cdf.py relies on all bursts, except the last one, to be exactly the length specified in samples_per_burst attribute, which it uses to shape the burst data into (time, sample) dimensions. I have encountered burst data that had an irregular burst in the middle of the deployment, but had good bursts otherwise, and the only way to get the good data to process was to manually remove the bad burst(s) from the burst.txt file.

Suggest using burst counter and time stamp in the burst.txt file to check consistency of each burst, and if a bad burst is encountered, fill(trim) missing(extra) values, and proceed. Also look for any unexpected events in the events.txt file, and if encountered warn user to further investigate potential issues with the deployment.

runaqdhdr2cdf.py doesn't like '30 m' in read_aqd_hdr

Hi-

I got this error when trying to use your code to re-run 983.
image
If you have time, would you show me a bit more about how to go about debugging?

So far, all I can tell is that it doesn't like something it found in the Aqd header- the '30M' isn't in either of argument files.

Make new release and available on conda-forge

I know that stglib is under constant development, but the philosophy of conda-forge is release early, release often. Would be nice to have it there, if only to do conda install stglib --only-deps

nc2waves.py and atmos_correct from utils.py

Hi Dan,

I have two quick questions:

  1. Could you please add: import xarray as xr and import numpy as np to beginning of nc2waves.py? Those are needed when handling continuous data and using def make_wave_bursts.

  2. Would you be ok with adding the reindexing by nearest method and 10-min tolerance back for atmos_correct?

I am noticing that when processing continuous dwave data, the atmospheric pressure correction is not working properly. Over at the WH office, we are using a Jupyter Notebook to create the atmpres.cdf file and Jupyter Notebook is not able to create a reindexed .cdf file on the 4hz continuous time base (at least in a reasonable time). I have been relying on the run script to do the re-indexing using nearest method.

I see how this can be a possible issue because most atmpres.cdf files are reindexed when created in Jupyter Notebook. So, that atmospheric pressure data will be reindexed twice, but I don't think the second reindex by the script would actually change the time if it's already been matched up in the Jupyter Notebook, right? One way to avoid this would be add an if elif statement to atmos_correct that will only re-index if the data is from a continuous dwave. All other atmpres.cdf files will be not be reindexed.

Hopefully all this made sense, if not I can elaborate...

Thanks

How do you avoid default fill value of NaN in xarray?

@dnowacki-usgs, @mmartini-usgs
One of the approaches to using python for our code in the future is using xarray for everything (so time is CF), then at the end, convert back to EPIC. Dan, you've already written most of this I think, but I'm stumped on some of the details.

I'm using a file MM wrote with xarray, to test. In that file _FillValue for all variables is NaN, which doesn't match our convention. In the files I've reviewed generated with your code, _FillValue is correct. Do you avoid having the wrong thing from the get-go using some xarray.ds argument, or is did you write a replace_nan_fillvalue that I haven't found?

In utils I found ds_add_attributes() that has this:
def add_attributes(var, dsattrs):
var.attrs.update({
'serial_number': dsattrs['serial_number'],
'initial_instrument_height': dsattrs['initial_instrument_height'],
'nominal_instrument_depth': dsattrs['nominal_instrument_depth'],
'height_depth_units': 'm',
'sensor_type': dsattrs['INST_TYPE'],
'_FillValue': 1e35})

Is that how you deal with it? What about variables that are defined as short? Is it smart enough to cast the 1e35 to float or double, depending on how the variable is declared?

Thanks!

xmltodict module missing

xmltodict is required when importing stglib.
(https://github.com/dnowacki-usgs/stglib/blob/master/requirements.txt)

If it isn't in one's environment one gets this error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-ae9bfbeb1548> in <module>()
     11 import sys
     12 sys.path.append("c:\\projects\\python\\stglib")
---> 13 import xmltodict
     14 import stglib
     15 get_ipython().run_line_magic('matplotlib', 'inline')

ModuleNotFoundError: No module named 'xmltodict'

It is not part of the IOOS package (http://ioos.github.io/notebooks_demos/other_resources/)

The work around:

activate IOOS
conda install xmltodict 

The permanent solution:
Get xmltodict on the list in ioos.github.io

encoding not a keyword in xr.DataArray

Line 145 of rsk2cdf.py encountering an error, encoding not a valid keyword.

if ("instrument_type" in ds.attrs) and (ds.attrs["instrument_type"] == "rbr_duo"):
    ds["T_28"] = xr.DataArray(
        t["temp"],
        coords=[times, samples],
        dims=("time", "sample"),
        name="Temperature",
        attrs={
            "units": "C",
            "long_name": "Temperature",
            "epic_code": 28,
            "serial_number": ds.attrs["serial_number"],
        },
        encoding={"_FillValue": 1e35},
    )

Dwave test zipped folder

Hi Dan,

Merge 143 included a continuous dwave test. Because the data is in a zipped folder, it is being unzipped during the test and git is wanting to track the unzipped folder. I guess the only way around this is to submit a PR with the unzipped folder included in the stglib test data folder?

-Bo

Does code work with down-looking instruments?

For downlookers, is there an option for opting out of trimming, atmospheric correction and waves computation in the config file? I'm fairly sure the last 3-4 lines are only needed if you're doing atmospheric correction and waves- should they be left or removed? Odd they're in the hdr2cdf, since the waves processing happens in diwasp. Are they only used to populate attributes (which shouldn't be added if you're not doing waves with them)?

I'm inclined to say trim_method: 'none', but is there a way to see what the options are? Sorry I don't know how to query it-

Output of a sample test run of runaqdhdr2cdf.py on 10104aqd is all 0's- why?

I had a seemingly successful run of an example from MVCO14 with your code but the matrix data in the -a is wrong. The program is incredibly fast though. output files attached.

(ioos) C:\home\data\proc\MVCO2014\1010_QD1_21m\aqd_11275\raw>python runaqdhdr2cdf.py ../../../glob_att1010.txt 10104_config.txt
Loading ASCII files
Insrument orientation: DOWN
Center_first_bin = 0.400000
bin_size = 0.200000
bin_count = 12.000000
User instructed that instrument was pointing DOWN
Time shifted by: 65 s
Finished writing data to 10104aqd-raw.cdf

(ioos) C:\home\data\proc\MVCO2014\1010_QD1_21m\aqd_11275\raw>python /Users/emontgomery/python_progs/DNstg/stglib/scripts/runaqdcdf2nc.py 10104aqd-raw.cdf
first burst in full file: 2014-07-01T04:00:00.000000000
last burst in full file: 2014-09-24T17:30:00.000000000
Clipping data using Deployment_date and Recovery_date
first burst in trimmed file: 2014-07-01T04:00:00.000000000
last burst in trimmed file: 2014-09-23T23:50:00.000000000
User instructed that instrument was pointing DOWN
Data are in XYZ coordinates; transforming to Earth coordinates
Rotating heading and horizontal velocities by -14.800000 degrees
Using NON-atmospherically corrected pressure to trim
Done writing netCDF file 10104aqd-a.nc

The resulting -a.nc file opened in ncBrowse and has reasonable looking data in P_1 and hdg_1215, but u and v both just showed up as black. I opened both in malab both u & v contain all 0's. There is variation in the VEL* variables in the CDF file, so it seems good that far... Not sure where to look for what's wrong- pls advise.

RBR Virtuoso Tu data handling needed

The RSK section of code is specific to the RBR DWave and burst data. The Virtuoso Tu has different field names, sampling types, etc. in the raw data file that need to be handled differently than the code is currently structured.

I propose separate files to handle DWave and Virtuoso-Tu data as they are so different.

KORexo file version issues

Alex and I are processing exo data.

When running runexocsv2cdf.py, Python raises an error message stating, KeyError: 'KOR Export File'.

This error message is generated when the read_exo_header function is called (defined in exo.py).
The read_exo_header function is written to read both old and new KOR export files and uses a 'try and except' statement.
However, the 'KeyError' exception is not included and Python stops running.
By replacing except pd.ParserError with except KeyError after the 'old KOR file' block, read_exo_header will fully run.

More KeyErrors are generated when the read_exo function (defined in exo.py) applies sensor serial numbers to each sensor.
These errors are raised because of differences in variable names between the old and new KOR file versions.
By adding a few 'try and except' statements with variable names corresponding to the new KOR files, no errors are generated when calling the read_exo function.

I think these changes are more of a temporary fix and hope to receive some feedback about how to properly modify these functions.

Thanks!
Bo

cutoff_ampl feature for aqd

Hi Dan,

I've got some AQD data that was affected by periods of burial from marsh material. During burial periods the amplitude/AGC is low and I tried using a cutoff_ampl threshold to fill these periods or burial, but it didn't seem to work. I also couldn't find a def in the AQD cdf2nc.py or aqdutils.py that would fill data by an AGC threshold. Is this located somewhere that I am overlooking?

If it is not in stglib, could I add this function to aqdutils.py as a QAQC option?

Thanks,
Bo

drop_vars functionality for processing LISST data

I don't think the LISST functions are able to remove variables from the final outputs, so un-used analoginputs still show up in the final nc files, at the moment. Can the LISST codes be updated so that adding "drop_vars: ['AnalogInput1','AnalogInput2']" to the .yaml file would result in those fields not showing up in the final nc?

Similarly, it might be nice if the command line output said something about drop_vars, i.e. 'Dropping variables 'x and y' from the final nc file.'

downloading stglib following the "easy (local machine)" strategy doesn't give all the updates

I tried downloading stglib on Aug 2, 2022, and today (Aug 10) I tried using two codes from Scripts (runecocsv2cdf.py and runecocdf2nc.py). runecocdf2nc.py uses utils.py.

My version of utils.py has 1195 lines, and line 1008 reads
"deltat = np.asscalar((ds["time"][1] - ds["time"][0]) / np.timedelta64(1, "s"))". Python told me that the "asscalar" was a problem. I think this has already been fixed, as the version of utils.py on github has 1278 lines, and line 1089 (the new location of the add_delta_t function) no longer requires "asscalar".

My conclusion (really Steve's because he was helping me) is that in downloading and setting up with conda I didn't get the updates, since utils.py says it was updated 28 days ago. Might there be an issue here?

in cvt_hobomet2xr, where is read_hobo?

I'm trying to figure how to work with xr, and thought this script would help, but don't find read_hobo in grandbay, djnpy or stglib. cvt* doesn't seem to import to get it- where does it live?

Missing serial numbers from some EXO variables

When looking into issue #77, I noticed some exo variables were not paired with the sensor serial numbers.
For example, Turb, T_28, and S_41 do not have the "sensor_serial_number" attribute after processing to the final .nc file.

I made some changes to read_exo() in a feature branch.

Dan, let me know if I should lump the edits from this issue and issue #77 into one PR.
If you would like to make these changes on yourself, that's fine too!

Thanks!

Clock drift correction needs fixing.

I tried a clock drift correction when processing some Aquadopp data (mean currents only). The drift correction (+15 sec) was applied to the time units attribute in the -raw.cdf file as an offset. Then when trying to do atmospheric correction (ac) to pressure data in the nc file creation it failed because there was a mismatch between time between 'atmpres' variable from atmpres.cdf and 'Pressure' in aqd-raw.cdf file. Here is link to files;

https://github.com/ssuttles-usgs/stglib/tree/clockerr_issue/clockerr_issue

Also the clock drift should be applied as a linear correction between the time the clock was set before the deployment and when it was checked after recovery. Not a simple offset.

Capability to do qaqc without recreating -raw.cdf file

Presently in the stglib workflow any qaqc actions to data variables are specified in the config.yaml file, which is ingested as an argument at the first processing step where the raw instrument data are read and written to a raw.cdf file. It would be desirable to have the added capability to allow qaqc actions to be specified at later steps in the process, so that the raw,cdf file would not need to be recreated each time. One idea that has been discussed would be to allow a new qaqc.yaml file, containing qaqc actions, as an optional argument at the step(s) where the .nc files for data release are generated (e.g runexocdf2nc.py). This could be implemented in a similar way to the optional atmospheric pressure correction argument (--atmpres ) that is used to correct submerged pressure data for changes in local atmospheric pressure.

Updated KorEXO .csv file (variable name change)

KorEXO Software V2.3.10.0 (released Oct 2020) renamed "BGA" to "TAL." They are the same variables, just different nomenclature (Blue grean algea vs. total algea). The name change causes some errors when calling various functions from exo.py...

I've made some changes in a branch that seem to have fixed the problem.

-Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.