Giter VIP home page Giter VIP logo

obs-standards's Introduction

OBS-standards

Definition of marine seismology data and metadata standards, and possibly creation of documents for users

The goals are to:

  • define standards (and propose new ones if needed)
  • Help data providers to create standardized data and metadata
  • Help users to understand and use marine seismology data
  • include standards in new “Guidelines for specific datasets” within the StationXML and miniSEED documentation

We want to have a document or documents ready for validation/vote before the next FDSN meeting (September 2025 Lisbon)

The current structure is

  • one document on standards
  • a second document on useful, publically available software tools
  • a third preambule.md document that holds motivation information (was in original documents, not ready to throw it out yet)
  • A "References" directory holding old standards documents

The first two documents are initally based on the information in the "References" directory

obs-standards's People

Contributors

waynecrawford avatar

Stargazers

Maria Tsekhmistrenko avatar  avatar

Watchers

 avatar Javier Quinteros avatar Maria Tsekhmistrenko avatar John A. Collins avatar Chad Trabant avatar Joel D. Simon avatar  avatar  avatar

obs-standards's Issues

How to specify the time correction method in the miniSEED files?

Current recommendation is to set the data quality code to "D" if the time is "NOT CLOCK CORRECTED" and "Q" if the time is "CLOCK CORRECTED". This cannot be carried over into miniSEED3, for which the data quality code is replaced by a numeric "data publication version" (0-255). Also, it has the disadvantage that, if we wish to provide both RAW and "CLOCK CORRECTED" data, standard database formats used by data centers (i.e., Seiscomp Data Structure [SDS]) give the same filename to both. Some options are:

  1. Use miniSEED3's "data publication version", with "NOT CLOCK CORRECTED" data versions starting at 0 and "CLOCK CORRECTED" data starting at 100

    • Advantages: downloads would (probably) return the highest version by default, allow user to select a version (like the current FDSN data webservice for the data quality flag). Does not break current (miniSEED2) convention.
    • Disadvantages: Non-standard use of this field. SDS would still not allow separate files for each method.
  2. Use miniSEED3's "Extra Header Field", maybe even an FDSN-standard one (would have to request from FDSN)

    • Advantages: Clear specification of time correction method. Does not conflict with defined usage of a field.
    • Disadvantages: Webservices may not have a way to download "preferred" method, nor to request a specific method? Standard database file naming conventions would not allow separate files for each method.
  3. Use a custom location, such as 00-D for "NOT CLOCK CORRECTED" method and 00-Q for "CLOCK CORRECTED" method

    • Advantages: Would allow separate files for each method. Easy to specify which method the user wants
    • Disadvantages: Not standard use of location code (but maybe it could become, as the dash has no standard use?). Not miniseed2-compatible (too many location characters). Would need separate channel-location definitions in StationXML

How to specify clock drift in metadata?

Linear drift and piecewise linear drift seem to be accepted. Polynomial drift has been requested by DEPAS. Below are proposed structures:

Linear drift correction

    type: "linear"
    start_sync_reference: '2015-04-22T09:21:00Z'
    start_sync_instrument: '2015-04-22T09:21:00Z'
    end_sync_reference: '2016-05-28T22:59:00.1843Z'
    end_sync_instrument: '2016-05-28T22:59:02Z'

If there is assumed to be clock drift but some or all of values were not measured, each missing value should be represented by 'None'

Piecewise correction

    type: "piecewise"
    interpolation: 'linear'  # 'linear' or 'cubic'
    syncs_reference:
        - '2015-04-22T09:21:00Z'
        - '2016-05-28T22:59:00.1843Z'
    syncs_instrument:
        - '2015-04-22T09:21:00Z'
        - '2016-05-28T22:59:02Z'

This gives the same output as "linear' if there are only two syncs and interpolation='linear'

Polynomial correction

Wayne made this up without checking with someone who's actually done this.

    type: "polynomial"
    start_sync_reference: '2015-04-22T09:21:00Z'
    a = [0, 1.1e-9, 1.5e-6, 2.45e-18]
    checks:
        start_sync_instrument: '2015-04-22T09:21:00Z'
        end_sync_reference: '2016-05-28T22:59:00.1843Z'
        end_sync_instrument: '2016-05-28T22:59:02Z'

corrected_time would equal instrument_time + a[0] + a[1]*dTime + a[2]*dTime^2 + a[3]*dTime^3 ...

where dTime = instrument_time - start_sync_instrument

  • start_sync_instrument provides a check on a[0]
  • end_sync_reference and end_sync_instrument provide a check on the offset at the end of the experiment

Optional parameters

The following parameters may be included for more information:

    time_base: 'Seascan MCXO, 1e-8 nominal drift' 
    reference: 'GPS'  

These structures are embedded in the StationXML file as a JSON-coded string in a <Comment> field with subject="Clock Drift"
In the future, a separate namespace may be created to allow a more specific and structured representation

Below is an example of the first proposition in a <Comment> field:

<Comment subject=”Clock Drift”>
<Value>“{type: linear, start_sync_reference: 2015-04- 22T09:21:00Z,start_sync_instrument: 0, end_sync_reference: 2016-05- 28T22:59:00.1843Z,end_sync_instrument: 2016-05-28T22:59:02Z}}”</Value> </Comment>

2024 AGU Abstract

A first draft:

The FDSN Action Group on Marine Seismology Data and Metadata Standards

(alphabetical list of Action Group Members, + WG chairs?)

Marine seismological data are crucial to studying many local, regional and global-scale processes, including subduction zones, mid-ocean spreading centers, interplate volcanos, deep and shallow hot-spots, mantle circulation and global earth structure. It is crucial that these data arrive at FDSN-standard data centers and that they are useable by the entire seismological community. The FDSN Action Group on Marine Seismology Data and Metadata Standards is working to develop and publish an international standard for marine-specific data and metadata standards, as well as a list of validated open-source tools for processing specific to marine data. We aim to propose these standards to the FDSN at the summer 2025 Lisbon IASPEI meeting. We present here the current state of the proposed standards and invite you to comment on and/or add to them.

How to specify `start_date`s and `end_date`s (and cut data)

Need a way to specify the difference between expected and obtained data. Simplest solution is:

  • Set station start_date and end_date as the expected beginning of recording and the release time, respectively
  • Set channel start_date and end_date to the actual beginning and end of useable data
  • Cut miniSEED data to correspond to channel start_date and end_date

But this creates other issues:

  • Broadband seismometers usually take a few hours to come online. Should this data be "cut" to where the useable data starts? Could give false impression of data success rate.
  • Some users want to see even bad data, to confirm that it is in fact unuseable.

Could we leave all data in (or at least as much as is necessary to see that the data goes bad at the end, if this is the case) and specify channel start_date and end_date that are tighter than the actual data (channel dates can be contained inside of channel data, but station dates cannot).

Similar questions are being asked for land data (@KaseyAderhold): we should make sure our solution is compatible with their's, but appropriate for OBSs.

Information about leveling systems in StationXML file

Dear Wayne,

I think that the information about leveling system may help researchers how sensors are tilted. During the analysis of tilt noise, I found that the tilt angles of sensor estimated by transfer function between Z-H is much larger than that reported by leveling unit's tilt meter (Kawano et al., 2023, BSSA) and the tile angle of the Z-component of CMG-3T is aligned to the tile of the leveling unit. This discrepancy could be recognized only when leveling unit information was described.

For example

leveling unit: ERI GLC-02, tolerance : 0.03 degree, accuracy : 0.01 degree

In case of Japanese BBOBS, tilt angles of two horizontal components of leveling unit are recorded once a day during an observation, so it is possible to describe the observed tilt angle. It is not so often but tilt angles could be changed during an observation, Describing the observed tilt in comments is not proper, I think. So, only tolerance and accuracy of the leveling system are included.

Cheers,

Takehi

Add OBS-specific parameters using external namespace?

GFZ does this for their instrument ids.
obspy seems to throw out this information
Could be useful for structurable information like clock drift (currently shoved into comments using JSON strings, but difficult for data preparers to parse).

Add leap-second correction line to msmod?

Currently, instructions for correcting leap seconds include two slightly complicated calls to msmod.

For example, for a positive leap second on 2016-12-31T23:59:60, the calls are :

msmod --timeshift -1 -ts 2016,366,23:59:59.999999'
msmod –-actflags ‘4,1’ –tsc 2016,366,23:59:59.999999 –tec 2016,366,23:59:59.999999

It would be simpler to have a specific msmod command, such as

msmod -lsp 3692217600	37

where the above values are directly copied from the leap-seconds.list file, or

msmod -lsp 2017,1,0:0:0

which is clearer, but requires the operator to calculate the day of/after the leap second (easier for 1 january than for 1 july!)

Any thoughts on this?

I'm not sure if this is something that should be combined in the request to the software company, or if Chad Trabant could more easily (and readily) code it himself.

Cheers

How to label type and support for each standard/recommendation

One option would be to prefix every item by:

  • [FDSN] if it is an existing FDSN standard (included for information). For example, the channel codes.
  • [STD {ratio}] if it is a proposed OBS standard, for example how to indicate the time correction state
  • [REC {ratio}] if it is a proposed OBS recommendation, for example providing SHIFTED data to data centers

where {ratio} is the ratio of yes votes to total members voting, for example "6/7"

How to handle leap-seconds

Need to clarify documentation on how to handle leap-seconds:

  1. Note sync times corrected for leap-seconds or not?
  2. Calculate drift using sync times corrected for leap seconds?
  3. Order to correct (leap seconds, then drift, or vice-versa)?
  4. Other questions?

First we have to agree amongst ourselves

Proposed changes to "Clock Correction"

The current specification for clock corrections is:

Full structure (as YAML):

"clock_correction":
    "drift":
        "base": {"instrument": "Seascan MCXO, ~1e-8 nominal drift", "reference": "GNSS"}
        "type": "piecewise"
        "interpolation": "linear"
        "syncs_reference_instrument": 
            - ["2015-04-23T11:20:00", "2015-04-23T11:20:00"]
            - ["2016-05-27T14:00:00.2450", "2016-05-27T14:00:00"]

As a StationXML Comment

<Comment subject="clock_correction">
    <Value>{"drift": {"base": {"instrument": "Seascan MCXO, ~1e-8 nominal drift", "reference": "GNSS"}, "type": "piecewise", "interpolation": "linear", "syncs_reference_instrument": [["2015-04-23T11:20:00", "2015-04-23T11:20:00"], ["2016-05-27T14:00:00.2450", "2016-05-27T14:00:00"]]}}</Value>
 </Comment>

In msmod drift correction specification file

interpolation: linear
# Reference time           Instrument time
2015-04-23T11:20:00        2015-04-23T11:20:00
2016-05-27T14:00:00.2450   2016-05-27T14:00:00"

Propose 2 changes:

  1. combine type and interpolation into type: ['piecewise_linear', 'cubic_spline']
  2. Allow instrument times to be specified as seconds to add to reference time

The new msmod specification would look like

In msmod drift correction specification file

type: piecewise_linear
# Reference time         Instrument time
2015-04-23T11:20:00        +0.0
2016-05-27T14:00:00.2450   2016-05-27T14:00:00"

Simplify users.md

Don't repeat anything presented in standards.md. Just list topics for Data and Metadata, and only independant part is the Software.

Specify how to convert geophone "N" and "E" channels to "1" and "2"

For a seismometer, the "N" component becomes the "1" channel and the "E" component becomes the "2" channel. For a geophone, if the horizontals have the same "reverse polarity" as the vertical, would the equivalent rule be to map the "N" component to the "2" channel and the "E" component to the "1" channel?

How to correct for leap-seconds?

OBS dataloggers do not (to my knowledge) account for leap seconds, so the data (and possibly the clock sync) needs to be corrected for this.

In the recommended clock correction section, we specify that the "reference" (GNSS, in general) and instrument times should be compared for the beginning and end of the experiment, and maybe at intermediate times if the drift is measureably non-linear.

My question is: should the instrument time provided be "adjusted" for any leap-seconds?

FOR:

  • Drift correction calculations do not need to account for leap seconds: if users DON'T do this, then the proposed msmod drift correction will skew everything
  • If intermediate clock comparisons are provided, it is much easier to figure them out if the leap-second has already been eliminated from the equation.

AGAINST:

  • the term "instrument time" implies the time indicated by the instrument, not a corrected time
  • Correcting the leapsecond offset programmatically avoids operator error.

I'm not sure how this would work for time cross-correlations or John's CSAC-correction: would these be more logical in one case than in the other?

Ideally, the instrument times would be the true ones and the leap second correction (which would be run before correcting clock drift) would somehow modify the instrument times in the clock comparison file, maybe adding a "time zone" like +00:01 to the given time to indicate that the leap second was corrected?

OBS elements missing in StationXML

These are elements important for OBSs but which are not currently in StationXML. If they are not in StationXML we will place them there as Comments, or using an obs-specific namespace. But we can propose them to FDSN as future StationXML additions.

Maybe make a separate document on this?

How to specify leap seconds in metadata

The standards.md document currently states:

Structure is:

time: 2016-082T23:59:60Z
type: +
description: Positive leap-second (a 61-second minute)
correction_data:
    - msmod --timeshift -1 -ts 2016,182,23:59:59.999999
    - msmod --actflags ‘4,1’ –tsc 2016,182,23:59:59.999999 –tec 2016,182,23:59:59.999999
correction_end_sync_instrument: subtracted one second from displayed instrument time

only time and type are required

[REC {/}] Embedded in a StationXML <Comment>. Possible future namespace element, as for clock drift. Below is a StationXML example

<Comment subject=”Leap Second”>
<Value>“{time: 2016-082T23:59:60Z, type: '+', description: 'Positive leap-second (a
61-second minute)', correction_data: ['msmod --timeshift -1 -ts 2016,182,23:59:59.999999', ' msmod –actflags ‘4,1’ –ts 2016,182,23:59:36 –te 2016,183,00:00:36'], correction_end_sync_instrument: subtracted one second from displayed instrument time"</Value>
</Comment>

I suggest a change to

Structure is:

leapseconds:
    values:
       - 
          list_file_strings: "3692217600      37      # 1 Jan 2017"
          type: "+"
   corrected_in_basic_miniseed: false
   corrected_in_syncs_instrument: true

values is an array/list, to allow for more than one leap-second during a deployment.

list_file_string should be directly copied from leap-seconds.list, which is available online at several sites, including https://data.iana.org/time-zones/tzdb/leap-seconds.list. The user should verify that the "File expires on" date is later than the last instrument channel's end-date.

type indicates whether the second number in the list_file_string is greater than the previous line's value ("+") or less than the previous line's value ("-"). As of June 2024, all leap seconds have been type "+"

corrected_in_basic_miniseed indicates whether the "raw" miniSEED data (if any) correctly integrates the leap second. For most OBS deployments, this value should be false as dataloggers without GPS don't (yet?) have a way to integrate leap seconds.

corrected_in_syncs_instrument: indicates whether the instrument sync times have been corrected for the leap second(s). They should generally be, but in some cases (many sync times and/or more than one leap second) this may be best left to an algorithm that inputs these values than to a human operator.

[REC {/}] The information should be embedded in a StationXML <Comment> with subject='Clock Corrections'. This is a possible future namespace element (see #1 ), as is clock drift. Below is a StationXML example

<Comment subject=”Clock Corrections”>
<Value>“{leapseconds: {values: [[list_file_string: '3692217600      37      # 1 Jan 2017,' type: '+']],   
                       corrected_in_syncs_instrument: true,
                       corrected_in_basic_miniseed: true}"</Value>
</Comment>

The advantages of the proposed changes are:

  • Simpler entry of leap-second values by the user, with less need for interpretation
  • Clearer information about what has been corrected and what has not
  • A clearer tie to the drift-based clock corrections (changing subject to "Clock Corrections", which would be shared with
    the drift-based clock corrections

I removed the information about how the leap second was corrected because it seems like a lot of information for the user to enter. The standards.md document should still include this information as a recommendation for data file processing

It may be wise to also change the structure of the clock drift corrections to something like:

     drift:
        type: "linear"
        start_sync_reference: "2016-09-10T00:00:00Z"
        end_sync_reference: "2017-07-13T11:25:00.6189Z"
        end_sync_instrument: "2017-07-13T11:25:01Z"

and also use subject="Clock Corrections" in the associated StationXML< Comment>.
This would give a clearer hierarchy in the StationXML file

Location codes to specify time-shifted data

[1] Do we want to use location codes to indicate whether or not data are time corrected (as opposed using, e.g., "Q" and "D" quality flags)?

[2] If location codes are used, what is the convention? Is using 00-49 (00 being the de facto "default" location) to specify time-corrected (as opposed to raw) potentially confusing?

Station names for repeated deployments

The standards.md document currently states:

Station names for repeated deployments
[REC` {/}] If OBSs are deployed repeatedly at one site (to make a long series), use an incrementing alphanumeric character at the end of the station name (i.e., A01A, then A01B then A01C for subsequent deployments at the same approximate location). This may be a de facto "standard", but I haven't seen it written down

On our July 10th call when voting on this item, Takehi brought up that in practice ERI will reoccupy sites and designate a new epoch under the same station name, with no incrementing character. We had some discussion of location uncertainties during redeployment. I suggested we reference existing standards or conventions, where if the uncertainty in reoccupying the same site is greater than the standard for renaming the station versus starting an epoch.

What parameters for cubic spline interpolation?

Need to answer this in able to present a new proposal to companies (#24)

For the cubic spline, have to choose which type:

  • "natural": 2nd deriv = 0 at the control points
  • other boundary conditions on the curves

Other possibile interpolations are (from scipy 1.14.0)

  • "akima1d": how does this work?
  • "pchipinterpolator": how does this work?

Should write a simple code to compare the results of each

Note that our case is a simplification of the general case, since control points are monotonically increasing.

We want to avoid putting too much curvature on big stretches of time, allow more in small stretches:

  • metric minimizing curvature*length?
  • metric minimizing deviation from linear interpolation?

Put link to obsinfo `subnetwork` file format in users.md

So that people running campaigns know what metadata is needed and how to validate it.

If there is another complete metadata format, please inform.

The "subnetwork" format should be updated to work without file dependencies (creates station-level StationXML)

If/how to specify time base nominal drift (or precision)

Currently, the field drift:time_base is open text, and the example in standards.md contains both the Model and the "nominal drift", which Wayne probably copied from the documentation. Questions include:

  • Should we include "nominal drift" information?
  • If so
    • would it be better named "precision" (or "nominal precision")?
    • should it be left in the time_base field (no standard for whether to put it there or not, but also no strict requirement for verification) or in a separate field (nominal_precision?, nominal_drift)?

@KAderhold was afraid that, without some verification by the user, the value communicated by the manufacturer could be false
@jcollins considers that using the manufacturer's value is not a problem and gives a reference for whether the actual observed drift can be considered "within spec" or not

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.