Giter VIP home page Giter VIP logo

Comments (15)

krischer avatar krischer commented on July 23, 2024

You can just delete the events and write them again:

In [5]: ds
Out[5]:
ASDF file [format version: 1.0.0]: 'test.h5' (0.0 bytes)
    Contains 0 event(s)
    Contains waveform data from 0 station(s).

In [6]: ds.add_quakeml(obspy.read_events())

In [7]: ds
Out[7]:
ASDF file [format version: 1.0.0]: 'test.h5' (8.8 KB)
    Contains 3 event(s)
    Contains waveform data from 0 station(s).

In [8]: del ds.events

In [9]: ds
Out[9]:
ASDF file [format version: 1.0.0]: 'test.h5' (8.8 KB)
    Contains 0 event(s)
    Contains waveform data from 0 station(s).

In [10]: ds.add_quakeml(obspy.read_events())

In [11]: ds
Out[11]:
ASDF file [format version: 1.0.0]: 'test.h5' (8.8 KB)
    Contains 3 event(s)
    Contains waveform data from 0 station(s).

from pyasdf.

wjlei1990 avatar wjlei1990 commented on July 23, 2024

Hi Youyi,

I remember you mentioned that if you replace the event information, you can't process the file anymore. It that true? Please confirm it.

from pyasdf.

chukren avatar chukren commented on July 23, 2024

Thanks, but now I am not sure that's problem. When create ASDF files we associated waveform data with event, after source update the id is changed.

In [17]: ev2.preferred_origin
Out[17]: <bound method Event.preferred_origin of Event(resource_id=ResourceIdentifier(id="smi:local/ndk/C090497A/event"), event_type=u'earthquake', event_type_certainty=u'known')>

In [18]: ev1.preferred_origin
Out[18]: <bound method Event.preferred_origin of Event(resource_id=ResourceIdentifier(id="smi:local/cmtsolution/C090497A/event"), event_type=u'earthquake’)> 

However, the change in event information seems doesn't matter, signal processing crashes due to other reasons which I believe something with the data itself. I will need to look it again tomorrow.

  File "/autofs/nccs-svm1_home1/youyir/src/pypaw/src/pypaw/process.py", line 28, in process_wrapper return process(stream, inventory=inv, **param)
  File "/autofs/nccs-svm1_home1/youyir/src/pytomo3d/pytomo3d/signal/process.py", line 212, in process st.detrend("linear")
  File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/util/decorator.py", line 241, in new_func return func(*args, **kwargs)
  File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/stream.py", line 2304, in detrend tr.detrend(type=type)
  File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/util/decorator.py", line 258, in new_func return func(*args, **kwargs)
  File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/util/decorator.py", line 241, in new_func return func(*args, **kwargs)
  File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/trace.py", line 231, in new_func result = func(*args, **kwargs)
  File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/trace.py", line 1817, in detrend self.data = func(self.data, **options)
  File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/scipy/signal/signaltools.py", line 1553, in detrend newdata = newdata.astype(dtype)
ValueError: could not convert string to float:

from pyasdf.

krischer avatar krischer commented on July 23, 2024

The preferred_origin attribute is a function and you have to call it to actually get the origin.

For the other error: Can you create a minimal example that reproduces that behavior?

from pyasdf.

chukren avatar chukren commented on July 23, 2024

Yes, I should call preferred_origin but the point is the resource_id is changed. I was worried because it might cause problem since we tie the waveforms with source, ds.add_waveforms(st, tag=tag, event_id=event), but now it seems not matter.

I am looking into the data problem and see if I can find the issue, will keep you posted.

from pyasdf.

krischer avatar krischer commented on July 23, 2024

Well yes - if you change the origin it should have a different id...you can always manually force it but I recommend against it. Each id should uniquely identify something.

Do you need functionality to add event/origin/... ids to existing waveforms?

from pyasdf.

chukren avatar chukren commented on July 23, 2024

I would be very nice to have such function. BTW, I am still puzzled how the waveforms ties to a certain event? what would happen if we have multiple events in one ASDF file?

from pyasdf.

krischer avatar krischer commented on July 23, 2024

I'll add that functionality.

A waveform is tied to an event via its id. Each event has a unique id thus multiple events are no problem. Each waveform can also be tied to multiple events - in that case it will be associated with multiple events ids.

from pyasdf.

krischer avatar krischer commented on July 23, 2024

in hdf5 speak: each waveform can have an attribute event_id that can store one or more ids.

from pyasdf.

chukren avatar chukren commented on July 23, 2024

Thanks @krischer ! That will make things better organized.

from pyasdf.

baagaard-usgs avatar baagaard-usgs commented on July 23, 2024

The use of a single dataset for the entire catalog seems like a significant design flaw to me. If I have a large number of events and want to add a new one, I would have to read the entire catalog, delete the dataset, append the new event to the QuakeML, and write the new dataset. Meanwhile to add stations and/or waveforms, all I have to do is add the datasets for the new ones and don't touch the old ones.

How difficult would it be to alter the HDF5 layout so that the catalog is stored via something like
Catalog/{event_id}/QuakeML? This provides consistency with the station layout. This would also allow adding auxiliary datasets like finite-source rupture models to the event, e.g., Catalog/{event_id}/Rupture Models/{tag}.

from pyasdf.

krischer avatar krischer commented on July 23, 2024

This would require a new version of the ASDF format but this is something we have done before and I don't see any major difficulty there. The reason we originally did not do this is that it is possible (and people do use this) to attach additional information at the Catalog level which in your proposed scheme could not be retained. Also the QuakeML data format is pretty much already designed to do this.

That being said it is true that XML is subpar for some applications and that it is cumbersome to work with very large event data sets and in this particular instance, the trade-off of not retaining catalog level information might be acceptable and a good idea.

How many events are you thinking about? There are a lot of tricks one could use to make this fast at the library level without touching the underlying data format.

Adding auxiliary data to a place not within the AuxiliaryData group would break a lot of assumptions and I'm not sure its worth it. One could still store event information as is right now or with your proposed changes and then store rupture models under /AuxiliaryData/RuptureModels/{event_id}/{tag} or something like it. Then the connection to events is trivial.

Doing your proposed change would require three steps:

(1) Update the definition https://github.com/SeismicData/ASDF_definition
(2) Add a new schema to the validator. Its already all set-up to deal with multiple ASDF version so this is pretty simple. https://github.com/SeismicData/asdf_validate/tree/master/asdf_validate/schemas
(3) Add support to the Python (https://github.com/SeismicData/pyasdf) and C (https://github.com/SeismicData/asdf-library) libraries. As its versioned this is somewhat optional but one of course would also like to use it.

from pyasdf.

baagaard-usgs avatar baagaard-usgs commented on July 23, 2024

After thinking more about my use cases, it probably makes sense for me to generate an ASDF file for each event. These can be merged as needed into larger ASDF files when generating a data set for a collection of events. This seems better than changing the ASDF format.

from pyasdf.

krischer avatar krischer commented on July 23, 2024

I'll probably add some kind of read-only multi-file dataset object pretty soon as it would come for some things we do. The idea is to offer the same interface as the ASDFDataSet object but also if the data originates from many different files.

from pyasdf.

krischer avatar krischer commented on July 23, 2024

No activity for a while. Closing for now.

from pyasdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.