Comments (15)
You can just delete the events and write them again:
In [5]: ds
Out[5]:
ASDF file [format version: 1.0.0]: 'test.h5' (0.0 bytes)
Contains 0 event(s)
Contains waveform data from 0 station(s).
In [6]: ds.add_quakeml(obspy.read_events())
In [7]: ds
Out[7]:
ASDF file [format version: 1.0.0]: 'test.h5' (8.8 KB)
Contains 3 event(s)
Contains waveform data from 0 station(s).
In [8]: del ds.events
In [9]: ds
Out[9]:
ASDF file [format version: 1.0.0]: 'test.h5' (8.8 KB)
Contains 0 event(s)
Contains waveform data from 0 station(s).
In [10]: ds.add_quakeml(obspy.read_events())
In [11]: ds
Out[11]:
ASDF file [format version: 1.0.0]: 'test.h5' (8.8 KB)
Contains 3 event(s)
Contains waveform data from 0 station(s).
from pyasdf.
Hi Youyi,
I remember you mentioned that if you replace the event information, you can't process the file anymore. It that true? Please confirm it.
from pyasdf.
Thanks, but now I am not sure that's problem. When create ASDF files we associated waveform data with event, after source update the id is changed.
In [17]: ev2.preferred_origin
Out[17]: <bound method Event.preferred_origin of Event(resource_id=ResourceIdentifier(id="smi:local/ndk/C090497A/event"), event_type=u'earthquake', event_type_certainty=u'known')>
In [18]: ev1.preferred_origin
Out[18]: <bound method Event.preferred_origin of Event(resource_id=ResourceIdentifier(id="smi:local/cmtsolution/C090497A/event"), event_type=u'earthquake’)>
However, the change in event information seems doesn't matter, signal processing crashes due to other reasons which I believe something with the data itself. I will need to look it again tomorrow.
File "/autofs/nccs-svm1_home1/youyir/src/pypaw/src/pypaw/process.py", line 28, in process_wrapper return process(stream, inventory=inv, **param)
File "/autofs/nccs-svm1_home1/youyir/src/pytomo3d/pytomo3d/signal/process.py", line 212, in process st.detrend("linear")
File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/util/decorator.py", line 241, in new_func return func(*args, **kwargs)
File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/stream.py", line 2304, in detrend tr.detrend(type=type)
File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/util/decorator.py", line 258, in new_func return func(*args, **kwargs)
File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/util/decorator.py", line 241, in new_func return func(*args, **kwargs)
File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/trace.py", line 231, in new_func result = func(*args, **kwargs)
File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/obspy/core/trace.py", line 1817, in detrend self.data = func(self.data, **options)
File "/ccs/home/youyir/src/anaconda/lib/python2.7/site-packages/scipy/signal/signaltools.py", line 1553, in detrend newdata = newdata.astype(dtype)
ValueError: could not convert string to float:
from pyasdf.
The preferred_origin
attribute is a function and you have to call it to actually get the origin.
For the other error: Can you create a minimal example that reproduces that behavior?
from pyasdf.
Yes, I should call preferred_origin
but the point is the resource_id is changed. I was worried because it might cause problem since we tie the waveforms with source, ds.add_waveforms(st, tag=tag, event_id=event)
, but now it seems not matter.
I am looking into the data problem and see if I can find the issue, will keep you posted.
from pyasdf.
Well yes - if you change the origin it should have a different id...you can always manually force it but I recommend against it. Each id should uniquely identify something.
Do you need functionality to add event/origin/... ids to existing waveforms?
from pyasdf.
I would be very nice to have such function. BTW, I am still puzzled how the waveforms ties to a certain event? what would happen if we have multiple events in one ASDF file?
from pyasdf.
I'll add that functionality.
A waveform is tied to an event via its id. Each event has a unique id thus multiple events are no problem. Each waveform can also be tied to multiple events - in that case it will be associated with multiple events ids.
from pyasdf.
in hdf5 speak: each waveform can have an attribute event_id
that can store one or more ids.
from pyasdf.
Thanks @krischer ! That will make things better organized.
from pyasdf.
The use of a single dataset for the entire catalog seems like a significant design flaw to me. If I have a large number of events and want to add a new one, I would have to read the entire catalog, delete the dataset, append the new event to the QuakeML, and write the new dataset. Meanwhile to add stations and/or waveforms, all I have to do is add the datasets for the new ones and don't touch the old ones.
How difficult would it be to alter the HDF5 layout so that the catalog is stored via something like
Catalog/{event_id}/QuakeML
? This provides consistency with the station layout. This would also allow adding auxiliary datasets like finite-source rupture models to the event, e.g., Catalog/{event_id}/Rupture Models/{tag}
.
from pyasdf.
This would require a new version of the ASDF format but this is something we have done before and I don't see any major difficulty there. The reason we originally did not do this is that it is possible (and people do use this) to attach additional information at the Catalog
level which in your proposed scheme could not be retained. Also the QuakeML data format is pretty much already designed to do this.
That being said it is true that XML is subpar for some applications and that it is cumbersome to work with very large event data sets and in this particular instance, the trade-off of not retaining catalog level information might be acceptable and a good idea.
How many events are you thinking about? There are a lot of tricks one could use to make this fast at the library level without touching the underlying data format.
Adding auxiliary data to a place not within the AuxiliaryData
group would break a lot of assumptions and I'm not sure its worth it. One could still store event information as is right now or with your proposed changes and then store rupture models under /AuxiliaryData/RuptureModels/{event_id}/{tag}
or something like it. Then the connection to events is trivial.
Doing your proposed change would require three steps:
(1) Update the definition https://github.com/SeismicData/ASDF_definition
(2) Add a new schema to the validator. Its already all set-up to deal with multiple ASDF version so this is pretty simple. https://github.com/SeismicData/asdf_validate/tree/master/asdf_validate/schemas
(3) Add support to the Python (https://github.com/SeismicData/pyasdf) and C (https://github.com/SeismicData/asdf-library) libraries. As its versioned this is somewhat optional but one of course would also like to use it.
from pyasdf.
After thinking more about my use cases, it probably makes sense for me to generate an ASDF file for each event. These can be merged as needed into larger ASDF files when generating a data set for a collection of events. This seems better than changing the ASDF format.
from pyasdf.
I'll probably add some kind of read-only multi-file dataset object pretty soon as it would come for some things we do. The idea is to offer the same interface as the ASDFDataSet
object but also if the data originates from many different files.
from pyasdf.
No activity for a while. Closing for now.
from pyasdf.
Related Issues (20)
- Seis-prov hash IDs HOT 2
- parallel I/O and python 3 HOT 2
- Why are tag paths validated against r"^[a-zA-Z0-9][a-zA-Z0-9_]*[a-zA-Z0-9]$"? HOT 7
- Slow performance when adding gappy data to ASDF HOT 3
- adding an existing StationXML throws TypeError HOT 1
- setup broken between 0.5.1 and 0.6.1 HOT 4
- How to make pyasdf to support multi-date-range station inventory + custom tags? HOT 6
- Calling parallel process() for a very large file gets stuck HOT 7
- Why the differences between input station XML file to ASDF and the output station XML file HOT 5
- Problem with hdf5 1.10 and _add_trace_write_collective_information during process() HOT 2
- SUGGESTION: adding public APIs for parallel writing functions HOT 1
- pkg_resources deprecated? HOT 1
- newer versions h5py > 2.1 will break pyasdf events HOT 2
- Error @opening after copying pyasdf-archive HOT 2
- Renaming a tag HOT 1
- Increasingly slow to add events to ASDF dataset HOT 3
- use dense attribute storage for large attributes
- Citation?
- Rounding start times to nearest sample rather than previous sample
- Issue in tutorial
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyasdf.