d-chambers / detex Goto Github PK

View Code? Open in Web Editor NEW

32.0 32.0 6.0 22.41 MB

A Python package for subspace detection and waveform similarity clustering

License: Other

Python 100.00%

detex's People

Contributors

Stargazers

Watchers

Forkers

sladebeta iceseismic luccas-wang sislzu mingzhaochina zhangge886

detex's Issues

using subSamp = True in createCluster causes waveform misalignment

If a SubSpace instance is created using a ClusterStream instance that was initiated with subSamp = True the subsample extrapolation can cause misalignment in the multiplexed waveforms. This results in a higher than necessary dimension of representation. The following examples from the intro tutorial illustrate the point. The first is a plot from a SubSpace instance when its Cluster instance used subSamp = False and the second when subSamp = True

Detex debug call in getdata module

A detex.deb (hardstop debug function) call occurs when an FDSN client fails to fetch data

Can no longer get hypoDD times

I used a standard TemplateKey from an earthquake catalog. I first ran
cl = detex.createCluster(CCreq=0.68,trim=[5,30],fetch_arg='../EventWaveForms',fileName='clustDD.pkl',enforceOrigin=True)
and then

cl.writeSimpleHypoDDInput(minCC=0.70)
Traceback (most recent call last):

File "", line 1, in
cl.writeSimpleHypoDDInput(minCC=0.70)

File "/home/pankow/anaconda/lib/python2.7/site-packages/detex/subspace.py", line 110, in writeSimpleHypoDDInput
trdf = self.TRDF[self.TRDF.Station == sta].iloc[0]

AttributeError: 'ClusterStream' object has no attribute 'TRDF'

KML function to rep. clusters

A method of the detex.subspace.Cluster class to output KML files (for google earth) for each station cluster structure would be useful for visualization

Data length determination causes detex to fail on data sets with gaps

Detex tries to determine an expected data length for the continuous data by loading random samples.

The problem can be seen from the data set attached.

A more data quality dependent way is needed to determine the expected length.

StationKey.txt

TemplateKey.txt

No events survived pre-processing, check DataFetcher and event quality

Hi Chambers,

I recently installed the code and I am trying to run the intro example you provided in the website. However, I am having some issues when I call the function createCluster():

Cannot remove response without a valid inventoryArg, setting removeResponse to False
Starting IO operations and data checks
/opt/anaconda/lib/python3.5/site-packages/scipy/linalg/basic.py:1226: RuntimeWarning: internal gelsd driver lwork query error, required iwork dimension not returned. This is likely the result of LAPACK bug 0038, fixed in LAPACK 3.2.2 (released July 21, 2010). Falling back to 'gelss' driver.
  warnings.warn(mesg, RuntimeWarning)
Traceback (most recent call last):
  File "Intro_detex.py", line 5, in <module>
    cl = detex.createCluster(CCreq=0)
  File "/opt/anaconda/lib/python3.5/site-packages/detex-1.0.8-py3.5.egg/detex/construct.py", line 122, in createCluster
    detex.log(__name__, msg, level='error')
  File "/opt/anaconda/lib/python3.5/site-packages/detex-1.0.8-py3.5.egg/detex/__init__.py", line 138, in log
    raise e(msg)
Exception: No events survived pre-processing, check DataFetcher and event quality

These are lines I am using so far:

import detex
detex.getdata.makeDataDirectories()
cl = detex.createCluster(CCreq=0.4)

Have you seen this before? Thanks in advance for the help.

Support quakeML and stationXML as inputs (or obspy equivalent)

Detex should accept these formats as inputs rather than just the custom stationkey and templatekey files.

too many columns, fail to write .index.db

fails to write indkey table in .index.db. This happens for the latest version of detex I have, also tried with an older version. Deleted and re-indexed though clustering and with getdata- same issue, too many columns. When I do get it to write a .index.db, I get an error because there is no 'indkey' table- so still seems like related error.

Traceback and associated files below.

File "", line 1, in
detex.getdata.makeDataDirectories(getContinuous=False)

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 174, in makeDataDirectories

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 202, in _getTemData

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/getdata.py", line 936, in indexDirectory

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/util.py", line 880, in saveSQLite
DF, Tablename, con=conn, flavor='sqlite', if_exists='append')

File "/home/linville/Applications/anaconda/lib/python2.7/site-packages/detex/pandas_dbms.py", line 83, in write_frame
cur.execute(schema)

OperationalError: too many columns on indkey

Archive.zip

Detex compilation warnings with Anaconda3 install.

Derrick,
After installing the newer Anaconda3-5.0.0-MacOSX-x86_64 and using the below configuration,

conda config --add channels conda-forge
conda create -n detex python=2.7
source activate detex
conda install pyqt=4 (Had to use an older version)
conda install joblib
conda install simplekml
conda install basemap
conda install obspy_

we get the warning

/home/blycker/anaconda3/envs/detex/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by " \

I didnt know if you were still supporting this but I thought I should drop you a line.

Bill

_loadFromNEIC and _loadFromEarthworm missing st = None assignment when getWaveforms fails

loadFromNEIC and _loadFromEarthworm of the getdata module are missing st = None assignment, or return None, when getWaveforms fails. A specific exception type should also be specified for when data are not available.

Events that occur too close in time link events to clusters that don't belong

Some events that occur close in time with other events can be included in the data pulled when each event is correlated against every other event to determine event clusters.

For example, in the following 2 plots:

contains 2 events (although only one is in the catalog).

Most of the events in the cluster associate with the second event, one event, however, associates better with the first.

This results in an alignment of the events in the first plot to the second event in the same plot. Consequently, the waveform in the bottom figure gets poorly aligned, as in this plot of a waveform group:

put detex on pypi

Putting detex on pypi will allow easier updates and installation through pip/easy_install

Failed alignment due to similar end effects

The current function in the construct model for calculating waveform correlation coefficients (_CCX2)
works by taking one of the two waveforms (each of length n) and zero padding n elements to the beginning and n elements to the end to the waveform. Conceptually, the other waveform is then slide over the zero padded waveform and the CC is calculated at every time step. This can allow similar parts of the end of one waveform and the beginning of another (such as filter effects) to produce the highest correlation coefficient in the cc trace, even though only a few samples are actually similar, as is the case in the plots shown in issue 19. When this happens it breaks the alignment algorithm in the createSubSpace call.

In order to remedy this the waveform the waveform to be padded will only receive n/2 zero elements at the beginning and n/2 zero elements at the end.

edit : The padding had to remain n on both sides or else the normalization gets screwed up, but the correlation coefficient vector is now sliced before determining its max to the n/2 bound, essentially doing the same thing.

Error messages when running ss.SVD and the final ss.detex steps

I have been able to successfully run detex subspace on my stations for the year 2003, however when I try to add another year (in this case 2004, but it has happened with other years as well) I get the following message when running ss.SVD()

To try and continue forward with the program, I set the threshold (ss.SVD(Threshold=0.25)) and run ss.detex(useSingles=True, fillZeros=True), but then get the following error message that looks very similar to the previous one...

I tried attaching my Station and Template Keys to this posting, but it wouldn't let me (said "Attaching documents requires write permission to this repository"). I can email them to you however if needed. All the Continuous Data I have used for these runs has been filtered out using the script you and Kris wrote a few weeks back.

Missing information in SQLite database

I ran the detex command and got all the detections for each station. When I went to use the results module, I discovered that the only table that made it to the SQLite database is ss_df, the one containing the results of the detections for the subspaces. After it gave the detections for the final station, it threw this up before stopping:

sta lta req of 8 failing on station IMU, dropping sta/lta requirement
sta lta req of 8 failing on station IMU, dropping sta/lta requirement
sta lta req of 8 failing on station TCRU, dropping sta/lta requirement
UU.TCRU starting on 2018-11-20T15:23:00 is shorter than expected
sta lta req of 8 failing on station TCRU, dropping sta/lta requirement
UU.TCRU starting on 2019-05-01T01:00:00 is shorter than expected
sta lta req of 8 failing on station TCRU, dropping sta/lta requirement
Traceback (most recent call last):
File "", line 1, in
File "/home/arecord/Subspace/detex/detex/subspace.py", line 1873, in detex
self.setSinglesThresholds()
File "/home/arecord/Subspace/detex/detex/subspace.py", line 1089, in setSinglesThresholds
useSubSpaces=False, **kwargs)
File "/home/arecord/Subspace/detex/detex/subspace.py", line 1743, in getFAS
issubspace=False)
File "/home/arecord/Subspace/detex/detex/fas.py", line 59, in _initFAS
ssArrayTD, ssArrayFD, staltalimit)
File "/home/arecord/Subspace/detex/detex/fas.py", line 119, in _getDSVect
detex.log(name, msg, level='error')
File "/home/arecord/Subspace/detex/detex/init.py", line 138, in log
raise e(msg)
Exception: Could not get any data for ECUT

createCluster removes templates if not exactly equal to the median length

In trying to cluster older data ~75% of the events were being removed. Relu found that there is an if statement to delete templates if not equal to the median template length. While this makes sense to remove lesser quality data, it is also removing templates that differ by a single point.

This one point difference in the template length is introduced in the obspy trim function. We think it would be best to keep all waveforms with only a single point difference, but only use the median template length in the calculations.

writing new detections to template key

When using res.writeDetections(eventDir='DetectedEvents',updateTemKey=True)
to update the TemplateKey the 'TIME' format is wrong. Looking at the bottom of the TemplateKey (head and tail shown below), the TIME seems to be in seconds instead of the date time stamp. When I later tried to use this template key to get lag times with
cl = detex.createCluster(CCreq=0.68,trim=[5,30],fetch_arg='../EventWaveForms',fileName='clustDD.pkl',enforceOrigin=True)
The memory fills and python crashes. Although I can run cluster
cl = detex.createCluster(CCreq=0.68,trim=[5,30],fetch_arg='../EventWaveForms')
and get cluster results.

[brewster:DATA/CIRCLEVILLE/Detections_1wk] pankow% head TemplateKey.csv_det
,Unnamed: 0,TIME,NAME,LAT,LON,MAG,DEPTH,STMP
0,0.0,2010-09-29T15-48-59.63,2010-09-29T15-48-59.63,38.202,-112.251833333,1.29,4.32,1285775339.63
1,1.0,2011-01-03T12-06-36.88,2011-01-03T12-06-36.88,38.2473333333,-112.33983333299999,4.56,5.4,1294056396.88
2,2.0,2011-01-03T12-10-08.66,2011-01-03T12-10-08.66,38.2491666667,-112.30616666700001,2.92,2.03,1294056608.66
3,3.0,2011-01-03T12-23-19.05,2011-01-03T12-23-19.05,38.248666666700004,-112.320333333,0.96,1.68,1294057399.05
[brewster:DATA/CIRCLEVILLE/Detections_1wk] pankow% tail TemplateKey.csv_det
397,,1294325117.3600001,d2011-01-06T14-45-17,,,0.12130792350080775,,
398,,1294328589.025,d2011-01-06T15-43-09,,,0.42995299945604737,,
399,,1294330616.4850001,d2011-01-06T16-16-56,,,0.05488828309071697,,
400,,1294331688.385,d2011-01-06T16-34-48,,,-0.1258472051222964,,
401,,1294332071.1999998,d2011-01-06T16-41-11,,,-0.06266347145329888,,

Station UU.NMU Makes No Detections When It Should

UU.NMU is a close station (~30km) to my swarm of interest with good template event waveforms. However, it is making no detections even though it looks like there are good detections to be made that were detected by my other 3 stations, UU.MSU, UU.IMU, UU.DWU.

Here is an example of a detection UU.NMU is missing.

I have tried running just the vertical component in case there was an issue running one 3-component station with single component stations, but again no detections. Any thoughts on why this would be happening? It would be useful for me and others to be able to utilize this station.

writeDetections method of the SSResults class sometimes fails to fetch data that exists

Add support for various input formats

Support should be added to support continuous data inputs other than the directory structure set up by detex.

log continuous data files in the subspace detection

Printing to the log and/or screen the current continuous data file could be useful in order debug postmortem

_loadDirectoryData function fails to return some event waveforms

_loadDirectoryData function fails to return event waveforms if another event waveform overlaps. This happens because the index parser assumes the data will not overlap much.

Several SubSpace class visualization methods broken

consisLen = True on the createCluster function can cause corr coef to be wrong

For data with gaps, _testStreamLengths method of construct trims time domain waveforms after frequency domain waveforms have already been calculated. This results in a misaligned normalization vector in _CCX2 (which calculates correlation coefficients)

detex method of SubSpace fails when fillZeros==True

This behavior can be seen in the basic tutorial dataset if the fillZeros parameter is set to True in the normal workflow.

createCluster clusters one inexistent waveform in a cluster

The event 2014-04-13T00:23:49.000Z does not exist in station MB.MCMT. But when I use version 1.0.5 to do the createCluster in MB.MCMT, it clusters this event to the cluster 2.

SVD ObsPyDeprecation warning

Just wanted to give you a heads up in case you have not seen this:
/home/pankow/anaconda/lib/python2.7/site-packages/obspy/init.py:159: ObsPyDeprecationWarning: Module 'obspy.station' is deprecated and will stop working with the next ObsPy version. Please import module 'obspy.core.inventory' instead.
ObsPyDeprecationWarning)
/home/pankow/anaconda/lib/python2.7/site-packages/obspy/init.py:159: ObsPyDeprecationWarning: Module 'obspy.station.response' is deprecated and will stop working with the next ObsPy version. Please import module 'obspy.core.inventory.response' instead.
ObsPyDeprecationWarning)
/home/pankow/anaconda/lib/python2.7/site-packages/obspy/init.py:159: ObsPyDeprecationWarning: Module 'obspy.station.util' is deprecated and will stop working with the next ObsPy version. Please import module 'obspy.core.inventory.util' instead.
ObsPyDeprecationWarning)

Report 'list' object has no attribute 'tolist' error and failed to write ss_info to SubSpace.db

When I run ss.detex with after ss.SVD(threshold =0.15), it failed to write the ss_info to the SubSpace.db and report error: 'list' object has no attribute 'tolist'. I check the code and it seems that the self.histSubSpace[sta][skey] will return a list not an array.

I run ss.detex before with the whole year continuous data and threshold in SVD is none, it worked well and didn't report this error. This time I run ss.detex just in April which station US.HILD has no continuous data in this month, the error occurred.
Now, I'm running ss.detex in the April data with threshold in SVD equal none and see if this error would occur again.

Detections for only subset of data in Continuous Waveforms

The options UTCstart and UTCend when creating detections from continuous data do not seem to work. Even with these values set when looking for detections, detex looks through all the continuous data. Stephen suggested that the search window be tied to the date strings in the StationKey

Remove reponse fails with attached inventory on data fetcher

supplying an inventory object to a datafetcher throws an error when attempting to remove the instrument response.

Detex not reading continous waveform data

Trying to run SVD or detections on data has resulted in Detex being unable to read the Continuous data files.

I have tried running the data in pickle and mseed format. Neither has worked properly.

Using data that was pulled by previous versions of Detex I received this error (file format pickle):

When trying to run SVD on the older data, it skips all waveforms and ends up with no data to run SVD.

Thinking that there might be something wrong with the continuous waveform data, I downloaded a month's worth of continuous data to a new directory. This was completed in both pickle and mseed format using Detex 1.0.6.
After the download completed, Detex began to auto-index the ContinuousWaveform directory with this result:

I tried running SVD (after terminating the auto indexing and Detex tried to index again), with this result:

Just to try it, I created a subspace with the new data:

I have been able to successfully use detex.pickTimes() and have been able to see those waveforms. Detex has also had no problem reading either pickle or mseed format EventWaveforms. The clusters have been produced without error. I am using fillZeros=True, but all other parameters (minus directory location variables) have remained unchanged.

My TemplateKey:
TemplateKey.txt

My Station Key:
StationKey.txt

*Both have been switched to ".txt" for uploading.

*If you choose to download the data, the four stations will be about 9 GB of data for the single month. The event waveforms take up approximately 50 MB of disk space.

Clustering returning empty link matrix

I've been trying to create clusters and haven't been having any luck. I keep getting this error:

I've attached the Template Key and the Station Key files that I'm using. I did try running the tutorial again to be sure it wasn't Detex and the tutorial ran correctly. I tried just running the clustering on one station (IMU) and I still received the same error. I'm going to keep trying to figure out why I'm having this trouble, but any insight into the issue would be great.

Oh, I can pull data, it's just getting the cluster to form that I'm having issue.

If it helps to narrow the issue, I also have had trouble creating a subspace. I thought that maybe it was my previous cluster (which created correctly just a few days ago (with more stations in the Station Key and more events in the Template Key) so I tried remaking the cluster and I started receiving the above error.

StationKey.txt
TemplateKey.txt

Detex not loading subspace

I have been trying to get Detex to load the subspace that I created earlier without success.
The error that I am receiving is regarding "no Subspace stream":

I have attempted to load subspaces created with pickle and mseed formats (thinking there may possibly be some loss of data between formats). Neither one will load properly.

Below is a snippet of the output created by a subspace that failed to load once written out to the drive.

This subspace was created with newly downloaded EventWaveform and Continuous data, attempting to look over one month of continuous data.

_attachResponse function of getdata raises HTTPError when used with "uuss" setting

This happens because the chan parameter passed to _attachResponse is a list of channels. The obspy FDSN get_stations function, however, requires a string that may use wildcards. A quick solution is to loop over the channels and add the inventories to an empty inventory like so:

def _attachResponse(fet, st, start, end, net, sta, loc, chan):
    """
    Function to attach response from inventory or client
    """
    if not fet.removeResponse or fet.inventory is None:
        return st
    if isinstance(fet.inventory, obspy.station.inventory.Inventory):
        st.attach_response(fet.inventory)
    else:
        inv = obspy.station.Inventory([], 'detex')
        for cha in chan:
            inv += fet.inventory.get_stations(starttime=start,
                                          endtime=end,
                                          network=net,
                                          station=sta,
                                          loc=loc,
                                          channel=cha,
                                          level="response")
        st.attach_response(inv)
    return st

Issue with PyQT4 crashing when using pickPhases

On Ubuntu pickPhases will work for 3 phase picks and then crash, regardless of which events are picked. Probably related to this:

http://stackoverflow.com/questions/18416201/core-dump-with-pyqt4

createSubspace routine failing

For select the datasets, the createSubspace module fails. Screen shot of output below.

Relu has researched this error and it seems that if there are no singletons the program fails. We think that this is an unwanted feature. It would be nice to treat all events in an area as a subspace or to look at small datasets that are all correlated.

When fillZeros = True on the createCluster correlation coefs above 1 can be reported

This is due to the normalization vector being very near 0 (because large parts of the waveform have been set to 0) causing a near division by zero.

exclude temporally coincident detections from unassociated templates?

doesn't seem robust to report a new event from temporally correlated detections on 2 stations with uncorrelated templates. So if you have templates A-C and you look at which templates are linked in a detection requiring 2 stations, they should like like [A,A],[B,B] or [C,C]. If A and C made a subspace they could look like [[A,C],C] or [[A,C],A] etc...

worth adding a new filter in results?

If helpful, I'm using the following code verify template continuity. It's hack, so check it if you follow this route.

%%

sgdb = loadSQLite('SubSpace.db','sg_info')
ssdb = loadSQLite('SubSpace.db','ss_info')

"""I think Kris makes this with res = detex.results.detResults(blah,blah)
res.Dets.to_pickle('detections_2RS.pkl')"""

with open('detections_2RS.pkl','rb') as f:
detections = pickle.load(f)

templates = readKey('TemplateKey.csv')

%%

yams = []
for i in range(len(detections.Dets)):
tempyam = []
for j in range(len(detections.Dets[i])):
each = detections.Dets[i].reset_index(drop=True)
try:
tempyam.append([templates[templates['NAME'] == sgdb[(sgdb['Name'] == each.Name[j]) & (sgdb['Sta'] == each.Sta[j])].Events.iloc[0]].index[0]])
except:
junkyam = list(np.empty(len(ssdb[(ssdb['Name'] == each.Name[j]) & (ssdb['Sta'] == each.Sta[j])].Events.iloc[0].split(','))))
for k in range(len(ssdb[(ssdb['Name'] == each.Name[j]) & (ssdb['Sta'] == each.Sta[j])].Events.iloc[0].split(','))):
junkyam[k]=templates[templates['NAME'] == ssdb[(ssdb['Name'] == each.Name[j]) & (ssdb['Sta'] == each.Sta[j])].Events.iloc[0].split(',')[k]].index[0]
tempyam.append(junkyam)
yams.append(tempyam)

idx=[min(len(set(x).intersection(set(yams[i][0]))) for x in yams[i]) for i in range(len(yams))]

where idx == 0 is not a valid detection

writeSimpleHypoDDInput

In calculating differences in lag times sometimes inputs 'nan' to dt.cc file. I have not investigated further.