ai4eps / gamma Goto Github PK

GaMMA: Earthquake Phase Association using a Bayesian Gaussian Mixture Model

License: MIT License

Python 6.32% Jupyter Notebook 93.65% Dockerfile 0.03%

gamma's Introduction

GaMMA: Gaussian Mixture Model Associator

1. Install

pip install git+https://github.com/wayneweiqiang/GaMMA.git

The implementation is based on the Gaussian mixture models in scikit-learn

2. Related papers

Zhu, Weiqiang et al. "Earthquake Phase Association using a Bayesian Gaussian Mixture Model." (2021)
Zhu, Weiqiang, and Gregory C. Beroza. "PhaseNet: A Deep-Neural-Network-Based Seismic Arrival Time Picking Method." arXiv preprint arXiv:1803.03211 (2018).

3. Examples

Hyperparameters:
- use_amplitude (default = True): If using amplitude information.
- vel (default = {"p": 6.0, "s": 6.0 / 1.75}): velocity for P and S phases.
- use_dbscan: If using dbscan to cut a long sequence of picks into segments. Using DBSCAN can significantly speed up associaiton using small windows.
- dbscan_eps (default = 10.0s): The maximum time between two picks for one to be considered as a neighbor of the other. See details in DBSCAN
- dbscan_min_samples (default = 3): The number of samples in a neighborhood for a point to be considered as a core point. See details in DBSCAN
- oversampling_factor (default = 10): The initial number of clusters is determined by (Number of picks)/(Number of stations)/(Inital points) * (oversampling factor).
- initial_points (default=[1,1,1] for (x, y, z) directions): Initial earthquake locations (cluster centers). For a large area over 10 degrees, more initial points are helpful, such as [2,2,1].
- covariance_prior (default = (5, 5)): covariance prior of time and amplitude residuals. Because current code only uses an uniform velocity model, a large covariance prior can be used to avoid splitting one event into multiple events.
- Filtering low quality association
  - min_picks_per_eq: Minimum picks for associated earthquakes. We can also specify minimum P or S picks:
  - min_p_picks_per_eq: Minimum P-picks for associated earthquakes.
  - min_s_picks_per_eq: Minimum S-picks for associated earthquakes.
  - max_sigma11: Max phase time residual (s)
  - max_sigma22: Max phase amplitude residual (in log scale)
  - max_sigma12: Max covariance term. (Usually not used)

Note the association speed is controlled by dbscan_eps and oversampling_factor. Larger values are preferred, but at the expense of a slower association speed.

Synthetic Example

Example using PhaseNet picks

See details in the notebook: example_phasenet.ipynb

Example using Seisbench

See details in the notebook: example_seisbench.ipynb

More examples can be found in the earthquake detection workflow -- QuakeFlow

gamma's People

Contributors

Stargazers

Watchers

gamma's Issues

All magnitudes are coming out ~8

All magnitudes are coming out as ~8 (>1000 events in a one month period). I'm unclear how Gamma is calculating the magnitudes and what I can adjust to make this come out correctly. Running other datasets through the same workflow results in proper magnitudes that are close to what is in their catalogues.

ValueError when disabling DBSCAN

As shown below, when we are in the else branch (config["use_dbscan"]=False), L73 will try to remove -1 in unique_labels. And it will raise ValueError: list.remove(x): x not in list.

GaMMA/gamma/utils.py

Lines 70 to 73 in 067893c

 else: 

 labels = np.zeros(len(data)) 

 unique_labels = [0] 

 unique_labels.remove(-1)

Associating Sparse Network with GaMMA

Hello,

I was curious if GaMMA is able to associate phase picks obtained from a very sparse network composed of 3 instruments?

I have good phase picks acquired from PhaseNet, however when I try to associate the events GaMMA yields the dreaded -1 event index for phase arrivals and generates no catalogs. I've played around with the parameters extensively with all tests resulting the same.

Is GaMMA limited by the number of stations and/or the spatial footprint of the network?

Thank you!

error when using dbscan

Hello,

I got the following errors when using dbscan. Any advice?

Thanks.

Traceback (most recent call last):
File "", line 1, in
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
prepare(preparation_data)
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/runpy.py", line 289, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Volumes/Durian/Projects/Gofar/G_5_GaMMa.py", line 108, in
catalogs, assignments = association(picks, stations, config, event_idx0, config["method"])
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/site-packages/gamma/utils.py", line 98, in association
manager = mp.Manager()
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/context.py", line 57, in Manager
m.start()
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/managers.py", line 562, in start
self._process.start()
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

Traceback (most recent call last):
File "/Volumes/Durian/Projects/Gofar/G_5_GaMMa.py", line 108, in
catalogs, assignments = association(picks, stations, config, event_idx0, config["method"])
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/site-packages/gamma/utils.py", line 98, in association
manager = mp.Manager()
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/context.py", line 57, in Manager
m.start()
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/managers.py", line 566, in start
self._address = reader.recv()
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/Users/wenyuanfan/opt/anaconda3/envs/ogamma/lib/python3.10/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Inquiries about how to deal with earthquake locations at association range boundaries

@zhuwq0
Hi Weiqiang,
thank you for developing this association algorithm program.
I have some problems in the process of my own seismic data. The earthquakes that occurred at the boundary locations of a research area looked odd, probably due to improper parameter settings on my part or some other issue I was unaware of. The earthquakes at these boundary locations, including latitude, longitude, and depth, seem to take on a cut-off appearance. I would like to ask what is the cause of this problem. How can I avoid a similar problem?

Question: Eikonal Model

Hi I was wondering what is the variable h was in the Eikonal equation for 1D model velocity? Thanks!

Tutorial notebook raises IndexError exception

When I try to run the example_synthetic.ipynb tutorial script, the last cell raises an exception:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[gamma.ipynb] Cell 4 in 3
     [26]     covariance_prior = np.array([[1,0], [0,1]]) 
     [28] print(data.shape, centers_init.shape)
---> [30] gmm = BayesianGaussianMixture(n_components=num_event_init, 
     [31]                                       station_locs=phase_loc, 
     [32]                                       phase_type=phase_type,
     [33]                                       weight_concentration_prior = 1[/](https://file+.vscode-resource.vscode-cdn.net/)num_event_init,
     [34]                                       mean_precision_prior = mean_precision_prior,
     [35]                                       covariance_prior = covariance_prior,
     [36]                                       init_params="centers",
     [37]                                       centers_init=centers_init.copy(),
     [38]                                       loss_type="l1",
     [39]                                     #   max_covar=10**2,
     [40]                                       ).fit(data)
     [42] # gmm = GaussianMixture(n_components=num_event_init,
     [43] #                               station_locs=phase_loc, 
     [44] #                               phase_type=phase_type, 
   (...)
     [50] #                             #   dummy_prob=0.01,
     [51] #                               ).fit(data)
     [53] t_end = time.time()

File [site-packages/gamma/_base.py:204], in BaseMixture.fit(self, X, y)
    182 def fit(self, X, y=None):
    183     """Estimate model parameters with the EM algorithm.
    184 
    185     The method fits the model ``n_init`` times and sets the parameters with
   (...)
    202     self
    203     """
--> 204     self.fit_predict(X, y)
    205     return self

File [gamma/_base.py:249], in BaseMixture.fit_predict(self, X, y)
    246 self._print_verbose_msg_init_beg(init)
    248 if do_init:
--> 249     self._initialize_parameters(X, random_state)
    251 lower_bound = (-np.infty if do_init else self.lower_bound_)
    253 for n_iter in range(1, self.max_iter + 1):

File [gamma/_base.py:168], in BaseMixture._initialize_parameters(self, X, random_state)
    164 else:
    165     raise ValueError("Unimplemented initialization method '%s'"
    166                      % self.init_params)
--> 168 self._initialize(X, resp)

File [~/anaconda3/envs/abyss/lib/python3.9/site-packages/gamma/_bayesian_mixture.py:543](https://file+.vscode-resource.vscode-cdn.net/mnt/Pollux/ABYSS/association/~/anaconda3/envs/abyss/lib/python3.9/site-packages/gamma/_bayesian_mixture.py:543), in BayesianGaussianMixture._initialize(self, X, resp)
    541 self._estimate_weights(nk)
    542 self._estimate_means(nk, xk)
--> 543 self._estimate_precisions(nk, xk, sk)
    544 self.centers_ = centers

File [gamma/_bayesian_mixture.py:594], in BayesianGaussianMixture._estimate_precisions(self, nk, xk, sk)
    578 def _estimate_precisions(self, nk, xk, sk):
    579     """Estimate the precisions parameters of the precision distribution.
    580 
    581     Parameters
   (...)
    592         'spherical' : (n_components,)
    593     """
--> 594     {"full": self._estimate_wishart_full,
    595      "tied": self._estimate_wishart_tied,
    596      "diag": self._estimate_wishart_diag,
    597      "spherical": self._estimate_wishart_spherical
    598      }[self.covariance_type](nk, xk, sk)
    600     self.precisions_cholesky_ = _compute_precision_cholesky(
    601         self.covariances_, self.covariance_type, self.max_covar)

File [gamma/_bayesian_mixture.py:626], in BayesianGaussianMixture._estimate_wishart_full(self, nk, xk, sk)
    623 self.covariances_ = np.empty((self.n_components, n_features, n_features))
    625 for k in range(self.n_components):
--> 626     diff = xk[k] - self.mean_prior_
    627     self.covariances_[k] = (self.covariance_prior_ + nk[k] * sk[k] +
    628                             nk[k] * self.mean_precision_prior_ [/](https://file+.vscode-resource.vscode-cdn.net/)
    629                             # self.mean_precision_[k] * np.outer(diff, diff))
    630                             self.mean_precision_[k] * np.dot(diff.T, diff)[/](https://file+.vscode-resource.vscode-cdn.net/)n_samples)
    632 # Contrary to the original bishop book, we normalize the covariances

IndexError: index 39 is out of bounds for axis 0 with size 39

It looks like the shape of centers_init might be off.

Add validate argument to untils.py

Hello there,
I started with GaMMA and after my first Error prompts I noticed that maybe there is one small enhancement for the utils.py in function convert_picks_to_csv in line 46:

 meta = stations.merge(picks["id"], how="right", on="id")

    meta = stations.merge(picks["id"], how="right", on="id", validate="one_to_many")

In that way you can make sure that there are no duplicates in stations DataFrame.

I hope that's an improvement.

Best regards

Bug: ValueError: array must not contain infs or NaNs

Dear Weiqiang,

Sometimes I have gotten an unexpected error when I tried to associate some seismic phases. If you want to reproduce the error, I attached both picks and stations csv.
picks.csv
stations.csv

This is the error:

File "/home/emmanuel/anaconda3/envs/eqt/lib/python3.7/site-packages/gamma/_bayesian_mixture.py", line 526, in _initialize self._estimate_precisions(nk, xk, sk) File "/home/emmanuel/anaconda3/envs/eqt/lib/python3.7/site-packages/gamma/_bayesian_mixture.py", line 584, in _estimate_precisions self.covariances_, self.covariance_type, self.max_covar) File "/home/emmanuel/anaconda3/envs/eqt/lib/python3.7/site-packages/gamma/_gaussian_mixture.py", line 484, in _compute_precision_cholesky cov_chol = linalg.cholesky(covariance, lower=True) File "/home/emmanuel/anaconda3/envs/eqt/lib/python3.7/site-packages/scipy/linalg/decomp_cholesky.py", line 91, in cholesky check_finite=check_finite) File "/home/emmanuel/anaconda3/envs/eqt/lib/python3.7/site-packages/scipy/linalg/decomp_cholesky.py", line 19, in _cholesky a1 = asarray_chkfinite(a) if check_finite else asarray(a) File "/home/emmanuel/anaconda3/envs/eqt/lib/python3.7/site-packages/numpy/lib/function_base.py", line 489, in asarray_chkfinite "array must not contain infs or NaNs") ValueError: array must not contain infs or NaNs

And the input parameters are:
x(km)= array([ 674.5275551 , 1777.84677595]), y(km)= array([ 491.76753437, 1605.99660826]), z(km)= array([ 0, 180]), bfgs_bounds= ((673.52755510494535, 1778.846775953371), (490.76753436815903, 1606.996608260825), (0, 181), (None, None) use_dbscan=False, use_amplitude=False, dbscan_eps=10.0, dbscan_min_samples=3, vel = {"p": 7.0, "s": 7.0 / 1.75}, method="BGMM", oversample_factor=20, min_picks_per_eq=5, max_sigma11=2.0, max_sigma22=1.0, max_sigma12=1.0,

This error only has happened sometimes, in other picks data GaMMA works properly. Do you know what could be the error? or is there something I am doing wrong?
Thanks for your help.
Regards

About sigma parameters

Hi!

I don't understand very well the purpose of sigma parameters. Could you explain me them please?

max_sigma11
max_sigma22
max_sigma12

I am using GaMMA to associate seismic events in a regional network, I have noticed that by setting the max_sigma11=20 parameter, I have more associations than max_sigma11=2 (default parameter).

False Detections

Hi Weiqiang,

in your paper you write "it is challenging to evaluate the false positive associations in these catalogs."
Do you have any idea/solution to get rid of these false positive associations?

Thanks
Janis

In case you don't like to answer publicly: janis.heuel(at)rub.de

How to use a 1D velocity model

Hello!

I am beginning to use GaMMA in conjunction with PhaseNet picks. However, I would like to incorporate a 1D velocity model instead of a constant one. How can I achieve this? When I use a constant velocity, I get 10 earthquake detections (even though only 3 are actually recorded in my "events.csv" file). However, if I attempt to set the velocity as shown in the "example_phasenet.ipynb" (where there are some commented lines regarding the "Eikonal for 1D velocity model"), I end up with more than 10,000 earthquake detections.

What steps should I take?

Thank you very much in advance,
Melania

Shallow false detections

Hi Weiqiang,

I've been testing GaMMA in different scenarios lately. In general, it works really well, but I noticed that I often get very shallow (<5 km) false detections. I put a picture below with an example. Some of the very shallow seismicity is real but there are also a lot of false detections. I've tried this with different pickers and in different regions. Do you have any ideas how these shallow false detections can be explained and how to avoid them?

Error: Found array with 0 sample(s) (shape=(0, 3)) while a minimum of 1 is required.

Greetings,

I have some issues regarding the seisbench example that I don't know how to resolve.

The error "Found array with 0 sample(s) (shape=(0, 3)) appears at the line below when I change the network from "CX" to any other network:

catalogs, assignments = association(pick_df, station_df, config, method=config["method"], pbar=pbar)

I am not sure why the error occurs since the code generates many picks in the given area. I have attached my code in notebook format.

test.zip

Understanding GaMMA Catalog Output

Hi,

I'm a bit confused as to what the gamma_score in the Gamma catalog output is actually telling us. Is it simply a measure of the confidence level of an associated event?

Need for some tips on setting parameters

Since there is no any document ready until now, It would be helpful to give us some tips and notes on how to select best parameters controling the final results. It seems the following are the most important ones that should be choosen with care while working in local or regional scales:

dbscan_eps
dbscan_min_samples
max_sigma11
any other?
Thanks.

Question: Velocity model requirements relative to station coverage

Hi!

I'm currently using GAMMA and have a question regarding the requirements for the velocity model.
Is it a requirement for the velocity model to include the geographical locations of all seismic stations used in the event location process?

I have run it in both ways and just wondering if you have any insight.

Cheers,
D

Misspelled Link for DBSCAN in Docs

Hello there,
When reading the README.md I discovered that the assigned links to DBSCAN-documentation are misspelled.
They are:
https://https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

This link causes an Error for not being able to find the site requested.

Could you get rid of the first https:// in the docs?

Thanks in advance

Increasing Memory usage in time

Hello,

This is more of a comment than a problem, but I had a problem with OOM Killer (Linux security measure to keep RAM usage <100%) killing subprocesses that cause the pool to never exit, and so loses all events found in the run.

It seems that some sub-processes increase their memory usage over time, doubling or tripling the overall memory usage of the run. (see the image for reference, the black curve is the main script, each child is a subprocess running the associate function from utils.py)

I tried adding a maxtasksperchild parameter to the pool, to see if it solves the problems which is why there is a lot of child's process (but it did not work)

Breaking the picks' fille into several parts solve the problem, but I thought it was worth telling.

Best,
Antoine

catalog station by station?

Hello,

Thank you for creating this code, it has worked amazing for my data. I hava a question though, I would like to get the earthquake location using NNL, and for that I need to get the station timestamp for each of the events.
Is there any way to extract this information from the catalog generated by GaMMA?

From another phase association method I was using I was using my setting up my results as (this is one earthquake):

id time sta pha prob
2 2011-06-04T00:00:32.920000Z POC p 0.721363
3 2011-06-04T00:00:33.240000Z PHU p 0.466320
4 2011-06-04T00:00:34.420000Z POC p 0.997895
5 2011-06-04T00:00:36.188391Z ANT p 0.936827
6 2011-06-04T00:00:37.640000Z RAN p 0.658889,

so I don't have a time stamp for each station. Is it possible?

Picks ID

Hi,

I utilized GaMMA to associate some picks to one event. This is the Pandas dataframe of my picks:

        id             timestamp            prob      type
 0    SX.MUHB  2018-01-02 08:25:31.980000  0.408322    p
 1    SX.WIMM  2018-01-02 08:25:29.530000  0.375889    p
 2    SX.WIMM  2018-01-02 08:25:34.940000  0.380612    s
 3    TH.CHRS  2018-01-02 08:25:33.010000  0.893861    p
 4    TH.CHRS  2018-01-02 08:25:41.290000  0.575885    s
 5   TH.NEUST  2018-01-02 08:25:32.920000  0.779552    p
 6   TH.NEUST  2018-01-02 08:25:40.550000  0.649434    s
 7    SX.NEUB  2018-01-02 08:25:34.010000  0.316887    p
 8    SX.NEUB  2018-01-02 08:25:44.970000  0.377629    s
 9     GR.CLZ  2018-01-02 08:25:37.040000  0.397447    p
10    GR.CLZ  2018-01-02 08:25:47.950000  0.204681    s
11   TH.POSS  2018-01-02 08:25:35.960000  0.804459    p
12   TH.POSS  2018-01-02 08:25:46.740000  0.529056    s
13   TH.TAUT  2018-01-02 08:25:38.150000  0.371234    p
14   TH.TAUT  2018-01-02 08:25:51.690000  0.396445    s

and this is the output of picks associated to the event (following the SeisBench example):

         pick_idx  event_idx  prob_gamma
 0          1          1       0.000921
 1          2          1       0.073889
 2          3          1       0.069758
 3          4          1       0.174539
 4          5          1       0.113167
 5          6          1       0.151336
 6          7          1       0.110181
 7          8          1       0.061309
 8          9          1       0.042372
 9         11          1       0.066970
10        12          1       0.013890
11        13          1       0.143820

This might be obvious, but my question is, to which pick on my picks dataframe do the indexes in the pick_idx column correspond? For example, pick_idx 1 corresponds to the pick with id SX.WIMM (type p) or to SX.MUHB (type p).

Thanks in advance for the help!

TypeError: unsupported operand type(s) for -: 'float' and 'str' when running association

Hi, when I run association on my own dataset I get this error:

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/dinko/anaconda3/envs/seisbench/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/dinko/anaconda3/envs/seisbench/lib/python3.9/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/home/dinko/anaconda3/envs/seisbench/lib/python3.9/site-packages/gamma/utils.py", line 193, in associate
gmm = BayesianGaussianMixture(
File "/home/dinko/anaconda3/envs/seisbench/lib/python3.9/site-packages/gamma/_base.py", line 204, in fit
self.fit_predict(X, y)
File "/home/dinko/anaconda3/envs/seisbench/lib/python3.9/site-packages/gamma/_base.py", line 249, in fit_predict
self._initialize_parameters(X, random_state)
File "/home/dinko/anaconda3/envs/seisbench/lib/python3.9/site-packages/gamma/_base.py", line 162, in _initialize_parameters
resp, centers, means = initialize_centers(X, self.phase_type, self.centers_init, self.station_locs, random_state)
File "/home/dinko/anaconda3/envs/seisbench/lib/python3.9/site-packages/gamma/seismic_ops.py", line 430, in initialize_centers
means[i, :, :] = calc_time(centers_init[i : i + 1, :], station_locs, phase_type)
File "/home/dinko/anaconda3/envs/seisbench/lib/python3.9/site-packages/gamma/seismic_ops.py", line 141, in calc_time
tt = np.linalg.norm(ev_loc - station_loc, axis=-1, keepdims=True) / v + ev_t
TypeError: unsupported operand type(s) for -: 'float' and 'str'
"""

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last)
Cell In[73], line 2
1 assigments=[]
----> 2 catalogs, assignments = association(pick_df, station_df, config, method=config["method"])
4 catalog = pd.DataFrame(catalogs)
5 assignments = pd.DataFrame(assignments, columns=["pick_idx", "event_idx", "prob_gamma"])

File ~/anaconda3/envs/seisbench/lib/python3.9/site-packages/gamma/utils.py:113, in association(picks, stations, config, event_idx0, method, **kwargs)
111 chunk_size = max(len(unique_labels)//(config["ncpu"]*20), 1)
112 with mp.get_context('spawn').Pool(config["ncpu"]) as p:
--> 113 results = p.starmap(
114 associate,
115 [
116 [
117 k,
118 labels,
119 data,
120 locs,
121 phase_type,
122 phase_weight,
123 pick_idx,
124 pick_station_id,
125 config,
126 timestamp0,
127 vel,
128 method,
129 event_idx,
130 lock,
131 ]
132 for k in unique_labels
133 ],
134 chunksize=chunk_size,
135 )
136 # resuts is a list of tuples, each tuple contains two lists events and assignment
137 # here we flatten the list of tuples into two lists
138 events, assignment = [],[]

File ~/anaconda3/envs/seisbench/lib/python3.9/multiprocessing/pool.py:372, in Pool.starmap(self, func, iterable, chunksize)
366 def starmap(self, func, iterable, chunksize=None):
367 '''
368 Like map() method but the elements of the iterable are expected to
369 be iterables as well and will be unpacked as arguments. Hence
370 func and (a, b) becomes func(a, b).
371 '''
--> 372 return self._map_async(func, iterable, starmapstar, chunksize).get()

File ~/anaconda3/envs/seisbench/lib/python3.9/multiprocessing/pool.py:771, in ApplyResult.get(self, timeout)
769 return self._value
770 else:
--> 771 raise self._value

TypeError: unsupported operand type(s) for -: 'float' and 'str'

I based my example on seisbench example, which works fine on my machine.
I also compared my "pick_df" and "station_df" with the seisbench example and everything that is supposed to be a string is a string and what's float is a float. Could you please point out where I cause this error to be raised?

Here is how I define my config variable:

config = {}
config["dims"] = ['x(km)', 'y(km)', 'z(km)']
config["use_dbscan"] = True
config["use_amplitude"] = False
config["x(km)"] = (250, 850)
config["y(km)"] = (4500, 5800)
config["z(km)"] = (0, 30)
config["vel"] = {"p": 6.0, "s": 6.0 / 1.75}
config["method"] = "BGMM"
if config["method"] == "BGMM":
config["oversample_factor"] = 4
if config["method"] == "GMM":
config["oversample_factor"] = 1

config["bfgs_bounds"] = (
(config["x(km)"][0] - 1, config["x(km)"][1] + 1), # x
(config["y(km)"][0] - 1, config["y(km)"][1] + 1), # y
(0, config["z(km)"][1] + 1), # x
(None, None), # t
)
config["dbscan_eps"] = 12 # seconds
config["dbscan_min_samples"] = 5

config["min_picks_per_eq"] = 5
config["min_p_picks_per_eq"] = 0
config["min_s_picks_per_eq"] = 0
config["max_sigma11"] = 6.0
config["max_sigma22"] = 1.0
config["max_sigma12"] = 1.0

config["ncpu"]=12

for k, v in config.items():
print(f"{k}: {v}")

Documentation

Hello!

I was wondering if it was posible for you at the moment to share more documentation for GaMMA. Since, as of now, it seems that only examples of usage are provided and i would like to do some further reading.

Juan.

Config dictionary for local, shallow events

Hi,

I am testing the GaMMA associator to associate picks of shallow seismic events, including quarry blasts following the examples provided e.g. in SeisBench. This are the parameters in the config dictionary:

 config = {'use_amplitude': False, 'dims': ['x(km)', 'y(km)', 'z(km)'], 'vel': {'p': 5.6, 's': 3.9},
          'use_dbscan': False, 'dbscan_min_samples': 3, 'dbscan_eps': 30, 'min_picks_per_eq': 3, 
          'oversample_factor': 10, 'x(km)': (1e-3 * ev_east - 111.11, 1e-3 * ev_east + 111.11),
          'y(km)': (1e-3 * ev_nort - 111.11, 1e-3 * ev_nort + 111.11), 'z(km)': (0, 50), 'max_sigma11': 2.0,
          'max_sigma22': 1.0, 'max_sigma12': 1.0, 'method': 'BGMM'}

ev_east and ev_north are the event easting and northing, which I know beforehand.
The parameter selection does not seem to be optimal for the picks I am trying to associate, as many picks which clearly correspond to my event of interest are ignored, as you can see in this plot:

Do you have any suggestions?

Cheers!

Differences when reproducing example

Hello! I tried to reproduce the example using synthetic data in the documentation and the result I obtained was different (and actually incorrect, because it associated picks to additional, non existent events). A copy-paste of the code produced the same incorrect result. Any ideas on what could be causing this difference? Maybe using a different version of any of the dependencies?

True events:
loc=400.000 t0=33.333 mag=2.000
loc=1600.000 t0=93.333 mag=2.352
loc=880.000 t0=153.333 mag=2.169
loc=1360.000 t0=213.333 mag=2.551
loc=1120.000 t0=273.333 mag=2.766
loc=640.000 t0=333.333 mag=3.000
GMMA time = 0.5283012390136719

Associated events:
loc=400.0 t0=33.3 mag=2.0 sigma11=0.495
loc=1517.6 t0=77.4 mag=4.2 sigma11=0.358
loc=1600.0 t0=93.3 mag=2.4 sigma11=0.185
loc=880.0 t0=153.3 mag=2.2 sigma11=0.161
loc=1360.0 t0=213.3 mag=3.1 sigma11=0.088
loc=1120.0 t0=273.3 mag=2.8 sigma11=0.098
loc=498.1 t0=292.0 mag=3.3 sigma11=0.600
loc=877.6 t0=372.9 mag=2.2 sigma11=1.197

Cheers!

Example GaMMA + SeisBench

Hi!
We've built an example using GaMMA together with SeisBench to quickly create a seismicity catalog with data from an FDSN webservice. You can find it here and also run it on Google Colab. I was wondering whether you'd maybe like to add it to the list of examples in this repo and/or your documentation?

use_amplitude parameter guidance

Why i can't turn on use_amplitude parameter with chile region in example notebook?
In the other hand, i have tried using 1 event with 6 stations (3 components), but when I compare it with the original event information, the magnitude of the gamma generated is 8 (always 8 even if the dbscan parameter is changed repeatedly), whereas it should be 3.5. When i turn off turn use_amplitude the gamma standard_catalog return nothing, 0 event

microseismic parameter setting

try to test this method on microseismic event data, but there is no output. Can you guide the parameter setting?

the script example_synthetic.ipynb has fails to execute successfully

When i run the script example_synthetic.ipynb,I found that the script fails to execute successfully on my computer.
I did not modify any parameters in the script, while executing it .

In the first step( Prepare synthetic data), the cell runs successfully. The output parameter phase_type has a shape of (137,), which includes 65 'p' items and 72 's' items.

However, in the second step(Association with GaMMA ), the cell fails to run successfully. The Error occurs at "gmm= BayesianGaussianMixture(...)". and I have verified that the input parameter num_event_init is 41.

my Python version is 3.7.13, my Numpy version is 1.21.5.

error report is as follow

error report

IndexError Traceback (most recent call last)
/tmp/ipykernel_98731/1795965897.py in
33 loss_type="l1",
34 # max_covar=10**2,
---> 35 ).fit(data)
36
37 # gmm = GaussianMixture(n_components=num_event_init,

~/software/Anaconda3/envs/phasenet/lib/python3.7/site-packages/gamma/_base.py in fit(self, X, y)
202 self
203 """
--> 204 self.fit_predict(X, y)
205 return self
206

~/software/Anaconda3/envs/phasenet/lib/python3.7/site-packages/gamma/_base.py in fit_predict(self, X, y)
247
248 if do_init:
--> 249 self.initialize_parameters(X, random_state)
250
251 lower_bound = (-np.infty if do_init else self.lower_bound)

~/software/Anaconda3/envs/phasenet/lib/python3.7/site-packages/gamma/_base.py in _initialize_parameters(self, X, random_state)
166 % self.init_params)
167
--> 168 self._initialize(X, resp)
169
170 @AbstractMethod

~/software/Anaconda3/envs/phasenet/lib/python3.7/site-packages/gamma/_bayesian_mixture.py in _initialize(self, X, resp)
541 self._estimate_weights(nk)
542 self._estimate_means(nk, xk)
--> 543 self.estimate_precisions(nk, xk, sk)
544 self.centers = centers
545

~/software/Anaconda3/envs/phasenet/lib/python3.7/site-packages/gamma/_bayesian_mixture.py in _estimate_precisions(self, nk, xk, sk)
596 "diag": self._estimate_wishart_diag,
597 "spherical": self.estimate_wishart_spherical
--> 598 }[self.covariance_type](nk, xk, sk)
599
600 self.precisions_cholesky = _compute_precision_cholesky(

~/software/Anaconda3/envs/phasenet/lib/python3.7/site-packages/gamma/bayesian_mixture.py in estimate_wishart_full(self, nk, xk, sk)
624
625 for k in range(self.n_components):
--> 626 diff = xk[k] - self.mean_prior
627 self.covariances[k] = (self.covariance_prior_ + nk[k] * sk[k] +
628 nk[k] * self.mean_precision_prior_ /

IndexError: index 39 is out of bounds for axis 0 with size 39

Additionally, I would like to understand why the parameter num_event_init is defined as min(int(len(data)/min(num_station, 20) * 6), len(data)).
Why is there a comparison with min(num_station, 20)? What is the significance of the value 6? Does it represent six components related to 'p' and 's'?"

Limit on number of picks and GaMMA inside a loop

Hi,

I have been using GaMMA to associate some of the automatic picks I have and it has worked well. Now I want to use it for the thousands of picks that I have from years of data.

Is there a la limit to the number of picks GaMMA can input? I tried putting the GaMMA association inside a loop but after a few times it does not process the data frame completely, so I am thinking of doing big batches of picks.

Any recommendations on how to use GaMMA with a really large number of picks will be really appreciated!

Olivia PS

'float' object has no attribute 'log'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/whs/VS_code/PhasePick/performer/position_analysis.py", line 547, in
gamma_location(filepath, "")
File "/Users/whs/VS_code/PhasePick/performer/position_analysis.py", line 82, in gamma_location
catalogs, assignments = association(pick_df, station_df, config, method=config["method"])
File "/Users/whs/anaconda3/envs/TF2/lib/python3.9/site-packages/gamma/utils.py", line 86, in association
events_, assignment_ = associate(
File "/Users/whs/anaconda3/envs/TF2/lib/python3.9/site-packages/gamma/utils.py", line 208, in associate
gmm = BayesianGaussianMixture(
File "/Users/whs/anaconda3/envs/TF2/lib/python3.9/site-packages/gamma/_base.py", line 204, in fit
self.fit_predict(X, y)
File "/Users/whs/anaconda3/envs/TF2/lib/python3.9/site-packages/gamma/_base.py", line 256, in fit_predict
log_prob_norm, log_resp = self._e_step(X)
File "/Users/whs/anaconda3/envs/TF2/lib/python3.9/site-packages/gamma/_base.py", line 311, in _e_step
log_prob_norm, log_resp = self._estimate_log_prob_resp(X)
File "/Users/whs/anaconda3/envs/TF2/lib/python3.9/site-packages/gamma/_base.py", line 516, in _estimate_log_prob_resp
weighted_log_prob = self._estimate_weighted_log_prob(X)
File "/Users/whs/anaconda3/envs/TF2/lib/python3.9/site-packages/gamma/_base.py", line 469, in _estimate_weighted_log_prob
return self._estimate_log_prob(X) + self._estimate_log_weights()
File "/Users/whs/anaconda3/envs/TF2/lib/python3.9/site-packages/gamma/_bayesian_mixture.py", line 770, in _estimate_log_prob
log_gauss += np.log(self.phase_weight)[:,np.newaxis]
TypeError: loop of ufunc does not support argument 0 of type float which has no callable log method

my pick dataframe is
id timestamp prob type
0 1 2023-07-03 20:52:18.439 0.982461 p
2 2 2023-07-03 20:52:18.439 0.980837 p
6 5 2023-07-03 20:52:18.449 0.976844 p
1 1 2023-07-03 20:52:18.459 0.990819 s
3 2 2023-07-03 20:52:18.459 0.992283 s
4 4 2023-07-03 20:52:18.459 0.964814 p
5 4 2023-07-03 20:52:18.469 0.989519 s
8 6 2023-07-03 20:52:18.469 0.974579 p
7 5 2023-07-03 20:52:18.479 0.991218 s
9 6 2023-07-03 20:52:18.489 0.98942 s

Torch dependency

Hi Weiqiang. Thanks for maintaining this useful code repository. I wanted to give GaMMA a try, but it seems that the dependency list is not up-to-date. seismic_ops includes torch as a dependency, which is missing from requirements.txt. Also, would it be possible to make this an optional dependency? Getting PyTorch to play nice alongside TensorFlow and JAX is practically impossible, especially with GPU support, so GaMMA would not be compatible with most of my environments. Is the Eikonal solver an essential part of the BayesianGaussianMixture routine?

	else:
	labels = np.zeros(len(data))
	unique_labels = [0]
	unique_labels.remove(-1)

ai4eps / gamma Goto Github PK

gamma's Introduction

1. Install

2. Related papers

3. Examples

gamma's People

Contributors

Stargazers

Watchers

Forkers

gamma's Issues

Recommend Projects

Recommend Topics

Recommend Org