ykatser / cpde Goto Github PK

Results of the "Ensembles of offline changepoint detection methods" research to reproduce

License: GNU General Public License v3.0

Jupyter Notebook 93.58% Python 6.42%

anomaly-detection changepoint changepoint-detection ensembles paper python ruptures

cpde's Introduction

Hi there, I'm Iurii Katser

I am a lead data scientist, researcher and lecturer. My research intrests are anomaly detection, technical diagnostics, time-series analysis, industrial data processing, predictive analytics.

🔩 I have previously worked at various industrial companies, now I am Lead DS at conundrum.ai
🚀 I’m a co-founder of waico.ru startup
🔭 I have finished my PhD thesis (Neural Network Based Algorithms of the Anomaly Detection in the Industrial Data Processing) at Skoltech university
🌱 I’m currently learning dev/MlOps (e.g., deployment, CD/CI, monitoring) and process optimization
⚡ Too serious to have a fun fact

Connect with me

cpde's People

Contributors

Stargazers

Watchers

Forkers

raqoon886 crismze fengyang95 theovincent shism2 koldamendia a-b-h-i-s-h-e-k-github-k-u-m-a-r liangcao2018 tdl77 cracer dhockaday muleina

cpde's Issues

Mistake: Cost functions - Linear VS RBF

Hi, I am currently working on the paper that you published with this repository.

In the paper, it is said that the cost functions used to build the ensemble methods are:

ar(1)
mahalanobis
l1
l2
linear

By looking at the code, it seems that the linear cost function has been replaced by the rbf cost function. Indeed, the error method of the CostNew class is returning:

return [abs(sub - med).sum(),
        self.signal[start:end].var(axis=0).sum() * (end - start),
        residual.sum(),
        val,
#                 val_normal * (end - start),#normal
        val_rbf#RBF
       ]

The last line is the rbf cost function in stead of the linear cost function.

As I was curious to see how the rbf cost function is performing on TEP and SKAB, I forked this repository and run the experiments again. The results can be seen here.

ps: I had a lot of fun working on your paper. Thanks a lot for your work and your interesting ideas :)

Implementation details: (Win & Dynp) vs Binseg

Hi, I am currently working on the paper that you published with this repository.

By looking at your code, I realized that you are treating the search methods Win and Dynp differently than Binseg. Indeed, for Win and Dynp you are simply scaling the scores and aggregating them. On the other hand, for Binseg you are first taking the opposite of the scores before scaling and aggregating them, and then you are taking the opposite of the output again. This can be seen here in binsegensembling.py:

gain, bkp = max(np.array([(-1)*selected_aggregation(self.ensembling)(np.array(scores)*(-1)), np.array(gain_list)[:,1]]).T, key=lambda x: x[0])

This is understandable since this is a gain and the score for Dynp is a cost. By taking the opposite you are dealing with the same kind of scores. My problem is that for Win, the score is also a gain but you are not doing the same "trick" than for Binseg. Indeed, here, in windowensembling.py the opposite of the score is not taken:

self.score = selected_aggregation(self.ensembling)(np.array(score))

I would personnaly suggest to do the same trick for Win than for Binseg. By not doing the "trick", it is as if you would have taken other aggregation functions. For example, for the min aggregation function, it is as if you would have taken the max aggregation function.

I was curious to see what we could get with this idea. This is why I implemented the "trick" for the search method Binseg also. I tried the max aggregation function for the three search methods. The results are available in the forked repository.

ps: I had a lot of fun working on your paper. Thanks a lot for your work and your interesting ideas :)

Mistake: ar(1) vs ar(5) in Win

Hi, I am currently working on the paper that you published with this repository.

By looking at your Notebooks, I realized that, for the search method Win, the results shown in the paper for ar(1) are actually the results for ar(5). Indeed, in TEP_experiment_with_NAB_metric.ipynb and in SKAB_experiment.ipynb the models that are tried are:

models = (
    {'cost':'ar', 'params':{'order':1}, 'width':10},
    {'cost':'ar', 'params':{'order':1}, 'width':15},
    {'cost':'ar', 'params':{'order':5}, 'width':20},
    {'cost':'mahalanobis', 'params':{}, 'width':10},
    {'cost':'mahalanobis', 'params':{}, 'width':15},
    {'cost':'mahalanobis', 'params':{}, 'width':20},
    {'cost':'l1', 'params':{}, 'width':10},
    {'cost':'l1', 'params':{}, 'width':15},
    {'cost':'l1', 'params':{}, 'width':20},
    {'cost':'l2', 'params':{}, 'width':10},
    {'cost':'l2', 'params':{}, 'width':15},
    {'cost':'l2', 'params':{}, 'width':20},
#     {'cost':'linear', 'params':{}, 'width':10},
#     {'cost':'linear', 'params':{}, 'width':40},
#     {'cost':'linear', 'params':{}, 'width':100}
#     {'model':'rbf', 'params':{}, 'width':40},
#     {'model':'rbf', 'params':{}, 'width':100},
)

Hence, the model ar(1) with width=20 is never tried. In the paper, the results for the model ar(5) can be found.

I have done this little change in a forked repository and ran the experiments again.

ps: I had a lot of fun working on your paper. Thanks a lot for your work and your interesting ideas :)

n_bkps = Unknown

Really wnjoying this notebook! I am wondering though... if i dont know the number of breakpoints, in the Ruptures package this allows the pentaly parameter to punich the inclusion of another changepoint. When i try and run the windwoEnsemble algorithm, i doenst allow pen=10

Any idea whats going on? I cant get my head around it...!

Mistake: Cost New

Hi, I am currently working on the paper that you published with this repository.

I am trying to add an example of your work in the galery of examples of ruptures. See the PR here.

I think I spot a mistake that would change all the results. I would like to know if what I think is true @YKatser.
Let's take the case of Window Ensemble. Going through the code I realized that the object CostNew() is only initialized once before going through the entire dataset. For example, in the notebook TEP_experiment_with_NAB_metric there is:

%%time
cost = CostNew()  # !!! Initialized only one !!!
table1 = []

for n in tnrange(1, NUM_CPDE, desc='agg functions loop'):
    for w in tqdm_notebook([10, 20, 30], desc='width loop', leave=False):
        table1.append(windowEnsemble(cost=cost, data=test, num_agg_func=n, width=w))
        clear_output()

This seems not to be a problem since you call the method cost.fit() for each new signal.
The main problem is coming from the following loop in CostNew().fit():

# Mahalanobis metric if self.metric is None
if self.metric is None:
    covar = np.cov(s_.T)
    self.metric = inv(
        covar.reshape(1, 1) if covar.size == 1 else covar)

This means that for the first signal you go through, self.metric will be none. It will be changed and fixed for all the other signal coming afterwards. It is a shame since you have shown that Mahalanobis is the best performing single cost. This is the reason why they have added the variable self.has_custom_metric in the package ruptures. This was a fix made by a PR.

Here is what I got by initializing CostNew() for each signal:

As a reminder, this is the former results that you have in the paper:

Here, you can see that the best performing aggregation function and scaling function have changed.

As I was curious to see what would be the results for the other experiments, I run the experiments again with this fix in a fork.

ps: I had a lot of fun working on your paper. Thanks a lot for your work and your interesting ideas :)

ykatser / cpde Goto Github PK

cpde's Introduction

Hi there, I'm Iurii Katser

Connect with me

cpde's People

Contributors

Stargazers

Watchers

Forkers

cpde's Issues

Mistake: Cost functions - Linear VS RBF

Implementation details: (Win & Dynp) vs Binseg

Mistake: ar(1) vs ar(5) in Win

n_bkps = Unknown

Mistake: Cost New

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent