Giter VIP home page Giter VIP logo

cpde's Introduction

Hi there, I'm Iurii Katser

I am a lead data scientist, researcher and lecturer. My research intrests are anomaly detection, technical diagnostics, time-series analysis, industrial data processing, predictive analytics.

  • ๐Ÿ”ฉ I have previously worked at various industrial companies, now I am Lead DS at conundrum.ai
  • ๐Ÿš€ Iโ€™m a co-founder of waico.ru startup
  • ๐Ÿ”ญ I have finished my PhD thesis (Neural Network Based Algorithms of the Anomaly Detection in the Industrial Data Processing) at Skoltech university
  • ๐ŸŒฑ Iโ€™m currently learning dev/MlOps (e.g., deployment, CD/CI, monitoring) and process optimization
  • โšก Too serious to have a fun fact

Connect with me

tg medium linkedin site kaggle researchgate

cpde's People

Contributors

dependabot[bot] avatar ykatser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cpde's Issues

Mistake: Cost functions - Linear VS RBF

Hi, I am currently working on the paper that you published with this repository.

In the paper, it is said that the cost functions used to build the ensemble methods are:

  • ar(1)
  • mahalanobis
  • l1
  • l2
  • linear

By looking at the code, it seems that the linear cost function has been replaced by the rbf cost function. Indeed, the error method of the CostNew class is returning:

return [abs(sub - med).sum(),
        self.signal[start:end].var(axis=0).sum() * (end - start),
        residual.sum(),
        val,
#                 val_normal * (end - start),#normal
        val_rbf#RBF
       ]

The last line is the rbf cost function in stead of the linear cost function.

As I was curious to see how the rbf cost function is performing on TEP and SKAB, I forked this repository and run the experiments again. The results can be seen here.

ps: I had a lot of fun working on your paper. Thanks a lot for your work and your interesting ideas :)

Implementation details: (Win & Dynp) vs Binseg

Hi, I am currently working on the paper that you published with this repository.

By looking at your code, I realized that you are treating the search methods Win and Dynp differently than Binseg. Indeed, for Win and Dynp you are simply scaling the scores and aggregating them. On the other hand, for Binseg you are first taking the opposite of the scores before scaling and aggregating them, and then you are taking the opposite of the output again. This can be seen here in binsegensembling.py:

gain, bkp = max(np.array([(-1)*selected_aggregation(self.ensembling)(np.array(scores)*(-1)), np.array(gain_list)[:,1]]).T, key=lambda x: x[0])

This is understandable since this is a gain and the score for Dynp is a cost. By taking the opposite you are dealing with the same kind of scores. My problem is that for Win, the score is also a gain but you are not doing the same "trick" than for Binseg. Indeed, here, in windowensembling.py the opposite of the score is not taken:

self.score = selected_aggregation(self.ensembling)(np.array(score))

I would personnaly suggest to do the same trick for Win than for Binseg. By not doing the "trick", it is as if you would have taken other aggregation functions. For example, for the min aggregation function, it is as if you would have taken the max aggregation function.

I was curious to see what we could get with this idea. This is why I implemented the "trick" for the search method Binseg also. I tried the max aggregation function for the three search methods. The results are available in the forked repository.

ps: I had a lot of fun working on your paper. Thanks a lot for your work and your interesting ideas :)

Mistake: ar(1) vs ar(5) in Win

Hi, I am currently working on the paper that you published with this repository.

By looking at your Notebooks, I realized that, for the search method Win, the results shown in the paper for ar(1) are actually the results for ar(5). Indeed, in TEP_experiment_with_NAB_metric.ipynb and in SKAB_experiment.ipynb the models that are tried are:

models = (
    {'cost':'ar', 'params':{'order':1}, 'width':10},
    {'cost':'ar', 'params':{'order':1}, 'width':15},
    {'cost':'ar', 'params':{'order':5}, 'width':20},
    {'cost':'mahalanobis', 'params':{}, 'width':10},
    {'cost':'mahalanobis', 'params':{}, 'width':15},
    {'cost':'mahalanobis', 'params':{}, 'width':20},
    {'cost':'l1', 'params':{}, 'width':10},
    {'cost':'l1', 'params':{}, 'width':15},
    {'cost':'l1', 'params':{}, 'width':20},
    {'cost':'l2', 'params':{}, 'width':10},
    {'cost':'l2', 'params':{}, 'width':15},
    {'cost':'l2', 'params':{}, 'width':20},
#     {'cost':'linear', 'params':{}, 'width':10},
#     {'cost':'linear', 'params':{}, 'width':40},
#     {'cost':'linear', 'params':{}, 'width':100}
#     {'model':'rbf', 'params':{}, 'width':40},
#     {'model':'rbf', 'params':{}, 'width':100},
)

Hence, the model ar(1) with width=20 is never tried. In the paper, the results for the model ar(5) can be found.

I have done this little change in a forked repository and ran the experiments again.

ps: I had a lot of fun working on your paper. Thanks a lot for your work and your interesting ideas :)

n_bkps = Unknown

Really wnjoying this notebook! I am wondering though... if i dont know the number of breakpoints, in the Ruptures package this allows the pentaly parameter to punich the inclusion of another changepoint. When i try and run the windwoEnsemble algorithm, i doenst allow pen=10

Any idea whats going on? I cant get my head around it...!

Mistake: Cost New

Hi, I am currently working on the paper that you published with this repository.

I am trying to add an example of your work in the galery of examples of ruptures. See the PR here.

I think I spot a mistake that would change all the results. I would like to know if what I think is true @YKatser.
Let's take the case of Window Ensemble. Going through the code I realized that the object CostNew() is only initialized once before going through the entire dataset. For example, in the notebook TEP_experiment_with_NAB_metric there is:

%%time
cost = CostNew()  # !!! Initialized only one !!!
table1 = []

for n in tnrange(1, NUM_CPDE, desc='agg functions loop'):
    for w in tqdm_notebook([10, 20, 30], desc='width loop', leave=False):
        table1.append(windowEnsemble(cost=cost, data=test, num_agg_func=n, width=w))
        clear_output()

This seems not to be a problem since you call the method cost.fit() for each new signal.
The main problem is coming from the following loop in CostNew().fit():

# Mahalanobis metric if self.metric is None
if self.metric is None:
    covar = np.cov(s_.T)
    self.metric = inv(
        covar.reshape(1, 1) if covar.size == 1 else covar)

This means that for the first signal you go through, self.metric will be none. It will be changed and fixed for all the other signal coming afterwards. It is a shame since you have shown that Mahalanobis is the best performing single cost. This is the reason why they have added the variable self.has_custom_metric in the package ruptures. This was a fix made by a PR.

Here is what I got by initializing CostNew() for each signal:
Screenshot from 2022-03-20 11-33-16

As a reminder, this is the former results that you have in the paper:
Screenshot from 2022-03-20 11-35-33

Here, you can see that the best performing aggregation function and scaling function have changed.

As I was curious to see what would be the results for the other experiments, I run the experiments again with this fix in a fork.

ps: I had a lot of fun working on your paper. Thanks a lot for your work and your interesting ideas :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.