Giter VIP home page Giter VIP logo

bayesian_changepoint_detection's Introduction

Bayesian Changepoint Detection

Methods to get the probability of a changepoint in a time series. Both online and offline methods are available. Read the following papers to really understand the methods:

[1] Paul Fearnhead, Exact and Efficient Bayesian Inference for Multiple
Changepoint problems, Statistics and computing 16.2 (2006), pp. 203--213

[2] Ryan P. Adams, David J.C. MacKay, Bayesian Online Changepoint Detection,
arXiv 0710.3742 (2007)

[3] Xuan Xiang, Kevin Murphy, Modeling Changing Dependency Structure in
Multivariate Time Series, ICML (2007), pp. 1055--1062

To see it in action have a look at the example notebook.

To install:

# Enter a directory of your choice, activate your python virtual environment.
git clone https://github.com/hildensia/bayesian_changepoint_detection.git
cd bayesian_changepoint_detection
pip install .
# Now can use bayesian_changepoint_detection in python.

Or using pip - older version of this package, that doesn't work with python3:

pip install bayesian-changepoint-detection

bayesian_changepoint_detection's People

Contributors

closedloop avatar danich1 avatar hildensia avatar mathdr avatar minesh1291 avatar multimeric avatar nariox avatar shahsmit14 avatar vladimirfokow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bayesian_changepoint_detection's Issues

How to utilize R matrix to detect change points?

In the current version of code,
Nw=10; ax.plot(R[Nw,Nw:-1])
is used to exhibit the changpoints. Although it works fine, I am really confused about the moral behind it. I tried to plot the run length with maximum prob in each time step i.e. the y index of maximum prob in each x col, but the result showed the run length keeps going up... I also went back to Admas's paper but found nothing about change point indentification stuff (he just stop at R matrix)... I also tried to find Adams's MATLAB code, but the code seems to have been removed...

I am trying to use this method in my work, and I believe it's the best to fully understand it before any deployment. Any help will be appreciated and thanks a lot!

Other observation models besides Gaussian

Hi. I was wondering if you had any insight in extending your code to include other emission models besides gaussian. In particular, how about a GMM with known number of gaussians?

I was going to take a stab at implementing it and submit a PR, but wanted to get your input first.

Thanks

Dan

Example notebook does not work

If I click on the "example notebook" work - an nbviewer link - I get a "too many redirects" error.

It would be nice if the example notebook was easily accessible in the repo (maybe I overlooked it... ) because we don't need a live notebook / nbviewer to figure out whether the example fits our use case.

How to adjust the sensitivity of the BOCD algorithm?

There is always a tradeoff between false alarms and missed alarms, and when the algorithm is more sensitive we should have higher false alarm rate and lower missed alarm rate. My question is, is it possible to adjust the sensitivity level of this algorithm by changing the hyperparameter (e.g., alpha, beta, kappa, mu)? Thank you!

About the conditions to use bocpd

Hi,nice to meet you,and i want to aks a basic question,if i don’t know the distribution of data(not the normal distribution),then could i use the bocpd?
Thank you!

API

The API in the current state is pretty much tailored to my needs. Would be interesting to make it more general usable. Any ideas anyone?

'FloatingPointError: underflow encountered in logaddexp' occurs when setting np.seterr(all='raise')

Hi,

I installed bayesian_changepoint_detection from this github repository.

By setting (accidentally) np.seterr(all='raise'), I was able to cause the following exception.

I am not sure whether this would have any relevance for the further processing, but I just wanted to draw attention to people working on / with this library.

/home/user/venv/env01/bin/python3.6 /home/user/PycharmProjects/project01/snippet.py
Use scipy logsumexp().
Traceback (most recent call last):
  File "/home/user/PycharmProjects/project01/snippet.py", line 68, in <module>
    Q, P, Pcp = offcd.offline_changepoint_detection(data, partial(offcd.const_prior, l=(len(data) + 1)), offcd.gaussian_obs_log_likelihood, truncate=-40)
  File "/home/user/experiments/original-unforked/bayesian_changepoint_detection/bayesian_changepoint_detection/offline_changepoint_detection.py", line 98, in offline_changepoint_detection
    Q[t] = np.logaddexp(P_next_cp, P[t, n-1] + antiG)
FloatingPointError: underflow encountered in logaddexp

Process finished with exit code 1

Ascertain direction

Hi

awesome package, thanks for putting this together. Quick question: once you have the change point, have you also thought about the direction from that change point?

so far, I have tried:

  1. Comparing to data around prev change points: the issue with this is that it might not have identified all change points
  2. Complaring data before/after the change point. the change point detection could be delayed, so, not sure how many points to go back. Also, if I check the change point with the next couple/few points, it slows me down

ta!

R matrix in online_changepoint_detection function

Please correct me if I am wrong.

It seems like the R matrix is conflating the joint probability and the conditional probability. The last step in the iteration turns it into a conditional probability. When you then calculate the growth/changepoint probabilities, you are using the conditional probabilities instead of the joint probabilities as depicted in the paper.

Updating parameters for bayesian online change point

I think my question is related to the one, which was not answered and is already closed:
#19

In your example, you have applied the student t-distribution as a likelihood. I understand the distribution, its parameters, but I have a question about how you set up prior and update its parameters in the code. So the following is:

df = 2*self.alpha
scale = np.sqrt(self.beta * (self.kappa+1) / (self.alpha * self.kappa))

I don't understand what alpha, beta and kappa correspond to. How have you come across this expression? The paper by Adams and McKey refers to updating sufficient statistics. Is your expression related to that? If so, how can I do that for any other distribution, let's say gaussian? In my comment, I refer to the following formula in the paper:

equation

Zero run length probability error

When we find the p(r_0 | x_{1:t}) you have

R[0, t+1] = np.sum( R[0:t+1, t] * predprobs * H)

but shouldn't it be

R[0, t+1] = np.sum( R[0:t+1, t] * predprobs[0] * H)

as we calculate the predictive probability assuming no data.

Scaling of Data

Hi,
I've noticed is the scaling of the data can have an effect on the result, but I am not sure why it would and can't find any reason for it in the code or references. Below I have the CP probabilities for the same data with or without a constant factor, which are somewhat different.

Are there some assumptions about the input data I am missing?
Thanks

image
image

Implementing learning bocpd

Was wondering if you had any plans or need for the online extension seen in "Adaptive Sequential Bayesian Change Point Detection."

I was going to implement it for a work project, and would like to just keep it in this package as it fits the theme.

Thoughts?

Add pyx file again

Was removed during a PR. Is there a good way to keep cython and python in sync. I'm not sure if I prefer one over the other (python is better for debugging, cython is faster).

Can you expalin how the update_theta() in Student_T works?

I read the paper, but I didn't find the reason why the parameters is updated in that way.
Besides, I am thinking about replacing the student_T with gamma, but I don't know how to update the parameters. Can you give me some advice?

Thank you!

Zeros in the log function for online detection

I ran the entire sample code and got an error on the log function for online detection. It says "divide by zero encountered in log". I constantly got this error for the random time series or my own time series data. I am wondering why there are zeros and how to avoid them. Could you please take a look when you get some time? Thank you so much!

Confused about the R matrix interpretation

Hi,

I am confused about the returned R matrix interpretation in the online detection algorithm. In the notebook example, the third plot is R[Nw,Nw:-1], where it is mentioned to be "the probability at each time step for a sequence length of 0, i.e. the probability of the current time step to be a changepoint."
So why do we choose the indices R[Nw,Nw:-1] ? why not R[Nw,:]

Also, it was mentioned as an example that R[7,3] means the probability at time step 7 taking a sequence of length 3, so does R[Nw,Nw:-1] means that we are taking all the probabilities at time step Nw ?

Any suggestions to help me to understand the output R ?

Thanks

Calculation of predprobs

Hi. According to the paper, predprobs is defined as the probability of x_t givn r_(t-1) and x_t^r. To my knowledge, it means at each time t, we should compute the predictive distribution given different x_t^r. if r_t = r_(t-1) +1 , let's assume r_t = 4, t =10. we can use x9,x8,x7,x6,x5 to calculate the posterior probablity and thus the predictive distribution. But based on predprobs = observation_likelihood.pdf(x) , the predprobs[i] is the predictive distribution after observing the first i data points. It doesn't seem to be consistent with the original definition. But it works well. I have been struggling with this issue for a while.
Correct me if I misunderstand the concept. Thanks.

just a question

Hello,
do you have any idea about online multivariate change point detection?

Online, but what about streaming?

The algorithm runs online, but with the assumption we have the length of the dataset a priori. What about streaming scenarios where we have a continuous stream of data? Is there (an efficient) way to the run the online algorithm without knowing the length of the dataset?

Scipy Import Error on newer versions

Hi guys,

there is an import issue if one uses newer scipy versions.

Would be a quick fix if you adapt the import statement at offline_changepoint_detection.py

try:  # SciPy >= 0.19
    from scipy.special import comb, logsumexp
except ImportError:
    from scipy.misc import comb, logsumexp  # noqa

Why the probability exceeds one?

I ran the given online detection example in the notebook, and I assumed the y axis indicating the probability of changepoint (am I right?). But the y value ranged from zero to hundreds.
I am not very familiar with the math, so can anyone please explain this outcome?

Thanks.

New release in PyPi?

Is it possible to get a new release in PyPi? since the offline detection does not work in the release currently in PyPi.

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.