Comments (5)
Thanks for starting a discussion on this. There are a couple of reasons I avoided returning the logLik class:
- I originally didn't implement the logLik function at all, this was suggested by a reviewer of the JSS paper so was added later.
- The changepoints listed are not valid for ANY penalty function, only the penalty that was used to generate the changepoint set. Thus just returning the logLik and allowing users to easily put this into AIC() and BIC() might encourage them to think that this is a valid thing to do.
- I return a two-length vector as users often want the raw cost function at the optimal value (for comparison with fits from other optimization procedures) alongside the optimised value from the algorithm used (the penalised log-likelihood+penalty).
- I return -2*log-like as this is the cost that is optimised - the documentation and output is very clear on this. I agree that this isn't standard and, whilst I'm open to changing this, it would not be backwards compatible so I'm conscious it may break (without error) a lot of existing code for users.
I'm open to hearing rebuttal to these points.
from changepoint.
I'm open to changing this functionality, especially for logLik() as it is bad practice.
2. This is a problem as people don't understand that when you run BIC() it isn't the optimal set. I have found, many times, that the majority of people that use the package are not stats people. They often find code to do things on the internet and replicate that code for their own data. This is great that people have access to use more complex stats methods because of this but the nuance that when they run lm() and then AIC() that is appropriate, but running cpt.*() and then BIC() isn't appropriate is lost. I enjoy trying to make code that these people can use and understand how to use easily.
3. As I said above, happy to change the logLik and add an objective_fun() or similarly name function to deal with the penalized likelihood calculation.
4. The name is hard because either it is a pre-set like BIC, MBIC, or it is "manual" and I don't think that makes sense as a name. Hence I've described it by the mathematical abstraction to words instead - this works for all penalty values.
As a side note, no one should ever be using AIC() with changepoint models - it asymptotically recovers too many changepoints.
from changepoint.
Oh, but also, I just realized my computation is wrong, because fitness()
should return the value of the objective function, not just the penalty!
from changepoint.
Thanks for the quick response!
FWIW, I still think it would be better to:
- change the behavior of
logLik()
to conform to the standards outlined above. [See also the "Value" section of the documentation forlogLik()
.] I guess I'm not sure what that would mean for backwards compatibility and how big of a deal that would be. - retain the function
likelihood()
with the current behavior (although I thinkobjective_fun()
would be a more accurate name)
With respect to your reasoning:
- LOL. Was it Reviewer 2??
- I'm new to this, so bear with me, but I think you mean "optimal" instead of "valid". That is, I think what you are saying is that since the changepoint set returned by the algorithm (let's call it
$\tau$ ) is only optimal under the use of the penalty function specified in the function call, it would be wrong to then use the log-likelihood reported by the algorithm, compute the AIC (or BIC or whatever different penalty function) and then claim that that value (AIC($\tau$ )) is the lowest possible AIC. That would be an incorrect interpretation of the AIC value, because in order for that claim to be justified, you would have to run the algorithm again with this penalty function specified. So that's what you're worried about. We're in agreement on that. However, I don't understand what would be invalid about simply computing AIC($\tau$ ). It's just a value, and as long as you understand that it's not necessarily the optimal value, I don't see what the danger is. In fact, I suspect it might be useful in a comparative analysis of penalty functions, etc. - Fair enough, but to me this adds grist to my mill, because the two values that you're reporting are more clearly reported by the behavior I've outlined through the functions
logLik()
andAIC()
. If these functions worked as expected as above, it would be more straightforward to do the kind of comparative analysis you're talking about. If you want the penalty value, and you're using AIC, then just runAIC(x)
and you get that value. In the current framework, it'slogLik(x)[2]
, which is not so obvious. Also, if you want the actual log-likelihood, you have to compute-logLik(x)/2
, which is also weird. - I would think about a different function, called
objective_fun()
(or whatever) that returns this value. I would also change thename
of the vector to be the name of the penalty function. So in the above case, instead of-2*Loglike+pen
the name would beAIC
(ormBIC
or whatever was appropriate).
In general, I think function names should reflect what the function does as much as possible, and methods should work within the guidelines of their corresponding generic. So logLik()
should return a log-likelihood value (and corresponding class as defined in stats
), AIC()
should return an AIC value, and objective_fun()
should return the value of the objective function. [Generally, each function should probably return only one value.]
Thanks for considering this. If you're curious, this is the compatibility layer I've written for changepoint
in tidychangepoint
. So I have already have my workaround. But as per Hadley's advice, I'm trying to push the logLik.cpt()
method into changepoint
rather than overwriting it (bad manners!) in tidychangepoint
! :)
from changepoint.
Thanks for indulging me. :)
FYI, I'm now using fitness()
as a generic method to return a named vector with the value of the objective function. So it works like this:
library(tidychangepoint)
x <- segment(CET, method = "pelt")
#> method: pelt
fitness(x)
#> MBIC
#> 23.56658
x$segmenter
#> Class 'cpt' : Changepoint Object
#> ~~ : S4 class containing 12 slots with names
#> cpttype date version data.set method test.stat pen.type pen.value minseglen cpts ncpts.max param.est
#>
#> Created on : Thu Jun 8 14:09:02 2023
#>
#> summary(.) :
#> ----------
#> Created Using changepoint version 2.2.4
#> Changepoint type : Change in mean and variance
#> Method of analysis : PELT
#> Test Statistic : Normal
#> Type of penalty : MBIC with value, 23.56658
#> Minimum Segment Length : 2
#> Maximum no. of cpts : Inf
#> Changepoint Locations : 55 57 309 311 330
Created on 2024-04-23 with reprex v2.1.0
from changepoint.
Related Issues (20)
- binseg returns incorrect segment means HOT 5
- NegBin distribution / R-GSOC'21 project? HOT 3
- cpt.meanvar return cost value? HOT 3
- `plot` of Changepoint object is not working HOT 3
- Segment check or residual check? HOT 4
- Q for cpt.meanvar with method=SegNeigh and sumstat = Poisson or Exp HOT 1
- logLik cpt.meanvar output when method="SegNeigh"
- availability of changeppoint.influence library HOT 2
- cpt.meanvar returns an extra changepoint location when locations are called directly HOT 2
- Should cpt.var allow minseglen = 1? HOT 1
- Can cpt.reg class still be used? HOT 1
- SegNeigh/SNIP GSOC project? HOT 2
- cpt.mean doesn't accept a ts object HOT 3
- BinSeg returns zeros in cpts.full matrix
- cpt.mean(method="BinSeg") slower than expected/optimal for large number of data and changes HOT 1
- diagnostic = TRUE does not work HOT 1
- Checking for 1D objects via `is.null(dim(x))` breaks for 1D-arrays HOT 3
- possible bug in logLik() computation HOT 2
- figure out reasonable calculation for fitted.nhpp() HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from changepoint.