Giter VIP home page Giter VIP logo

Comments (14)

pat-alt avatar pat-alt commented on June 5, 2024 2

There are many other ways to compute prediction intervals, but I think we should limit our focus here on CP. I also don't think I want to change the name of this package. One reason is that it also implements conformal classification and those predictions are set-valued. That being said, PredictionIntervals.jl sounds like a good name for an umbrella-type package that is designed to generate prediction intervals (different approaches beyond CP) for any supervised model.

from conformalprediction.jl.

ryantibs avatar ryantibs commented on June 5, 2024 1

Yes, that is correct. Of course if you're seeking guaranteed 90% coverage, so \alpha = 0.1, then you could always use the JK+ with \alpha = 0.05.

However I wouldn't say the interpretation is that the JK-minimax is better. The JK+ at level \alpha often has close to level 1-\alpha coverage in practice, and provably so under stability conditions (as the paper shows). In practice, the JK-minimax is often overly conservative. So practically, I would favor the JK+ in most scenarios.

from conformalprediction.jl.

pat-alt avatar pat-alt commented on June 5, 2024

Sure thing! I'll probably turn back to this later next week - happy to have a chat then. Obviously do also feel free to create a PR in the meantime

from conformalprediction.jl.

azev77 avatar azev77 commented on June 5, 2024

Here is some code for the Jackknife+ (and friends) Prediction intervals:

using Distributions, Plots, Random, Tables, GLMNet, MLJ;

n= 1_000; σ=1.0;
X = [ones(n) sin.(1:n) cos.(n:-1:1) exp.(sin.(1:n))]
θ = [9.0; 7.0; 3.0; 21]
Noise = randn(MersenneTwister(49), n) # n rows 
cef(x;θ=θ)  = x*θ;      # Conditional Expectation Function 
skedastic(x;σ=σ) = σ;   # Skedastic Function. Homoskedastic.
y = cef(X;θ=θ) + skedastic(X;σ=σ) .* Noise

train, calibration, test = partition(eachindex(y[:,1]), 0.4, 0.4)
#train, calibration, test = partition(eachindex(y[:,1]), 0.4, 0.4, shuffle=true, rng=444)

i_new = test[1] # index of new data point.
y_train = y[train]; X_train = X[train,:];

scfit(x,y) = GLMNet.glmnetcv(x, y, nlambda=1_000, alpha = 1.0) # Ridge/Lasso alpha = 0/1
scpr(m,x)  = GLMNet.predict(m, x);

α = 0.05 # miscoverage rate α∈[0,1]

# compute in-sample residuals in the training data 
m  = scfit(X_train, y_train);
ŷ = scpr(m, X); 
ŷ_new_is = ŷ[i_new];
res_is = y_train .- ŷ[train] # in-sample residuals.

# compute LOO residuals in the training data 
res_LOO, ŷ_new_LOO = [], [];
for tt in train 
    println(tt)
    #
    ty = y_train[train .!= tt]    # y_train leave out tt
    tX = X_train[train .!= tt,:]  # X_train leave out tt
    #
    tm = scfit(tX, ty)             # fit on training data minus row==tt
    tŷ = scpr(tm, X)               # predict on dataset
    #
    push!(res_LOO, y[tt] - tŷ[tt]) # LOO residual 
    push!(ŷ_new_LOO,    tŷ[i_new]) # LOO prediction 
end 
res_LOO
ŷ_new_LOO



"Naive PI: use in-sample residuals to estimate oos residuals"
"problem: in-sample residuals usually smaller than oos residuals"
err_Naive = quantile(abs.(res_is), 1 - α)
LB_Naive = ŷ_new_is - err_Naive 
UB_Naive = ŷ_new_is + err_Naive

"Jackknife PI: use training LOO residuals to estimate oos residuals"
"problem: sensitive to instability"
err_LOO = quantile(abs.(res_LOO), 1 - α)
LB_J = ŷ_new_is - err_LOO 
UB_J = ŷ_new_is + err_LOO

"Jackknife+ PI: use sample quantile of each ŷ_new_LOO[tt] adjusted by its own res_LOO[tt]"
LB_Jp = quantile(ŷ_new_LOO - abs.(res_LOO),   α)
UB_Jp = quantile(ŷ_new_LOO + abs.(res_LOO), 1-α) 

"Jackknife-minmax PI"
LB_Jmm = minimum(ŷ_new_LOO) - err_LOO
UB_Jmm = maximum(ŷ_new_LOO) + err_LOO
#
#v = ŷ_new_LOO - abs.(res_LOO);
#quantile(v,α) == -quantile(-v,1-α)  # SANITY CHECK 

LB_Naive, LB_J, LB_Jp, LB_Jmm
UB_Naive, UB_J, UB_Jp, UB_Jmm


#TODO:
"Split Conformal/Full Conformal/K-Fold CV+/K-fold cross-conformal"

@ablaom (I had trouble implementing in MLJ so I used GLMNet.jl)
The idea is pretty simple (as explained in the paper)
image

Naive prediction intervals (ignores overfitting):
image

image

Properties:
image

Notation (sample quantile):
image

image
image
image
image
image

Also:
image
image
image

Some theoretical results:
image
image
image

@ryantibs this is prob a dumb question (I prob misunderstood the paper), but suppose I want a prediction interval that contains Y_{n+1} 90% of the time (alpha=.10).
Does Theorem 1 imply the 90% Jackknife+ PI will contain it at least 80% of the time?
Does Theorem 2 imply the 90% Jackknife-minmax PI will contain it at least 90% of the time?
Then would that mean Jackknife-minmax PIs are "better" than Jackknife+?

from conformalprediction.jl.

azev77 avatar azev77 commented on June 5, 2024

Thanks @ryantibs! I re-read the paper.
If we assume exchangeability, coverage for JK+ is >= 1-2α
If we assume exchangeability & oos-stability, coverage for JK+ is >= 1-α {+ a term that vanishes as n gets big}

  1. are there analogous theoretical results that bound maximum coverage for these prediction intervals?
    I've seen "valid PIs" defined image

  2. IIUC what the paper calls "assumption-free theory" relies on the assumption of exchangeability. If we are predicting time-series data, this can be a very strong assumption...

from conformalprediction.jl.

pat-alt avatar pat-alt commented on June 5, 2024

Nice one @azev77 👍🏽 should be straight-forward to add this here. Happy to do that myself, but perhaps you'd prefer creating a PR so you'll show up as a contributor? (I'd like to turn to #5 first anyway and have some work to do this week on some other packages)

Edit: I've invited you as a collaborator and created a separate branch for this.

from conformalprediction.jl.

ryantibs avatar ryantibs commented on June 5, 2024

As a general comment, I would say that validity just means that the coverage is at least 1-\alpha. This just means you don't undercover. The upper bound is of course nice, and says that you don't overcover by "too much". Traditional conformal methods have this property.

  1. We don't have an overcoverage bound for JK+.

  2. "Assumption-free" is used to refer to the fact that there are no distributional assumptions on the joint distribution of (X,Y). But yes to your broader point, certainly I agree that i.i.d. (or exchangeable) sampling is key. None of the traditional conformal methods, JK+, or CV+ really apply outside of this setting. For time series you will have to look at extensions or adaptions of these techniques---there are by now several, take a look fro example at our "beyond exchangeability" paper and references therein.

from conformalprediction.jl.

azev77 avatar azev77 commented on June 5, 2024

@pat-alt thanks for being so open to collaborating.
I have some other work I gotta finish before I look into PRs.
In the mean time, here is a table I made to try to summarize some of the PI methods in @ryantibs's JK+paper:
(to do: implement heteroskedasticity robust (sorta) PIs in PNAS paper by @kwuthrich etal)
image
Notation:
image

from conformalprediction.jl.

azev77 avatar azev77 commented on June 5, 2024

I just tried to understand what the package is doing to compute Naive PIs for regression NaiveConformalRegressor.
I'll point out a small warning regarding terminology.

The JK+ paper (linked above) calls naive methods where the score is computed using in sample residuals.
-This is Naive bc we expect residuals outside the training data to be bigger...
The method this pkg calls naive uses oos residuals w/ outcomes in the calibration data
-probably different papers in the literature have different definitions for naive...

using MLJ 
import Statistics: quantile
EvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees

#Make Data 
X, y = randn(1_000, 2), randn(1_000)
train, calibration, test = partition(eachindex(y), 0.4, 0.4)
Xcal = X[calibration,:]; ycal = y[calibration];
Xnew = X[test,:];        ynew = y[test];

#Fit training data 
model = EvoTreeRegressor() 
mach = machine(model, X, y)
fit!(mach, rows=train)

#Get PIs manually w/o a pkg (NaiveConformalRegressor)

## 2: non-conformity scores w/ calibration data ##
ŷcal = MLJ.predict(mach, Xcal) # MLJ.predict(mach, rows=calibration)
scores_cal = @.(abs(ŷcal - ycal))
scores_cal = sort(scores_cal, rev=true) # sorted non-conformity scores

## 3: get PIs ##
α=0.05; # miscoverage α ∈ [0,1]
n = length(scores_cal)
p̂ = ceil(((n+1) * (1-α))) / n
p̂ = clamp(p̂, 0.0, 1.0)
q̂ = quantile(scores_cal, p̂)
ŷnew = MLJ.predict(mach, Xnew) # MLJ.predict(mach, rows=test)
PInew = map(x -> ["lower" => x .- q̂, "upper" => x .+ q̂],eachrow(ŷnew))

some comments (to consider maybe at some point):

  1. may be useful to allow the user to specify their own score() function as input?
  2. may be allow alternative cross-validation techniques (like in my JK+ code above)?

PS: before I risk letting "scope creep" let me get too carried away.
My goals are to implement:
-Jackknife/JK+ methods
-the DCP methods in this PNAS paper
-possibly compare w/ prob prediction methods such as NGBoost.py
-hopefully the pkg will become easy to use so other authors can add their own method...

from conformalprediction.jl.

pat-alt avatar pat-alt commented on June 5, 2024

Thanks @azev77

You're right, that was a misnomer. In fact, what I had referred to as NaiveConformalRegressor originally, is in fact a simple Inductive Conformal regressor (also referred to as Split Conformal Regressor, because it relies on a separate on splitting the training data). Jackknife and related methods do not use data splitting and therefore fall into the category of what this tutorial as transductive CP.

I've changed the code base (see #10) to incorporate that broad distinction: models of type InductiveConformalModel <: ConformalModel rely on a separate calibration step involving a calibration set, while models of type TransductiveConformalModel <: ConformalModel only need to be fitted on training data. I've made that distinction explicit for now.

I've also implemented Jackknife now (see #15). Adding other approaches is very straight-forward from here. Authors/contributors just need to subtype the relevant ConformalModel (either inductive of transductive) and dispatch the scoring logic. If you want to have a look at the way I've now implemented Jackknife [here], you should see what I have in mind in terms of extensibility.

I'll close this issue and related branch now, but feel free to continue discussion here.

from conformalprediction.jl.

pat-alt avatar pat-alt commented on June 5, 2024

Added multiple other approaches in #18 @azev77:

Regression:

  • Inductive
  • Naive Transductive
  • Jackknife
  • Jackknife+
  • Jackknife-minmax
  • CV+
  • CV-minmax

Classification:

  • Inductive (LABEL)
  • Adaptive Inductive

from conformalprediction.jl.

azev77 avatar azev77 commented on June 5, 2024

HI @pat-alt, I was actually hoping to add JK+ etc, I'm actually traveling (in Vienna rn, but wanna look into the code when back...)

from conformalprediction.jl.

pat-alt avatar pat-alt commented on June 5, 2024

Great - will be good to have second pair of eyes glance over it 🙏 safe travels!

from conformalprediction.jl.

azev77 avatar azev77 commented on June 5, 2024

I have a weird question about prediction interval terminology.

Are there other approaches to PIs besides conformal & JK+? @ryantibs @pat-alt @ablaom

What about packages such as ngboost.py which return an entire predicted distribution? What if we compute PIs implied by that predicted distribution, what are those PIs called?

I prob should've asked this question sooner, but is there any advantage to naming this pkg PredictionIntervals.jl & make it flexible enough to incorporate a variety of methods (including methods that haven't been thought of yet...)?

from conformalprediction.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.