Giter VIP home page Giter VIP logo

aangelopoulos / conformal-prediction Goto Github PK

View Code? Open in Web Editor NEW
661.0 8.0 78.0 39.72 MB

Lightweight, useful implementation of conformal prediction on real data.

Home Page: http://people.eecs.berkeley.edu/~angelopoulos/blog/posts/gentle-intro/

License: MIT License

Python 0.68% Jupyter Notebook 99.18% MATLAB 0.14%
computer-vision conformal conformal-prediction distribution-shift natural-language-processing time-series time-series-prediction uncertainty uncertainty-estimation uncertainty-quantification

conformal-prediction's Introduction

Conformal Prediction

rigorous uncertainty quantification for any machine learning task

This repository is the easiest way to start using conformal prediction (a.k.a. conformal inference) on real data. Each of the notebooks applies conformal prediction to a real prediction problem with a state-of-the-art machine learning model.

No need to download the model or data in order to run conformal

Raw model outputs for several large-scale real-world datasets and a small amount of sample data from each dataset are downloaded automatically by the notebooks. You can develop and test conformal prediction methods entirely in this sandbox, without ever needing to run the original model or download the original data. Open a notebook to see the expected output. You can use these notebooks to experiment with existing methods or as templates to develop your own.

Example notebooks

Notebooks can be run immediately using the provided Google Colab links

Colab links are in the top cell of each notebook

To run these notebooks locally, you just need to have the correct dependencies installed and press run all cells! The notebooks will automatically download all required data and model outputs. You will need 1.5GB of space on your computer in order for the notebook to store the auto-downloaded data. If you want to see how we generated the precomputed model outputs and data subsamples, see the files in generation-scripts. There is one for each dataset. To create a conda environment with the correct dependencies, run conda env create -f environment.yml. If you still get a dependency error, make sure to activate the conformal environment within the Jupyter notebook.

Citation

This repository is meant to accompany our paper, the Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. In that paper is a detailed explanation of each example and attributions. If you find this repository useful, in addition to the relevant methods and datasets, please cite:

@article{angelopoulos2021gentle,
  title={A gentle introduction to conformal prediction and distribution-free uncertainty quantification},
  author={Angelopoulos, Anastasios N and Bates, Stephen},
  journal={arXiv preprint arXiv:2107.07511},
  year={2021}
}

Videos

If you're interested in learning about conformal prediction in video form, watch our videos below!

A Tutorial on Conformal Prediction

A Tutorial on Conformal Prediction Part 2: Conditional Coverage

A Tutorial on Conformal Prediction Part 3: Beyond Conformal Prediction

conformal-prediction's People

Contributors

aangelopoulos avatar harryzhangog avatar madhav-kanda avatar stephenbates19 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conformal-prediction's Issues

Score function for APS

Hello,

Thank you for providing these notebooks for conformal prediction, they have been immensely helpful.

Reading through the section 2.1 of the paper on "Classification with Adaptive Prediction Sets" and the associated notebook, I had some questions about the scoring function.

Namely, the paper provides the score function

$$s(x,y) = \sum_{j=1}^k \hat{f}(x)_{\pi_j(x)}$$

where $y = \pi_j(x)$. Why are we including $\hat{f}(x)_{y}$ in the sum? Doing so would lead to some possibly problematic scores. Consider, for example, a perfect predictor that assigns all of its mass to the correct label $y$, and a completely incorrect predictor that assigns all of its mass to some incorrect label $\ne y$. Both of these predictors would have the same score of 1. This breaks the assumption that a higher score corresponds to misalignment between the forecaster and the true label.

Investigating this issue further, I tried modifying the score function to greedily include all classes up to, but not including, the true label. Intuitively, a higher score would correspond to more probability mass assigned to incorrect labels, which is a better estimate of misalignment. Coding this up in the notebook for APS, this little fix increased the coverage slightly, but more importably it decreased the mean size of the confidence sets to 3.3 (compared to 187.5 in the original notebook). The confidence sets on the imagenet examples also seem to make more sense upon preliminary inspection. This could possibly address an issue raised previously.

Is there a typo/error in the score function of APS that would explain these results?

Thanks in advance!

Predictive uncertainty in weather-time-series-distribution-shift notebook

First of all, I would like to thank all contributors to this repository. Appreciate the great work that goes behind creating and maintaining this repository.

I was looking through the notebook, weather-time-series-distribution-shift.ipynb, and notice that in the last 4 lines of the second section, we have:
sort_idx = np.argsort(times)
pred_mean = pred_mean[sort_idx]
temperatures = temperatures[sort_idx]
times = times[sort_idx]

Should the sorting also be done for the uncertainty data, i.e., by adding the line pred_uncertainty=pred_uncertainty[sort_idx]?

Over-coverage/Coverage violation in APS-Randomized algorithm

Thank you for providing such a valuable resource. I have a few inquiries regarding the APS-Randomized algorithm.
To begin, I'd like to refer to the upper bound result for CP calibration, as stated in Theorem 2.2 of "Distribution-Free Predictive Inference For Regression":
$\mathbb{P}(Y_{test} \in C(X_{test}, U_{test}, \hat{q})) < 1 - \alpha + \frac{1}{n-1}$

Upon running the APS-Randomized algorithm for 100 trials, I observed a mean coverage of approximately 93%, consistent with the empirical coverage in the provided repo example (0.93020408163265). I can raise a rationale for this deviation: while the "split conformal algorithm" in the referenced paper operates with a deterministic model ($\mathcal{A}$), while in APS-Randomized, both the generated scores and the threshold are randomized, which may cause potential challenges.

Moreover, apart from the favorable over-coverage exhibited by this algorithm, its conditional coverage, quantifiable using metrics like SSCV, surpasses that of the APS algorithm outlined in the RAPS paper (which is RAPS with $\lambda = 0$), while maintaining identical set sizes. I'm interested in understanding the underlying rationale behind this algorithm and would appreciate insights into its origins, particularly if it was derived from a specific academic paper.
Thank you for your assistance!
Lahav.

Conformal risk control question

Hi, thank you so much for the great work. I have a question regarding the notebook on conformal risk control.

in the notebook, you defined the risk optimization objective as

def lamhat_threshold(lam): return false_negative_rate(cal_sgmd>=lam, cal_gt_masks) - ((n+1)/n*alpha - 1/(n+1))

However, in section 4.3 of the paper, it is defined as $$\alpha - \frac{B-\alpha}{n}$$, which means the denominator should be $n$. Why is it $n+1$ in the code? Thanks.

[Question] Why the upper bound of the selective risk is non-monotonic in the tutorial of selective classification?

While trying to understand the method for selective classification, I tried to run your code and to plot the curves of the selective risk and its upper bound.
This is the code that I added to plot these curves:

selective_risk_values = np.array([selective_risk(lam) for lam in lambdas])
selective_risk_values_ub = np.array([selective_risk_ub(lam) for lam in lambdas])
plt.plot(lambdas, selective_risk_values, label='Selective Risk')
plt.plot(lambdas, selective_risk_values_ub, label='Selective Risk upper bound')
plt.axhline(y=alpha, color='red', linestyle='--', label='alpha')
plt.legend()

I was expecting to see the upper bound to be monotonic and descending, but as you can see in the image below, it is not.

From the paper "Gentle introduction to conformal prediction", I assumed that the upper bound should be monotonic because it was introduced to overcome the issue related to the fact that the selective risk is non monotonic (section 5.5 Selective classification).

Screenshot 2024-01-19 alle 19 00 34

When testing it with my own dataset, I get an extreme example of this behaviour.

selective_risk

Is this an expected behaviour or the assumption of the upper bound being monotonic is wrong?

Thank you!

Improved baseline computation for conformal prediction under distribution shift

First many thanks for this awesome repo and the tutorial on split conformal prediction. While going through the conformal prediction under distribution drift section and the corresponding example weather prediction with time-series distribution shift, I noticed that the naive implementation for determining $\hat{q}$ uses expanding window that takes all scores up to time $t$ and compute the quantile and iterates over $t$

naive_qhats = np.array( [np.quantile(scores[:t], np.ceil((t+1)*(1-alpha))/t, interpolation='higher') for t in range(K+1, scores.shape[0]) ] )

But one can think of another approach by using a rolling window of fixed window size $K$ (in the example you were using $K=1000$) and then compute the quantile on each of the windows -- I rewrote and tested the function to support both options below

def compute_qhats(scores, alpha, wsize, opt='fixed_window'):
    lst = []
    qlst = []
    K = wsize
    for t in range(K+1, scores.shape[0]):
        if opt == 'fixed_window':
            start = t - K - 1
            nsamples = K + 1 # = t - start = t - (t - wsize -1) = wsize + 1 = K + 1    QED
        elif opt == 'expanding_window':
            start = 0
            nsamples = t
        q_level = np.ceil((nsamples+1)*(1-alpha))/nsamples
        qlst.append(q_level)
        tmp = np.quantile(scores[start:t], q_level, interpolation='higher')
        lst.append(tmp)
        print('start:', start, 'end:', t)
    return np.array(qlst), np.array(lst)

Plot of the $\hat{q}_{expandingwindow}$ expanding window approach (i.e. naive implementation)
Screenshot 2022-11-28 at 16 35 04

vs. the plot for e $\hat{q}_{rollingwindow}$ rolling window approach
Screenshot 2022-11-28 at 16 37 49

If we compare the results to the weighted conformal prediction approach
Screenshot 2022-11-28 at 16 45 05
it is very similar to the rolling window but with the cost of additional computation for finding infimum of q
$$\hat{q} = inf \{ q : \sum^{n}_{i=1} \tilde{w}_i \mathbb{1} \{s_i \leq q\}\geq 1- \alpha \}$$

that requires finding the roots of the expression above after moving the $1-\alpha$ to the left side of the inequality (i am using the generalized expression but in practice we are using the adaptation to window based version in section 5.3).

Lastly here is the comparison of coverage over time of the three approaches
Screenshot 2022-11-28 at 17 08 23

In fact when computing the overall coverage, the rolling window version achieves the best coverage with score 0.900665 vs. 0.8995545 for the weighted version.

It might be the case that for this data adding the constraint of finding the infimum does not add much, but overall if the argument of weighted conformal prediction is based on weighting the recent observation in a window then a sensibly defined window would be sufficient to counter the drift especially that we are not "learning the weights $w$" but rather fixing them to a uniform across the window in both cases (unless I am missing something here :) )

On a separate note, one minor issue in the code is the size of the window K. The way it is coded now it is translated to be K+1 observations used, and in the case of weighted conformal prediction, when computing qhats you omit the first observation from the computation.

Thank you again for your work and effort to make conformal prediction accessible to the masses.

Size of Prediction Sets using APS Different Than Reported in RAPS Paper

Hello,

Thank you so much for providing the conformal prediction tutorial & corresponding notebooks, they are super helpful!

I had a question regarding the size of the prediction sets returned using the APS methods. In the implementation provided in the notebooks, the prediction sets are far larger than reported on your paper than introduced RAPS. The notebook implementation returns sets that are on average >200 labels, whereas the paper reports an average set size of 10.4, on ResNet152.

I have not done extensive evaluation on RAPS, but it seems the notebook implementation also returns slightly larger sets (set size of ~3).

I was wondering if you have any ideas as to what might be causing this discrepancy, and what the best way to replicate the results in the paper might be.

Also, I wasn't sure which repo this issue should be opened in, so apologies if it doesn't fit here. Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.