efs-opensource / calibration-framework Goto Github PK

The net:cal calibration framework is a Python 3 library for measuring and mitigating miscalibration of uncertainty estimates, e.g., by a neural network.

Home Page: https://efs-opensource.github.io/calibration-framework/

License: Apache License 2.0

Python 100.00%

calibration-framework's Issues

ReliabilityDiagram fails to import because of tikzplotlib

When trying to plot a ReliabilityDiagram I got this traceback:

File "REDACTED", line 31, in
from netcal.presentation import ReliabilityDiagram
File "REDACTED.venv\Lib\site-packages\netcal\presentation_init_.py", line 25, in
from .ReliabilityDiagram import ReliabilityDiagram
File "REDACTED.venv\Lib\site-packages\netcal\presentation\ReliabilityDiagram.py", line 14, in
import tikzplotlib
File "REDACTED.venv\Lib\site-packages\tikzplotlib_init_.py", line 5, in
from ._save import Flavors, get_tikz_code, save
from . import _axes
File "REDACTED.venv\Lib\site-packages\tikzplotlib_axes.py", line 3, in
from matplotlib.backends.backend_pgf import (
ImportError: cannot import name 'common_texification' from 'matplotlib.backends.backend_pgf' (REDACTED.venv\Lib\site-packages\matplotlib\backends\backend_pgf.py)

This seems to me to be caused by this issue in the tikzplotlib library and a quick fix would be to downgrade matplotlib to before 3.8.

Pyro import fails in 1.2.1 netcal.scaling

In 1.2.1 importing netcal.scaling results in the following error:

Traceback (most recent call last):
  File "/path_to_script/ecal.py", line 3, in <module>
    from netcal.scaling import TemperatureScaling, LogisticCalibration
  File "/path_to_miniconda/lib/python3.8/site-packages/netcal/scaling/__init__.py", line 28, in <module>
    from .AbstractLogisticRegression import AbstractLogisticRegression
  File "/path_to_miniconda/lib/python3.8/site-packages/netcal/scaling/AbstractLogisticRegression.py", line 26, in <module>
    import pyro
  File "/path_to_miniconda/lib/python3.8/site-packages/pyro/__init__.py", line 4, in <module>
    import pyro.poutine as poutine
  File "/path_to_miniconda/lib/python3.8/site-packages/pyro/poutine/__init__.py", line 4, in <module>
    from .handlers import (
  File "/path_to_miniconda/lib/python3.8/site-packages/pyro/poutine/handlers.py", line 60, in <module>
    from .collapse_messenger import CollapseMessenger
  File "/path_to_miniconda/lib/python3.8/site-packages/pyro/poutine/collapse_messenger.py", line 7, in <module>
    from pyro.distributions.distribution import COERCIONS
  File "/path_to_miniconda/lib/python3.8/site-packages/pyro/distributions/__init__.py", line 4, in <module>
    import pyro.distributions.torch_patch  # noqa F403
  File "/path_to_miniconda/lib/python3.8/site-packages/pyro/distributions/torch_patch.py", line 87, in <module>
    @patch_dependency("torch.distributions.constraints._CorrCholesky.check")
  File "/path_to_miniconda/lib/python3.8/site-packages/pyro/distributions/torch_patch.py", line 18, in patch_dependency
    module = getattr(module, part)
AttributeError: module 'torch.distributions.constraints' has no attribute '_CorrCholesky'

This is with pytorch 1.7.1, python 3.8, pyro-ppl 1.7.0. Reproduction is as simple as import netcal.scaling.

Is it accuracy - or is it the relative frequency of positive examples in the bin?

Dear Fabian,

Thank you for the time you put into this repo and for open sourcing your code!

I have never used netcal before, and so I found myself comparing it to other libraries/pieces of code that do similar things. Concerning the visualisation function(s), specifically netcal.presentation.ReliabilityDiagram, I was wondering: is the quantity you plot on the y axis really the accuracy, or is it the relative frequency of positive examples in each bin (as, from my understanding, it should be in calibration curves)?

Checking the code here, in particular this snippet:

for batch_X, batch_matched, batch_hist, batch_median in zip(X, matched, histograms, median_confidence):
            acc_hist, conf_hist, _, num_samples_hist = batch_hist
            empty_bins, = np.nonzero(num_samples_hist == 0)

            # calculate overall mean accuracy and confidence
            mean_acc.append(np.mean(batch_matched))
            mean_conf.append(np.mean(batch_X))

assuming batch_matched stores the ground truth labels for each batch, I am pretty confident that should not be named "accuracy" (still - I confess I have not spent a lot of time trying to understanding perfectly what the various function should return).

I have also tried to compare the results from netcal with scikit-learn calibration_curve function - whose documentation state returns "the proportion of samples whose class is the positive class, in each bin (fraction of positives)", and the results look very similar, if not identical, to what I get with netcal.

It would be amazing if you could clarify this!

Cheers,
Dennis.

Basic binary classification case

Hi, I'm having problems understanding what's the proper use of the library for a very simple binary classifier. I have a 1-D array of binary labels {0, 1} and a 1-D array of model predictions with probability values p in range (0, 1). Those values reflect the probability of a positive class.

Plugging those values into e.g. the reliability diagram, I got the following plot:

Confidence histogram makes sense to me, as most samples are negative and classifier correctly assigns a low probability. But I'm not sure how to interpret the reliability diagram -- what do the dark red bars suggest here? Also, ECE I received is very high (>0.8).

I tried to reverse the probabilities for negative samples, i.e. if a label is 0, then the probability is (1-p). This gives a more justifiable plot:

Could you confirm that for negative samples the probability should reflect probability of a negative class, not the positive class, even in a binary classification case?

Also, it might be worth clarifying that the confidence estimates for some functions (e.g. Platt's / temperature scaling) are supposed to be in the prediction space and not logit space. After reading official papers and implementations it might be confusing because conversion prediction -> logit is done behind the scenes, and information in docs about this would be helpful.

Seeing strange behavior in reliability diagram for histogram binning

I've run HistogramBinning on several datasets and always seem to end up with an empty right-most bin on the post-calibration reliability diagram (see figure below).

Bug in Reliability Diagram

The ReliabilityDiagram creates bins with values which are not in the input.
Code to reproduce it:

import numpy as np
import matplotlib.pyplot as plt
from netcal.presentation import ReliabilityDiagram
# Generate true and predicts values
y_true = np.random.randint(0, 2, 100).astype(np.float32)
# Generate perfect predictions:
y_pred = y_true.copy() 
n_bins = 10
diagram = ReliabilityDiagram(n_bins)
_ = diagram.plot(y_pred, y_true)

ece algorithm has bug using for binary classification

the parameter I pass: X is the proba, y is the label vector.
I found that the result is different from which i get from tfp.stats.expected_calibration_error.
I check both codes, and I found that it is different to caculate the acc for each bin. I think the code in netcal.metrics.MIscalibration.py 386 line may be wrong, the code is

matched = np.array(y)
but even if y true label is zero, it can be matched as long as the predicted label is 0.

In your code, you calculate the acc as the portion of the positive sample of the total samples when the y is one dim.

Error in the code of temperature scaling

Based on the formula for TS ,which is softmax(z/T) where T is the Temperature.But in the repository,the code represents softmax(z*T),where weight T is calculated.Can you please confirm this.

TemperatureScaling

Why is TemperatureScaling applied to confidences instead of logits?

Mean Accuracy Treshold

How is the mean accuracy in the ReliabilityDiagram Calculated? What is the threshold used to select a binary outcome?
Would it be possible to add a parameter to use as a threshold?

Thanks

Missing sdist URL in pypi

Hi,

thank you for the library! I am using it to compute some calibration metrics like ECE and so on, so it came in really handy.

I have one small request: could you provide a link to the sdist in pypi? The reason I need it is a fairly unusual one - I want to use netcal withi a custom pyodide bin in a frontend application and wanted to create a pyodide package with their mkpgk wrapper (see https://pyodide.readthedocs.io/en/latest/new_packages.html). This failed with the error
Exception: No sdist URL found for package netcal (https://pypi.org/project/netcal/).

I can root around it, of course, but it would be an easy fix to make it work :)

Reliability diagram correctness

Hi,when I tried to plot the Reliability diagram for CIFAR 10 resnet110 model,the plot contains the blue region's filled for low level bins also even though there are no probability values present in those bins.Is this anything that is default in the code

Issue with Probabilistic Regression GPBeta

I'm getting the following error when using GPBeta. Please help resolve this issue.

TypeError Traceback (most recent call last)
Cell In[3], line 92
87 varscaling.fit((ensemble_avg_flats_np, ensemble_std_flats_np), label_flats_np)
89 # Ensure the jitter value is set appropriately for gpytorch
90 # gpytorch.settings.cholesky_jitter.value = jitter
---> 92 gpbeta.fit((ensemble_avg_flats_np, ensemble_std_flats_np), label_flats_np)
94 # # Corrected fitting for GPBeta
95 # gpbeta.jitter = jitter # Set the jitter value in the gpnormal instance
96 # with gpytorch.settings.cholesky_jitter(jitter):
97 # gpbeta.fit((ensemble_avg_flats_np, ensemble_std_flats_np), label_flats_np)
98
99 # Save the calibration models
100 varscaling.save_model(os.path.join(base_dir, 'var_scaling.pkl'))

File ~/miniconda3/envs/nnUNet/lib/python3.11/site-packages/netcal/regression/gp/AbstractGP.py:693, in AbstractGP.fit(self, X, y, tensorboard)
690 best_loss, best_parameters = float('inf'), {}
692 # enter optimization loop and iterate over epochs and batches
--> 693 with gpytorch.settings.cholesky_jitter(float=self.jitter, double=self.jitter), tqdm(total=self.n_epochs) as pbar:
694 step = 0
695 for epoch in range(self.n_epochs):

TypeError: _dtype_value_context.init() got an unexpected keyword argument 'float'

Feature request: Allow for pre-existing figure object for rendering reliability diagrams

Feature request: allow the plotting routines for reliability diagram to take pre-existing Matplotlib Figure instances to work with so that the user is able to perform the figure handling.
Add new optional parameter to consume figure object (not axis objects, as most of the diagrams create multiple subplots).

Getting Error While Installing Netcal (Pyro-ppl library Issue)

File "/workspace/system-paper/main_util.py", line 23, in
from netcal.scaling import TemperatureScaling
File "/opt/conda/lib/python3.6/site-packages/netcal/scaling/init.py", line 28, in
from .AbstractLogisticRegression import AbstractLogisticRegression
File "/opt/conda/lib/python3.6/site-packages/netcal/scaling/AbstractLogisticRegression.py", line 26, in
import pyro
File "/opt/conda/lib/python3.6/site-packages/pyro/init.py", line 4, in
import pyro.poutine as poutine
File "/opt/conda/lib/python3.6/site-packages/pyro/poutine/init.py", line 4, in
from .handlers import (
File "/opt/conda/lib/python3.6/site-packages/pyro/poutine/handlers.py", line 60, in
from .collapse_messenger import CollapseMessenger
File "/opt/conda/lib/python3.6/site-packages/pyro/poutine/collapse_messenger.py", line 7, in
from pyro.distributions.distribution import COERCIONS
File "/opt/conda/lib/python3.6/site-packages/pyro/distributions/init.py", line 4, in
import pyro.distributions.torch_patch # noqa F403
File "/opt/conda/lib/python3.6/site-packages/pyro/distributions/torch_patch.py", line 87, in
@patch_dependency("torch.distributions.constraints._CorrCholesky.check")
File "/opt/conda/lib/python3.6/site-packages/pyro/distributions/torch_patch.py", line 18, in patch_dependency
module = getattr(module, part)
AttributeError: module 'torch.distributions.constraints' has no attribute '_CorrCholesky

NaN outputs

Sometimes I get NaN with the transform function.
In these cases, the below warning is observed when calling fit function:
/usr/local/lib/python3.10/dist-packages/netcal/binning/HistogramBinning.py:280: RuntimeWarning: invalid value encountered in divide
calibrated = np.divide(calibrated, normalizer)

how use multi-classes logits in detection calibration?

Since we use inverse_sigmoid to reconstruct the logits, Why not use the multi-class output directly?
So the LogisticCalibration model can learn the correlation between different classes?

Where can I view the paper "Bayesian Confidence Calibration for Epistemic Uncertainty Modelling"?

Where can I view the paper "Bayesian Confidence Calibration for Epistemic Uncertainty Modelling"?
I can't find it anywhere on the internet including on google scholar search. I can only find is the abstract. I would like to have a read for reference to my paper related to model calibration.
If the paper is yet published, can you please generously send me a copy to [email protected]?
Thank you.

Support for vector and matrix scaling methods

Can netcal support vector and matrix scaling methods

Wrong Identification of Multi-Class Classification in Metrics Calculation

Setup:

Using metrics package for classification (e.g., netcal.metrics.ECE)
Binary classification (number of distinct ground-truth labels: 2)
Input array with shape (n, 2) with n samples and confidence scores for the negative/positive classes, respectively

In this scenario, the metric erroneously identifies the input as mulit-class, although the input is binary. This results in an error.

inference with calibrated model

Once i have calibrated my model, how can i use the calibrated model to run inference on an image

ReliabilityDiagram.plot() makes duplicate copy of figure

Code to recreate (I ran it in Google Colab):

!pip install netcal

import numpy as np
from netcal.presentation import ReliabilityDiagram

conf = np.random.rand(1000)
ground = np.random.randint(0, 2, 1000)

diag = ReliabilityDiagram(20)
diag.plot(conf, ground)

Results of !pip show netcal:

Name: netcal
Version: 1.3.5
Summary: The net:cal calibration framework is a Python 3 library for measuring and mitigating miscalibration of uncertainty estimates, e.g., by a neural network.
Home-page: 
Author: Fabian Küppers
Author-email: [[email protected]](mailto:[email protected])
License: Apache-2.0
Location: /usr/local/lib/python3.9/dist-packages
Requires: gpytorch, matplotlib, numpy, pyro-ppl, scikit-learn, scipy, tensorboard, tikzplotlib, torch, torchvision, tqdm
Required-by:

RuntimeError: On detection mode, it is mandatory to provide binary labels y in [0,1].

Hi,
I am passing the below input to the LogisticCalibration, but it is giving the runtime error.

`confidence_scores = np.array([0.70745564, 0.71694]
matched = np.array([1, 1] # as both are boxes are matched with the ground truth's
relative_x_position = np.array([0.7543349742889405, 0.24766819924116135])
input = np.stack((confidences_scores, relative_x_position), axis=1)

lr = LogisticCalibration(detection=True, use_cuda=False) # flag 'detection=True' is mandatory for this method
lr.fit(input, matched)
calibrated = lr.transform(input)
`

Thanks

Incorrect documentation for ECE usage

In the readme and API reference docs:

from netcal.metrics import ECE

n_bins = 10

ece = ECE(n_bins)
uncalibrated_score = ece.measure(confidences)
calibrated_score = ece.measure(calibrated)

This triggers:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-ea3fcf5ef398> in <module>
      4 
      5 ece = ECE(n_bins)
----> 6 uncalibrated_score = ece.measure(confidences)
      7 calibrated_score = ece.measure(calibrated)
      8 print('uncalibrated_score', uncalibrated_score)

TypeError: measure() missing 1 required positional argument: 'y'

The correct documentation is:

from netcal.metrics import ECE

n_bins = 10

ece = ECE(n_bins)
uncalibrated_score = ece.measure(confidences, ground_truth)
calibrated_score = ece.measure(calibrated, ground_truth)

Pickling objects

I wanted to pickle the LogisticCalibration() class after I had fit it (for later re-use), but I was getting an error related to can't pickle _thread.Rlock objects.

I was able to find a work-around by setting logger in the class no None. Seems a bit hacky, but it did work. Might be something to think about in future releases

TemperatureScaling().transform() return input confidences.

I run CIFAR.py example in the classification folder, and print both the inputs and outputs of TemperatureScaling().transform(), the inputs and outputs are always the same. Why the calibration doesn't work?

transform single prediction

Hi,
I want to transform my predicted output one at a time. However, it's throwing error as it squeezes (1,num_classes) shaped output to (num_classes).
Thanks!

TemperatureScaling().transform() binary-case output

TemperatureScaling().transform() returns confidences for the second class in the binary classification case. This behavior seems unintuitive; ideally return shape should be (n, k), but if it has to be (n,) then it should return confidences of the first class rather than the second.

Numpy explicit dtypes

According to https://numpy.org/doc/stable/release/1.20.0-notes.html#deprecations, the "generic" Numpy dtypes np.float and np.int were deprecated and have been removed in the latest Numpy release 1.24.0 according to https://numpy.org/doc/stable/release/1.24.0-notes.html#expired-deprecations .
An update is required to remove the "generic" dtypes completely.

mean_estimate flag in scaling calibration transform method does not work as intended

When the mean_estimate parameter is flagged as True, it returns the error expected np.ndarray (got Tensor). This only happens when method in the constructor is initialized as 'variational' or 'mcmc'. When method is initialized as 'mle', there are no problems yielding numpy array, i.e. inputs to the fit and transform are valid.

Re-using HistogramBinning

Hi, I noticed the following behavior when using HistogramBinning:

from netcal.binning import HistogramBinning
import numpy as np

labels = np.random.randint(2, size=(100,))
preds = np.random.uniform(size=(100,))

estimator = HistogramBinning()
for i in range(2):
    print(f"Loop {i}")
    estimator.fit(preds, labels)

with the code above, first loop will run correctly but second will throw AttributeError: Parameter 'bins' must be int for classification mode. (as this line changes bins from int to array).

This can be fixed by re-initializing HistogramBinning every time in the loop, but this error doesn't show up in other estimators so I thought it would be worth bringing up here. Maybe there's a way to avoid this, and if not I'll keep this issue for others that might encounter this problem.

DType Error for LogisticCalibrationDependent

I recently upgraded from netcal version 1.2.1 to 1.3.1, and now I can no longer fit a LogisticCalibrationDependent instance without the following error occurring: RuntimeError: Found dtype Double but expected Float. My code matches the examples in terms of dtypes for the features (np.float32) and matched vector (np.int32). Exact same code works with version 1.2.1.

The error is being thrown from the following line in AbstractLogisticRegression (line 582 according to pdb):

torch.nn.BCELoss(reduction='mean')(torch.sigmoid(x), y)

I'm using PyTorch version 1.11.0.

Question about input range of multivariate confidence calibration

Hello, I would like to ask a question that arose while doing research using the great platform you provide.
This question is about the function fit() of the class Abstract Calibration implemented in netcal/AbstractCalibration.py.
Looking at line 164, regardless of the task (whether classification or detection), the range of input X is limited to a value between 0 and 1.

If calibration is performed using box parameters together, elements such as width and length will be outside the range. Is there a reason why you implemented it as above?
Also, if I want to use box parameters, could you please recommend how to convert them to that range and calibrate them?

ENIR

The following error sometimes occurs when working with ENIR

ValueError: Array of size zero to minimum decrement operation that has no identity

ECE measrue error -" ValueError: The dimension of bins must be equal to the dimension of the sample x."

Hi,

I'm trying to use the EC.Emeasrue function, in accorgance with the example in the readme, but get the following error:
ValueError: The dimension of bins must be equal to the dimension of the sample x.

I'm running this dummy exampe:

import numpy as np

ground_truth = np.asarray([1, 1, 0]) 
confidences = np.asarray([[0.1, 0.8], [0.3, 0.7,], [0.2, 0.8]]) 

n_bins = 10
ece = ECE(n_bins)
uncalibrated_score = ece.measure(confidences, ground_truth)

The function return value when the confidences are of shape (n_samples, ).

Am I doing something wrong?

EDIT: It's seems that when going to 3 class classification, its workin and 2-classes classification must be formulized a single-logit.

Thanks

How to extract temperature value for future use?

Hello,
I am not able to figure out ,how to extract temperature value of my model so that i can use that in my test data.
thanks

Use of nan parameters in np.nan_to_num requires numpy>=1.17

In Miscalibration.py, L233 you are using the "nan" parameters in np.nan_to_num. This parameters was added in version 1.17 of numpy. You probably need to update the requirements for your package to use numpy 1.17 :)

Temperature scaling for Multi-label classification

If we were to use Temperature scaling for Multi-label classification, do we work under the assumption that every class is independent of each other and perform calibration on each of our classes independently?

LogisticCalibration implementation differences to scikit-learn

Hi @fabiankueppers,

Thanks for creating this great library. It works perfectly for our use case 😊 There's just one thing I'm wondering about:

For LogisticCalibration, the documentation states that it implements Platt scaling. However, I've found that it yields quite different results than when implementing it with the logistic model in sklearn.

So this

from netcal.scaling import LogisticCalibration

LC = LogisticCalibration()
LC.fit(np.array(pred), np.array(labels))
calibrated_prob = LC.transform(np.array(pred))

gives very different results from this:

from sklearn.linear_model import LogisticRegression as LR
lr = LR().fit(np.reshape(pred,(-1,1)), labels)
calibrated_prob = lr.predict_proba(np.reshape(pred,(-1,1)))[:,1]

Are there any intended differences between your implementation and sklearn? Or are we just comparing it wrong?

I've found it challenging to tell by looking at the code alone.

Thanks,
Patrick

How to evaluate using D-ECE

Hi I am trying to evaluate my object detection model
Should I concat predictions for all images together, or how to do it
Is there code with working example?
Thanks

Problems measuring miscalibration

I'm trying to do this, as you pointed out:
uncalibrated_score = ece.measure(confidences)

but I'm getting this error:
TypeError: measure() is missing 1 required positional argument: 'y'

confidences is a NumPy object already:
{ndarray: (512, 8)}

### EDIT

I've added the ground truth as you did in one of your examples.
uncalibrated_score = ece.measure(confidences, ground_truth)

Where ground_truth are the encoded labels. Neither confidences nor ground_truth have NaN values, but I'm getting:
TypeError: nan_to_num() got an unexpected keyword argument 'nan'

### EDIT
YOU NEED NUMPY >= 1.17 FOR THIS TO WORK.

Can this be used in face verifaction and how?

Classification example not running

I installed the calibration framework from scratch as described and was using an conda environment, but when I try to run the classification examples I got an error. It looks like that it is related to the ENIR.

Get path of all Near Isotonic Regression models with mPAVA ...
Traceback (most recent call last):
File "/home/labor/calibration-framework/examples/classification/CIFAR.py", line 169, in
cross_validation(model, use_cuda=use_cuda, domain=domain)
File "/home/labor/calibration-framework/examples/classification/CIFAR.py", line 135, in cross_validation
success = cross_validation_5_2(models=models, datafile=datafile, bins=bins, save_models=save_models, domain=domain)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/Decorator.py", line 90, in new_f
return f(*args, **kwds)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/examples/classification/utils.py", line 236, in cross_validation_5_2
instance.fit(build_set_sm, build_set_gt)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/Decorator.py", line 62, in new_f
return f(*args, **kwds)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/binning/ENIR.py", line 239, in fit
self._multiclass_instances = self._create_one_vs_all_models(X, y, ENIR, self.score_function,
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/AbstractCalibration.py", line 568, in _create_one_vs_all_models
model.fit(onevsall_confidence, onevsall_ground_truth)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/Decorator.py", line 62, in new_f
return f(*args, **kwds)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/binning/ENIR.py", line 270, in fit
self._model_scores, self._binning_models = self._elbow(X, y, model_list, self.score_function, alpha=0.001)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/Decorator.py", line 35, in new_f
return f(*args, **kwds)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/AbstractCalibration.py", line 498, in _elbow
model_scores = self._calc_model_scores(confidences, ground_truth, model_list, score_function)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/Decorator.py", line 35, in new_f
return f(*args, **kwds)
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/netcal/AbstractCalibration.py", line 468, in _calc_model_scores
model_scores = np.exp((np.min(score) - score) / 2.)
File "<array_function internals>", line 200, in amin
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2946, in amin
return _wrapreduction(a, np.minimum, 'min', axis, None, out,
File "/home/labor/miniconda3/envs/cal/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity

Used setup:

Ubuntu 20.04

conda environment
conda create --name cal python=3.10
conda activate cal
Repo clone and install
git clone https://github.com/EFS-OpenSource/calibration-framework
cd calibration-framework/
python3 -m pip install .
Execute examples
cd examples/classification/
python CIFAR.py

netcal.binning.BBQ.transform() sometimes returns values that are outside of the [0,1] range

Code to reproduce issue:
`
#insert here any model to calculate the confidence array, I got this error with multiple different models in multiple different datasets for binary classification

            bbq_calibration = BBQ()
            bbq_calibration.fit(y_conf_cal[:,1], y_cal)
            y_conf_bbq = bbq_calibration.transform(y_conf_cal[:,1])

`
sometimes the y_conf_bbq would contain values that go outside 0 and 1, I suspect that it is a floating point error since when I tested to see what numbers it gave outside the [0,1] range I got 1.0000000000000002, but as it was relatively rare I did not try multiple times to see wether different anomalous values are possible.
If indeed it is a floating point error simply clipping the output should be fine to fix this error.

Is classification logit or probability used as input for temperature scaling?

I run the classification examplar code for CIFAR dataset, I find the .npz files used store the classification probability instead of the classification logit. Does it mean there is an discrepancy between the original temperature alrogithm [1] and this reproduced algorithm? Thanks for your explanation.

[1] Chuan Guo, Geoff Pleiss, Yu Sun and Kilian Q. Weinberger: "On Calibration of Modern Neural Networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. Get source online https://arxiv.org/abs/1706.04599`_

SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Hello,

I have this error when trying to run the following:
from netcal.presentation import ReliabilityDiagram

SystemError Traceback (most recent call last)
File ~/miniconda3/envs/drain/lib/python3.11/site-packages/IPython/core/formatters.py:340, in BaseFormatter.call(self, obj)
338 pass
339 else:
--> 340 return printer(obj)
341 # Finally look for special method names
342 method = get_real_method(obj, self.print_method)

File ~/miniconda3/envs/drain/lib/python3.11/site-packages/IPython/core/pylabtools.py:152, in print_figure(fig, fmt, bbox_inches, base64, **kwargs)
149 from matplotlib.backend_bases import FigureCanvasBase
150 FigureCanvasBase(fig)
--> 152 fig.canvas.print_figure(bytes_io, **kw)
153 data = bytes_io.getvalue()
154 if fmt == 'svg':

File ~/miniconda3/envs/drain/lib/python3.11/site-packages/matplotlib/backend_bases.py:2042, in FigureCanvasBase.print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, **kwargs)
2036 if bbox_inches:
2037 # call adjust_bbox to save only the given area
2038 if bbox_inches == "tight":
2039 # When bbox_inches == "tight", it saves the figure twice.
2040 # The first save command (to a BytesIO) is just to estimate
2041 # the bounding box of the figure.
-> 2042 result = print_method(
2043 io.BytesIO(),
...
521 cbook.open_file_cm(filename_or_obj, "wb") as fh:
--> 522 _png.write_png(renderer._renderer, fh,
523 self.figure.dpi, metadata=metadata)

SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats

Information on relative_x_position variable

Hi and thanks for this great repo,

I'm trying to use the repo to calibrate the confidence scores from a BERT model I fine-tuned. My problem is a binary classification and I want to use Platt Scaling (LogisticCalibration class). I am not sure I understand what the relative_x_position variable refers to? Could you please help me understand this?

Thanks a lot in advance.

LogisticCalibration use _inverse_sigmoid

Hi Fabian Küppers
when use LogisticCalibration() for binary class, why use _inverse_sigmoid(X) rather than X.

version 1.0 netcal    
`        # if binary, use sigmoid instead of softmax
    if self.num_classes <= 2 or self.independent_probabilities:
        logit = self._inverse_sigmoid(X) 
    else:
        logit = self._inverse_softmax(X)

    # otherwise, use SciPy optimzation. Usually, this is much faster
    if self.num_classes > 2:
        # convert ground truth to one hot if not binary
        y = self._get_one_hot_encoded_labels(y, self.num_classes)

    # if temperature scaling, fit single parameter
    if self.temperature_only:
        theta_0 = np.array(1.0)

    # else fit bias and weights for each class (one parameter on binary)
    else:
        if self._is_binary_classification():
            theta_0 = np.array([0.0, 1.0])
        else:
            theta_0 = np.concatenate((np.zeros(self.num_classes), np.ones(self.num_classes)))

    # perform minimization of squared loss - invoke SciPy optimization suite
    result = optimize.minimize(fun=self._loss_function, x0=theta_0,
                               args=(logit, y))`

Thanks

efs-opensource / calibration-framework Goto Github PK

calibration-framework's Issues

Recommend Projects

Recommend Topics

Recommend Org