Giter VIP home page Giter VIP logo

mlr3fairness's Introduction

Machine Learning Fairness Extension for mlr3.

r-cmd-check CRAN status StackOverflow Mattermost

Installation

Install the development version from github:

remotes::install_github("mlr-org/mlr3fairness")

Why should I care about fairness in machine learning?

Machine Learning model predictions can be skewed by a range of factors and thus might be considered unfair towards certain groups or individuals. An example would be the COMPAS algorithm, which is a popular commercial algorithm used by judges and parole officers for scoring criminal defendant’s likelihood of reoffending (recidivism). Studies have shown, that the algorithm might be biased in favor of white defendants. Biases can occur in a large variety of situations where algorithms automate or support human decision making e.g. credit checks, automatic HR tools along with a variety of other domains.

The goal of mlr3fairness is to allow for auditing of mlr3 learners, visualization and subsequently trying to improve fairness using debiasing strategies.


⚠️ Note Bias auditing and debiasing solely based on observational data can not guarantee fairness of a decision making system. Several biases, for example comming from the data can not be detected using the approaches implemented in mlr3fairness. The goal of this software is instead to allow for a better understanding and first hints at possible fairness problems in a studied model.


Feature Overview

More Information

Protected Attribute

mlr3fairness requires information about the protected attribute wrt. which we want to assess fairness. This can be set via the col_role “pta” (protected attribute).

task$col_roles$pta = "variable_name"

In case a non-categorical or more complex protected attribute is required, it can be manually computed and added to the task. mlr3fairness does not require specific types for pta, but will compute one metric for every unique value in the pta column.

Fairness Metrics

mlr3fairness offers a variety of fairness metrics. Metrics are prefixed with fairness. and can be found in the msr() dictionary. Most fairness metrics are based on a difference between two protected groups (e.g. male and female) for a given metric (e.g. the false positive rate: fpr). See the vignette for a more in-depth introduction to fairness metrics and how to choose them.

library(mlr3)
library(mlr3fairness)
key description
fairness.acc Absolute differences in accuracy across groups
fairness.mse Absolute differences in mean squared error across groups
fairness.fnr Absolute differences in false negative rates across groups
fairness.fpr Absolute differences in false positive rates across groups
fairness.tnr Absolute differences in true negative rates across groups
fairness.tpr Absolute differences in true positive rates across groups
fairness.npv Absolute differences in negative predictive values across groups
fairness.ppv Absolute differences in positive predictive values across groups
fairness.fomr Absolute differences in false omission rates across groups
fairness.fp Absolute differences in false positives across groups
fairness.tp Absolute differences in true positives across groups
fairness.tn Absolute differences in true negatives across groups
fairness.fn Absolute differences in false negatives across groups
fairness.cv Difference in positive class prediction, also known as Calders-Wevers gap or demographic parity
fairness.eod Equalized Odds: Mean of absolute differences between true positive and false positive rates across groups
fairness.pp Predictive Parity: Mean of absolute differences between ppv and npv across groups
fairness.acc_eod=.05 Accuracy under equalized odds < 0.05 constraint
fairness.acc_ppv=.05 Accuracy under ppv difference < 0.05 constraint

Additional custom fairness metrics can be easily constructed, the vignette contains more details. The fairness_tensor() function can be used with a Prediction in order to print group-wise confusion matrices for each protected attribute group. We can furthermore measure fairrness in each group separately using MeasureSubgroup and groupwise_metrics.

Fairness Visualizations

Visualizations can be used with either a Prediction, ResampleResult or a BenchmarkResult. For more information regarding those objects, refer to the mlr3 book.

  • fairness_accuracy_tradeoff: Plot available trade-offs between fairness and model performance.

  • compare_metrics: Compare fairness across models and cross-validation folds.

  • fairness_prediction_density: Density plots for each protected attribute.

Debiasing Methods

Debiasing methods can be used to improve the fairness of a given model. mlr3fairness includes several methods that can be used together with mlr3pipelines to obtain fair(er) models:

library(mlr3pipelines)
lrn = as_learner(po("reweighing_wts") %>>% lrn("classif.rpart"))
rs = resample(lrn, task = tsk("compas")$filter(1:500), rsmp("cv"))
rs$score(msr("fairness.acc"))

Overview:

key output.num input.type.train input.type.predict output.type.train
EOd 1 TaskClassif TaskClassif NULL
reweighing_os 1 TaskClassif TaskClassif TaskClassif
reweighing_wts 1 TaskClassif TaskClassif TaskClassif

Fair Learners

mlr3fairness furthermore contains several learners that can be used to directly learn fair models:

key package reference
regr.fairfrrm fairml Scutari et al., 2021
classif.fairfgrrm fairml Scutari et al., 2021
regr.fairzlm fairml Zafar et al., 2019
classif.fairzlrm fairml Zafar et al., 2019
regr.fairnclm fairml Komiyama et al., 2018

Datasets

mlr3fairness includes two fairness datasets: adult and compas. See ?adult and ?compas for additional information regarding columns.

You can load them using tsk(<key>).

Model Cards & Datasheets

An important step towards achieving more equitable outcomes for ML models is adequate documentation for datasets and models in machine learning. mlr3fairness comes with reporting aides for models and datasets. This provides empty templates that can be used to create interactive reports through RMarkdown.

Report Description Reference Example
report_modelcard Modelcard for ML models Mitchell et al., 2018 link
report_datasheet Datasheet for data sets Gebru et al., 2018 link
report_fairness Fairness Report -1 link

Usage:

The report_* functions instantiate a new .Rmd template that contains a set of pre-defined questions which can be used for reporting as well as initial graphics. The goal is that a user extends this .Rmd file to create comprehensive documentation for datasets, ML models or to document a model’s fairness. It can later be converted into a html report usingrmarkdown’s render.

rmdfile = report_datasheet()
rmarkdown::render(rmdfile)

Demo for Adult Dataset

We provide a short example detailing how mlr3fairness integrates with the mlr3 ecosystem.

library(mlr3fairness)

#Initialize Fairness Measure
fairness_measure = msr("fairness.fpr")
#Initialize tasks
task_train = tsk("adult_train")
task_test = tsk("adult_test")
#Initialize model
learner = lrn("classif.rpart", predict_type = "prob")

#Verify fairness metrics
learner$train(task_train)
predictions = learner$predict(task_test)
predictions$score(fairness_measure, task = task_test)

#Visualize the predicted probability score based on protected attribute.
fairness_prediction_density(predictions, task_test)

Extensions

  • The mcboost package integrates with mlr3 and offers additional debiasing post-processing functionality for classification, regression and survival.

Other Fairness Toolkits in R

  • The AI Fairness 360 toolkit offers an R extension that allows for bias auditing, visualization and mitigation.
  • fairmodels integrates with the DALEX R-packages and similarly allows for bias auditing, visualization and mitigation.
  • The fairness package allows for bias auditing in R.
  • The fairml package contains methods for learning de-biased regression and classification models. Learners from fairml are included as learners in mlr3fairness.

Other Fairness Toolkits

  • Aequitas Allows for constructing a fairness report for different fairness metrics along with visualization in Python.
  • fairlearn Allows for model auditing and debiasing as well as visualization in Python.
  • AI Fairness 360 Allows for model auditing and debiasing as well as visualization in R and Python.

Future Development

Several future developments are currently planned. Contributions are highly welcome!

  • Visualizations: Improvement on visualizations, like anchor points and others. See issues.
  • Debiasing Methods: More debiasing methods, post-processing and in-processing.

Bugs, Feedback and Questions

mlr3fairness is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page! In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour.

Footnotes

  1. The fairness report is inspired by the Aequitas Bias report.

mlr3fairness's People

Contributors

be-marc avatar github-actions[bot] avatar mllg avatar pat-s avatar pfistfl avatar sebffischer avatar superp0tat0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

vollmersj tdhock

mlr3fairness's Issues

ToDo's: Visualization

Types of visualizations:

Visualize Models

  • Do we want to have this? Let us decide later!

Visualize Fairness Metrics

  • Create "Fairness Prediction Density Plot"
    • Support Prediction Class
  • Create "Fairness Comparison Plot"
    • Support Prediction Class
    • Support BenchmarkResult Class
    • Support SamplingResult Class
  • Create "Fairness vs Accuracy Plot"
    • Support Prediction Class
    • Support BenchmarkResult Class
    • Support SamplingResult Class
  • autplot(bmr) with fairness metrics -> This does not need any work, but should be in a vignette.
  • circle the anchor point in accuracy - fairness tradeoff plots.
  • Custom plots? Collect ideas!

Vignettes:

  • Write a general vignette on visualizations for fairness. Show each plot on an example.
  • Document how to get a data.frame of fairness metrics that can be used for custom visualizations.
    Example: Creating a fairness vs. performance plot

Roxygenize Citations

  • bibtex citations for EOd
  • bibtex citations for Reweighing
  • bibtex citations for fairness metrics and EOd

Collect list of debiasing methods in the wiki

  • Go over other fairness toolkits
  • Collect names to debiasing methods (or just links to the documentation section
  • Select the minimal set that occurs in all of them

DoD: Wiki Entry with Top 5 Debiasing Methods and Links to debiasing method overview for all major packages

ToDo's: Measures

Unit Tests:

  • Write unit test for datasets [data types, column types, number row, cols, PTA)
  • Write unit tests for MeasureFairness [data types, parameters, produces error on wrong inputs)
    This should perhaps error if we try to use it with a task that does not have a pta.
  • Write unit tests for pre-defined measures in zzz.R
  • Write unit tests for "operations" in Measure Fairness

Code:

  • Migrate operations code to a single operations.R.

Implement Measures (auto-loaded in zzz.R):

  • Top 10 Measures from Wiki
  • Equalized odds might not be a 1-line function. So consider implementing it and then just export it in zzz.R.
    Extend base_measure to a list of Measures. Those will then be added together.
  • Are there measures that we currently can not compute with our approach?
    We should at least consider one probability score / calibration based measure!

Vignettes

  • Write a vignette on measuring fairness of a classifier / regressor.
    • Should explain the idea behind measuring fairness: Why do we do it?
    • Should have a table of pre-defined fairness metrics (in zzz.R) with an explanation, perhaps link to literature
    • Should contain 1-2 examples (COMPAS, Adult) for measuring fairness (3 different measures in total, using benchmark
      and how to score a Prediction object.
      Predictions: prd$score(msr("fairness.fpr"), t)
      Benchmark: bmr$aggregate(....)
    • Should explain how to build your own measure by using MeasureFairness

Organization:

  • Go through fairness related issues and migrate them here / check whether they should be closed

compas dataset

Are the preprocessing steps standard? Can we point to a paper where this is done as we do?

adult dataset

Are the preprocessing steps standard? Can we point to a paper where this is done like we do?

fairness toolkits

Toolkits such as
Aequitas
IBM AI Fairness 360
Microsoft Fairlearn
Amazon SageMaker Clarify
Fairness.jl

DoD: Collect a list of Fairness Toolkits in the Wiki with links to all of them.

Better groupwise scoring

Should perhaps include functionality to report scores grouped by variables (incl. binning for numerics) together with counts for the groups. Additionally, perhaps provide an intersectional perspective.

Refactor issues

I am not happy with the issues I created, for the following reasons:

  • They are too broad: Separate out in smaller issues that can be solved i
  • They are missing a definition of done:

update wiki

Update the wiki:
go through the timeline, see what is still actual and what has changed with your improved understanding.
Make a list of 10 metrics you feel are important. Check whether they can be implemented using the current procedure.

Measure

# user supplied: measure = msr(...)

MeasureFairness = R6Class(
inherits = "Measure",
initialize = function(measure) {
  self$measure = measure
  # here do some other thing setting task types etc.
  # minimize, properties, ...
  self$minimize = measure$minimize
}
...

$.score = function() {
m1 = self$measure$score(group1, )
m2 = self$measure$score....

if (difference = "quotient) m1 /m2
else abs(m1-m2)
}
)
MeasureFairness$new(msr("classif.acc"))
MeasureFairness$new(msr("regr.rmse"))
```

RFC's

Add to attic folder

  • for Metric
  • Debiasing Method
  • Visualizations

Fix the test errors

Current errors existing for test scripts in

  • Visualization
  • Equalized Odds PipeOps
  • Measures

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.