Support for phenotype assessment tools?

Equality AI `EqualityML`

Let's end algorithmic bias together!

Equality AI (EAI) is a public-benefit corporation dedicated to providing developers with evidence-based tools to end algorithmic bias. Our tools are built by developers for developers. So, we know that developers want their models to be fair, but we also understand that bias is difficult and intimidating.

The EAI EqualityML repository provides tools and guidance on how to include fairness and bias mitigation methods to model fitting so as to safeguard the people on the receiving end of our models from bias.

If you like what we're doing, give us a ⭐ and join our EAI Manifesto!!

We have extented EqualityML to include other aspects of Responsible AI (see full framework Figure 1.) and collaboration features to create our Beta MLOps Developer Studio. Become a Beta user by going to our website!

_{Figure 1: Full Responsible AI Framework.}

Introduction

Incorporating bias mitigation methods and fairness metrics into the traditional end-to-end MLOps is called fairness-based machine learning (ML) or fair machine learning. However, fair ML comes with its own challenges. We assembled a diverse team of statisticians and ML experts to provide evidence-based guidance on fairness metrics use/selection and validated code to properly run bias mitigation methods.

Click to read our findings:

Fairness Metric

Statistical measure of the output of a machine learning model based a mathematical definition of fairness.

Fairness Metric Guide: We have combined fairness metrics and bias mitigation into a unified syntax.
_{Statistical Parity | Conditional Statistical Parity | Negative Predictive Parity | Equal Opportunity | Balance for Positive Class | Predictive Parity | Well Calibration | Calibration | Conditional Use Accuracy | Predictive Equality | Balance for Negative Class | Equalized Odds | Overall Balance}

Bias Mitigation

Methods or algorithms applied to a machine learning dataset or model to improve the fairness of the model output. Many mitigation methods have been proposed in the literature, which can be broadly classified into the application of a mitigation method on the data set (pre-processing), in the model fitting (in-processing), and to the model predictions (post-processing).

Bias Mitigation Guide:
_{Resampling | Reweighting | Disparate Impact Remover | Correlation Remover}

_{Figure 2: Bias mitigation can be performed in the pre-processing, in-processing, and post-processing of a model.}

Need a specific metric or method? Just let us know!

Potential Uses

Bias mitigation methods are employed to address bias in data and/or machine learning models and fairness metrics are needed to mathematically represent the fairness or bias levels of a ML model.

Use	Description
As a metric	Quantify a measure of fairness (a.k.a a fairness metric) targeting a bias
Evaluate fairness	Fairness metrics can be used to mathematically represent the fairness levels of a ML model. This can also be used to monitor a model.
Create parity on fairness	Unlike model performance metrics (e.g., loss, accuracy, etc.), fairness metrics affect your final model selection by creating parity (i.e., equality) on appropriate fairness metrics before model deployment.
Select most fair model	Balance fairness with performance metrics when selecting the final model.
Apply methods to improve the fairness & performance tradeoff	Methods to improve the fairness by applying a.k.a bias mitigation methods

_{Table 1: The potential uses for fairness metrics and bias mitigation methods.}

Note: Parity is achieved when a fairness metric (such as the percent of positive predictions) have the same value across all levels of a sensitive attribute. Sensitive attributes are attributes such as race, gender, age, and other patient attributes that are of primary concern when it comes to fairness, and are typically protected by law.

Through these steps we safeguard against bias by:

Creating metrics targeting sources of bias to balance alongside our performance metrics in evaluation, model selection, and monitoring.

Applying bias mitigation methods to improve fairness without compromising performance.

EAI `EqualityML` Workflow

We have conducted extensive literature review and theoretical analysis on dozens of fairness metrics and mitigation methods. Theoretical properties of those fairness mitigation methods were analyzed to determine their suitability under various conditions to create our framework for a pre-processing workflow.

Pre-processing Workflow	Tool or Guidance provided
1. Select Fairness Metric	Use our Fairness Metric Selection Questionnaire & Tree to determine appropriate fairness metric(s)
2. Data Preparation
3. Fit Prediction Model
4. Compute Model Results and Evaluate Fairness Metric	Use `EqualityML` method `fairness_metric` to evaluate the fairness of a model
5. Run Bias Mitigation	Use `EqualityML` method `bias_mitigation` to run various bias mitigation methods on your dataset
6. Compute Model Results and Fairness Metric After Mitigation	`fairness_metric` `bias_mitigation`
7. Compare Model Results and Fairness Metric Before and After Mitigation	`fairness_metric` `bias_mitigation`

_{Table 2: The Equality AI recommended pre-processing workflow with tools and guidance made available per step.}

We recommend assessing the fairness of the same ML model after bias mitigation is applied. By comparing the predictions before and after mitigation, we will be able to assess whether and to what extent the fairness can be improved. Furthermore, the trade-offs between the accuracy and fairness of the machine learning model will be examined.

In-processing and Post-processing are still under development. Do you need this now? Let us know!

Guidance on selecting Fairness Metrics

To make fairness metric selection easy we have provided a few essential questions you must answer to identify the appropriate fairness metric for your use case. Click here for the questionnaire. Complete the answers to this questionnaire, then refer to the scoring guide to map your inputs to the desired metrics.

_{Figure 3: Tree representation of questionnaire.}

After identifying the important fairness criteria, we recommend you attempt to use multiple bias mitigation strategies to try to optimize the efficiency-fairness tradeoff.

`EqualityML` Installation

Python

The EqualityML python package can be installed from PyPI.

pip install equalityml

Manual Installation

Clone the last version of this repository:

https://github.com/EqualityAI/EqualityML.git

In the root directory of the project run the command:

poetry install

Package Testing

To run the bunch of tests over the EqualityML package, dependencies shall be first installed before calling pytest.

poetry install --with test
pytest tests

Quick Tour

Check out the example below to see how EqualityML can be used to assess fairness metrics and mitigate unwanted bias in the dataset.

from sklearn.linear_model import LogisticRegression
from equalityml import FAIR
import numpy as np
import pandas as pd

# Sample unfair dataset
random_col = np.random.normal(size=30)
sex_col = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
           0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
weight_col = [80, 75, 70, 65, 60, 85, 70, 75, 70, 70, 70, 80, 70, 70, 70, 80, 75, 70, 65, 70,
              70, 75, 80, 75, 75, 70, 65, 70, 75, 65]
target_col = [1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
              0, 1, 0, 1, 1, 0, 0, 1, 1, 0]
training_data = pd.DataFrame({"random": random_col, "sex": sex_col, "weight": weight_col, 
                              "Y": target_col})
    
# Train a machine learning model (for example LogisticRegression)
ml_model = LogisticRegression()
ml_model.fit(training_data.drop(columns="Y"), training_data["Y"])

# Instantiate a FAIR object
fair_obj = FAIR(ml_model=ml_model, 
                training_data=training_data,
                target_variable="Y",
                protected_variable="sex", 
                privileged_class=1)

# Evaluate a fairness metric (for example statistical parity ratio)
metric_name = 'statistical_parity_ratio'
fairness_metric = fair_obj.fairness_metric(metric_name)

# In case the model is unfair in terms of checked fairness metric (value is not close to 1), 
# EqualityML provides a range of methods to try to mitigate bias in Machine Learning models. 
# For example, we can use 'resampling' to perform mitigation on training dataset.

mitigation_method = "resampling"
mitigation_result = fair_obj.bias_mitigation(mitigation_method)

# Now we can re-train the machine learning model based on that mitigated data and 
# evaluate again the fairness metric
mitigated_data = mitigation_result['training_data']
ml_model.fit(mitigated_data.drop(columns="Y"), mitigated_data["Y"])

fair_obj.update_classifier(ml_model)
new_fairness_metric = fair_obj.fairness_metric(metric_name)

# print the unmitigated fairness metric
print(f"Unmitigated fairness metric = {fairness_metric}")

# print the mitigated fairness metric
print(f"Mitigated fairness metric = {new_fairness_metric}")

# All available fairness metrics and bias mitigation can be printed calling the methods:
fair_obj.print_fairness_metrics()
fair_obj.print_bias_mitigation_methods()

R

The EqualityML R package can be installed from CRAN:

install.packages("equalityml")

or developer version from GitHub:

devtools::install_github("EqualityAI/equalityml/equalityml-r")

For more details regarding the R package, please check here.

Responsible AI Takes a Community

The connections and trade-offs between fairness, explainability, and privacy require a holistic approach to Responsible AI development in the machine learning community. We are starting with the principle of fairness and working towards a solution that incorporates multiple aspects of Responsible AI for data scientists and healthcare professionals. We have much more in the works, and we want to know—what do you need? Do you have a Responsible AI challenge you need to solve? Drop us a line and let’s see how we can help!

Contributing to the project

Equality AI uses both GitHub and Slack to manage our open source community. To participate:

Join the Slack community (https://equalityai.com/slack)
- Introduce yourself in the #Introductions channel. We're all friendly people!
Check out the CONTRIBUTING file to learn how to contribute to our project, report bugs, or make feature requests.
Try out the EqualityML
- Hit the top right "star" button on GitHub to show your love!
- Follow the recipe above to use the code.
Provide feedback on your experience using the GitHub discussions or the Slack #support channel
- For any questions or problems, send a message on Slack, or send an email to [email protected].

	def map_bias_mitigation(self):
	return {'treatment_equality_ratio': [''],
	'treatment_equality_difference': [''],
	'balance_positive_class': [''],
	'balance_negative_class': [''],
	'equal_opportunity_ratio': [''],
	'accuracy_equality_ratio': [''],
	'predictive_parity_ratio': [''],
	'predictive_equality_ratio': [''],
	'statistical_parity_ratio': ['disparate-impact-remover', 'resampling',
	'resampling-preferential', 'reweighing']}

	score = binary_threshold_score(self.orig_ml_model,
	testing_data[self.features],
	testing_data[self.target_variable],
	scoring=scoring,
	threshold=self.threshold,
	utility_costs=utility_costs)
	fairness_metric = self.fairness_metric(self._metric_name)
	comparison_df.loc['reference'] = [score, fairness_metric]

	# Iterate over mitigation methods list and re-evaluate score and fairness metric
	for mitigation_method in mitigation_methods:
	ml_model = self.model_mitigation(mitigation_method=mitigation_method, **kwargs)
	if self.mitigated_testing_data is not None:
	testing_data = self.mitigated_testing_data
	else:
	testing_data = self.testing_data if self.testing_data is not None else self.training_data

	score = binary_threshold_score(ml_model,
	testing_data[self.features],
	testing_data[self.target_variable],
	scoring=scoring,
	threshold=self.threshold,
	utility_costs=utility_costs)

	def compare_mitigation_methods(self,
	scoring=None,
	utility_costs=None,
	metric_name=None,
	mitigation_methods=None,
	fairness_threshold=0.8,
	show=False,
	save_figure=False,
	**kwargs):

	for mitigation_method in mitigation_methods:
	ml_model = self.model_mitigation(mitigation_method=mitigation_method, **kwargs)
	if self.mitigated_testing_data is not None:
	testing_data = self.mitigated_testing_data
	else:
	testing_data = self.testing_data if self.testing_data is not None else self.training_data

	score = binary_threshold_score(ml_model,
	testing_data[self.features],
	testing_data[self.target_variable],
	scoring=scoring,
	threshold=self.threshold,
	utility_costs=utility_costs)
	fairness_metric = self.fairness_metric(self._metric_name)
	comparison_df.loc[mitigation_method] = [score, fairness_metric]

	def _cr_removing_data(self,
	data,
	alpha=1.0):
	"""
	Filters out sensitive correlations in a dataset using 'CorrelationRemover' function from fairlearn package.
	"""

	# Getting correlation coefficient for mitigation_method 'correlation_remover'. The input alpha parameter is
	# used to control the level of filtering between the sensitive and non-sensitive features

	# remove the outcome variable and sensitive variable
	data_rm_columns = data.columns.drop([self.protected_variable, self.target_variable])

	cr = CorrelationRemover(sensitive_feature_ids=[self.protected_variable], alpha=alpha)
	data_std = cr.fit_transform(data.drop(columns=[self.target_variable]))
	train_data_cr = pd.DataFrame(data_std, columns=data_rm_columns, index=data.index)

	# Concatenate data after correlation remover
	mitigated_data = pd.concat(
	[pd.DataFrame(data[self.target_variable]),
	pd.DataFrame(data[self.protected_variable]),
	train_data_cr], axis=1)

	# Keep the same columns order
	mitigated_data = mitigated_data[data.columns]
	return mitigated_data

	elif mitigation_method == "correlation-remover":
	mitigated_training_data = self._cr_removing_data(self.training_data, alpha)
	mitigated_dataset['training_data'] = mitigated_training_data
	self.mitigated_training_data = mitigated_training_data

	if self.testing_data is not None:
	mitigated_testing_data = self._cr_removing_data(self.testing_data, alpha)
	mitigated_dataset['testing_data'] = mitigated_testing_data
	self.mitigated_testing_data = mitigated_testing_data

	def _resampling_data(self,
	data,
	mitigation_method):
	"""
	Resample the input data using 'resample' function from dalex package.
	"""

	# Uniform resampling
	idx_resample = 0
	if (mitigation_method == "resampling-uniform") or (mitigation_method == "resampling"):
	idx_resample = resample(data[self.protected_variable],
	data[self.target_variable],
	type='uniform',
	verbose=False)
	# Preferential resampling
	elif mitigation_method == "resampling-preferential":
	_pred_prob = self._predict_binary_prob(self.orig_ml_model, data)
	idx_resample = resample(data[self.protected_variable],
	data[self.target_variable],
	type='preferential', verbose=False,
	probs=_pred_prob)

	mitigated_data = data.iloc[idx_resample, :]

	return mitigated_data

equalityai / equalityml Goto Github PK

equalityml's Introduction

Equality AI EqualityML

Let's end algorithmic bias together!

Introduction

Fairness Metric

Bias Mitigation

Potential Uses

EAI EqualityML Workflow

Guidance on selecting Fairness Metrics

EqualityML Installation

Python

Manual Installation

Package Testing

Quick Tour

R

Responsible AI Takes a Community

Contributing to the project

equalityml's People

Contributors

Stargazers

Watchers

Forkers

equalityml's Issues

Recommend Projects

Recommend Topics

Recommend Org

Equality AI `EqualityML`

EAI `EqualityML` Workflow

`EqualityML` Installation