This repository contains source code and slides for the talk "Fair machine learning" at Cascadia R Conf in June 2024. The slides for the talk are available here.
To learn more about machine learning with R:
- Machine learning with tidymodels: tmwr.org
- More example notebooks with tidymodels are at tidymodels.org. Two fairness-oriented ones:
- Are GPT detectors fair?, focused on how different fairness metrics encoded different interpretations of fairness across linguistic proficiency.
- Fair prediction of hospital readmission, focused on training models that are near-fair with respect to a set of fairness metrics across racial groups.
- An overview of our reading group's conclusions and the kinds of fairness-oriented workflows we decided to support.
I mentioned these works in the talk:
- ccao-data/public from folks at the Cook County Assessor's Office—source code for a talk that inspired the property assessment example.
- An Unfair Burden by Jason Grotto in 2017, an article from the Chicago Tribune investigating unfairness in Cook County's property tax system.
- How Lower-Income Americans Get Cheated on Property Taxes from the NYTimes Editorial Board in 2021, which performs a similar analysis across counties in the US and additionally engages with the racial disparities in property taxation.
- Algorithmic Fairness: Choices, Assumptions, and Definitions from Mitchell et al. in 2021. A great reference for the various choices and assumptions that underlie fairness-oriented analysis of ML models.
In this repository,
index.qmd
contains the source code for the slides. The slides use images in the/figures
directory./docs
is auto-generated fromindex.qmd
. Content in that folder is likely unhelpful for a human reader, and is better viewed at the links above. :)