Official code for using / reproducing CDEP from the paper "Interpretations are useful: penalizing explanations to align neural networks with prior knowledge". This code allows one to regularize interpretations (computed via contextual decomposition) to improve neural networks (trained in pytorch).
Note: this repo is actively maintained. For any questions please file an issue.
- fully-contained data/models/code for reproducing and experimenting with CDEP
- the src folder contains the core code for running and penalizing contextual decomposition
- in addition, we run experiments on 4 datasets, each of which are located in their own folders
- notebooks in these folders show demos for different kinds of text
- tested with python 3.6 and pytorch 1.0
ISIC skin-cancer classification - using CDEP, we can learn to avoid spurious patches present in the training set, improving test performance!
ColorMNIST - penalizing the contributions of individual pixels allows us to teach a network to learn a digit's shape instead of its color, improving its test accuracy from 0.5% to 25.1%
Fixing text gender biases - CDEP can help to learn spurious biases in a dataset, such as gendered words
using CDEP requires two steps:
- run CD/ACD on your model. Specifically, 3 things must be altered:
- the pred_ims function must be replaced by a function you write using your own trained model. This function gets predictions from a model given a batch of examples.
- the model must be replaced with your model
- the current CD implementation doesn't always work for all types of networks. If you are getting an error inside of
cd.py
, you may need to write a custom function that iterates through the layers of your network (for examples seecd.py
)
- add CD scores to the loss function (see notebooks)
- this work is part of an overarching project on interpretable machine learning, guided by the PDR framework
- for related work, see the github repo for acd (hierarchical interpretations)
- for related work, see the github repo for disentangled attribution curves
-
feel free to use/share this code openly
-
if you find this code useful for your research, please cite the following:
@article{rieger2019interp, title={Interpretations are useful: penalizing explanations to align neural networks with prior knowledge}, author={Rieger, Laura and Singh, Chandan and Murdoch, W James and Yu, Bin}, journal={arXiv preprint arXiv:1909.13584}, year={2019} }