Official code for using / reproducing CDEP from the paper "Interpretations are useful: penalizing explanations to align neural networks with prior knowledge". This code allows one to regularize interpretations (computed via contextual decomposition) to improve neural networks (trained in pytorch).

Note: this repo is actively maintained. For any questions please file an issue.

documentation

fully-contained data/models/code for reproducing and experimenting with CDEP
the src folder contains the core code for running and penalizing contextual decomposition
in addition, we run experiments on 4 datasets, each of which are located in their own folders
- notebooks in these folders show demos for different kinds of text
tested with python 3.6 and pytorch 1.0

examples

ISIC skin-cancer classification - using CDEP, we can learn to avoid spurious patches present in the training set, improving test performance!

ColorMNIST - penalizing the contributions of individual pixels allows us to teach a network to learn a digit's shape instead of its color, improving its test accuracy from 0.5% to 25.1%

Fixing text gender biases - CDEP can help to learn spurious biases in a dataset, such as gendered words

using CDEP on your own data

using CDEP requires two steps:

run CD/ACD on your model. Specifically, 3 things must be altered:

the pred_ims function must be replaced by a function you write using your own trained model. This function gets predictions from a model given a batch of examples.
the model must be replaced with your model
the current CD implementation doesn't always work for all types of networks. If you are getting an error inside of cd.py, you may need to write a custom function that iterates through the layers of your network (for examples see cd.py)

add CD scores to the loss function (see notebooks)

related work

this work is part of an overarching project on interpretable machine learning, guided by the PDR framework
for related work, see the github repo for acd (hierarchical interpretations)
for related work, see the github repo for disentangled attribution curves

reference

feel free to use/share this code openly

if you find this code useful for your research, please cite the following:

@article{rieger2019interp,
  title={Interpretations are useful: penalizing explanations to align neural networks with prior knowledge},
  author={Rieger, Laura and Singh, Chandan and Murdoch, W James and Yu, Bin},
  journal={arXiv preprint arXiv:1909.13584},
  year={2019}
}

jjanizek / deep-explanation-penalization Goto Github PK

deep-explanation-penalization's Introduction

documentation

examples

using CDEP on your own data

related work

reference

deep-explanation-penalization's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent