Giter VIP home page Giter VIP logo

ibreakdown's Introduction

Model Agnostic Local Attributions

R build status Coverage Status CRAN_Status_Badge Total Downloads

Overview

The iBreakDown package is a model agnostic tool for explanation of predictions from black boxes ML models. Break Down Table shows contributions of every variable to a final prediction. Break Down Plot presents variable contributions in a concise graphical way. SHAP (Shapley Additive Attributions) values are calculated as average from random Break Down profiles. This package works for binary classifiers as well as regression models.

iBreakDown is a successor of the breakDown package. It is faster (complexity O(p) instead of O(p^2)). It supports variable interactions and interactive explanations with D3.js visualizations. It is imported and used to compute model explanations in multiple packages e.g. DALEX, modelStudio, arenar.

Methodology behind the iBreakDown package is described in the arXiv paper and Explanatory Model Analysis book. It is a part of DrWhy.AI universe.

Installation

# the easiest way to get iBreakDown is to install it from CRAN:
install.packages("iBreakDown")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("ModelOriented/iBreakDown")

Learn more

Find more examples in the EMA book: https://ema.drwhy.ai/.

This version also works with D3: see an example and demo.

plotD3

Acknowledgments

Work on this package was financially supported by the NCN Opus grant 2016/21/B/ST6/02176.

ibreakdown's People

Contributors

adamizdebski avatar agosiewska avatar hbaniecki avatar komosinskid avatar learningasigoxyz avatar maksymiuks avatar mrdomani avatar nspyrison avatar pbiecek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ibreakdown's Issues

plotD3.break_down plots within tabs apart from active tab do not load/size correctly

I have a Rmarkdown document with some tabs, each of which has a breakdown plot. However, on all non-active tabs, the plot has been rendered where it appears to be shifted to the left of screen, cutting off the y-axis.

I note that this issue is occurs because the ability for the plot to re-render on window resize has been overwritten. Perhaps a simple-ish fix would be to provide a boolean to allow for window-resizing where required?

Code review for v 1.0.0

What to check:

  • is documentation sufficient to understand function's parameters
  • is the description in DESCRIPTION up to date and meaningful
  • are examples easy to understand and follow
  • are descriptions in vignettes easy to understand and follow
  • is the README.md file easy to understand, consistent with other descriptions
  • is the R code easy to understand and follow
  • R code should be modular (long functions are bad)
  • are function names and variable names consistent

add default title to break_down plot

The plot function for break_down objects has no title.

We should:

  1. add default title Break Down profile
  2. add default subtitle created for the XXX model, where the XXX is extracted from the explainer (explainer$label). (Note that the plot function may take multiple explainers !!! all names shall be added)

Here is an example

library("DALEX")
library("iBreakDown")
titanic <- na.omit(titanic)
titanic_small <- titanic[sample(1:nrow(titanic), 500), c(1,2,6,9)]
model_titanic_glm <- glm(survived == "yes" ~ gender + age + fare,  data = titanic_small, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,   data = titanic_small[,-9],   y = titanic_small$survived == "yes")
bd_rf <- break_down(explain_titanic_glm, titanic_small[1, ])
plot(bd_rf, max_features = 3)

Local attributions plot too wide due to unrounded model values

For regressions, the table returned by local_attributions() contains values that are not rounded (left column):

                                             contribution
rf  model: intercept                                4.085
rf  model: joy = 0                                 -0.539
rf  model: negative = 0.5                          -0.380
rf  model: disgust = 0.214285714285714             -0.684

This would not be an issue, however, the plot becomes too wide which narrows the main plot:
image

Is there any way to round the decimal values in the y-axis?

plotting SHAP object

Hello,
In the last line of following code, I am trying to plot top 30 highest contributing predictors. I added max_features =30, but the output did not change. Any suggestion please?
testing_rand<- testing[sample(nrow(testing),500),] xgb<-DALEX::explain(xgb_03, data=testing_rand[,1:345],y= testing_rand$TRANSITIONED=="YES", label = "for Member FLU1533123501") ive_xgb <- iBreakDown::shap(xgb, new_observation = filter(testing_rand[,1:345], MEMBER=='FLU1533123501'), B=5)
plot(ive_xgb, max_features =30)

Support for multiple observations

Functions local_attributions() and local_interactions() break when more observations are passed via parameter new_observation. What is more, behaviors of these functions are different.
local_attributions() returns broken plot:
image
local_interactions() throws an error:
Error in data.frame(variable = variable, contribution = contribution, :
arguments imply differing number of rows: 10, 2108, 1

How about, at least, printing a warning that more than one observation was provided?

library("DALEX")
library("randomForest")
titanic <- na.omit(titanic)
model_titanic_rf <- randomForest(survived == "yes" ~ gender + age + class + embarked +
                                   fare + sibsp + parch,  data = titanic)
explain_titanic_rf <- explain(model_titanic_rf, 
                              data = titanic[,-9],
                              y = titanic$survived == "yes", 
                              label = "Random Forest v7")
library("iBreakDown")
rf_la <- local_attributions(explain_titanic_rf, titanic)
plot(rf_la)

rf_li <- local_interactions(explain_titanic_rf, titanic)
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=pl_PL.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=pl_PL.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=pl_PL.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] iBreakDown_0.9.4    randomForest_4.6-14 titanic_0.1.0       DALEX_0.3.0        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0        pillar_1.3.1      compiler_3.5.2    plyr_1.8.4        remotes_2.0.2     prettyunits_1.0.2 tools_3.5.2      
 [8] testthat_2.0.1    digest_0.6.18     pkgbuild_1.0.2    pkgload_1.0.2     memoise_1.1.0     tibble_2.0.1      gtable_0.2.0     
[15] lattice_0.20-38   pkgconfig_2.0.2   rlang_0.3.1       Matrix_1.2-15     cli_1.0.1         rstudioapi_0.9.0  curl_3.3         
[22] yaml_2.2.0        withr_2.1.2       dplyr_0.8.0.1     fs_1.2.6          desc_1.2.0        devtools_2.0.1    rprojroot_1.3-2  
[29] grid_3.5.2        tidyselect_0.2.5  glue_1.3.0        R6_2.4.0          processx_3.2.1    sessioninfo_1.1.1 ggplot2_3.1.0    
[36] purrr_0.3.0       callr_3.1.1       magrittr_1.5      usethis_1.4.0     backports_1.1.3   scales_1.0.0      ps_1.3.0         
[43] assertthat_0.2.0  colorspace_1.4-0  labeling_0.3      lazyeval_0.2.1    munsell_0.5.0     crayon_1.3.4     

Wrong values in shap sign column

The sign column should be updated on new values of the contribution column.

extracted_contributions <- sapply(result, function(chunk) {
chunk[order(chunk$label, chunk$variable), "contribution"]
})
result_average <- result[[1]]
result_average <- result_average[order(result_average$label, result_average$variable),]
result_average$contribution <- rowMeans(extracted_contributions)
result_average$B <- 0
result <- c(list(result_average), result)

Aggregate local_interactions to estimate shap with interactions

Hi,
Thanks for the package! I was wondering how is the variable order set when calculating the local interactions and if there could be a way to randomize that order to repeat the measure of the contribution for different orders (and get an estimation of the contribution closer to what SHAP would output)?
I tried passing different orders of variables to local_interactions(..., order =) but it does not change anything, and so I don't know if I am missing a step.. ?

Script example:

# get the variable names and interactions
tmp <- colnames(X)
tmp <- combn(tmp, m = 2)
tmp <-unlist(lapply(asplit(tmp, MARGIN = 2), paste, collapse = ':'))
varN <- c(colnames(X), tmp)

# create different orders
var_orders <- list()
for (i in 1:5){
    set.seed(i)
    var_orders[[i]] <- sample(varN)
}

# get the contributions for different orders
res <- list()
i <- 1
for (vo in var_orders){
    res[[i]] <- local_interactions(new_observation = X[1,],x = explain_rf, interaction_preference = 10, var_orders = vo)
    i <- i+1
}

Error when passing model/data to break_down function

From break_down() examples:
This throws a note:

library("iBreakDown")
library("DALEX")
library("randomForest")
set.seed(1313)

model <- randomForest(status ~ . , data = HR)
new <- HR_test[1,]

explainer_rf <- explain(model,
                        data = HR[1:1000,1:5],
                        y = HR$status[1:1000])

Please note that 'y' is a factor.[...]

This works:

break_down(explainer_rf, new)

This throws an error:

break_down(x=model, data = HR[1:1000,1:5], predict_function = predict, new_observation = new)

Error in break_down.default(x = model, data = HR[1:1000, 1:5], new_observation = new, :
promise already under evaluation: recursive default argument reference or earlier problems?

This throws an error:

local_attributions(x=model, data = HR[1:1000,1:5], predict_function = predict, new_observation = new)

Error in colMeans(yhatpred) : 'x' must be numeric

plot() incompatible with old objects

I cannot use plot function with old break_down objects (created about half a year before).

Please, find an example old object in the attachment.
old_bd.zip

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] patchwork_1.0.0  iBreakDown_1.0.1 dplyr_0.8.5      ggplot2_3.3.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4       rstudioapi_0.11  magrittr_1.5     tidyselect_1.0.0 munsell_0.5.0   
 [6] colorspace_1.4-1 R6_2.4.1         rlang_0.4.5      tools_3.6.1      grid_3.6.1      
[11] packrat_0.5.0    gtable_0.3.0     DALEX_1.0.1      withr_2.1.2      assertthat_0.2.1
[16] digest_0.6.25    tibble_2.1.3     lifecycle_0.2.0  crayon_1.3.4     purrr_0.3.3     
[21] farver_2.0.3     glue_1.3.2       labeling_0.3     compiler_3.6.1   pillar_1.4.3    
[26] scales_1.1.0     pkgconfig_2.0.3 

Significant number rounding doesn't seem right

library("DALEX")
library("randomForest")
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor +
                                      no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model,
                        data = apartments_test[,2:5],
                        y = apartments_test$m2.price)

iBreakDown::break_down(explainer_rf, new_observation = apartments_test[1,2:5])
apartments_test[1,2:5]

This example shows wrong observation values in break_down object.

nice_format <- function(x) {

This formatting could be done in print and plot functions (if really needed), not in local_attributions (and probably round instead of signif).

plotD3 - support for multiple models

something is wrong with displaying D3 plots
image

library("DALEX2")
library("breakDown2")
library("randomForest")
set.seed(1313)
model <- randomForest(status ~ . , data = HR)
new_observation <- HR_test[1,]

explainer_rf <- explain(model,
                        data = HR[1:1000,1:5],
                        y = HR$status[1:1000])

bd_rf <- local_attributions(explainer_rf,
                            new_observation)
plotD3(bd_rf)

session info

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] breakDown2_0.9.1    reticulate_1.10     randomForest_4.6-14 iBreakDown_0.9.3   
[5] DALEX2_0.9         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0       rstudioapi_0.9.0 magrittr_1.5     tidyselect_0.2.5
 [5] munsell_0.5.0    lattice_0.20-38  colorspace_1.4-0 R6_2.3.0        
 [9] rlang_0.3.1      plyr_1.8.4       dplyr_0.8.0.1    tools_3.5.2     
[13] grid_3.5.2       gtable_0.2.0     htmltools_0.3.6  digest_0.6.18   
[17] yaml_2.2.0       lazyeval_0.2.1   assertthat_0.2.0 tibble_2.0.1    
[21] crayon_1.3.4     Matrix_1.2-15    purrr_0.2.5      ggplot2_3.1.0   
[25] htmlwidgets_1.3  glue_1.3.0       compiler_3.5.2   pillar_1.3.1    
[29] r2d3_0.2.3       scales_1.0.0     jsonlite_1.6     pkgconfig_2.0.2 

local_attributions fails for classification: incorrect number of subscripts on matrix

For classification, local_attributions() returns the error:

Error in contribution[nrow(contribution), ] <- cummulative[nrow(contribution),  : 
  incorrect number of subscripts on matrix

One hint for the root cause might be the warning message thrown by the explainer - it tries to calculate numeric residuals which of course it cannot do:

      DALEX.explainer <- DALEX::explain(
        model = model_object,
        data = features,
        y = training.set$.outcome == TARGET.VALUE,
        label = paste(model_object$method, " model"),
        colorize = TRUE
      )
  
  A new explainer has been created!  
Warning message:
In mean.default(residuals) :
  argument is not numeric or logical: returning NA

Reproducible example:

random.case <- structure(list(anger = 0.166666666666667, anticipation = 0, disgust = 0.166666666666667, 
    fear = 0.166666666666667, joy = 0, negative = 0.25, positive = 0.0833333333333333, 
    sadness = 0.0833333333333333, surprise = 0.0833333333333333, 
    trust = 0), class = "data.frame", row.names = c(NA, -1L))

training.set <- structure(list(.outcome = structure(c(3L, 4L, 5L, 4L, 4L, 5L, 
5L, 4L, 3L, 3L, 3L, 5L, 4L, 3L, 3L, 1L, 4L, 3L, 4L, 5L, 3L, 2L, 
5L, 5L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), 
    anger = c(0, 0.0434782608695652, 0, 0, 0, 0.1, 0, 0.037037037037037, 
    0.0192307692307692, 0, 0, 0, 0, 0.0673076923076923, 0.181818181818182, 
    0.0408163265306122, 0, 0, 0, 0.0285714285714286, 0.0526315789473684, 
    0.0952380952380952, 0, 0.0441176470588235, 0), anticipation = c(0.333333333333333, 
    0.217391304347826, 0.125, 0.15, 0.2, 0.2, 0.217391304347826, 
    0.111111111111111, 0.173076923076923, 0.166666666666667, 
    0.111111111111111, 0.157894736842105, 0.214285714285714, 
    0.115384615384615, 0.0909090909090909, 0.0408163265306122, 
    0, 0.166666666666667, 0, 0.114285714285714, 0.184210526315789, 
    0.0476190476190476, 0.133333333333333, 0.102941176470588, 
    0.176470588235294), disgust = c(0, 0, 0, 0, 0, 0, 0, 0.0185185185185185, 
    0.0192307692307692, 0.0833333333333333, 0.0740740740740741, 
    0, 0, 0.0288461538461538, 0, 0.0204081632653061, 0, 0, 0.111111111111111, 
    0, 0, 0.0952380952380952, 0, 0.0294117647058824, 0), fear = c(0, 
    0.0434782608695652, 0, 0.05, 0, 0, 0, 0.0185185185185185, 
    0, 0, 0, 0, 0, 0.0673076923076923, 0, 0.0408163265306122, 
    0, 0.0833333333333333, 0.111111111111111, 0, 0.0263157894736842, 
    0.0952380952380952, 0, 0.0294117647058824, 0), joy = c(0, 
    0.130434782608696, 0.166666666666667, 0.15, 0.233333333333333, 
    0.2, 0.173913043478261, 0.166666666666667, 0.0961538461538462, 
    0.166666666666667, 0.037037037037037, 0.210526315789474, 
    0.214285714285714, 0.0961538461538462, 0.181818181818182, 
    0.0204081632653061, 0.333333333333333, 0.0833333333333333, 
    0.222222222222222, 0.2, 0.105263157894737, 0.0952380952380952, 
    0.2, 0.147058823529412, 0.176470588235294), negative = c(0, 
    0.0869565217391304, 0.0833333333333333, 0.1, 0, 0, 0, 0.0555555555555556, 
    0.0769230769230769, 0.166666666666667, 0.0740740740740741, 
    0.0526315789473684, 0.0714285714285714, 0.105769230769231, 
    0.181818181818182, 0.204081632653061, 0, 0.166666666666667, 
    0.222222222222222, 0.0285714285714286, 0.105263157894737, 
    0.19047619047619, 0, 0.102941176470588, 0.0294117647058824
    ), positive = c(0.333333333333333, 0.217391304347826, 0.291666666666667, 
    0.4, 0.3, 0.3, 0.347826086956522, 0.333333333333333, 0.326923076923077, 
    0.25, 0.259259259259259, 0.315789473684211, 0.285714285714286, 
    0.240384615384615, 0.181818181818182, 0.244897959183673, 
    0.333333333333333, 0.25, 0.222222222222222, 0.4, 0.342105263157895, 
    0.238095238095238, 0.4, 0.235294117647059, 0.352941176470588
    ), sadness = c(0.333333333333333, 0.0434782608695652, 0.0416666666666667, 
    0, 0, 0, 0, 0.0185185185185185, 0.0576923076923077, 0, 0.0740740740740741, 
    0, 0, 0.0480769230769231, 0.0909090909090909, 0.142857142857143, 
    0, 0, 0.111111111111111, 0, 0.0526315789473684, 0.0952380952380952, 
    0, 0.0441176470588235, 0.0294117647058824), surprise = c(0, 
    0.0434782608695652, 0.0833333333333333, 0.05, 0.0666666666666667, 
    0, 0.0434782608695652, 0.037037037037037, 0.0192307692307692, 
    0, 0.111111111111111, 0.0526315789473684, 0, 0.0865384615384615, 
    0, 0.0408163265306122, 0, 0, 0, 0.0285714285714286, 0.0526315789473684, 
    0, 0.0666666666666667, 0.0735294117647059, 0.0294117647058824
    ), trust = c(0, 0.173913043478261, 0.208333333333333, 0.1, 
    0.2, 0.2, 0.217391304347826, 0.203703703703704, 0.211538461538462, 
    0.166666666666667, 0.259259259259259, 0.210526315789474, 
    0.214285714285714, 0.144230769230769, 0.0909090909090909, 
    0.204081632653061, 0.333333333333333, 0.25, 0, 0.2, 0.0789473684210526, 
    0.0476190476190476, 0.2, 0.191176470588235, 0.205882352941176
    )), row.names = c(NA, 25L), class = "data.frame")

model.rf <- caret::train(
  form = .outcome ~ .,
  data = training.set,
  method = "rf", 
  trControl = trainControl(
    method = "repeatedcv", number = 5, repeats = 5)
)

target <- training.set$.outcome
features <- training.set %>% select(-.outcome)

TARGET.VALUE <- "1"

DALEX.explainer <- DALEX::explain(
        model = model.rf,
        data = features,
        y = target == TARGET.VALUE,
        label = paste(model_object$method, " model"),
        colorize = TRUE
  )

DALEX.attribution <- DALEX.explainer %>%
        iBreakDown::local_attributions(random.case) 

Baseline is ignored?

I'm running the following code:


set.seed(17)
x1 <- runif(1000, -10, 10)
x2 <- runif(1000, -10, 10)
y <- 0.05*x1^2 + 0.05*x2^2

v2_df <- data.frame(x = x1,
                    y = x2,
                    z = y)

true_model <- function(model, newdata) {
  0.05*newdata[, 1]^2 + 0.05*newdata[, 2]^2
}

library(DALEX)
v2_explainer <- explain(list(), data = v2_df[, -3], predict_function = true_model,
                        label = "double_quadratic")

library(iBreakDown)
ibd_expl_1 <- local_attributions(v2_explainer, data.frame(x = -6, y = -6))
ibd_expl_2 <- local_attributions(v2_explainer, data.frame(x = -6, y = -6), baseline = 0)

ibd_plot_1 <- plot(ibd_expl_1, baseline = 0)
ibd_plot_2 <- plot(ibd_expl_2, baseline = 0)

ibd_plot_1
ibd_plot_2

And I'm getting same plots in both cases, neither of them starts in 0.
obraz

examples for SER

library("titanic")
head(titanic_train)


titanic_small <- titanic_train
titanic_small <- titanic_small[,c("Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked")]
titanic_small$Survived <- factor(titanic_small$Survived)
titanic_small$Sex <- factor(titanic_small$Sex)
titanic_small$Embarked <- factor(titanic_small$Embarked)
titanic_small <- na.omit(titanic_small)
titanic_train[760,]



library("randomForest")
rf_model <- randomForest(Survived ~ Pclass + Sex + Age + SibSp + 
                         Parch + Fare + Embarked, 
data = titanic_small)
rf_model


library("breakDown2")
library("DALEX2")
predict_fuction <- function(m,x) predict(m, x, type = "prob")[,2]
rf_explain <- explain(rf_model, data = titanic_small,
                      y = titanic_small$Survived == "1", label = "RF",
                      predict_function = predict_fuction)

# plor D3 explainers
library("breakDown2")
rf_la <- local_attributions(rf_explain, titanic_small[2,])
rf_la
plotD3(rf_la)


rf_la <- local_attributions(rf_explain, titanic_small[2,], order = 2:8)
rf_la
plotD3(rf_la)

rf_la <- local_attributions(rf_explain, titanic_small[2,], 
                            order = c("Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"))
rf_la
plotD3(rf_la)

rf_la <- local_attributions(rf_explain, titanic_small[7,], 
                            order = c("Age", "Sex", "Pclass", "SibSp"))
rf_la
plotD3(rf_la, max_features = 10)

rf_la <- local_attributions(rf_explain, titanic_small[7,], 
                            order = c("Sex", "Age", "Pclass", "SibSp"))
rf_la
plotD3(rf_la, max_features = 10)


rf_la <- local_attributions(rf_explain, titanic_small[7,], 
                            order = c("SibSp", "Age", "Sex", "Pclass"))
rf_la
plotD3(rf_la, max_features = 10)


rf_la <- local_interactions(rf_explain, titanic_small[7,])
rf_la

rf_la <- local_interactions(rf_explain, titanic_small[7,], interaction_preference  = 50)
rf_la
plotD3(rf_la)
## 
# Model level Uncertanity 

# create 10 bootstrap samples

models <- lapply(1:10, function(i) {
  titanic_B <- titanic_small[sample(1:nrow(titanic_small), replace = TRUE),]
  
  rf_model <- randomForest(Survived ~ Pclass + Sex + Age + SibSp + 
                             Parch + Fare + Embarked, 
                           data = titanic_B)
  rf_model
})


attributions <- lapply(models, function(rf_model) {
  rf_explain <- explain(rf_model, data = titanic_B,
                        y = titanic_B$Survived == "1", label = "RF",
                        predict_function = predict_fuction)
  local_attributions(rf_explain, titanic_small[7,])
})

local_interactions(rf_explain, titanic_small[7,])

plotD3(attributions[[1]], max_features = 10)
plotD3(attributions[[2]], max_features = 10)
plotD3(attributions[[3]], max_features = 10)


plot() gives BD profiles for all target classes

In a multiclass usecase, one needs BreakDown profile for only one target class, however plot() produces profiles for ALL target classes. How to control plot() to show only required target class result. Here is a dummy code similar to my actual usecase. I need to see BD plot for "DF lrnr exp.alpha" only and two other should not be shown. This kind of selection is needed where number of target classes and/or number of variables is large. Here is the dummy code:
`library(DALEX)
library(DALEXtra)
library(tidyverse)
library("mlr3verse")

df=data.frame(w=c(34,65,23,78,37, 34,65,23,78,37, 34,65,23,78,37, 34,65,23,78,37),
x=c('a','b','a','c','c', 'a','b','a','c','c', 'a','b','a','c','c', 'a','b','a','c','c'),
y=c(TRUE,FALSE,TRUE,TRUE,FALSE, TRUE,FALSE,TRUE,TRUE,FALSE, TRUE,FALSE,TRUE,TRUE,FALSE, TRUE,FALSE,TRUE,TRUE,FALSE),
z=c('alpha','alpha','delta','delta','phi', 'alpha','alpha','delta','delta','phi', 'alpha','alpha','delta','delta','phi', 'alpha','alpha','delta','delta','phi')
)

df_task <- TaskClassif$new(id = "my_df", backend = df, target = "z")
df_lrn <- lrn("classif.rpart", predict_type = "prob")
df_lrn$train(df_task)

df_lrn_exp <- explain_mlr3(df_lrn,
data = df[,-4],
y = df$z,
label = "DF lrnr exp")
df_BD <- predict_parts(df_lrn_exp, df[3,], type='break_down')
plot(df_BD, max_features = 5, add_contributions = T)
`

bug in plotting distributions on R3.6

It occurs when the variable column is a factor instead of character (default for R3.6, changed in R4.0).

library(DALEX)
model_lm <- lm(m2.price~., data = apartments)
explainer_lm <- DALEX::explain(model_lm, data = apartments, y = apartments$m2.price)
mp <- iBreakDown::local_attributions(explainer_lm, apartments[3,], keep_distributions=TRUE)

mp$variable <- as.factor(mp$variable)

plot(mp, plot_distributions = TRUE)

image

Potential fix: add stringsAsFactors=TRUE to all of the results and fix vorder in plot_break_down_distributions

iBreakDown article goes to arxiv

Please ask someone for review of the iBreakDown article
upload iBreakDown article to arxiv
add link to it in README.md, CITATION and maybe DESCRIPTION

generating SHAP values.

Hello,
First of all thank you so much for your contribution for making millions of like me more educated on "Black Box" model.
I have a quick question regarding SHAP values generation using DALEX/ iBreakDown packages.
xgb<-DALEX::explain(xgb_03, data=testing[,1:345],y= testing$TRANSITIONED=="YES", label = "for Member "L00274353401")

ive_xgb <- iBreakDown::shap(xgb, new_observation = filter(testing[,1:345], MEMBER=='TX00274353401'))

When I run this code, first object gets created without problem but for second object (ive_xgb), its still running after three hours. Do you have any suggestion? please
Thank you

Error-message: subscript out of bounds

When I try to fit an XGBoost model on the famous Diabetes dataset, I get the message "Subscript out of bounds". See the code below.

library(tidyverse)
library(Hmisc)
library(xgboost)
library(iBreakDown)
library(tictoc)
library(recipes)

Load dataset

#Diabetes <- read_csv("https://www.kaggle.com/saurabh00007/diabetescsv/diabetes.csv")
Diabetes <- read_csv("diabetes.csv")

Summarise dataset

d <- describe(Diabetes)
plot(d)

Data Pre-processing, bring outliers back to values within certain range

Diabetes_Recept <- recipe(Outcome ~ ., data = Diabetes) %>%
step_range(Pregnancies, min = 0, max = 10) %>%
step_range(Glucose, min = 80, max = 150) %>%
step_range(BloodPressure, min = 50, max = 100) %>%
step_range(SkinThickness, min = 10, max = 50) %>%
step_range(Insulin, min = 10, max = 200) %>%
step_range(Age, min = 20, max = 70) %>%
step_range(BMI, min = 20, max = 55)

Diabetes_prep <- prep(x = Diabetes_Recept,
training = Diabetes)

Diabetes_bake <- bake(object = Diabetes_prep,
new_data = Diabetes)

Prepare for modeling

Y.train <- Diabetes$Outcome
features <- select(Diabetes_bake, -Outcome)
X.train <- features %>% data.matrix()
`

Fit Xgboost Model

tic()
set.seed(12)

param <- list(objective = "binary:logistic", # For classification
eval_metric = "auc", # auc is used for classification
max_depth = 4,
eta = 0.3, # Learning rate
subsample = 0.8,
colsample_bytree = 0.8,
min_child_weight = 2,
scale_pow_weight = sum(Y.train == 0) / sum(Y.train == 1),
max_delta_step = 8)

XGB_Model <- xgboost(data = X.train, label = Y.train, params = param, nround = 100, verbose = F)

toc()

Look at the shap plots

xgb.plot.shap(data = X.train,
model = XGB_Model,
top_n = 8,
n_col = 2,
ylab = "Probability of Diabetes")

Make explain object

predict_logit <- function(model, x) {
raw_x <- predict(model, x)
exp(raw_x)/(1 + exp(raw_x))
}

Explainer_XGB <- DALEX::explain(model = XGB_Model,
label="Extreme Gradient Boosting",
data = X.train,
predict_function = predict_logit,
y = Diabetes$Outcome)

predictions <- predict(XGB_Model, newdata= X.train, type="prob")

case1 <- as.matrix(X.train[1,])

Explain model outcomes on individual case level

After running the next command I get the error message

explain1 <- break_down(x = Explainer_XGB,
new_observation = case1,
interactions = FALSE)

plot(explain1,
max_features = 5,
vcolors = c("green", "red", "purple") )

`order` in `local_interactions()`

How to set the fixed order of variables with interactions?
This doesn't work:

library("randomForest")
explain_rf_v6 <- archivist::aread("pbiecek/models/9b971")

library("DALEX")
johny_d <- archivist::aread("pbiecek/models/e3596")

library("iBreakDown")
ibd_rf <- local_interactions(explain_rf_v6,
                             johny_d, 
                             order = c("age:class", "gender", "fare", "parch", "sibsp", "embarked" ))

It would be helpfull to have an example in documentation.

[shap] NA in variable_value column

library("xgboost")
library("DALEX")

model_matrix <- model.matrix(status == "fired" ~ . -1, HR)
data <- xgb.DMatrix(model_matrix, label = HR$status == "fired")

params <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2,
               objective = "binary:logistic", eval_metric = "auc")

model_HR <- xgb.train(params, data, nrounds = 50)

explainer_HR <- explain(model_HR,
                        data = model_matrix,
                        y = HR$status == "fired",
                        verbose = FALSE)

library(iBreakDown)
# this works
break_down(explainer_HR, model_matrix[1,,drop=FALSE])

# this has NA in variable_value (and rownames)
shap(explainer_HR, model_matrix[1,,drop=FALSE])

DALEXverse 0.19.8 release summer 2019

DALEXverse 0.19.8 release summer 2019

Integration

  • readability: vignettes
  • readability: NEWS
  • readability: DESCRIPTION
  • consistency: pkgdown website
  • consistency: entry at DrWhy.AI webpage

assigned: @pbiecek

Code review

  • consistency: names of functions
  • consistency: names of files
  • consistency: names of variables in functions (local and global)
  • length: functions
  • readability: code (comments, constructions)

assigned: @maksymiuks

Feature review

  • readability: documentation (title, description, details)
  • readability: examples (relevant, complete, with comments)
  • reproducibility: tests (code coverage)
  • links to functions: \code

assigned: @kasiapekala

Chose colors with vcolors in plot.break_down

In the plot function you can specify the colour of the bars with the " vcolors" argument.
I want to colour risk-contributing factors with "red", risk-lowering factor with "green" and the prediction of the local model with the color "purple'.

Therefore I specified these three colors. 90% of the time I get the desired effect. Sometimes however the features which make the probability higher turn green instead of red.
Is it possible to label the colors, like "higher-score = red", " lower-score = green" and "prediction = purple" or something like that?

which LICENSE?

Hi,
could you please add a license for this project? This would be very much helpful for usage in projects.

Enhancement of label-functionality

It would be nice if the plot function of iBreakdown would have 2 extra arguments:

  1. One argument to add a subtitle, which can be used for a case-specific label (for example the identification and name of the case).
    Instead of having to override the model-label.

  2. An argument to specifiy user-friendly variable labels, which than can be used in the breakdown plot instead of the technical variable-names used in the model. This is especially useful in the communication with end-users about the explanation of the prediction (which is one of the goals of explaining models).

Error in describe(bd_rf) : could not find function "describe"

@AdamIzdebski would you check if pkgdown::build_site() and devtools::check() works after your changes?
I have following issue

Reading 'vignettes/vignette_iBreakDown_description.Rmd'
Error in describe(bd_rf) : could not find function "describe"
In addition: Warning message:
`chr_along()` is deprecated as of rlang 0.2.0.
This warning is displayed once per session. 
Error in describe(bd_rf) : could not find function "describe"

Sizing of plotD3 in Shiny

Hello,

I'm attempting to use an iBreakdown plot in a Shiny app and I'm finding I'm unable to increase the size of the plot. According to r2d3 documentation, the plot should fill the area so I'm wondering if there's a bug on the 'iBreakDown' side rather than the 'r2d3' side.

Below, I've tried setting the container height of both the d3Output as well as the column within the fluidRow. I also tried setting scale_height=TRUE.

library(shiny)
library(r2d3)
library(DALEX)
library(iBreakDown)

ui <- fluidPage(
  fluidRow(
    column(12, style = "height:800px;",
           d3Output("d3", height = "800px")
           )
    )
  )

server <- function(input, output) {
  
  output$d3 <- renderD3({

    titanic <- na.omit(titanic)
    set.seed(1313)
    titanic_small <- titanic[sample(1:nrow(titanic), 500), c(1,2,6,9)]
    model_titanic_glm <- glm(survived == "yes" ~ gender + age + fare,
                             data = titanic_small, family = "binomial")
    explain_titanic_glm <- explain(model_titanic_glm,
                                   data = titanic_small[,-9],
                                   y = titanic_small$survived == "yes",
                                   label = "glm")
    bd_glm <- local_attributions(explain_titanic_glm, titanic_small[1, ])
    plotD3(bd_glm) # also tried adding 'scale_height=TRUE'
    
  })
}

shinyApp(ui = ui, server = server)

As for my info:

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] iBreakDown_0.9.9 DALEX_0.4.7      r2d3_0.2.3       shiny_1.3.2     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       compiler_3.5.3   pillar_1.3.1     later_0.8.0     
 [5] plyr_1.8.4       tools_3.5.3      digest_0.6.19    jsonlite_1.6    
 [9] tibble_2.1.1     gtable_0.2.0     pkgconfig_2.0.2  rlang_0.3.4     
[13] rstudioapi_0.10  yaml_2.2.0       xfun_0.5         dplyr_0.8.0.1   
[17] knitr_1.22       htmlwidgets_1.3  grid_3.5.3       tidyselect_0.2.5
[21] glue_1.3.1       R6_2.4.0         ggplot2_3.1.0    purrr_0.3.2     
[25] magrittr_1.5     scales_1.0.0     promises_1.0.1   htmltools_0.3.6 
[29] assertthat_0.2.1 mime_0.6         xtable_1.8-3     colorspace_1.4-1
[33] httpuv_1.5.0     lazyeval_0.2.2   munsell_0.5.0    crayon_1.3.4 

Error when explaining random forest model

I get the following error when using iBreadkDown in combination with a Random Forest model:
Error in yhat.default(x, data) : (list) object cannot be coerced to type 'double'_

library(tidyverse)
library(DALEX)
library(iBreakDown)
library(randomForest)

setwd("~/2019 Diabetes")

Diabetes <- read_csv("diabetes.csv")

Diabetes <- Diabetes %>%
mutate(Outcome = as.factor(Outcome))

set.seed(57974)

RF_Fit <-
rand_forest(mode = "classification", mtry = 5, trees = 100) %>%
set_engine("randomForest") %>%
fit(Outcome ~ ., data = Diabetes)

And explaining the model cases, with iBreakdown

Explainer_Rf_Diabetes <- DALEX::explain(RF_Fit,
data = Diabetes[, 1:8],
Y = Diabetes == "1")

person1 <- Diabetes %>%
slice(1) %>%
select(-Outcome)

Cp_Rf1 <- break_down(Explainer_Rf_Diabetes,
new_observation = person1)

plot(Cp_Rf1)

Problem with predict function

library(DALEX2)
library(ggplot2)
library(breakDown2)

head(HR)
new_observation <- HR_test[1,]
new_observation

library(nnet)
m_glm <- multinom(status ~ . , data = HR, probabilities = TRUE, model = TRUE)

p_fun <- function(object, newdata){predict(object, newdata=newdata, type="prob")}

bd_glm <- local_attributions(m_glm,
                            data = HR_test,
                            new_observation =  new_observation,
                            keep_distributions = TRUE,
                            predict_function = p_fun)

#bd_glm
plot(bd_glm)
plot(bd_glm, start_baseline = TRUE)
plot(bd_glm, plot_distributions = TRUE)

bd_glm causes error
plot bd_glm overlaps figures

plot distrobutions work well

Hidden last contribution label in the break_down plot

@hbaniecki Thank you for pointing us the docs for iBreakDown, etc.!
I don't know if this is helpful but I got this minor issue a while ago for binary classification models.
The prediction for the positive cases works fine, but for negative cases the prediction becomes unreadable.
image

adding vcolors to the generic plot function worked for me:

bd_glm <- variable_attribution(explainer_glm, 
                               new_observation = x_test[1,],
                               type="break_down")

bd_glm2 <- variable_attribution(explainer_glm,
                                new_observation = x_test[30,],
                                type="break_down")
bd_colors <- c("#f05a71","#4378bf", "#8bdcbe", "#ffa58c")

p1 <- plot(bd_glm2, vcolors = bd_colors)
p2 <- plot(bd_glm, vcolors = bd_colors)

image

Thanks for your reply!

Originally posted by @marcjermaine-pontiveros in ModelOriented/DALEX#176 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.