modeloriented / ibreakdown Goto Github PK

View Code? Open in Web Editor NEW

79.0 10.0 15.0 104.82 MB

Break Down with interactions for local explanations (SHAP, BreakDown, iBreakDown)

Home Page: https://ModelOriented.github.io/iBreakDown/

License: GNU General Public License v3.0

R 99.20% CSS 0.80%

xai iml breakdown interpretability shapley

ibreakdown's Introduction

Model Agnostic Local Attributions

Overview

The iBreakDown package is a model agnostic tool for explanation of predictions from black boxes ML models. Break Down Table shows contributions of every variable to a final prediction. Break Down Plot presents variable contributions in a concise graphical way. SHAP (Shapley Additive Attributions) values are calculated as average from random Break Down profiles. This package works for binary classifiers as well as regression models.

iBreakDown is a successor of the breakDown package. It is faster (complexity O(p) instead of O(p^2)). It supports variable interactions and interactive explanations with D3.js visualizations. It is imported and used to compute model explanations in multiple packages e.g. DALEX, modelStudio, arenar.

Methodology behind the iBreakDown package is described in the arXiv paper and Explanatory Model Analysis book. It is a part of DrWhy.AI universe.

Installation

# the easiest way to get iBreakDown is to install it from CRAN:
install.packages("iBreakDown")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("ModelOriented/iBreakDown")

Learn more

Find more examples in the EMA book: https://ema.drwhy.ai/.

This version also works with D3: see an example and demo.

Acknowledgments

Work on this package was financially supported by the NCN Opus grant 2016/21/B/ST6/02176.

ibreakdown's People

Contributors

Stargazers

Watchers

Forkers

komosinskid agosiewska sztach adamizdebski jabogithub alex33261 learningasigoxyz mrdomani anhmike nspyrison subodhk26 michaelchaoli-cpu arieles08 dominik-aigora maksymiuks

ibreakdown's Issues

delete create.R as not needed

plotD3.break_down plots within tabs apart from active tab do not load/size correctly

I have a Rmarkdown document with some tabs, each of which has a breakdown plot. However, on all non-active tabs, the plot has been rendered where it appears to be shifted to the left of screen, cutting off the y-axis.

I note that this issue is occurs because the ability for the plot to re-render on window resize has been overwritten. Perhaps a simple-ish fix would be to provide a boolean to allow for window-resizing where required?

description() needs at least 4 features to work

add max_vars as an alias to max_features in plots

https://modeloriented.github.io/ingredients/reference/plot.feature_importance_explainer.html

remove binder as not needed/not used

Code review for v 1.0.0

What to check:

is documentation sufficient to understand function's parameters
is the description in DESCRIPTION up to date and meaningful
are examples easy to understand and follow
are descriptions in vignettes easy to understand and follow
is the README.md file easy to understand, consistent with other descriptions
is the R code easy to understand and follow
R code should be modular (long functions are bad)
are function names and variable names consistent

add default title to break_down plot

The plot function for break_down objects has no title.

We should:

add default title Break Down profile
add default subtitle created for the XXX model, where the XXX is extracted from the explainer (explainer$label). (Note that the plot function may take multiple explainers !!! all names shall be added)

Here is an example

library("DALEX")
library("iBreakDown")
titanic <- na.omit(titanic)
titanic_small <- titanic[sample(1:nrow(titanic), 500), c(1,2,6,9)]
model_titanic_glm <- glm(survived == "yes" ~ gender + age + fare,  data = titanic_small, family = "binomial")
explain_titanic_glm <- explain(model_titanic_glm,   data = titanic_small[,-9],   y = titanic_small$survived == "yes")
bd_rf <- break_down(explain_titanic_glm, titanic_small[1, ])
plot(bd_rf, max_features = 3)

parallel backend for iBreakDown

some statistics can be calculated in parallel

Local attributions plot too wide due to unrounded model values

For regressions, the table returned by local_attributions() contains values that are not rounded (left column):

                                             contribution
rf  model: intercept                                4.085
rf  model: joy = 0                                 -0.539
rf  model: negative = 0.5                          -0.380
rf  model: disgust = 0.214285714285714             -0.684

This would not be an issue, however, the plot becomes too wide which narrows the main plot:

Is there any way to round the decimal values in the y-axis?

pkgdown site is not working

lots of errors, please correct

plotting SHAP object

Hello,
In the last line of following code, I am trying to plot top 30 highest contributing predictors. I added max_features =30, but the output did not change. Any suggestion please?
testing_rand<- testing[sample(nrow(testing),500),] xgb<-DALEX::explain(xgb_03, data=testing_rand[,1:345],y= testing_rand$TRANSITIONED=="YES", label = "for Member FLU1533123501") ive_xgb <- iBreakDown::shap(xgb, new_observation = filter(testing_rand[,1:345], MEMBER=='FLU1533123501'), B=5)
plot(ive_xgb, max_features =30)

Support for multiple observations

Functions local_attributions() and local_interactions() break when more observations are passed via parameter new_observation. What is more, behaviors of these functions are different.
local_attributions() returns broken plot:

local_interactions() throws an error:
Error in data.frame(variable = variable, contribution = contribution, :
arguments imply differing number of rows: 10, 2108, 1

How about, at least, printing a warning that more than one observation was provided?

library("DALEX")
library("randomForest")
titanic <- na.omit(titanic)
model_titanic_rf <- randomForest(survived == "yes" ~ gender + age + class + embarked +
                                   fare + sibsp + parch,  data = titanic)
explain_titanic_rf <- explain(model_titanic_rf, 
                              data = titanic[,-9],
                              y = titanic$survived == "yes", 
                              label = "Random Forest v7")
library("iBreakDown")
rf_la <- local_attributions(explain_titanic_rf, titanic)
plot(rf_la)

rf_li <- local_interactions(explain_titanic_rf, titanic)

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=pl_PL.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=pl_PL.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=pl_PL.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] iBreakDown_0.9.4    randomForest_4.6-14 titanic_0.1.0       DALEX_0.3.0        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0        pillar_1.3.1      compiler_3.5.2    plyr_1.8.4        remotes_2.0.2     prettyunits_1.0.2 tools_3.5.2      
 [8] testthat_2.0.1    digest_0.6.18     pkgbuild_1.0.2    pkgload_1.0.2     memoise_1.1.0     tibble_2.0.1      gtable_0.2.0     
[15] lattice_0.20-38   pkgconfig_2.0.2   rlang_0.3.1       Matrix_1.2-15     cli_1.0.1         rstudioapi_0.9.0  curl_3.3         
[22] yaml_2.2.0        withr_2.1.2       dplyr_0.8.0.1     fs_1.2.6          desc_1.2.0        devtools_2.0.1    rprojroot_1.3-2  
[29] grid_3.5.2        tidyselect_0.2.5  glue_1.3.0        R6_2.4.0          processx_3.2.1    sessioninfo_1.1.1 ggplot2_3.1.0    
[36] purrr_0.3.0       callr_3.1.1       magrittr_1.5      usethis_1.4.0     backports_1.1.3   scales_1.0.0      ps_1.3.0         
[43] assertthat_0.2.0  colorspace_1.4-0  labeling_0.3      lazyeval_0.2.1    munsell_0.5.0     crayon_1.3.4

Wrong values in shap sign column

The sign column should be updated on new values of the contribution column.

iBreakDown/R/break_down_uncertainty.R

Lines 156 to 163 in 28fc66e

 extracted_contributions <- sapply(result, function(chunk) { 

 chunk[order(chunk$label, chunk$variable), "contribution"] 

 }) 

 result_average <- result[[1]] 

 result_average <- result_average[order(result_average$label, result_average$variable),] 

 result_average$contribution <- rowMeans(extracted_contributions) 

 result_average$B <- 0 

 result <- c(list(result_average), result)

Aggregate local_interactions to estimate shap with interactions

Hi,
Thanks for the package! I was wondering how is the variable order set when calculating the local interactions and if there could be a way to randomize that order to repeat the measure of the contribution for different orders (and get an estimation of the contribution closer to what SHAP would output)?
I tried passing different orders of variables to local_interactions(..., order =) but it does not change anything, and so I don't know if I am missing a step.. ?

Script example:

# get the variable names and interactions
tmp <- colnames(X)
tmp <- combn(tmp, m = 2)
tmp <-unlist(lapply(asplit(tmp, MARGIN = 2), paste, collapse = ':'))
varN <- c(colnames(X), tmp)

# create different orders
var_orders <- list()
for (i in 1:5){
    set.seed(i)
    var_orders[[i]] <- sample(varN)
}

# get the contributions for different orders
res <- list()
i <- 1
for (vo in var_orders){
    res[[i]] <- local_interactions(new_observation = X[1,],x = explain_rf, interaction_preference = 10, var_orders = vo)
    i <- i+1
}

Write a short article about iBreakDown with plotD3 for RBloggers

After #23

move DLAEX to suggests in the CRAN version v1.2

will fix ModelOriented/DALEX#195

prepare vigniettes

one for classification and one for regression

add support for travis and codecov

Request: Order the break_down like the old prediction_breakdown

I think it improves readability if the columns are ordered either ascending or descending

Error when passing model/data to break_down function

From break_down() examples:
This throws a note:

library("iBreakDown")
library("DALEX")
library("randomForest")
set.seed(1313)

model <- randomForest(status ~ . , data = HR)
new <- HR_test[1,]

explainer_rf <- explain(model,
                        data = HR[1:1000,1:5],
                        y = HR$status[1:1000])

Please note that 'y' is a factor.[...]

This works:

break_down(explainer_rf, new)

This throws an error:

break_down(x=model, data = HR[1:1000,1:5], predict_function = predict, new_observation = new)

Error in break_down.default(x = model, data = HR[1:1000, 1:5], new_observation = new, :
promise already under evaluation: recursive default argument reference or earlier problems?

This throws an error:

local_attributions(x=model, data = HR[1:1000,1:5], predict_function = predict, new_observation = new)

Error in colMeans(yhatpred) : 'x' must be numeric

plot() incompatible with old objects

I cannot use plot function with old break_down objects (created about half a year before).

Please, find an example old object in the attachment.
old_bd.zip

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] patchwork_1.0.0  iBreakDown_1.0.1 dplyr_0.8.5      ggplot2_3.3.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4       rstudioapi_0.11  magrittr_1.5     tidyselect_1.0.0 munsell_0.5.0   
 [6] colorspace_1.4-1 R6_2.4.1         rlang_0.4.5      tools_3.6.1      grid_3.6.1      
[11] packrat_0.5.0    gtable_0.3.0     DALEX_1.0.1      withr_2.1.2      assertthat_0.2.1
[16] digest_0.6.25    tibble_2.1.3     lifecycle_0.2.0  crayon_1.3.4     purrr_0.3.3     
[21] farver_2.0.3     glue_1.3.2       labeling_0.3     compiler_3.6.1   pillar_1.4.3    
[26] scales_1.1.0     pkgconfig_2.0.3

Significant number rounding doesn't seem right

library("DALEX")
library("randomForest")
apartments_rf_model <- randomForest(m2.price ~ construction.year + surface + floor +
                                      no.rooms, data = apartments)
explainer_rf <- explain(apartments_rf_model,
                        data = apartments_test[,2:5],
                        y = apartments_test$m2.price)

iBreakDown::break_down(explainer_rf, new_observation = apartments_test[1,2:5])
apartments_test[1,2:5]

This example shows wrong observation values in break_down object.

iBreakDown/R/local_attributions.R

Line 307 in 0e91507

nice_format <- function(x) {

This formatting could be done in print and plot functions (if really needed), not in local_attributions (and probably round instead of signif).

plotD3 - support for multiple models

something is wrong with displaying D3 plots

library("DALEX2")
library("breakDown2")
library("randomForest")
set.seed(1313)
model <- randomForest(status ~ . , data = HR)
new_observation <- HR_test[1,]

explainer_rf <- explain(model,
                        data = HR[1:1000,1:5],
                        y = HR$status[1:1000])

bd_rf <- local_attributions(explainer_rf,
                            new_observation)
plotD3(bd_rf)

session info

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=Polish_Poland.1250    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] breakDown2_0.9.1    reticulate_1.10     randomForest_4.6-14 iBreakDown_0.9.3   
[5] DALEX2_0.9         

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0       rstudioapi_0.9.0 magrittr_1.5     tidyselect_0.2.5
 [5] munsell_0.5.0    lattice_0.20-38  colorspace_1.4-0 R6_2.3.0        
 [9] rlang_0.3.1      plyr_1.8.4       dplyr_0.8.0.1    tools_3.5.2     
[13] grid_3.5.2       gtable_0.2.0     htmltools_0.3.6  digest_0.6.18   
[17] yaml_2.2.0       lazyeval_0.2.1   assertthat_0.2.0 tibble_2.0.1    
[21] crayon_1.3.4     Matrix_1.2-15    purrr_0.2.5      ggplot2_3.1.0   
[25] htmlwidgets_1.3  glue_1.3.0       compiler_3.5.2   pillar_1.3.1    
[29] r2d3_0.2.3       scales_1.0.0     jsonlite_1.6     pkgconfig_2.0.2

Word cummulative does not exist. Change it to cumulative.

The whole package really (use search).

local_attributions fails for classification: incorrect number of subscripts on matrix

For classification, local_attributions() returns the error:

Error in contribution[nrow(contribution), ] <- cummulative[nrow(contribution),  : 
  incorrect number of subscripts on matrix

One hint for the root cause might be the warning message thrown by the explainer - it tries to calculate numeric residuals which of course it cannot do:

      DALEX.explainer <- DALEX::explain(
        model = model_object,
        data = features,
        y = training.set$.outcome == TARGET.VALUE,
        label = paste(model_object$method, " model"),
        colorize = TRUE
      )
  
  A new explainer has been created!  
Warning message:
In mean.default(residuals) :
  argument is not numeric or logical: returning NA

Reproducible example:

random.case <- structure(list(anger = 0.166666666666667, anticipation = 0, disgust = 0.166666666666667, 
    fear = 0.166666666666667, joy = 0, negative = 0.25, positive = 0.0833333333333333, 
    sadness = 0.0833333333333333, surprise = 0.0833333333333333, 
    trust = 0), class = "data.frame", row.names = c(NA, -1L))

training.set <- structure(list(.outcome = structure(c(3L, 4L, 5L, 4L, 4L, 5L, 
5L, 4L, 3L, 3L, 3L, 5L, 4L, 3L, 3L, 1L, 4L, 3L, 4L, 5L, 3L, 2L, 
5L, 5L, 5L), .Label = c("1", "2", "3", "4", "5"), class = "factor"), 
    anger = c(0, 0.0434782608695652, 0, 0, 0, 0.1, 0, 0.037037037037037, 
    0.0192307692307692, 0, 0, 0, 0, 0.0673076923076923, 0.181818181818182, 
    0.0408163265306122, 0, 0, 0, 0.0285714285714286, 0.0526315789473684, 
    0.0952380952380952, 0, 0.0441176470588235, 0), anticipation = c(0.333333333333333, 
    0.217391304347826, 0.125, 0.15, 0.2, 0.2, 0.217391304347826, 
    0.111111111111111, 0.173076923076923, 0.166666666666667, 
    0.111111111111111, 0.157894736842105, 0.214285714285714, 
    0.115384615384615, 0.0909090909090909, 0.0408163265306122, 
    0, 0.166666666666667, 0, 0.114285714285714, 0.184210526315789, 
    0.0476190476190476, 0.133333333333333, 0.102941176470588, 
    0.176470588235294), disgust = c(0, 0, 0, 0, 0, 0, 0, 0.0185185185185185, 
    0.0192307692307692, 0.0833333333333333, 0.0740740740740741, 
    0, 0, 0.0288461538461538, 0, 0.0204081632653061, 0, 0, 0.111111111111111, 
    0, 0, 0.0952380952380952, 0, 0.0294117647058824, 0), fear = c(0, 
    0.0434782608695652, 0, 0.05, 0, 0, 0, 0.0185185185185185, 
    0, 0, 0, 0, 0, 0.0673076923076923, 0, 0.0408163265306122, 
    0, 0.0833333333333333, 0.111111111111111, 0, 0.0263157894736842, 
    0.0952380952380952, 0, 0.0294117647058824, 0), joy = c(0, 
    0.130434782608696, 0.166666666666667, 0.15, 0.233333333333333, 
    0.2, 0.173913043478261, 0.166666666666667, 0.0961538461538462, 
    0.166666666666667, 0.037037037037037, 0.210526315789474, 
    0.214285714285714, 0.0961538461538462, 0.181818181818182, 
    0.0204081632653061, 0.333333333333333, 0.0833333333333333, 
    0.222222222222222, 0.2, 0.105263157894737, 0.0952380952380952, 
    0.2, 0.147058823529412, 0.176470588235294), negative = c(0, 
    0.0869565217391304, 0.0833333333333333, 0.1, 0, 0, 0, 0.0555555555555556, 
    0.0769230769230769, 0.166666666666667, 0.0740740740740741, 
    0.0526315789473684, 0.0714285714285714, 0.105769230769231, 
    0.181818181818182, 0.204081632653061, 0, 0.166666666666667, 
    0.222222222222222, 0.0285714285714286, 0.105263157894737, 
    0.19047619047619, 0, 0.102941176470588, 0.0294117647058824
    ), positive = c(0.333333333333333, 0.217391304347826, 0.291666666666667, 
    0.4, 0.3, 0.3, 0.347826086956522, 0.333333333333333, 0.326923076923077, 
    0.25, 0.259259259259259, 0.315789473684211, 0.285714285714286, 
    0.240384615384615, 0.181818181818182, 0.244897959183673, 
    0.333333333333333, 0.25, 0.222222222222222, 0.4, 0.342105263157895, 
    0.238095238095238, 0.4, 0.235294117647059, 0.352941176470588
    ), sadness = c(0.333333333333333, 0.0434782608695652, 0.0416666666666667, 
    0, 0, 0, 0, 0.0185185185185185, 0.0576923076923077, 0, 0.0740740740740741, 
    0, 0, 0.0480769230769231, 0.0909090909090909, 0.142857142857143, 
    0, 0, 0.111111111111111, 0, 0.0526315789473684, 0.0952380952380952, 
    0, 0.0441176470588235, 0.0294117647058824), surprise = c(0, 
    0.0434782608695652, 0.0833333333333333, 0.05, 0.0666666666666667, 
    0, 0.0434782608695652, 0.037037037037037, 0.0192307692307692, 
    0, 0.111111111111111, 0.0526315789473684, 0, 0.0865384615384615, 
    0, 0.0408163265306122, 0, 0, 0, 0.0285714285714286, 0.0526315789473684, 
    0, 0.0666666666666667, 0.0735294117647059, 0.0294117647058824
    ), trust = c(0, 0.173913043478261, 0.208333333333333, 0.1, 
    0.2, 0.2, 0.217391304347826, 0.203703703703704, 0.211538461538462, 
    0.166666666666667, 0.259259259259259, 0.210526315789474, 
    0.214285714285714, 0.144230769230769, 0.0909090909090909, 
    0.204081632653061, 0.333333333333333, 0.25, 0, 0.2, 0.0789473684210526, 
    0.0476190476190476, 0.2, 0.191176470588235, 0.205882352941176
    )), row.names = c(NA, 25L), class = "data.frame")

model.rf <- caret::train(
  form = .outcome ~ .,
  data = training.set,
  method = "rf", 
  trControl = trainControl(
    method = "repeatedcv", number = 5, repeats = 5)
)

target <- training.set$.outcome
features <- training.set %>% select(-.outcome)

TARGET.VALUE <- "1"

DALEX.explainer <- DALEX::explain(
        model = model.rf,
        data = features,
        y = target == TARGET.VALUE,
        label = paste(model_object$method, " model"),
        colorize = TRUE
  )

DALEX.attribution <- DALEX.explainer %>%
        iBreakDown::local_attributions(random.case)

Baseline is ignored?

I'm running the following code:


set.seed(17)
x1 <- runif(1000, -10, 10)
x2 <- runif(1000, -10, 10)
y <- 0.05*x1^2 + 0.05*x2^2

v2_df <- data.frame(x = x1,
                    y = x2,
                    z = y)

true_model <- function(model, newdata) {
  0.05*newdata[, 1]^2 + 0.05*newdata[, 2]^2
}

library(DALEX)
v2_explainer <- explain(list(), data = v2_df[, -3], predict_function = true_model,
                        label = "double_quadratic")

library(iBreakDown)
ibd_expl_1 <- local_attributions(v2_explainer, data.frame(x = -6, y = -6))
ibd_expl_2 <- local_attributions(v2_explainer, data.frame(x = -6, y = -6), baseline = 0)

ibd_plot_1 <- plot(ibd_expl_1, baseline = 0)
ibd_plot_2 <- plot(ibd_expl_2, baseline = 0)

ibd_plot_1
ibd_plot_2

And I'm getting same plots in both cases, neither of them starts in 0.

iBreakDown goes to CRAN

After #22 and #23
set version to 1.0.0
and send iBreakDown to CRAN

examples for SER

library("titanic")
head(titanic_train)


titanic_small <- titanic_train
titanic_small <- titanic_small[,c("Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked")]
titanic_small$Survived <- factor(titanic_small$Survived)
titanic_small$Sex <- factor(titanic_small$Sex)
titanic_small$Embarked <- factor(titanic_small$Embarked)
titanic_small <- na.omit(titanic_small)
titanic_train[760,]



library("randomForest")
rf_model <- randomForest(Survived ~ Pclass + Sex + Age + SibSp + 
                         Parch + Fare + Embarked, 
data = titanic_small)
rf_model


library("breakDown2")
library("DALEX2")
predict_fuction <- function(m,x) predict(m, x, type = "prob")[,2]
rf_explain <- explain(rf_model, data = titanic_small,
                      y = titanic_small$Survived == "1", label = "RF",
                      predict_function = predict_fuction)

# plor D3 explainers
library("breakDown2")
rf_la <- local_attributions(rf_explain, titanic_small[2,])
rf_la
plotD3(rf_la)


rf_la <- local_attributions(rf_explain, titanic_small[2,], order = 2:8)
rf_la
plotD3(rf_la)

rf_la <- local_attributions(rf_explain, titanic_small[2,], 
                            order = c("Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"))
rf_la
plotD3(rf_la)

rf_la <- local_attributions(rf_explain, titanic_small[7,], 
                            order = c("Age", "Sex", "Pclass", "SibSp"))
rf_la
plotD3(rf_la, max_features = 10)

rf_la <- local_attributions(rf_explain, titanic_small[7,], 
                            order = c("Sex", "Age", "Pclass", "SibSp"))
rf_la
plotD3(rf_la, max_features = 10)


rf_la <- local_attributions(rf_explain, titanic_small[7,], 
                            order = c("SibSp", "Age", "Sex", "Pclass"))
rf_la
plotD3(rf_la, max_features = 10)


rf_la <- local_interactions(rf_explain, titanic_small[7,])
rf_la

rf_la <- local_interactions(rf_explain, titanic_small[7,], interaction_preference  = 50)
rf_la
plotD3(rf_la)
## 
# Model level Uncertanity 

# create 10 bootstrap samples

models <- lapply(1:10, function(i) {
  titanic_B <- titanic_small[sample(1:nrow(titanic_small), replace = TRUE),]
  
  rf_model <- randomForest(Survived ~ Pclass + Sex + Age + SibSp + 
                             Parch + Fare + Embarked, 
                           data = titanic_B)
  rf_model
})


attributions <- lapply(models, function(rf_model) {
  rf_explain <- explain(rf_model, data = titanic_B,
                        y = titanic_B$Survived == "1", label = "RF",
                        predict_function = predict_fuction)
  local_attributions(rf_explain, titanic_small[7,])
})

local_interactions(rf_explain, titanic_small[7,])

plotD3(attributions[[1]], max_features = 10)
plotD3(attributions[[2]], max_features = 10)
plotD3(attributions[[3]], max_features = 10)

plot() gives BD profiles for all target classes

In a multiclass usecase, one needs BreakDown profile for only one target class, however plot() produces profiles for ALL target classes. How to control plot() to show only required target class result. Here is a dummy code similar to my actual usecase. I need to see BD plot for "DF lrnr exp.alpha" only and two other should not be shown. This kind of selection is needed where number of target classes and/or number of variables is large. Here is the dummy code:
`library(DALEX)
library(DALEXtra)
library(tidyverse)
library("mlr3verse")

df=data.frame(w=c(34,65,23,78,37, 34,65,23,78,37, 34,65,23,78,37, 34,65,23,78,37),
x=c('a','b','a','c','c', 'a','b','a','c','c', 'a','b','a','c','c', 'a','b','a','c','c'),
y=c(TRUE,FALSE,TRUE,TRUE,FALSE, TRUE,FALSE,TRUE,TRUE,FALSE, TRUE,FALSE,TRUE,TRUE,FALSE, TRUE,FALSE,TRUE,TRUE,FALSE),
z=c('alpha','alpha','delta','delta','phi', 'alpha','alpha','delta','delta','phi', 'alpha','alpha','delta','delta','phi', 'alpha','alpha','delta','delta','phi')
)

df_task <- TaskClassif$new(id = "my_df", backend = df, target = "z")
df_lrn <- lrn("classif.rpart", predict_type = "prob")
df_lrn$train(df_task)

df_lrn_exp <- explain_mlr3(df_lrn,
data = df[,-4],
y = df$z,
label = "DF lrnr exp")
df_BD <- predict_parts(df_lrn_exp, df[3,], type='break_down')
plot(df_BD, max_features = 5, add_contributions = T)
`

bug in plotting distributions on R3.6

It occurs when the variable column is a factor instead of character (default for R3.6, changed in R4.0).

library(DALEX)
model_lm <- lm(m2.price~., data = apartments)
explainer_lm <- DALEX::explain(model_lm, data = apartments, y = apartments$m2.price)
mp <- iBreakDown::local_attributions(explainer_lm, apartments[3,], keep_distributions=TRUE)

mp$variable <- as.factor(mp$variable)

plot(mp, plot_distributions = TRUE)

Potential fix: add stringsAsFactors=TRUE to all of the results and fix vorder in plot_break_down_distributions

iBreakDown article goes to arxiv

Please ask someone for review of the iBreakDown article
upload iBreakDown article to arxiv
add link to it in README.md, CITATION and maybe DESCRIPTION

generating SHAP values.

Hello,
First of all thank you so much for your contribution for making millions of like me more educated on "Black Box" model.
I have a quick question regarding SHAP values generation using DALEX/ iBreakDown packages.
xgb<-DALEX::explain(xgb_03, data=testing[,1:345],y= testing$TRANSITIONED=="YES", label = "for Member "L00274353401")

ive_xgb <- iBreakDown::shap(xgb, new_observation = filter(testing[,1:345], MEMBER=='TX00274353401'))

When I run this code, first object gets created without problem but for second object (ive_xgb), its still running after three hours. Do you have any suggestion? please
Thank you

Error-message: subscript out of bounds

When I try to fit an XGBoost model on the famous Diabetes dataset, I get the message "Subscript out of bounds". See the code below.

library(tidyverse)
library(Hmisc)
library(xgboost)
library(iBreakDown)
library(tictoc)
library(recipes)

Load dataset

#Diabetes <- read_csv("https://www.kaggle.com/saurabh00007/diabetescsv/diabetes.csv")
Diabetes <- read_csv("diabetes.csv")

Summarise dataset

d <- describe(Diabetes)
plot(d)

Data Pre-processing, bring outliers back to values within certain range

Diabetes_Recept <- recipe(Outcome ~ ., data = Diabetes) %>%
step_range(Pregnancies, min = 0, max = 10) %>%
step_range(Glucose, min = 80, max = 150) %>%
step_range(BloodPressure, min = 50, max = 100) %>%
step_range(SkinThickness, min = 10, max = 50) %>%
step_range(Insulin, min = 10, max = 200) %>%
step_range(Age, min = 20, max = 70) %>%
step_range(BMI, min = 20, max = 55)

Diabetes_prep <- prep(x = Diabetes_Recept,
training = Diabetes)

Diabetes_bake <- bake(object = Diabetes_prep,
new_data = Diabetes)

Prepare for modeling

Y.train <- Diabetes$Outcome
features <- select(Diabetes_bake, -Outcome)
X.train <- features %>% data.matrix()
`

Fit Xgboost Model

tic()
set.seed(12)

param <- list(objective = "binary:logistic", # For classification
eval_metric = "auc", # auc is used for classification
max_depth = 4,
eta = 0.3, # Learning rate
subsample = 0.8,
colsample_bytree = 0.8,
min_child_weight = 2,
scale_pow_weight = sum(Y.train == 0) / sum(Y.train == 1),
max_delta_step = 8)

XGB_Model <- xgboost(data = X.train, label = Y.train, params = param, nround = 100, verbose = F)

toc()

Look at the shap plots

xgb.plot.shap(data = X.train,
model = XGB_Model,
top_n = 8,
n_col = 2,
ylab = "Probability of Diabetes")

Make explain object

predict_logit <- function(model, x) {
raw_x <- predict(model, x)
exp(raw_x)/(1 + exp(raw_x))
}

Explainer_XGB <- DALEX::explain(model = XGB_Model,
label="Extreme Gradient Boosting",
data = X.train,
predict_function = predict_logit,
y = Diabetes$Outcome)

predictions <- predict(XGB_Model, newdata= X.train, type="prob")

case1 <- as.matrix(X.train[1,])

Explain model outcomes on individual case level

After running the next command I get the error message

explain1 <- break_down(x = Explainer_XGB,
new_observation = case1,
interactions = FALSE)

plot(explain1,
max_features = 5,
vcolors = c("green", "red", "purple") )

`order` in `local_interactions()`

How to set the fixed order of variables with interactions?
This doesn't work:

library("randomForest")
explain_rf_v6 <- archivist::aread("pbiecek/models/9b971")

library("DALEX")
johny_d <- archivist::aread("pbiecek/models/e3596")

library("iBreakDown")
ibd_rf <- local_interactions(explain_rf_v6,
                             johny_d, 
                             order = c("age:class", "gender", "fare", "parch", "sibsp", "embarked" ))

It would be helpfull to have an example in documentation.

plot shapley values with plot.local_attributions_uncertainty() function

currently local_attributions_uncertainty supports the order argument
If order = "average" then shapley values shall be presented

[shap] NA in variable_value column

library("xgboost")
library("DALEX")

model_matrix <- model.matrix(status == "fired" ~ . -1, HR)
data <- xgb.DMatrix(model_matrix, label = HR$status == "fired")

params <- list(max_depth = 2, eta = 1, silent = 1, nthread = 2,
               objective = "binary:logistic", eval_metric = "auc")

model_HR <- xgb.train(params, data, nrounds = 50)

explainer_HR <- explain(model_HR,
                        data = model_matrix,
                        y = HR$status == "fired",
                        verbose = FALSE)

library(iBreakDown)
# this works
break_down(explainer_HR, model_matrix[1,,drop=FALSE])

# this has NA in variable_value (and rownames)
shap(explainer_HR, model_matrix[1,,drop=FALSE])

Differentely calculated diffs in local_attributions and local_interactions

iBreakDown/R/local_attributions.R

Line 126 in 43b6e0b

mean((average_yhats[[i]] - baseline_yhat)^2)

iBreakDown/R/local_interactions.R

Line 143 in 43b6e0b

diffs_1d <- average_yhats - baseline_yhat

DALEXverse 0.19.8 release summer 2019

Integration

assigned: @pbiecek

Code review

consistency: names of functions
consistency: names of files
consistency: names of variables in functions (local and global)
length: functions
readability: code (comments, constructions)

assigned: @maksymiuks

Feature review

readability: documentation (title, description, details)
readability: examples (relevant, complete, with comments)
reproducibility: tests (code coverage)
links to functions: \code

assigned: @kasiapekala

prepare tests

increase the codecove to 95% or more

Chose colors with vcolors in plot.break_down

In the plot function you can specify the colour of the bars with the " vcolors" argument.
I want to colour risk-contributing factors with "red", risk-lowering factor with "green" and the prediction of the local model with the color "purple'.

Therefore I specified these three colors. 90% of the time I get the desired effect. Sometimes however the features which make the probability higher turn green instead of red.
Is it possible to label the colors, like "higher-score = red", " lower-score = green" and "prediction = purple" or something like that?

which LICENSE?

Hi,
could you please add a license for this project? This would be very much helpful for usage in projects.

Enhancement of label-functionality

It would be nice if the plot function of iBreakdown would have 2 extra arguments:

One argument to add a subtitle, which can be used for a case-specific label (for example the identification and name of the case).
Instead of having to override the model-label.
An argument to specifiy user-friendly variable labels, which than can be used in the breakdown plot instead of the technical variable-names used in the model. This is especially useful in the communication with end-users about the explanation of the prediction (which is one of the goals of explaining models).

Error in describe(bd_rf) : could not find function "describe"

@AdamIzdebski would you check if pkgdown::build_site() and devtools::check() works after your changes?
I have following issue

Reading 'vignettes/vignette_iBreakDown_description.Rmd'
Error in describe(bd_rf) : could not find function "describe"
In addition: Warning message:
`chr_along()` is deprecated as of rlang 0.2.0.
This warning is displayed once per session. 
Error in describe(bd_rf) : could not find function "describe"

Use feature instead of variable in DALEXverse descriptions (?)

Replicate figure 5 from the iBreakDown paper

add function that adds error bars with explanation level uncertainty

Change github repository description

to something like "Model agnostic tool for decomposition of predictions from black boxes." (from DESCRIPTION file)

Sizing of plotD3 in Shiny

Hello,

I'm attempting to use an iBreakdown plot in a Shiny app and I'm finding I'm unable to increase the size of the plot. According to r2d3 documentation, the plot should fill the area so I'm wondering if there's a bug on the 'iBreakDown' side rather than the 'r2d3' side.

Below, I've tried setting the container height of both the d3Output as well as the column within the fluidRow. I also tried setting scale_height=TRUE.

library(shiny)
library(r2d3)
library(DALEX)
library(iBreakDown)

ui <- fluidPage(
  fluidRow(
    column(12, style = "height:800px;",
           d3Output("d3", height = "800px")
           )
    )
  )

server <- function(input, output) {
  
  output$d3 <- renderD3({

    titanic <- na.omit(titanic)
    set.seed(1313)
    titanic_small <- titanic[sample(1:nrow(titanic), 500), c(1,2,6,9)]
    model_titanic_glm <- glm(survived == "yes" ~ gender + age + fare,
                             data = titanic_small, family = "binomial")
    explain_titanic_glm <- explain(model_titanic_glm,
                                   data = titanic_small[,-9],
                                   y = titanic_small$survived == "yes",
                                   label = "glm")
    bd_glm <- local_attributions(explain_titanic_glm, titanic_small[1, ])
    plotD3(bd_glm) # also tried adding 'scale_height=TRUE'
    
  })
}

shinyApp(ui = ui, server = server)

As for my info:

R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] iBreakDown_0.9.9 DALEX_0.4.7      r2d3_0.2.3       shiny_1.3.2     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1       compiler_3.5.3   pillar_1.3.1     later_0.8.0     
 [5] plyr_1.8.4       tools_3.5.3      digest_0.6.19    jsonlite_1.6    
 [9] tibble_2.1.1     gtable_0.2.0     pkgconfig_2.0.2  rlang_0.3.4     
[13] rstudioapi_0.10  yaml_2.2.0       xfun_0.5         dplyr_0.8.0.1   
[17] knitr_1.22       htmlwidgets_1.3  grid_3.5.3       tidyselect_0.2.5
[21] glue_1.3.1       R6_2.4.0         ggplot2_3.1.0    purrr_0.3.2     
[25] magrittr_1.5     scales_1.0.0     promises_1.0.1   htmltools_0.3.6 
[29] assertthat_0.2.1 mime_0.6         xtable_1.8-3     colorspace_1.4-1
[33] httpuv_1.5.0     lazyeval_0.2.2   munsell_0.5.0    crayon_1.3.4

Error when explaining random forest model

I get the following error when using iBreadkDown in combination with a Random Forest model:
Error in yhat.default(x, data) : (list) object cannot be coerced to type 'double'_

library(tidyverse)
library(DALEX)
library(iBreakDown)
library(randomForest)

setwd("~/2019 Diabetes")

Diabetes <- read_csv("diabetes.csv")

Diabetes <- Diabetes %>%
mutate(Outcome = as.factor(Outcome))

set.seed(57974)

RF_Fit <-
rand_forest(mode = "classification", mtry = 5, trees = 100) %>%
set_engine("randomForest") %>%
fit(Outcome ~ ., data = Diabetes)

And explaining the model cases, with iBreakdown

Explainer_Rf_Diabetes <- DALEX::explain(RF_Fit,
data = Diabetes[, 1:8],
Y = Diabetes == "1")

person1 <- Diabetes %>%
slice(1) %>%
select(-Outcome)

Cp_Rf1 <- break_down(Explainer_Rf_Diabetes,
new_observation = person1)

plot(Cp_Rf1)

Problem with predict function

library(DALEX2)
library(ggplot2)
library(breakDown2)

head(HR)
new_observation <- HR_test[1,]
new_observation

library(nnet)
m_glm <- multinom(status ~ . , data = HR, probabilities = TRUE, model = TRUE)

p_fun <- function(object, newdata){predict(object, newdata=newdata, type="prob")}

bd_glm <- local_attributions(m_glm,
                            data = HR_test,
                            new_observation =  new_observation,
                            keep_distributions = TRUE,
                            predict_function = p_fun)

#bd_glm
plot(bd_glm)
plot(bd_glm, start_baseline = TRUE)
plot(bd_glm, plot_distributions = TRUE)

bd_glm causes error
plot bd_glm overlaps figures

plot distrobutions work well

Hidden last contribution label in the break_down plot

@hbaniecki Thank you for pointing us the docs for iBreakDown, etc.!
I don't know if this is helpful but I got this minor issue a while ago for binary classification models.
The prediction for the positive cases works fine, but for negative cases the prediction becomes unreadable.

adding vcolors to the generic plot function worked for me:

bd_glm <- variable_attribution(explainer_glm, 
                               new_observation = x_test[1,],
                               type="break_down")

bd_glm2 <- variable_attribution(explainer_glm,
                                new_observation = x_test[30,],
                                type="break_down")
bd_colors <- c("#f05a71","#4378bf", "#8bdcbe", "#ffa58c")

p1 <- plot(bd_glm2, vcolors = bd_colors)
p2 <- plot(bd_glm, vcolors = bd_colors)

Thanks for your reply!

Originally posted by @marcjermaine-pontiveros in ModelOriented/DALEX#176 (comment)

	extracted_contributions <- sapply(result, function(chunk) {
	chunk[order(chunk$label, chunk$variable), "contribution"]
	})
	result_average <- result[[1]]
	result_average <- result_average[order(result_average$label, result_average$variable),]
	result_average$contribution <- rowMeans(extracted_contributions)
	result_average$B <- 0
	result <- c(list(result_average), result)