modeloriented / kernelshap Goto Github PK
View Code? Open in Web Editor NEWEfficient R implementation of SHAP
Home Page: https://modeloriented.github.io/kernelshap/
License: GNU General Public License v2.0
Efficient R implementation of SHAP
Home Page: https://modeloriented.github.io/kernelshap/
License: GNU General Public License v2.0
I'm trying to calculate SHAP values for a rf with 2300 trees and I keep getting this message.
Error in kernelshap_one(x = X[i, , drop = FALSE], v1 = v1[i, , drop = FALSE], :
task 1 failed - "no applicable method for 'predict' applied to an object of class "ranger""
The demo with the diamonds dataset works fine so I don't know why it doesn't work with my own data.
The rf syntax looks like this:
rf= ranger(OverOrdAltiAltFactor~.,
data= df,
num.trees = 2300,
probability = TRUE)
The syntax for SHAP calculation:
shaps= kernelshap(rf, sample_big, bg_X= sample_small, parallel=TRUE)
If I dont use "probability= TRUE" wirh ranger, I get an error that "predictions must be a vector, matrix, data.frame or 2D array".
I can't share the data unfortunately.
In some situations, X
contains non-feature columns. An example is a model
lm(y ~ . - x1, data = X)
.
Its predict function will launch an error if "x1" is dropped from X
. Thus, if we wish to use kernelshap
on this model, we would need to calculate SHAP values also for column "x1", which is unnecessary.
Thus, it would be nice to have an argument feature_names = colnames(X)
to reduce the feature set in such cases.
Does this also work for the calculation of asymmetric shapley values, that allow their calculation by incorporating causal information with domain knowledge? The github for a package implementing asymmetric shapley values is here:
https://github.com/nredell/shapFlex
This is made by using what the researchers have found in this paper:
https://proceedings.neurips.cc/paper/2020/file/0d770c496aa3da6d2c3f2bd19e7b9d6b-Paper.pdf
There is also a talk by the researchers explaining the theory on youtube:
MacOS does not recognize text in math Latex. Need to replace by textrm. Fixed in #89
The core function requires a nested loop to update A and b. The inner loop can be replaced. This makes the algo easier to read.
The result of kernelshap()
applied to a multiresponse model containts an unnamed list S
of shap values. It would be nice if the elements would be named according to the prediction function output. Similar for SE
.
SHAP kernel weights put at least 0.75 of the mass to vectors z with sum(z) in {1, p-1}.
The current sampling strategy works by drawing vectors z from this distribution. Depending on p, >=75% of the draws will fall in this category. Not much mass is left for all other cases. This appears to be very wasteful as there are only 2p distinct z vectors with sum(z) in {1, p-1}.
Much better seems a hybrid between the exact approach (listing all possible z vectors and weighting them accordingly) and the sampling approach (drawing z vectors from the right distribution):
Weight corrections as in the exact case will be necessary to weight up the exact part and (uniformly) weight down the random part.
The Python implementation even goes a step further and also tries to enumerate all z with sum(z) in {2, p-2} etc. While we could also do this, we can prefer the much simpler approach outlined here and rely on the iterative sampling logic of Covert and Lee (2021). A drawback of also enumerating the cases with sum(z) in {2, p-2} etc. is that the sample size m must be relatively large.
Comparing this hybrid case with the Python logic, we can say: Python uses a larger m, and we use a smaller m but iterate until SHAP values are sufficiently precise. Both makes sense.
I'm trying to calculate SHAP-values of random forests from the ranger package. The random forests have between 1000 and 8000 trees and the datasets between 5000 and 37000 rows. The only model I managed to calculate SHAP-values for is a model that had 850 trees. The rest all crashed. I find it odd that none of the tutorials or vignettes says this could happen so I am wondering if there is something wrong with my models or whether this is expected. I cannot post the data for privacy issues (government data) but the variables are all numerical except one binary variable. One group of datasets have 8 variables (there are datasets for different years) and the other 32. The dependent variable is a 3-level factor. I'm using a 32GB Macbook pro and R 4.1.2.
This is the code I am using:
ntree=850
set.seed(199)
model <- ranger(Satisfaction ~ .,
data = df20,
importance = 'permutation',
num.trees = ntree20,
mtry=3,
probability = TRUE,
case.weights = class_weights[df20$Satisfaction])
set.seed(199)
xvars=colnames(df20[-9])
X= df20[sample(nrow(df20),500), xvars]
bg_X= df20[sample(nrow(df20),200),]
shap_rf= kernelshap(model, X, bg_X = bg_X)
Hello,
I want to try to run kernelshap with the example related to mlr3, ranger package but it reports an error as in the title, what should I do about this?
I am proposing to remove the ks_extract()
function. It allows to extract objects like S
from a kernelshap object x
. However, the usual extractors x$S
or x[["S"]]
will suffice.
Thanks for this awesome package!
I was wondering if it might be possible to have an option for parallel processing? I'm thinking specifically of the big for
loop:
for (i in seq_len(n)) res[[i]] <- kernelshap_one(...)
which could easily be replaced with foreach
:
res <- foreach(i = seq_len(n)) %dopar% kernelshap_one(...)
The downside is, you'd lose the progress bar. (There are some methods for doing parallel progress bars in R, but they all kind of suck in my experience.) The upside is, you can speed up the slowest part of the task by a factor of n_cores
. Perhaps a logical parallel
argument could allow users to toggle between the two?
Thank you for developing so kind package! But, when I apply this function, there was a problem for suppressing message.
The scrits like:
library(kernelshap)
ks <- kernelshap(model_final,X=training_data,bg_X = training_data, verbose = FALSE)
output:
1/1 [==============================] - 0s 21ms/step
1/1 [==============================] - 0s 18ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 2ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 2ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 2ms/step
19/19 [==============================] - 0s 2ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 2ms/step
19/19 [==============================] - 0s 2ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 1ms/step
19/19 [==============================] - 0s 1ms/step
I try 2 ways to suppress this messages, but those ways do not work.
suppressMessages({ ks <- kernelshap(model_final,X=training_data,bg_X = training_data, verbose = FALSE) })
invisible(capture.output(kernelshap(model_final,X=training_data,bg_X = training_data, verbose = FALSE)))
See title
Currently, the background dataset is not allowed to have only a single row. This is not ideal as there are certain important cases where this is explicitly wanted:
In such cases, the algorithm will run extremely fast.
This confused me for a bit because I assumed the L1 penalty was part of the optimization itself (i.e., just another constraint in Eq. 6 of the Covert & Lee paper). But a closer look at the shap code reveals that this is not the case. They just fit a lasso for feature selection then remove some column(s) from Z and run the regular pipeline.
We could implement this with glmnet
. We'd need an l1
argument, with possible values NULL
(the default, no regularization), "aic"
, "bic"
(for adaptive feature selection), or some integer less than p to set a target number of variables. Then within kernelshap_one
, we'd have something like the following (after Z
and vz
are defined):
# Optional feature selection
if (!is.null(l1)) {
# Fit lasso
fit <- glmnet(x = Z, y = (vz - v0_ext), weights = w, intercept = FALSE)
if (l1 %in% c('aic', 'bic')) {
# Compute information criteria for adaptive feature selection
y_hat <- predict.glmnet(fit, newx = Z, s = fit$lambda)
eps <- y_hat - (vz - v0_ext)
rmse <- sqrt(colMeans(eps^2))
ll <- sapply(seq_len(length(f$lambda)), function(l) {
sum(dnorm(eps[, l], sd = rmse[l], log = TRUE))
})
aic <- 2 * fit$df - 2 * ll
bic <- log(n) * fit$df - 2 * ll
k <- ifelse(crit == 'aic', which.min(aic), which.min(bic))
keep <- predict.glmnet(fit, s = fit$lambda[k], type = 'nonzero')$s1
} else {
# Or select some fixed number of features. This can be tricky when
# no value of lambda delivers precisely the target number of nonzeros
if (l1 %in% fit$df) {
k <- max(which(fit$df == l1))
keep <- predict.glmnet(fit, s = fit$lambda[k], type = 'nonzero')$s1
} else {
k <- min(which(fit$df >= l1))
beta <- abs(as.numeric(coef.glmnet(fit, s = fit$lambda[k])))
keep <- order(beta, decreasing = TRUE)[1:l1]
}
}
Z <- Z[, keep]
}
This updated Z
is then used to define A
and b
. Of course, we'd need to set the beta
for dropped features to 0 prior to output.
What do you think? I can create a pull request if you like.
kernelshap involves working with different matrices of different shapes.
We should consistently use feature names for the "p" dimension and prediction names (if available) for the "K" dimension. This would make the code easier to read.
I try to explain long short-term memory (LSTM) model with keras package, but does not work. The codes are:
`library(keras)
library(kernelshap)
set.seed(123)
x <- matrix(runif(100), ncol = 5)
y <- runif(20)
n_samples <- 20
n_timesteps <- 2
n_features <- 5
x_samples <- array(0, dim = c(n_samples, n_timesteps, n_features))
y_samples <- array(0, dim = c(n_samples, 1))
for (i in 1:(n_samples - n_timesteps + 1)) {
x_samples[i,,] <- x[i:(i + n_timesteps - 1),]
y_samples[i,] <- y[i + n_timesteps - 1]
}
model <- keras_model_sequential()
model %>%
layer_lstm(units = 32, input_shape = c(n_timesteps, n_features)) %>%
layer_dense(units = 1) %>%
compile(loss = 'mean_squared_error',optimizer = optimizer_adam())
model %>% fit(x_samples, y_samples, epochs = 20, batch_size = 1, verbose = 0)
ks <- kernelshap(model, X=x_samples, bg_X = x_samples, verbose = F)`
Error in kernelshap.default(model, X = x_samples, bg_X = x_samples, verbose = F) :
is.matrix(X) || is.data.frame(X) is not TRUE
kernelshap()
calculates predictions on the explainer data X
. For some applications, it would be convenient to return the X
,
Thus I am proposing to add a matrix of predictions
to the output of kernelshap()
.
Just to make it easier to follow the documentation
I would like to reduce the dependency footprint of "kernelshap". One candidate is MASS::ginv()
, which is the only function imported from "MASS". Thus, if MASS allows, I would love to use ginv()
directly, without MASS.
The package contains a large number of ks_*()
extraction functions. In order to not clutter the memory, they should be replaced by a single ks_extract(x, what = "S")
function.
Test result of "kernelshap" has two warns due to orphaned dependency "dorng":
https://cran.r-project.org/web/packages/doRNG/index.html
We used it to produce reproducible results under parallel computing. Since kernelshap's results are quite unaffected by randomness, we can replace it by %dopar%
from "doFuture". "dofuture" has a very stable maintainer.
With paired sampling, 2m on-off vectors are drawn, with simple sampling only m. This makes a direct comparison very unfair.
To fix this inconsistency, we modify paired sampling by drawing only m/2 on-off vectors and then add their complements. To ensure identical results for the (default) paired strategy, the "auto" values of m are doubled.
This will leave most applications of kernelshap() unchanged, but fixes the inconsistency.
A pain-point in using {kernelshap} is the manual preparation of the background data bg_X
. Most applications have a relatively large explanation data X
and a model m
, but no background data. It would be convenient to derive the background data automatically from X
, so that a SHAP analysis would start with:
shp <- kernelshap(m, X)
Suggestion: Set the default bg_X = NULL
. In this case, use this logic here:
nrow(X) <= 200
-> bg_X = X
nrow(X) < 20
-> warning("X is to small to be used as background data. Please pass a larger background data via 'bg_X'.")nrow(X) > 200
-> message("X is too large to be used as background data. We randomly select 200 rows from it.") and do subsamplingIf bg_X = NULL
and the user wants to pass a vector of case weights bg_w
, the latter would need to correspond to X
(we need to subsample correspondingly)
Ping @pbiecek
I am dealing with a large dataset (27000 x 140) and when I first try without parallel computation of kernalSHAP, there is no movement in the progress bar (it stays at 0). When I turn on parallel processing using what you have described on the home page of your repo, I get the error:
Error in unserialize(socklist[[n]]) : error reading from connection
What should I do? I am using windows, core i7 12700k.
I think we can adapt the existing code for exact solutions to extend the approach beyond
A
, which only depends on p
. So a simple utils function will do the trick. (Fun fact, the formula for off-diagonal values in Covert & Lee's Appx. A is actually wrong. You can verify this by comparing it with brute force results as in your existing crossprod(Z) / n_Z
when Z
is exact.)# Here's the proposed utils function
off_diag <- function(p) {
S <- 1:(p - 1)
probs <- (p - 1) / (choose(p, S) * S * (p - S))
probs <- probs / sum(probs)
c_pr <- (0.5 * S^2 - 0.5 * S) / choose(p, 2)
sum(probs * c_pr)
}
# Then A is just given by
if (exact) {
A <- diag(0.5, nrow = p)
A[A == 0] <- off_diag(p)
...
}
b
without any extra rows beyond the Z
-vectors. The trick is just to use those weights that are currently computed as part of create_exact_Z
, but deliver them along with the matrix Z
. This can be done outside the kernelshap_one
loop to avoid unnecessary computation.# Another utils function, this one to replace create_exact_Z
complete_Z <- function(p) {
# Create Z
z <- do.call(expand.grid, replicate(p, 0:1, simplify = FALSE))
z <- z[2:(nrow(z) - 1), ]
z <- as.matrix(z)
dimnames(z) <- NULL
rs <- rowSums(z)
# Compute weights
S <- 1:(p - 1)
probs <- (p - 1) / (choose(p, S) * S * (p - S))
probs <- probs / sum(probs)
w <- probs / as.vector(table(rs))
w <- w[rs]
out <- list('Z' = z, 'w' = w)
return(out)
}
# Then we can resume the code above
if (exact) {
A <- diag(0.5, nrow = p)
A[A == 0] <- off_diag(p)
cz <- complete_Z(p)
Z <- cz$Z
w <- cz$w
b <- crossprod(Z, w * (vz - v0_ext))
}
Of course, the value functions depend on the data, so b
will need to be computed within the kernelshap_one
loop. Everything else can be done outside though, and requires only pred_fun
per sample, which is less than the current create_exact_Z
implementation. Also allows for arbitrary
Just a thought, anyway! Happy to create a pull request from the current branch if you're interested.
Currently, there are four sampling strategies implemented:
exact = TRUE
exact = "auto"
paired_sampling = TRUE
(and exact = FALSE)paired_sampling = FALSE
(and exact = FALSE)For the user, it is relatively difficult to see how these arguments interact. Furthermore, additional strategies might be added in the future.
A clearer interface would be to replace the two arguments "exact" and "paired_sampling" by a single one, "sampling_strategy":
sampling_strategy = "auto"
(same as exact = auto)sampling_strategy = "exact"
(same as exact = TRUE)sampling_strategy = "paired"
(same as exact = FALSE with paired_sampling = TRUE)sampling_strategy = "simple"
(same as exact = FALSE with paired_sampling = FALSE)Ping @dswatson
See title.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.