Giter VIP home page Giter VIP logo

xai2shiny's Introduction

xai2shiny

R build status Coverage Status

Overview

The xai2shiny R package creates a Shiny application for Explainers (adapters for machine learning models created using the DALEX package). Turn your model into an interactive application containing model's prediction, performance and many XAI methods with just one function. Furthermore, with xai2shiny you can simply export your application to the cloud and share it with others.

Installation

# Install the development version from GitHub:
devtools::install_github("ModelOriented/xai2shiny")

Example

Package usage example will be based on the titanic dataset, including GLM and Random Forest models. The final application created using the scipt below. First, it is necessary to have any explainers created whatsoever:

library("xai2shiny")
library("ranger")
library("DALEX")

# Creating ML models
model_rf <- ranger(survived ~ .,
                   data = titanic_imputed,
                   classification = TRUE, 
                   probability = TRUE)
model_glm <- glm(survived ~ .,
                 data = titanic_imputed,
                 family = "binomial")

# Creating DALEX explainers
explainer_rf <- explain(model_rf,
                     data = titanic_imputed[,-8],
                     y = titanic_imputed$survived)

explainer_glm <- explain(model_glm,
                     data = titanic_imputed[,-8],
                     y = titanic_imputed$survived)

Then all is left to do is to run:

xai2shiny::xai2shiny(explainer_glm, explainer_rf, 
                     directory = './',
                     selected_variables = c('gender', 'age'),
                     run = FALSE)

Above, in xai2shiny function, apart from explainers, following attributes were provided:

  • directory - a location indicator where to create whole xai2shiny directory and place there required files (an app and explainers),
  • selected_variables - a vector containing variables list chosen at an app start-up (used for modification and local explanations research),
  • run - whether to run an app immediately after creating.

Cloud deployment

Further cloud deployment can be performed. In order to do so, there are just three steps necessary to enjoy your new xai2shiny application in the cloud.

  1. If you don't have an account on DigitalOcean, create one here and get $100 free credit.
  2. Create an SSH key if you don't have one yet.
  3. Deploy the SSH key to DigitalOcean

And that's it, you are ready to get back to R and deploy your application. In order to create a new cloud instance, called a droplet by DigitalOcean, running Docker on Ubuntu with all prerequisities installed, just run:

xai2shiny::cloud_setup(size)
  • size - ram size desired for the droplet, defaults to 1GB. It can be modified later through DigitalOceans website.

Now that your droplet is setup, just deploy the created xai2shiny application with one function.

deploy_shiny(droplet = <your_droplet_id>, path = './xai2shiny', packages = "ranger")
  • droplet - the droplet object/droplet's ID that can be read from running analogsea::droplets().
  • path - path to the xai2shiny application
  • packages - packages used to create or run the model, they will be installed on the droplet.

And that's it, the xai2shiny application is running and will automatically open in your default web browser, now all that's left is to share it!

Functionality

The main function is called xai2shiny which creates the Shiny app.R file and runs it converting your models into an interactive application.

At the time it supports such functionalities for multiple models in one application:

  1. Model prediction
  2. Model performance (with text descriptions of measures)
  3. Local explanations: (with text descriptions)
    • Break Down plot
    • SHAP values plot
    • Ceteris Paribus plot
  4. Global explanations:
    • Feature importance plots
    • Partial Dependence plots

Acknowledgments

Work on this package was financially supported by the Polish National Science Centre under Opus Grant number 2017/27/B/ST6/0130.

xai2shiny's People

Contributors

adamoso avatar mckraqs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

han-tun abson-dev

xai2shiny's Issues

more verbose xai2shiny::xai2shiny

after the dir is created there is no message where it was created and if the process was successful
be more verbose and let the user know that the process was OK and where is the outpur
(see verbose parameter in DALEX::explain)

Error running xai2shiny

Hi @Adamoso it's me again! I've updated your package to the most recent version and now I'm getting the following error:

Error in withSpinner(uiOutput("textPred"), hide.ui = FALSE) : 
  unused argument (hide.ui = FALSE)

Reproducible example:

library(xai2shiny) # devtools::install_github("ModelOriented/xai2shiny")
library(lares) # devtools::install_github("laresbernardo/lares")
ignore <- c("PassengerId","Ticket","Cabin")
model <- h2o_automl(dft, Survived, ignore = ignore, quiet = FALSE)
explainer <- h2o_explainer(model$datasets$test, model = model$model, y = "Survived", ignore = ignore)
xai2shiny(explainer)

AUC in regressions models

I have tested the code with two regressions models: xgboost and glm. The results produce an AUC plot, which is meaningless for regressions.

Error rendering - missing comma

Error running xai2shiny function, something about a missing comma?

Load libraries

Data transformations

suppressMessages(library(tidyverse))
suppressMessages(library(data.table))

Saving data to disk

suppressMessages(library(feather))
suppressMessages(library(arrow))
suppressMessages(library(here))

Feature engineering

suppressMessages(library(recipes))
suppressMessages(library(yardstick))

Machine learning

suppressMessages(library(tidymodels))
suppressMessages(library(themis))

Explainer

suppressMessages(library(DALEX))

dir <- "/home/paulc/projects_Paul/31_user_journey"

Load models

lasso_model <- readRDS(paste0(dir, "/R/models/02_glmnet/", "final_model_smote.Rds"))
ranger_model <- readRDS(paste0(dir, "/R/models/03_randomForest/", "final_model_smote.Rds"))

Read data

path <- paste0(dir, "/R/data/processed/", "data_final.parquet")
df.data <- setDT(read_parquet(path))

Delete col_to_del

col_to_del <- c("username", "user_id", "start_activity",
"end_activity", "cohort", "min_to_purchase",
"token_bonus_ratio", "first_purchase")
df.data[, (col_to_del) := NULL]

Split the data into training and testing sets

set.seed(2020)
train_test_split <- df.data %>%
initial_split(prop = 0.8, strata = label_fct)

Set recipie

recipie_num <- training(train_test_split) %>%
recipe(label_fct ~. ) %>% # Fomula
step_mutate(label_fct = as.factor(label_fct)) %>%
step_normalize(all_predictors()) %>%
step_smote(label_fct) %>%
prep()

create the final data

df.train <- as.data.frame(juice(recipie_num))
df.test <- as.data.frame(bake(recipie_num, new_data = testing(train_test_split)))

binary variable for explainer

df.testing_original <- testing(train_test_split)
yTest <- as.integer(ifelse(df.testing_original$label_fct == "yes", 1, 0))

df.test <- df.test %>%
select(-label_fct)

custom_predict <- function(object, newdata) {pred <- predict(object, newdata, type = "prob")
response <- pred$.pred_yes
return(response)}

lasso_explainer <- DALEX::explain(model = lasso_model,
data = df.test,
y = yTest,
predict_function = custom_predict,
label = "Lasso",
colorize = FALSE)
#> Preparation of a new explainer is initiated
#> -> model label : Lasso
#> -> data : 20514 rows 5 cols
#> -> target variable : 20514 values
#> -> predict function : custom_predict
#> -> predicted values : numerical, min = 0.1982451 , mean = 0.35857 , max = 0.9535754
#> -> model_info : package parsnip , ver. 0.1.3 , task classification ( default )
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.9535754 , mean = -0.3346352 , max = 0.8017549
#> A new explainer has been created!

ranger_explainer <- DALEX::explain(model = ranger_model,
data = df.test,
y = yTest,
predict_function = custom_predict,
label = "Random Forest",
colorize = FALSE)
#> Preparation of a new explainer is initiated
#> -> model label : Random Forest
#> -> data : 20514 rows 5 cols
#> -> target variable : 20514 values
#> -> predict function : custom_predict
#> -> predicted values : numerical, min = 0.2152264 , mean = 0.3666207 , max = 0.9012534
#> -> model_info : package parsnip , ver. 0.1.3 , task classification ( default )
#> -> residual function : difference between y and yhat ( default )
#> -> residuals : numerical, min = -0.9012534 , mean = -0.3426858 , max = 0.7847736
#> A new explainer has been created!
library(xai2shiny)
xai2shiny(lasso_explainer, ranger_explainer)
#> Loading required package: shiny
#> Error in parse(file, keep.source = FALSE, srcfile = src, encoding = enc) :
#> /tmp/RtmpdRkELf/xai2shiny/app.R:11:1: unexpected ','
#> 10: library(parsnip)
#> 11: ,
#> ^
#> Possible missing comma at:
#> 30: if(!is.null(header)) tags$li(class="header",header),
#> ^
#> Possible extra comma at:
#> 127: column(width = 3, uiOutput("pdpvariable"),),
#> ^
#> Possible missing comma at:
#> 153: nulls <- sapply(obs, function(x) length(x) == 0)
#> ^
#> Error in sourceUTF8(fullpath, envir = new.env(parent = sharedEnv)): Error sourcing /tmp/RtmpdRkELf/xai2shiny/app.R
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: CentOS Linux 7 (Core)
#>
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices datasets utils methods base
#>
#> other attached packages:
#> [1] shiny_1.5.0 xai2shiny_0.1.0 DALEX_2.0
#> [4] themis_0.1.2 workflows_0.2.0 tune_0.1.1
#> [7] rsample_0.0.8 parsnip_0.1.3 modeldata_0.0.2
#> [10] infer_0.5.3 dials_0.0.9 scales_1.1.1
#> [13] broom_0.7.0 tidymodels_0.1.1.9000 yardstick_0.0.7
#> [16] recipes_0.1.13 here_0.1 arrow_1.0.0.20200728
#> [19] feather_0.3.5 data.table_1.12.8 forcats_0.5.0
#> [22] stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4
#> [25] readr_1.4.0 tidyr_1.1.2 tibble_3.0.4
#> [28] ggplot2_3.3.2 tidyverse_1.3.0
#>
#> loaded via a namespace (and not attached):
#> [1] readxl_1.3.1 mlr_2.17.1 backports_1.1.10
#> [4] fastmatch_1.1-0 plyr_1.8.6 shinydashboard_0.7.1
#> [7] splines_4.0.2 listenv_0.8.0 digest_0.6.26
#> [10] foreach_1.5.0 htmltools_0.5.0 fansi_0.4.1
#> [13] magrittr_1.5 checkmate_2.0.0 BBmisc_1.11
#> [16] unbalanced_2.0 doParallel_1.0.15 globals_0.13.0
#> [19] modelr_0.1.8 gower_0.2.2 colorspace_1.4-1
#> [22] blob_1.2.1 rvest_0.3.5 haven_2.3.1
#> [25] xfun_0.17 crayon_1.3.4 jsonlite_1.7.1
#> [28] survival_3.1-12 iterators_1.0.12 glue_1.4.2
#> [31] gtable_0.3.0 ipred_0.9-9 shape_1.4.4
#> [34] DBI_1.1.0 Rcpp_1.0.5 xtable_1.8-4
#> [37] GPfit_1.0-8 bit_1.1-15.2 lava_1.6.8
#> [40] prodlim_2019.11.13 glmnet_4.0-2 httr_1.4.1
#> [43] sourcetools_0.1.7 FNN_1.1.3 ellipsis_0.3.1
#> [46] pkgconfig_2.0.3 ParamHelpers_1.14 nnet_7.3-14
#> [49] dbplyr_1.4.4 tidyselect_1.1.0 rlang_0.4.8
#> [52] DiceDesign_1.8-1 later_1.1.0.1 munsell_0.5.0
#> [55] cellranger_1.1.0 tools_4.0.2 cli_2.1.0
#> [58] generics_0.0.2 ranger_0.12.1 evaluate_0.14
#> [61] fastmap_1.0.1 yaml_2.2.1 knitr_1.30
#> [64] bit64_0.9-7 fs_1.4.2 shinycssloaders_1.0.0
#> [67] RANN_2.6.1 future_1.19.1 whisker_0.4
#> [70] mime_0.9 xml2_1.3.2 compiler_4.0.2
#> [73] rstudioapi_0.11 reprex_0.3.0 lhs_1.0.2
#> [76] stringi_1.5.3 highr_0.8 lattice_0.20-41
#> [79] Matrix_1.2-18 shinyjs_2.0.0 vctrs_0.3.4
#> [82] pillar_1.4.6 lifecycle_0.2.0 furrr_0.1.0
#> [85] httpuv_1.5.4 R6_2.4.1 promises_1.1.1
#> [88] renv_0.12.0-12 codetools_0.2-16 MASS_7.3-51.6
#> [91] assertthat_0.2.1 rprojroot_1.3-2 shinyWidgets_0.5.4
#> [94] ROSE_0.0-3 withr_2.3.0 parallel_4.0.2
#> [97] hms_0.5.3 grid_4.0.2 rpart_4.1-15
#> [100] timeDate_3043.102 class_7.3-17 rmarkdown_2.3
#> [103] parallelMap_1.5.0 pROC_1.16.2 lubridate_1.7.9
Created on 2020-10-22 by the reprex package (v0.3.0)

finish TODOS

line 23

# TODO: create observation based on average data for each variable
chosen_observation <- data[1,-8]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.