Comments (5)

cregouby avatar cregouby commented on June 25, 2024 1

Issue comes from character variables present in the Titanic dataset beeing

titanic <- titanic_train
titanic$Survived <- factor(titanic$Survived,labels = c("no","yes"))
titanic$gender <- factor(titanic$Sex)
titanic$Embarked <- factor(titanic$Embarked)
titanic <- na.omit(titanic)
titanic <- titanic[titanic$Embarked != "",]
titanic$Embarked <- factor(titanic$Embarked)
names(titanic) <- tolower(names(titanic))

That results in

Skim summary statistics
 n obs: 712 
 n variables: 13 

── Variable type:character ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 variable missing complete   n min max empty n_unique
    cabin       0      712 712   0  15   529      134
     name       0      712 712  13  82     0      712
      sex       0      712 712   4   6     0        2
   ticket       0      712 712   3  18     0      541

── Variable type:factor ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 variable missing complete   n n_unique                   top_counts ordered
 embarked       0      712 712        3 S: 554, C: 130, Q: 28, NA: 0   FALSE
   gender       0      712 712        2    mal: 453, fem: 259, NA: 0   FALSE
 survived       0      712 712        2     no: 424, yes: 288, NA: 0   FALSE

── Variable type:integer ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    variable missing complete   n   mean     sd p0    p25 p50    p75 p100     hist
       parch       0      712 712   0.43   0.85  0   0      0   1       6 ▇▂▁▁▁▁▁▁
 passengerid       0      712 712 448.59 258.68  1 222.75 445 677.25  891 ▇▇▇▇▇▇▇▇
      pclass       0      712 712   2.24   0.84  1   1      2   3       3 ▅▁▁▃▁▁▁▇
       sibsp       0      712 712   0.51   0.93  0   0      0   1       5 ▇▃▁▁▁▁▁▁

── Variable type:numeric ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 variable missing complete   n  mean    sd   p0   p25   p50 p75   p100     hist
      age       0      712 712 29.64 14.49 0.42 20    28     38  80    ▂▃▇▆▃▂▁▁
     fare       0      712 712 34.57 52.94 0     8.05 15.65  33 512.33 ▇▁▁▁▁▁▁▁

After removal of the character variables,

titanic <- na.omit(titanic[,sapply(titanic, class) != 'character']) 

the provided works fine now.

Maybe it is worth adding this character-class filter to the data_set <- explainers[[1]]$data assignation in the renderMainPage function, or, at least, test for it and raise an explicit error message.

Thanks for this wonderful package anyway !

from modeldown.

kromash avatar kromash commented on June 25, 2024 1

@cregouby Thank you for your suggestion. We added validation for variables that comes with explainer dataset

from modeldown.

pbiecek avatar pbiecek commented on June 25, 2024

Is it still an issue?
I've updated the

from modeldown.

cregouby avatar cregouby commented on June 25, 2024

I updated modelDown and DALEX, breakDown and factorMerger to latest github release. The result in a issue during [1] "Generating variable_response..." Error in seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE), : 'from' must be a finite number
Here is the traceback

 Error in seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE),  : 
  'from' must be a finite number 
stop("'from' must be a finite number") 
seq.default(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE), 
    length = grid.resolution) 
seq(from = min(y, na.rm = TRUE), to = max(y, na.rm = TRUE), length = grid.resolution) 
FUN(X[[i]], ...) 
lapply(pred.var, function(x) {
    if (is.factor(train[, x, drop = TRUE])) {
        levels(train[, x, drop = TRUE])
    } ... 
pred_grid(train = train, pred.var = pred.var, grid.resolution = grid.resolution, 
    quantiles = quantiles, probs = probs, trim.outliers = trim.outliers) 
partial.default(explainer$model, pred.var = variable, train = explainer$data, 
    ..., = predictor_pdp, recursive = FALSE) 
partial(explainer$model, pred.var = variable, train = explainer$data, 
    ..., = predictor_pdp, recursive = FALSE) 
variable_response(explainer, variable_name, type = type) at generator.R#17
FUN(X[[i]], ...) 
lapply(explainers, function(explainer) {
    variable_response(explainer, variable_name, type = type)
}) at generator.R#17
FUN(X[[i]], ...) 
lapply(types, function(type) {
    lapply(explainers, function(explainer) {
        variable_response(explainer, variable_name, type = type)
    }) ... at generator.R#16
make_variable_plot(variable_name, types, explainers, img_folder, 
    options) at generator.R#35
FUN(X[[i]], ...) 
lapply(variables, make_variable_plot_model, explainers, img_folder, 
    options) at generator.R#51
generator_env$generator(explainers, options, file.path(output_folder, 
FUN(X[[i]], ...) 
lapply(modules_names, function(module_name) {
    print(paste("Generating ", module_name, "...", sep = ""))
    generator_path <- system.file("extdata", "modules", module_name, 
        "generator.R", package = "modelDown") ... 
generateModules(modules, output_folder, explainers, options) 
modelDown(explain_titanic_rf, explain_titanic_gbm, explain_titanic_svm, 
    explain_titanic_knn, device = "svg", output_folder = "modelDown_Titanic_example") at modelDown_example2.R#55
eval(ei, envir) 
eval(ei, envir) 
withVisible(eval(ei, envir)) 
source("~/R/Interpretability/modelDown_example2.R", echo = TRUE)

Here is the corresponding sessionInfo()

R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/
LAPACK: /usr/lib/x86_64-linux-gnu/

 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8     LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] kableExtra_1.1.0    caret_6.0-81        ggplot2_3.1.0       lattice_0.20-38     gbm_2.1.5           e1071_1.7-1         randomForest_4.6-14 modelDown_0.1.1    
 [9] DALEX_0.2.9         titanic_0.1.0      

loaded via a namespace (and not attached):
  [1] readxl_1.3.1       hnp_1.2-6          plyr_1.8.4         lazyeval_0.2.2     sp_1.3-1           splines_3.5.2      AlgDesign_1.1-7.3  digest_0.6.18     
  [9] foreach_1.4.4      htmltools_0.3.6    gdata_2.18.0       magrittr_1.5       cluster_2.0.7-1    ROCR_1.0-7         openxlsx_4.1.0     recipes_0.1.4     
 [17] readr_1.3.1        gower_0.2.0        gmodels_2.18.1     xts_0.11-2         tseries_0.10-46    colorspace_1.4-1   rvest_0.3.2        ggrepel_0.8.0     
 [25] haven_2.1.0        xfun_0.5           dplyr_0.8.0.1      crayon_1.3.4       ALEPlot_1.1        survival_2.43-3    zoo_1.8-5          iterators_1.0.10  
 [33] glue_1.3.1         gtable_0.2.0       ipred_0.9-8        webshot_0.5.1      questionr_0.7.0    car_3.0-2          quantmod_0.4-13    abind_1.4-6       
 [41] scales_1.0.0       mvtnorm_1.0-10     DBI_1.0.0          GGally_1.4.0       miniUI_0.1.1.1     Rcpp_1.0.1         plotROC_2.2.1      viridisLite_0.3.0 
 [49] xtable_1.8-3       spData_0.3.0       units_0.6-2        foreign_0.8-71     spdep_1.0-2        proxy_0.4-23       stats4_3.5.2       lava_1.6.5        
 [57] prodlim_2018.04.18 httr_1.4.0         yaImpute_1.0-31    gplots_3.0.1.1     RColorBrewer_1.1-2 factoextra_1.0.5   pkgconfig_2.0.2    reshape_0.8.8     
 [65] nnet_7.3-12        deldir_0.1-16      labeling_0.3       tidyselect_0.2.5   rlang_0.3.2        reshape2_1.4.3     later_0.8.0        munsell_0.5.0     
 [73] cellranger_1.1.0   tools_3.5.2        generics_0.0.2     factorMerger_0.3.6 fdrtool_1.2.15     evaluate_0.13      stringr_1.4.0      ModelMetrics_1.2.2
 [81] knitr_1.22         auditor_0.3.2      zip_2.0.1          pdp_0.7.0          caTools_1.17.1.2   purrr_0.3.2        packrat_0.5.0      nlme_3.1-137      
 [89] whisker_0.3-2      mime_0.6           xml2_1.2.0         compiler_3.5.2     rstudioapi_0.10    curl_3.3           klaR_0.6-14        tibble_2.1.1      
 [97] stringi_1.4.3      highr_0.8          forcats_0.4.0      Matrix_1.2-16      classInt_0.3-1     pillar_1.3.1       LearnBayes_2.15.1  combinat_0.0-8    
[105] data.table_1.12.0  bitops_1.0-6       httpuv_1.5.0       agricolae_1.3-0    R6_2.4.0           promises_1.0.1     KernSmooth_2.23-15 gridExtra_2.3     
[113] rio_0.5.16         codetools_0.2-16   boot_1.3-20        MASS_7.3-51.1      gtools_3.8.1       assertthat_0.2.1   withr_2.1.2        expm_0.999-4      
[121] hms_0.4.2          quadprog_1.5-5     grid_3.5.2         rpart_4.1-13       timeDate_3043.102  coda_0.19-2        class_7.3-15       rmarkdown_1.12    
[129] breakDown_0.2.0    carData_3.0-2      TTR_0.23-4         ggpubr_0.2         sf_0.7-3           shiny_1.2.0        lubridate_1.7.4   

from modeldown.

pbiecek avatar pbiecek commented on June 25, 2024

Thank you for this detailed investigation. We will handle character variables somehow ;-)

from modeldown.

