Giter VIP home page Giter VIP logo

Comments (3)

agosiewska avatar agosiewska commented on May 27, 2024

I believe it is a matter of how DALEX treats the datasets in the explainer, could you, please prepare a reproducible example and share session info?

from rsafe.

jacekkotowski avatar jacekkotowski commented on May 27, 2024

I attached a rendered html and rmd file with my analysis and session info at the bottom.
timeseries_modelling_xgboost_short _2922_06_23a.zip

Is it ok just to ignore from the output the variables that did not take part in modelling? And do the data transformation with the existing variables as they are?
Or these excluded variables have impact on all the break points in the variables?

My session info:

R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Polish_Poland.1250  LC_CTYPE=Polish_Poland.1250   
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C                  
[5] LC_TIME=C                     
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] shiny_1.7.1

loaded via a namespace (and not attached):
  [1] colorspace_2.0-3   ellipsis_0.3.2     class_7.3-20       timetk_2.8.0      
  [5] base64enc_0.1-3    fs_1.5.2           rstudioapi_0.13    listenv_0.8.0     
  [9] furrr_0.3.0        farver_2.1.0       dials_0.1.1        DT_0.23           
 [13] prodlim_2019.11.13 fansi_1.0.3        lubridate_1.8.0    codetools_0.2-18  
 [17] splines_4.1.3      R.methodsS3_1.8.1  doParallel_1.0.17  cachem_1.0.6      
 [21] knitr_1.39         polyclip_1.10-0    jsonlite_1.8.0     workflows_0.2.6   
 [25] pROC_1.18.0        R.oo_1.24.0        yardstick_0.0.9    ggforce_0.3.3     
 [29] tune_0.2.0         clipr_0.8.0        compiler_4.1.3     assertthat_0.2.1  
 [33] Matrix_1.4-1       fastmap_1.1.0      cli_3.3.0          later_1.3.0       
 [37] tweenr_1.0.2       htmltools_0.5.2    tools_4.1.3        gtable_0.3.0      
 [41] glue_1.6.2         dplyr_1.0.9        Rcpp_1.0.8.3       jquerylib_0.1.4   
 [45] styler_1.7.0       DiceDesign_1.9     vctrs_0.4.1        iterators_1.0.14  
 [49] parsnip_0.2.1      timeDate_3043.102  gower_1.0.0        xfun_0.31         
 [53] globals_0.15.0     mime_0.12          miniUI_0.1.1.1     lifecycle_1.0.1   
 [57] pacman_0.5.1       future_1.26.1      MASS_7.3-57        zoo_1.8-10        
 [61] scales_1.2.0       ipred_0.9-12       promises_1.2.0.1   parallel_4.1.3    
 [65] yaml_2.3.5         ggplot2_3.3.6      sass_0.4.1         rpart_4.1.16      
 [69] corrplot_0.92      foreach_1.5.2      lhs_1.1.5          hardhat_0.2.0     
 [73] lava_1.6.10        repr_1.1.4         rlang_1.0.2        pkgconfig_2.0.3   
 [77] rsample_0.1.1      evaluate_0.15      lattice_0.20-45    purrr_0.3.4       
 [81] recipes_0.2.0      htmlwidgets_1.5.4  tidyselect_1.1.2   parallelly_1.31.1 
 [85] plyr_1.8.7         magrittr_2.0.3     R6_2.5.1           generics_0.1.2    
 [89] DBI_1.1.2          pillar_1.7.0       withr_2.5.0        xts_0.12.1        
 [93] survival_3.3-1     DALEX_2.4.2        nnet_7.3-17        tibble_3.1.7      
 [97] future.apply_1.9.0 crayon_1.5.1       xgboost_1.6.0.1    utf8_1.2.2        
[101] rmarkdown_2.14     grid_4.1.3         data.table_1.14.2  reprex_2.0.1      
[105] digest_0.6.29      xtable_1.8-4       R.cache_0.15.0     tidyr_1.2.0       
[109] httpuv_1.6.5       R.utils_2.11.0     GPfit_1.0-8        munsell_0.5.0     
[113] finetune_0.2.0     skimr_2.1.4        bslib_0.3.1  

from rsafe.

agosiewska avatar agosiewska commented on May 27, 2024

Thank you, by reproducible example, I meant some toy example that is simple and fast to run, this .Rmd is taking a lot of time to compute and when I decreased the number of trees in xgboost to speed the script up I got an error:

> bike_rf_rs <-
+   bike_rf_wkfl %>%
+     finetune::tune_sim_anneal(
+     resamples = bike_folds,
+    param_info = xgboost_set,
+       metrics = bike_metrics,
+          iter = 30,
+       initial = 10)

>  Generating a set of 10 initial parameter results
<U+221A> Initialization complete

Error in UseMethod("mutate") : 
  no applicable method for 'mutate' applied to an object of class "NULL"

Anyway, if you pass the data frame with all columns (bike_all) to the DALEX::explainer, SAFE will compute transformations for all of them.
However, as long as you don't use interactions in SAFE (I saw in the script that you don't), then you can ignore the transformations for columns not used by the model. They are calculated for each variable independently.

Variable filtering perhaps should be a feature in a future version of SAFE. At this point, I would suggest filtering out variables before feeding data into the explainer.

from rsafe.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.