Comments (3)
I believe it is a matter of how DALEX treats the datasets in the explainer, could you, please prepare a reproducible example and share session info?
from rsafe.
I attached a rendered html and rmd file with my analysis and session info at the bottom.
timeseries_modelling_xgboost_short _2922_06_23a.zip
Is it ok just to ignore from the output the variables that did not take part in modelling? And do the data transformation with the existing variables as they are?
Or these excluded variables have impact on all the break points in the variables?
My session info:
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250
[3] LC_MONETARY=Polish_Poland.1250 LC_NUMERIC=C
[5] LC_TIME=C
system code page: 65001
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] shiny_1.7.1
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 ellipsis_0.3.2 class_7.3-20 timetk_2.8.0
[5] base64enc_0.1-3 fs_1.5.2 rstudioapi_0.13 listenv_0.8.0
[9] furrr_0.3.0 farver_2.1.0 dials_0.1.1 DT_0.23
[13] prodlim_2019.11.13 fansi_1.0.3 lubridate_1.8.0 codetools_0.2-18
[17] splines_4.1.3 R.methodsS3_1.8.1 doParallel_1.0.17 cachem_1.0.6
[21] knitr_1.39 polyclip_1.10-0 jsonlite_1.8.0 workflows_0.2.6
[25] pROC_1.18.0 R.oo_1.24.0 yardstick_0.0.9 ggforce_0.3.3
[29] tune_0.2.0 clipr_0.8.0 compiler_4.1.3 assertthat_0.2.1
[33] Matrix_1.4-1 fastmap_1.1.0 cli_3.3.0 later_1.3.0
[37] tweenr_1.0.2 htmltools_0.5.2 tools_4.1.3 gtable_0.3.0
[41] glue_1.6.2 dplyr_1.0.9 Rcpp_1.0.8.3 jquerylib_0.1.4
[45] styler_1.7.0 DiceDesign_1.9 vctrs_0.4.1 iterators_1.0.14
[49] parsnip_0.2.1 timeDate_3043.102 gower_1.0.0 xfun_0.31
[53] globals_0.15.0 mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.1
[57] pacman_0.5.1 future_1.26.1 MASS_7.3-57 zoo_1.8-10
[61] scales_1.2.0 ipred_0.9-12 promises_1.2.0.1 parallel_4.1.3
[65] yaml_2.3.5 ggplot2_3.3.6 sass_0.4.1 rpart_4.1.16
[69] corrplot_0.92 foreach_1.5.2 lhs_1.1.5 hardhat_0.2.0
[73] lava_1.6.10 repr_1.1.4 rlang_1.0.2 pkgconfig_2.0.3
[77] rsample_0.1.1 evaluate_0.15 lattice_0.20-45 purrr_0.3.4
[81] recipes_0.2.0 htmlwidgets_1.5.4 tidyselect_1.1.2 parallelly_1.31.1
[85] plyr_1.8.7 magrittr_2.0.3 R6_2.5.1 generics_0.1.2
[89] DBI_1.1.2 pillar_1.7.0 withr_2.5.0 xts_0.12.1
[93] survival_3.3-1 DALEX_2.4.2 nnet_7.3-17 tibble_3.1.7
[97] future.apply_1.9.0 crayon_1.5.1 xgboost_1.6.0.1 utf8_1.2.2
[101] rmarkdown_2.14 grid_4.1.3 data.table_1.14.2 reprex_2.0.1
[105] digest_0.6.29 xtable_1.8-4 R.cache_0.15.0 tidyr_1.2.0
[109] httpuv_1.6.5 R.utils_2.11.0 GPfit_1.0-8 munsell_0.5.0
[113] finetune_0.2.0 skimr_2.1.4 bslib_0.3.1
from rsafe.
Thank you, by reproducible example, I meant some toy example that is simple and fast to run, this .Rmd is taking a lot of time to compute and when I decreased the number of trees in xgboost to speed the script up I got an error:
> bike_rf_rs <-
+ bike_rf_wkfl %>%
+ finetune::tune_sim_anneal(
+ resamples = bike_folds,
+ param_info = xgboost_set,
+ metrics = bike_metrics,
+ iter = 30,
+ initial = 10)
> Generating a set of 10 initial parameter results
<U+221A> Initialization complete
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "NULL"
Anyway, if you pass the data frame with all columns (bike_all
) to the DALEX::explainer, SAFE will compute transformations for all of them.
However, as long as you don't use interactions in SAFE (I saw in the script that you don't), then you can ignore the transformations for columns not used by the model. They are calculated for each variable independently.
Variable filtering perhaps should be a feature in a future version of SAFE. At this point, I would suggest filtering out variables before feeding data into the explainer.
from rsafe.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rsafe.