modeloriented / xspliner Goto Github PK

formula <- log(y) ~
    xs(x, method_opts = list(type = type)) * z + xf(t) + w ^ 2 + I(z ^ 2)
extract_formula_var_names(formula_4, data) # gives c("y", "x", "z", "t", "w")
get_formula_raw_components(formula_terms) # gives
    c("xs(x, method_opts = list(type = type))", "z", "xf(t)", "w", "I(z^2)")

Find out what logic should be here implemented do match z variable.

Basic fixes

add authors (Przemysław Biecek)
lower package version
rename functions
complete NEWS file

Which parameters should be available from pdp::partial function?

Add vignettes or Rmd instructions

mgcv::gam uses global environment for model object

As a result, when using prediction there is used global environment instead of formula environment

Shouldn't I use glm instead of gam in final model build?

I think I should. There is no good reason to use mgcv::gam in final model when I don't use splines there.
Actually one con: it's easily to compare two gam models.

Maybe i should add parameter fr choosing one?

Add variables extracting from blackbox

When we use formula y ~ . in xspliner, the full formula is built automatically on all variables from data. But bb could be built on smaller variables set.

DALEX integrations

Integrate code with DALEX (result of explain function)

Within approx_with_splines data needs to be renamed in order to recalc gam. Allow to keep original predictor names.

Extract useful functions from previous code version.

Temporarily move them into deprecated.R and do not export. Finally customized with new library version it will be moved to existing actual scripts.

Approx as monotonic spline.
Approx with partially constant function.
factorMerger approx

Add option for automatic monotonicity

It can be considered which approximation is better.
See: approx_with_monotonic_spline
Maybe parameter: monotonic = c("increase", "decrese", "automatic", "none")

Add factorMerger for xf options

Based on #6

Add plotting transition comparing for many models

If I models to compare_with add option to plot pdp-s also for them on one plot.

Add general way for passing xs and xf approximations

Currently it is not general. Only pdp can be used for type = "pdp".
How to extend this?

Move each testing code into examples

There is a lot of code that is used just for testing.
Moving them regularly can make good stuff for usage examples.

Include only formula variables in its Environment

Description: While formula preprocessing parent.frame the environment is used, so it can become huge (huge global env).
Let's use just variables used as parameters to raw formula.

Idea how to solve: use all.vars function to get the names.

Should xs and xf be one function?

It could be based on variable type.

Make functionality work without passing data (raw variables from env)

Functionality bases on iteration across all variables that possibly should be found in data. If you pass no data it is not determined which variables are "data sourced" and not parameters. Find the way how to distinguish data from parameters.

See:

x <- rnorm(10)
y <- rnorm(10)
oko <- 10
get_formula_details(y ~ xs(x, spline_opts = list(k = oko)))

Idea? Assumption that data variables should be vectors with the same length. It doesn't cover all cases but huge part of them.

When calling xp_gam xs and xf functions are passed for global environment

Add stats for models comparison

Consider "pseudo r-squared" https://christophm.github.io/interpretable-ml-book/global.html#theory-4

Include estimated coefficients in plot

Actually only bare transformation is plotted but this makes interpretation inaccurate on scale level.
It would be great to add such flag for plot method.

Create pred function for Ale method to return 'link' everytime

Add type = c("classification", "regression") parameter, also quantitatives which specifies which variable should be used with xf.

Automatic decision if xs or xf should be used or raw variable

It was implemented in the previous version.
The idea is to compare the performance of:
lm(y ~ pred_var) and lm(y ~ approx(pred_var)) and choose better option.

Important:
Make decision rule general (passed as parameter?).

NOTE:
How to do it without recalculating response approximation?

Idea:
It can be parameter passed for xs nd xf, for example choose = "automatic" that parforms decision in backend.

Use link function only when passed

By default should be NULL and link shoul be extracted from family parameter.

What graphics should be available for the solution?

The plots should be plot s3 methods.

Ideas based on case:

factorMerger graphics (when used on variable)

For quantitative:

data points
pdp (ale)
pdp (ale) approximation (when used on variable)
pdp (ale) derivative on separate axis

For qualitative

Factor Merger

For xspliner
Comparison on

probs, responces (heatmaps?)

Specify which variables should not be transformed.

Some variables, mainly integer ones has few unique values, so that GAM cannot approximate them with splines. In this case we get error and algorith stops. it would be great to specify which variables should bo not transformed.

Add link function parameter.

It's actually quite simple with using pdp package.
When link is passed for continuous response variable right in the formula, then we just use it (transformed with link) in pdp, and for spline estimation.

When we use family parameter (like in glm) pdp returns output transformed with link. After that we use the variable to esitmate spline.

After all we pass raw formula link or family in final model (probably it shouldn't be mgcv::gam, glm is enough).

usage of xs on factor

modeloriented / xspliner Goto Github PK

xspliner's Issues

Recommend Projects

Recommend Topics

Recommend Org