modeloriented / xspliner Goto Github PK
View Code? Open in Web Editor NEWExplain black box with GLM
Home Page: https://ModelOriented.github.io/xspliner/
Explain black box with GLM
Home Page: https://ModelOriented.github.io/xspliner/
Existing one is based on mgcv::gam. Should it be extended?
What can be added?
Ideas:
In some cases the result is not monotonic (it may depend on grid.resolution). Check boston data nad rm variable.
When passing y ~ x + z + t
formula, it could be useful to automatically consider spline transformation for each of them.
For this case common method and spline parameters needs to be defined.
See:
formula <- log(y) ~
xs(x, method_opts = list(type = type)) * z + xf(t) + w ^ 2 + I(z ^ 2)
extract_formula_var_names(formula_4, data) # gives c("y", "x", "z", "t", "w")
get_formula_raw_components(formula_terms) # gives
c("xs(x, method_opts = list(type = type))", "z", "xf(t)", "w", "I(z^2)")
Find out what logic should be here implemented do match z
variable.
As a result, when using prediction there is used global environment instead of formula environment
I think I should. There is no good reason to use mgcv::gam in final model when I don't use splines there.
Actually one con: it's easily to compare two gam models.
Maybe i should add parameter fr choosing one?
When we use formula y ~ .
in xspliner, the full formula is built automatically on all variables from data. But bb could be built on smaller variables set.
Integrate code with DALEX (result of explain
function)
Check ALEPlot article
Temporarily move them into deprecated.R and do not export. Finally customized with new library version it will be moved to existing actual scripts.
It can be considered which approximation is better.
See: approx_with_monotonic_spline
Maybe parameter: monotonic = c("increase", "decrese", "automatic", "none")
Based on #6
Based on #6
If I models to compare_with
add option to plot pdp-s also for them on one plot.
Currently it is not general. Only pdp can be used for type = "pdp".
How to extend this?
There is a lot of code that is used just for testing.
Moving them regularly can make good stuff for usage examples.
Description: While formula preprocessing parent.frame
the environment is used, so it can become huge (huge global env).
Let's use just variables used as parameters to raw formula.
Idea how to solve: use all.vars function to get the names.
It could be based on variable type.
Functionality bases on iteration across all variables that possibly should be found in data. If you pass no data it is not determined which variables are "data sourced" and not parameters. Find the way how to distinguish data from parameters.
See:
x <- rnorm(10)
y <- rnorm(10)
oko <- 10
get_formula_details(y ~ xs(x, spline_opts = list(k = oko)))
Idea? Assumption that data variables should be vectors with the same length. It doesn't cover all cases but huge part of them.
Consider "pseudo r-squared" https://christophm.github.io/interpretable-ml-book/global.html#theory-4
Actually only bare transformation is plotted but this makes interpretation inaccurate on scale level.
It would be great to add such flag for plot method.
It was implemented in the previous version.
The idea is to compare the performance of:
lm(y ~ pred_var)
and lm(y ~ approx(pred_var))
and choose better option.
Important:
Make decision rule general (passed as parameter?).
NOTE:
How to do it without recalculating response approximation?
Idea:
It can be parameter passed for xs nd xf, for example choose = "automatic" that parforms decision in backend.
By default should be NULL and link shoul be extracted from family parameter.
The plots should be plot s3 methods.
Ideas based on case:
For quantitative:
For qualitative
For xspliner
Comparison on
Some variables, mainly integer ones has few unique values, so that GAM cannot approximate them with splines. In this case we get error and algorith stops. it would be great to specify which variables should bo not transformed.
It's actually quite simple with using pdp package.
When link is passed for continuous response variable right in the formula, then we just use it (transformed with link) in pdp, and for spline estimation.
When we use family parameter (like in glm) pdp returns output transformed with link. After that we use the variable to esitmate spline.
After all we pass raw formula link or family in final model (probably it shouldn't be mgcv::gam, glm is enough).
It may be easy after #38 is done.
Current ideas:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.