Giter VIP home page Giter VIP logo

xspliner's People

Contributors

krystian8207 avatar pbiecek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xspliner's Issues

What graphics should be available for the solution?

The plots should be plot s3 methods.

Ideas based on case:

  • factorMerger graphics (when used on variable)

For quantitative:

  • data points
  • pdp (ale)
  • pdp (ale) approximation (when used on variable)
  • pdp (ale) derivative on separate axis

For qualitative

  • Factor Merger

For xspliner
Comparison on

  • probs, responces (heatmaps?)

Extract useful functions from previous code version.

Temporarily move them into deprecated.R and do not export. Finally customized with new library version it will be moved to existing actual scripts.

  1. Approx as monotonic spline.
  2. Approx with partially constant function.
  3. factorMerger approx

Specify which variables should not be transformed.

Some variables, mainly integer ones has few unique values, so that GAM cannot approximate them with splines. In this case we get error and algorith stops. it would be great to specify which variables should bo not transformed.

Basic fixes

  • add authors (Przemysław Biecek)
  • lower package version
  • rename functions
  • complete NEWS file

Include estimated coefficients in plot

Actually only bare transformation is plotted but this makes interpretation inaccurate on scale level.
It would be great to add such flag for plot method.

Automatic decision if xs or xf should be used or raw variable

It was implemented in the previous version.
The idea is to compare the performance of:
lm(y ~ pred_var) and lm(y ~ approx(pred_var)) and choose better option.

Important:
Make decision rule general (passed as parameter?).

NOTE:
How to do it without recalculating response approximation?

Idea:
It can be parameter passed for xs nd xf, for example choose = "automatic" that parforms decision in backend.

Add variables extracting from blackbox

When we use formula y ~ . in xspliner, the full formula is built automatically on all variables from data. But bb could be built on smaller variables set.

Define S3 summary method

Existing one is based on mgcv::gam. Should it be extended?
What can be added?

Ideas:

  • performance comparison with bare gam and bare blackbox?

Include only formula variables in its Environment

Description: While formula preprocessing parent.frame the environment is used, so it can become huge (huge global env).
Let's use just variables used as parameters to raw formula.

Idea how to solve: use all.vars function to get the names.

Package crashes on duplicated variables

See:

formula <- log(y) ~
    xs(x, method_opts = list(type = type)) * z + xf(t) + w ^ 2 + I(z ^ 2)
extract_formula_var_names(formula_4, data) # gives c("y", "x", "z", "t", "w")
get_formula_raw_components(formula_terms) # gives
    c("xs(x, method_opts = list(type = type))", "z", "xf(t)", "w", "I(z^2)")

Find out what logic should be here implemented do match z variable.

Make functionality work without passing data (raw variables from env)

Functionality bases on iteration across all variables that possibly should be found in data. If you pass no data it is not determined which variables are "data sourced" and not parameters. Find the way how to distinguish data from parameters.

See:

x <- rnorm(10)
y <- rnorm(10)
oko <- 10
get_formula_details(y ~ xs(x, spline_opts = list(k = oko)))

Idea? Assumption that data variables should be vectors with the same length. It doesn't cover all cases but huge part of them.

Add link function parameter.

It's actually quite simple with using pdp package.
When link is passed for continuous response variable right in the formula, then we just use it (transformed with link) in pdp, and for spline estimation.

When we use family parameter (like in glm) pdp returns output transformed with link. After that we use the variable to esitmate spline.

After all we pass raw formula link or family in final model (probably it shouldn't be mgcv::gam, glm is enough).

Add option for automatic monotonicity

It can be considered which approximation is better.
See: approx_with_monotonic_spline
Maybe parameter: monotonic = c("increase", "decrese", "automatic", "none")

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.