drsimonj / pipelearner Goto Github PK
View Code? Open in Web Editor NEWTidy machine learning pipelines
Tidy machine learning pipelines
I have another issue. This looks to be triggered by the contains("rsquare"))
part.
results %>%
add_rsquare() %>%
select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>%
mutate(source = gsub("rsquare_", "", source)) %>%
ggplot(aes(cv_pairs.id, rsquare, color = source)) +
geom_point() +
labs(x = "Fold", y = "R Squared")
Error in .p(.x[[i]], ...) : argument ".y" is missing, with no default
traceback()
20: .p(.x[[i]], ...)
19: isTRUE(.p(.x[[i]], ...))
18: some(.x, identical, .y)
17: contains("rsquare")
16: eval(expr, envir, enclos)
15: eval(x$expr, data, x$env)
14: FUN(X[[i]], ...)
13: lapply(x, lazy_eval, data = data)
12: lazyeval::lazy_eval(args, names_list)
11: select_vars_(names(.data), dots)
10: select_.data.frame(.data, .dots = lazyeval::lazy_dots(...))
9: select_(.data, .dots = lazyeval::lazy_dots(...))
8: select(., cv_pairs.id, contains("rsquare"))
7: function_list[i]
6: freduce(value,_function_list
)
5:_fseq
(_lhs
)
4: eval(expr, envir, enclos)
3: eval(quote(_fseq
(_lhs
)), env, env)
2: withVisible(eval(quote(_fseq
(_lhs
)), env, env))
1: results %>% add_rsquare() %>% select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>% mutate(source = gsub("rsquare_",
"", source)) %>% ggplot(aes(cv_pairs.id, rsquare, color = source))
What do you think is causing this?`
Currently, predict.pipelearner
assumes that the output of predict
on each fit will return a single vector of values. However, this isn't always the case. For example, default settings on predict.rpart
when a classification tree is run will return a data frame of predicted probabilities.
pl <- d %>% pipelearner(rpart, am ~ .,
minsplit = c(2, 20),
maxdepth = c(2, 5),
xval = c(5, 10))
pl %>%
learn() %>%
mutate(
minsplit = map_dbl(params, "minsplit"),
maxdepth = map_dbl(params, "maxdepth"),
xval = map_dbl(params, "xval"),
accuracy_train = pmap_dbl(list(fit, train, target), accuracy),
accuracy_test = pmap_dbl(list(fit, test, target), accuracy)
) %>% select(minsplit, maxdepth, xval, contains("accuracy"))`
Error in select(., minsplit, maxdepth, xval, contains("accuracy")) :
unused arguments (minsplit, maxdepth, xval, contains("accuracy"))
R3.3.2/Win7
Current version of purrr (0.2.3) results in following error being thrown:
`cross_d()` is deprecated; please use `cross_df()` instead.
As I was going through your example I ran into this issue?
'> results <- d %>%
pipelearner(lm, visib ~ .) %>%
learn_cvpairs(k = 10) %>%
learn()
Error: 'as_tibble' is not an exported object from 'namespace:tibble'
traceback()
12: stop(gettextf("'%s' is not an exported object from 'namespace:%s'",
name, getNamespaceName(ns)), call. = FALSE, domain = NA)
11: getExportedValue(pkg, name)
10: tibble::as_tibble
9: pipelearner.data.frame(., lm, visib ~ .)
8: pipelearner(., lm, visib ~ .)
7: function_list[i]
6: freduce(value,_function_list
)
5:_fseq
(_lhs
)
4: eval(expr, envir, enclos)
3: eval(quote(_fseq
(_lhs
)), env, env)
2: withVisible(eval(quote(_fseq
(_lhs
)), env, env))
1: d %>% pipelearner(lm, visib ~ .) %>% learn_cvpairs(k = 10) %>%
learn()`
It turned out that I needed a new version of tibble. I thought I got that when I installed tidyverse but I guess I was incorrect.
Currently, predict.pipelearner()
applies default predict()
to each fit. However, sometimes this needs to be adjusted. For example, changing predict
to produce probabilities or classes.
In general, a user will want to predict values and score/evaluated their fit after learning all models via learn()
. The exact functions to do this are many. However, pipeable functions could be written that takes the tibble coming from learn()
as well as a function that will take the relevant columns (e.g., test, target, and fit), and output the predicted values. It will then be the responsibility of the user to create a function that accepts these arguments.
e.g.,...
pl %>% learn() %>%
pl_predict("test_hat", FUN = function(test, target, fit) {
# etc... to produce vectors of fitted values
}) %>%
pl_score("test_rsqr", FUN = function(test, target, test_hat) {
# etc...
})
The functions in resamplr also use modelr::resample objects and includes all cross validation methods from scikit-learn.
It looks like learn_cvpairs can be written the same way as learn_models with arbitrary cross validation functions as long as the cross validation function returns a df with train, test, and .id columns (which is the case with resamplr).
I ran into an issue using gam
from the mcgv
package with pipelearner. To illustrate, lm
can be used like this:
iris %>%
lm(Sepal.Length ~ Sepal.Width, .)
and so can be piped into pipelearner:
iris %>%
pipelearner(lm, Sepal.Length ~ Sepal.Width) %>%
learn()
However, gam
requires "data = " explicitly. The analogous expression to lm
fails:
iris %>%
gam(Sepal.Length ~ s(Sepal.Width), .)
Error in eval(expr, envir, enclos) : object 'Sepal.Length' not found
iris %>%
pipelearner(gam, Sepal.Length ~ s(Sepal.Width)) %>%
learn()
Error in eval(expr, envir, enclos) : object 'Sepal.Length' not found
This works:
iris %>%
gam(Sepal.Length ~ s(Sepal.Width), data = .)
Is there a syntax which will allow pipelearner to run gam
, or does pipelearner require modification?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.