drsimonj / pipelearner Goto Github PK

View Code? Open in Web Editor NEW

134.0 134.0 26.0 247 KB

Tidy machine learning pipelines

R 100.00%

pipelearner's People

Contributors

Stargazers

Watchers

pipelearner's Issues

Issue running example code

I have another issue. This looks to be triggered by the contains("rsquare")) part.

results %>%

```
add_rsquare() %>% 
```

select(cv_pairs.id, contains("rsquare")) %>%

gather(source, rsquare, contains("rsquare")) %>%

mutate(source = gsub("rsquare_", "", source)) %>%

ggplot(aes(cv_pairs.id, rsquare, color = source)) +

```
geom_point() +
```
```
labs(x = "Fold", y = "R Squared")
```

Error in .p(.x[[i]], ...) : argument ".y" is missing, with no default

traceback()
20: .p(.x[[i]], ...)
19: isTRUE(.p(.x[[i]], ...))
18: some(.x, identical, .y)
17: contains("rsquare")
16: eval(expr, envir, enclos)
15: eval(x$expr, data, x$env)
14: FUN(X[[i]], ...)
13: lapply(x, lazy_eval, data = data)
12: lazyeval::lazy_eval(args, names_list)
11: select_vars_(names(.data), dots)
10: select_.data.frame(.data, .dots = lazyeval::lazy_dots(...))
9: select_(.data, .dots = lazyeval::lazy_dots(...))
8: select(., cv_pairs.id, contains("rsquare"))
7: function_list[i]
6: freduce(value, _function_list)
5: _fseq(_lhs)
4: eval(expr, envir, enclos)
3: eval(quote(_fseq(_lhs)), env, env)
2: withVisible(eval(quote(_fseq (_lhs)), env, env))
1: results %>% add_rsquare() %>% select(cv_pairs.id, contains("rsquare")) %>%
gather(source, rsquare, contains("rsquare")) %>% mutate(source = gsub("rsquare_",
"", source)) %>% ggplot(aes(cv_pairs.id, rsquare, color = source))

What do you think is causing this?`

predict shouldn't expand results

Currently, predict.pipelearner assumes that the output of predict on each fit will return a single vector of values. However, this isn't always the case. For example, default settings on predict.rpart when a classification tree is run will return a data frame of predicted probabilities.

Error

pl <- d %>% pipelearner(rpart, am ~ .,
                        minsplit = c(2, 20),
                        maxdepth = c(2, 5),
                        xval     = c(5, 10))
pl %>%
  learn() %>% 
  mutate(
    minsplit = map_dbl(params, "minsplit"),
    maxdepth = map_dbl(params, "maxdepth"),
    xval     = map_dbl(params, "xval"),
    accuracy_train = pmap_dbl(list(fit, train, target), accuracy),
    accuracy_test  = pmap_dbl(list(fit, test,  target), accuracy)
  ) %>% select(minsplit, maxdepth, xval, contains("accuracy"))`
Error in select(., minsplit, maxdepth, xval, contains("accuracy")) : 
  unused arguments (minsplit, maxdepth, xval, contains("accuracy"))

R3.3.2/Win7

Fix deprecated purrr function

Current version of purrr (0.2.3) results in following error being thrown:

`cross_d()` is deprecated; please use `cross_df()` instead.

Version limits required for dependencies - tibble

As I was going through your example I ran into this issue?
'> results <- d %>%

```
pipelearner(lm, visib ~ .) %>% 
```
```
learn_cvpairs(k = 10) %>% 
```
```
learn()
```

Error: 'as_tibble' is not an exported object from 'namespace:tibble'

traceback()
12: stop(gettextf("'%s' is not an exported object from 'namespace:%s'",
name, getNamespaceName(ns)), call. = FALSE, domain = NA)
11: getExportedValue(pkg, name)
10: tibble::as_tibble
9: pipelearner.data.frame(., lm, visib ~ .)
8: pipelearner(., lm, visib ~ .)
7: function_list[i]
6: freduce(value, _function_list)
5: _fseq(_lhs)
4: eval(expr, envir, enclos)
3: eval(quote(_fseq(_lhs)), env, env)
2: withVisible(eval(quote(_fseq (_lhs)), env, env))
1: d %>% pipelearner(lm, visib ~ .) %>% learn_cvpairs(k = 10) %>%
learn()`

It turned out that I needed a new version of tibble. I thought I got that when I installed tidyverse but I guess I was incorrect.

predict needs to be adjustable

Currently, predict.pipelearner() applies default predict() to each fit. However, sometimes this needs to be adjusted. For example, changing predict to produce probabilities or classes.

Feature: wrapper functions to "predict" and "score"

In general, a user will want to predict values and score/evaluated their fit after learning all models via learn(). The exact functions to do this are many. However, pipeable functions could be written that takes the tibble coming from learn() as well as a function that will take the relevant columns (e.g., test, target, and fit), and output the predicted values. It will then be the responsibility of the user to create a function that accepts these arguments.

e.g.,...

pl %>% learn() %>%
pl_predict("test_hat", FUN = function(test, target, fit) {
   # etc... to produce vectors of fitted values
}) %>%
pl_score("test_rsqr", FUN = function(test, target, test_hat) {
   # etc...
})

Add additional cross validation methods

The functions in resamplr also use modelr::resample objects and includes all cross validation methods from scikit-learn.

It looks like learn_cvpairs can be written the same way as learn_models with arbitrary cross validation functions as long as the cross validation function returns a df with train, test, and .id columns (which is the case with resamplr).

Using pipelearner in functions where "data =" is required

I ran into an issue using gam from the mcgv package with pipelearner. To illustrate, lm can be used like this:

iris %>%
  lm(Sepal.Length ~ Sepal.Width, .)

and so can be piped into pipelearner:

iris %>%
  pipelearner(lm, Sepal.Length ~ Sepal.Width) %>%
  learn()

However, gam requires "data = " explicitly. The analogous expression to lm fails:

iris %>%
  gam(Sepal.Length ~ s(Sepal.Width), .)
Error in eval(expr, envir, enclos) : object 'Sepal.Length' not found

iris %>%
  pipelearner(gam, Sepal.Length ~ s(Sepal.Width)) %>%
  learn()
Error in eval(expr, envir, enclos) : object 'Sepal.Length' not found

This works:

iris %>%
  gam(Sepal.Length ~ s(Sepal.Width), data = .)

Is there a syntax which will allow pipelearner to run gam, or does pipelearner require modification?

drsimonj / pipelearner Goto Github PK

pipelearner's People

Contributors

Stargazers

Watchers

Forkers

pipelearner's Issues

Issue running example code

predict shouldn't expand results

Error

Fix deprecated purrr function

Version limits required for dependencies - tibble

predict needs to be adjustable

Feature: wrapper functions to "predict" and "score"

Add additional cross validation methods

Using pipelearner in functions where "data =" is required

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent