Giter VIP home page Giter VIP logo

mlr3pipelines's Introduction

mlr3pipelines

Package website: release | dev

Dataflow Programming for Machine Learning in R.

tic CRAN StackOverflow Mattermost

What is mlr3pipelines?

Watch our “WhyR 2020” Webinar Presentation on Youtube for an introduction! Find the slides here.

WhyR 2020 mlr3pipelines

mlr3pipelines is a dataflow programming toolkit for machine learning in R utilising the mlr3 package. Machine learning workflows can be written as directed “Graphs” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the mlr3tuning package, it is even possible to simultaneously optimize parameters of multiple processing units.

In principle, mlr3pipelines is about defining singular data and model manipulation steps as “PipeOps”:

pca        = po("pca")
filter     = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))

These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a GraphLearner that behave like any other Learner in mlr3.

graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)

This learner can be used for resampling, benchmarking, and even tuning.

resample(tsk("iris"), glrn, rsmp("cv"))
#> <ResampleResult> of 10 iterations
#> * Task: iris
#> * Learner: pca.variance.classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations

Feature Overview

Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing; currently supported features are:

  • Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
  • Task subsampling for speed and outcome class imbalance handling
  • mlr3 Learner operations for prediction and stacking
  • Simultaneous path branching (data going both ways)
  • Alternative path branching (data going one specific way, controlled by hyperparameters)
  • Ensemble methods and aggregation of predictions

Documentation

The easiest way to get started is reading some of the vignettes that are shipped with the package, which can also be viewed online:

Bugs, Questions, Feedback

mlr3pipelines is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.

Citing mlr3pipelines

If you use mlr3pipelines, please cite our JMLR article:

@Article{mlr3pipelines,
  title = {{mlr3pipelines} - Flexible Machine Learning Pipelines in R},
  author = {Martin Binder and Florian Pfisterer and Michel Lang and Lennart Schneider and Lars Kotthoff and Bernd Bischl},
  journal = {Journal of Machine Learning Research},
  year = {2021},
  volume = {22},
  number = {184},
  pages = {1-7},
  url = {https://jmlr.org/papers/v22/21-0281.html},
}

Similar Projects

A predecessor to this package is the mlrCPO-package, which works with mlr 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the caret package and the related recipes project, and the dplyr package.

mlr3pipelines's People

Contributors

mb706 avatar pfistfl avatar sumny avatar mllg avatar pat-s avatar berndbischl avatar zzawadz avatar dandls avatar coorsaa avatar be-marc avatar github-actions[bot] avatar ja-thomas avatar prockenschaub avatar vpolisky avatar web-flow avatar jakob-r avatar zackbarry avatar kant avatar rustylongbow avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.