Giter VIP home page Giter VIP logo

gestalt's Introduction

Gestalt

A helper library for data science pipelines

"Something that is made of many parts and yet is somehow more than or different from the combination of its parts"

Machine learning (ML) has achieved considerable success in multiple fields, much of the pipeline surrounding machine learning can be generalised to a point that removes the 'engineering' of the data flows and pipelines leaving the human expert to work on the most difficult aspects:

  1. Pre-processing the data
  2. Feature Engineering
  3. Select appropriate features

The goals of Gestalt

The goal of Gestalt is to remove the cumbersome parts of building a meta-learner data science pipeline (i.e the process of building a generalised stacker across folds).

The current roadmap for this module is as follows:

  1. Build P.O.C for GeneralisedStacking class using pandas - Done
  2. Build Stackers for, Regression and Classification problems (binary and multiclass) - Done
  3. Create plugin wrapper to expose other algorithms to the stacking process (say R libraries) - Done
  4. Support scipy.sparse data to allow both dense and sparse models to be run on the same folds -Done
  5. Create an example of bayesian encoding as a transformer that runs across folds in line with the stacker.
  6. Create Hyper-parameter autotuning class for the set of base models to be used in the metaleaner
  7. Add loads of tests and documentation.

As with all alpha OSS projects things are under constant development and already one can see places where refactoring some of the core code would make sense (i.e a Class to remove the cumbersome _predict_# and _fit_# methods)

Install

pip install -U git+https://github.com/mpearmain/gestalt

Example

In the examples dir there is a set of examples showing various different use cases. To run the r wrapper for ranger you need to have a copy of rpy2 and a copy of the ranger R library locally. NOTE: You can even runu Vowpal Wabbit inside the stacker with the VWClasssifer and VWRegression sklearn wrappers from pip install vowwpalwabbit

gestalt's People

Contributors

mpearmain avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.