Giter VIP home page Giter VIP logo

tinyautoml's Introduction

TinyAutoML Logo

TinyAutoML is a Machine Learning Python3.9 library thought as an extension of Scikit-Learn.
It builds an adaptable and auto-tuned pipeline to handle binary classification tasks.

Tests Licence MIT Pypi Size Commits Python Version


In a few words, your data goes through 2 main preprocessing steps.
The first one is scaling and NonStationnarity correction, which is followed by Lasso Feature selection.
Finally, one of the three MetaModels is fitted on the transformed data.


Latest News ! :

  • Logging format changed from default to [TinyAutoML]
  • Added Github Actions Workflow for CI, for updating the README.md !
  • Added parallel computation of LassoFeatureSelector -> LassoFeatureSelectionParallel
  • New example notebook based on VIX index directionnal forecasting

⚡️ Quick start

First, let's install and import the library !

  • Install the last release using pip
%pip install TinyAutoML
import os
os.chdir('..') #For Github CI, you don't have to run that
from TinyAutoML.Models import *
from TinyAutoML import MetaPipeline

MetaModels

MetaModels inherit from the MetaModel Abstract Class. They all implement ensemble methods and therefore are based on EstimatorPools.

When training EstimatorPools, you are faced with a choice : doing parameterTuning on entire pipelines with the estimators on the top or training the estimators using the same pipeline and only training the top. The first case refers to what we will be calling comprehensiveSearch.

Moreover, as we will see in details later, those EstimatorPools can be shared across MetaModels.

They are all initialised with those minimum arguments :

MetaModel(comprehensiveSearch: bool = True, parameterTuning: bool = True, metrics: str = 'accuracy', nSplits: int=10)
  • nSplits corresponds to the number of split of the cross validation
  • The other parameters are equivoque

They need to be put in the MetaPipeline wrapper to work

There are 3 MetaModels

1- BestModel : selects the best performing model of the pool

best_model = MetaPipeline(BestModel(comprehensiveSearch = False, parameterTuning = False))

2- OneRulerForAll : implements Stacking using a RandomForestClassifier by default. The user is free to use another classifier using the ruler arguments

orfa_model = MetaPipeline(OneRulerForAll(comprehensiveSearch=False, parameterTuning=False))

3- DemocraticModel : implements Soft and Hard voting models through the voting argument

democratic_model = MetaPipeline(DemocraticModel(comprehensiveSearch=False, parameterTuning=False, voting='soft'))

As of release v0.2.3.2 (13/04/2022) there are 5 models on which these MetaModels rely in the EstimatorPool:

  • Random Forest Classifier
  • Logistic Regression
  • Gaussian Naive Bayes
  • Linear Discriminant Analysis
  • XGBoost

We'll use the breast_cancer dataset from sklearn as an example:

import pandas as pd
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
 
X = pd.DataFrame(data=cancer.data, columns=cancer.feature_names)
y = cancer.target

cut = int(len(y) * 0.8)

X_train, X_test = X[:cut], X[cut:]
y_train, y_test = y[:cut], y[cut:]

Let's train a BestModel first and reuse its Pool for the other MetaModels

best_model.fit(X_train,y_train)
[TinyAutoML] Training models...
[TinyAutoML] The best estimator is random forest classifier with a cross-validation accuracy (in Sample) of 1.0





MetaPipeline(model=BestModel(comprehensiveSearch=False, parameterTuning=False))

We can now extract the pool

pool = best_model.get_pool()

And use it when fitting the other MetaModels to skip the fitting of the underlying models:

orfa_model.fit(X_train,y_train,pool=pool)
democratic_model.fit(X_train,y_train,pool=pool)
[TinyAutoML] Training models...
[TinyAutoML] Training models...





MetaPipeline(('model', Democratic Model))

Great ! Let's look at the results with the sk_learn classification_report :

orfa_model.classification_report(X_test,y_test)
              precision    recall  f1-score   support

           0       0.89      0.92      0.91        26
           1       0.98      0.97      0.97        88

    accuracy                           0.96       114
   macro avg       0.93      0.94      0.94       114
weighted avg       0.96      0.96      0.96       114

Looking good! What about the roc_curve ?

democratic_model.roc_curve(X_test,y_test)

png

Let's see how the estimators of the pool are doing individually:

best_model.get_scores(X_test,y_test)
[('random forest classifier', 1.0),
 ('Logistic Regression', 0.9473684210526315),
 ('Gaussian Naive Bayes', 0.956140350877193),
 ('LDA', 0.9473684210526315),
 ('xgb', 0.956140350877193)]

What's next ?

You can do the same steps with comprehensiveSearch set to True if you have the time and if you want to improve your results. You can also try new rulers and so on.

tinyautoml's People

Contributors

g0bel1n avatar thomktz avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

tinyautoml's Issues

High level API

MetaPipeline(BestModel(comprehensiveSearch = False, parameterTuning = False)) is too verbose

Would be nice to have a high level API that could do either

best_model(comprehensiveSearch = False, parameterTuning = False)

or

metapipeline(kind="best_model", ...)"

function & parameter names open to debate

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.