Giter VIP home page Giter VIP logo

sklearncompyre's Introduction


SklearncomPYre

Facilitating beautifully efficient comparisons of machine learning classifiers and regression models.

Created by


Build Status        codecov        GitHub forks       GitHub issues       GitHub stars       GitHub license

Summary    •    Install    •    How To Use    •    Credits    •    Related    •    License   •    Contribute


Summary

SklearncomPYre harnesses the power of scikit-learn, combining it with pandas dataframes and matplotlib plots for easy, breezy, and beautiful machine learning exploration.

Looking to do the same in R? Check out caretcompaR!

Function 1: split()

The function splits the training input samples X, and target values y (class labels in classification, real numbers in regression) into train, test and validation sets according to specified proportions.

Outputs four array like training, validation, test, and combined training and validation sets and four y arrays.

Inputs:

  • X data set, type: Array like
  • Y data set, type: Array like
  • proportion of training data , type: float
  • proportion of test data , type: float
  • proportion of validation data, type: float

Outputs:

  • X train set, type: Array like
  • y train, type: Array like
  • X validation set, type: Array like
  • y validation, type: Array like
  • X train and validation set, type: Array like
  • y train and validation, type: Array like
  • X test set, type: Array like
  • y test, type: Array like

Function 2: train_test_acc_time()

The purpose of this function is to compare different sklearn regressors or classifiers in terms of training and test accuracies, and the time it takes to fit and predict. The function inputs are dictionary of models, input train samples Xtrain(input features), input test samples Xtest, target train values ytrain and target test values ytest (continuous or categorical).

The function outputs a beautiful dataframe with training & test scores, model variance, and the time it takes to fit and predict using different models.

Inputs:

  • Dictionary of ML classifiers or regressors.
  • X train set, type: Array-like
  • Y train set, type: Array-like
  • X test set, type: Array-like
  • Y test set, type: Array-like

Outputs:

  • Dataframe with 7 columns: (1) regressor or classifier name, (2) training accuracy, (3) test accuracy, (4) model variance, (5) time it takes to fit, (6) time it takes to predict and (7) total time. The dataframe will be sorted by test score in descending order.

Function 3: comparison_viz()

The purpose of this function is to visualize the output of train_test_acc_time() for easy communication and interpretation. The user has the choice to visualize a comparison of accuracies or time. It takes in a dataframe with 7 attributes i.e. model name, training & test scores, model variance, and the time it takes to fit, predict and total time.

Outputs a beautiful matplotlib bar chart comparison of different models' training and test scores or the time it takes to fit and predict.

Inputs:

  • Dataframe with 7 columns: (1) regressor or classifier name, (2) training accuracy, (3) test accuracy, (4) model variance, (5) time it takes to fit, (6) time it takes to predict and (7) total time. Type: pandas.Dataframe
  • Choice of accuracy or time, with the default being 'accuracy' if no string is given. Type: string

Outputs:

  • Bar chart of accuracies or time comparison by models saved to root directory. Type: png

Install

Pleas use the following command to install the package. :
pip install git+https://github.com/UBC-MDS/SklearncomPYre.git

Once installed, load the package using following commands :

from SklearncomPYre.train_test_acc_time import train_test_acc_time
from SklearncomPYre.comparison_viz import comparison_viz
from SklearncomPYre.split import split

Dependencies

  • Python==3.6.8
  • matplotlib==3.0.1
  • numpy==1.15.4
  • pandas==0.20.3
  • scikit-learn==0.20.2
  • scipy==1.2.0

How To Use

Here is an example of how you can use SklearncomPYre:

# Example usage

# Import libraries
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Importing SklearncomPYre
from SklearncomPYre.train_test_acc_time import train_test_acc_time
from SklearncomPYre.comparison_viz import comparison_viz
from SklearncomPYre.split import split

# Loading the handy iris dataset
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data[:, [2, 3]]
y = iris.target

# Setting up a dictionary of classifiers to test

dictionary = {
    'knn': KNeighborsClassifier(),
    'LogRegression':LogisticRegression() ,
    'RForest': RandomForestClassifier()}

# Let's start by using the SklearncomPYre function split().

# Splitting up datasets into 40% training, 20% vaildation, and 40% tests sets.

X_train, y_train, X_val, y_val, X_train_val, y_train_val, X_test, y_test = split(X,y,0.4,0.2,0.4)

#Now, let's train some models and compare them in a pandas dataframe by using train_test_acc_time().

result = train_test_acc_time(dictionary,X_train,y_train,X_val,y_val)
result

# Next, let's take a look at some some plots with comparison_viz()

#Our plots will be saved to the working directory.

comparison_viz(result, "accuracy")
comparison_viz(result, 'time')

Credits

Related

Where does this package fit in?

This package provides functions to help make the early stages of model selection and exploration easier to cycle through and meaningfully compare.

Our idea for this package was to facilitate the comparison of machine learning classifiers and models. Our inspiration came from UBC MDS DSCI 573 lab assignments where we learned to combine python's sci-kit learn with pandas in order to produce interpretable comparisons of train and test accuracies and time efficiencies across models.

We are not currently aware of any packages that combine sci-kit learn and pandas for efficient and interpretable model-to-model comparisons. We expect that this combination is used in practice and after having used it while learning machine learning techniques during our UBC MDS coursework, we thought it would be a good combination of tools to formally package together.

We are aware of a new package, sklearn-pandas that combines sci-kit learn and pandas powers but this new package is tailored towards providing full-cycle machine learning functionality (feature selection, transformations, inputting/outputting pandas dataframes, etc.) rather than focusing facilitating model-to-model comparisons via dataframes.

License

MIT License

Contribute

Interested in contributing? See our Contributing Guidelines and Code of Conduct.

sklearncompyre's People

Contributors

birinder1469 avatar jessimk avatar talhaadnan100 avatar

Watchers

 avatar  avatar  avatar

sklearncompyre's Issues

Enhancements to train_test_acc_time.py

  • The first column is titled Classifier, since we are also doing regressors, maybe 'Models' is a better name for it
  • The sequence of columns ought to be 'Models', 'Train Accuracy', 'Test Accuracy', 'Variance', 'Fit Time (s)', 'Predict Time (s)', 'Total Time (s)'

Thanks!

Updating install info on the README

Once all our functions and tests are up and running, we need to make sure to add this to our README:

  • Your packages should now work and another user could download and install them from Github and use them in their own software development using:

  • pip install git+PACKAGE_URL.git

🐛 fixing indexing bug in comparison_viz()

    if all(isinstance(n, str) for n in [comparison.iloc[:,1][x] for x in np.arange(comparison.shape[0])]) != True:
        raise TypeError("Comparison Models column must only contain type string")

[comparison.iloc[:,1][x] I think this should be [comparison.iloc[:,0][x] so that it's looking at the first column and not the second column.

@talhaadnan100 any chance you can take a look and change if necessary?

Feedback of Milestone 1

The output of the first two functions are some type, and the unit test of them may be very similar. Porbably consider something different so you can have different test.
The first two function feels like just one line of code (call score function), maybe add some extension

Feedback of Milestone 2

  • Usage needs to indicates how to use the package and give example code. Only installation instruction are provided.

Milestone 1

Very good job guys! Explanation and documentation are very clear. Here are some feedbacks:

  • Grammar errors presents, please review the grammar or use grammar help software, like Grammarly, before you submit.

  • For the test case of comparison_viz(): You should also consider what if the input frame has the correct size but meanless words?

  • For the test case of comparison_viz(): No output test, can you figure out a way to test the output?

  • For the test case of split(): what if input X and y have different length? Additionally, please test for each output X set corresponding to the correct y set.
    Please think about the extreme case for the unit test.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.