Giter VIP home page Giter VIP logo

pycaret-demo-dphi's Introduction

drawing

An open-source, low-code machine learning library in Python

๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ PyCaret 3.4 is now available. ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰

pip install --upgrade pycaret

Docs โ€ข Tutorials โ€ข Blog โ€ข LinkedIn โ€ข YouTube โ€ข Slack

Overview
CI/CD pytest on push Documentation Status
Code !pypi !python-versions !black
Downloads Downloads Downloads Downloads
License License
Community Slack

alt text

Welcome to PyCaret

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the experiment cycle exponentially and makes you more productive.

In comparison with the other open-source machine learning libraries, PyCaret is an alternate low-code library that can be used to replace hundreds of lines of code with few lines only. This makes experiments exponentially fast and efficient. PyCaret is essentially a Python wrapper around several machine learning libraries and frameworks such as scikit-learn, XGBoost, LightGBM, CatBoost, Optuna, Hyperopt, Ray, and few more.

The design and simplicity of PyCaret are inspired by the emerging role of citizen data scientists, a term first used by Gartner. Citizen Data Scientists are power users who can perform both simple and moderately sophisticated analytical tasks that would previously have required more technical expertise. PyCaret was inspired by the caret library in R programming language.

๐Ÿš€ Installation

๐ŸŒ Option 1: Install via PyPi

PyCaret is tested and supported on 64-bit systems with:

  • Python 3.9, 3.10 and 3.11
  • Ubuntu 16.04 or later
  • Windows 7 or later

You can install PyCaret with Python's pip package manager:

# install pycaret
pip install pycaret

PyCaret's default installation will not install all the optional dependencies automatically. Depending on the use case, you may be interested in one or more extras:

# install analysis extras
pip install pycaret[analysis]

# models extras
pip install pycaret[models]

# install tuner extras
pip install pycaret[tuner]

# install mlops extras
pip install pycaret[mlops]

# install parallel extras
pip install pycaret[parallel]

# install test extras
pip install pycaret[test]

# install dev extras
pip install pycaret[dev]

##

# install multiple extras together
pip install pycaret[analysis,models]

Check out all optional dependencies. If you want to install everything including all the optional dependencies:

# install full version
pip install pycaret[full]

๐Ÿ“„ Option 2: Build from Source

Install the development version of the library directly from the source. The API may be unstable. It is not recommended for production use.

pip install git+https://github.com/pycaret/pycaret.git@master --upgrade

๐Ÿ“ฆ Option 3: Docker

Docker creates virtual environments with containers that keep a PyCaret installation separate from the rest of the system. PyCaret docker comes pre-installed with a Jupyter notebook. It can share resources with its host machine (access directories, use the GPU, connect to the Internet, etc.). The PyCaret Docker images are always tested for the latest major releases.

# default version
docker run -p 8888:8888 pycaret/slim

# full version
docker run -p 8888:8888 pycaret/full

๐Ÿƒโ€โ™‚๏ธ Quickstart

1. Functional API

# Classification Functional API Example

# loading sample dataset
from pycaret.datasets import get_data
data = get_data('juice')

# init setup
from pycaret.classification import *
s = setup(data, target = 'Purchase', session_id = 123)

# model training and selection
best = compare_models()

# evaluate trained model
evaluate_model(best)

# predict on hold-out/test set
pred_holdout = predict_model(best)

# predict on new data
new_data = data.copy().drop('Purchase', axis = 1)
predictions = predict_model(best, data = new_data)

# save model
save_model(best, 'best_pipeline')

2. OOP API

# Classification OOP API Example

# loading sample dataset
from pycaret.datasets import get_data
data = get_data('juice')

# init setup
from pycaret.classification import ClassificationExperiment
s = ClassificationExperiment()
s.setup(data, target = 'Purchase', session_id = 123)

# model training and selection
best = s.compare_models()

# evaluate trained model
s.evaluate_model(best)

# predict on hold-out/test set
pred_holdout = s.predict_model(best)

# predict on new data
new_data = data.copy().drop('Purchase', axis = 1)
predictions = s.predict_model(best, data = new_data)

# save model
s.save_model(best, 'best_pipeline')

๐Ÿ“ Modules

Classification

Functional API OOP API

Regression

Functional API OOP API

Time Series

Functional API OOP API

Clustering

Functional API OOP API

Anomaly Detection

Functional API OOP API

๐Ÿ‘ฅ Who should use PyCaret?

PyCaret is an open source library that anybody can use. In our view the ideal target audience of PyCaret is:

  • Experienced Data Scientists who want to increase productivity.
  • Citizen Data Scientists who prefer a low code machine learning solution.
  • Data Science Professionals who want to build rapid prototypes.
  • Data Science and Machine Learning students and enthusiasts.

๐ŸŽฎ Training on GPUs

To train models on the GPU, simply pass use_gpu = True in the setup function. There is no change in the use of the API; however, in some cases, additional libraries have to be installed. The following models can be trained on GPUs:

  • Extreme Gradient Boosting
  • CatBoost
  • Light Gradient Boosting Machine requires GPU installation
  • Logistic Regression, Ridge Classifier, Random Forest, K Neighbors Classifier, K Neighbors Regressor, Support Vector Machine, Linear Regression, Ridge Regression, Lasso Regression requires cuML >= 0.15

๐Ÿ–ฅ๏ธ PyCaret Intel sklearnex support

You can apply Intel optimizations for machine learning algorithms and speed up your workflow. To train models with Intel optimizations use sklearnex engine. There is no change in the use of the API, however, installation of Intel sklearnex is required:

pip install scikit-learn-intelex

๐Ÿค Contributors

๐Ÿ“ License

PyCaret is completely free and open-source and licensed under the MIT license.

โ„น๏ธ More Information

Important Links Description
โญ Tutorials Tutorials developed and maintained by core developers
๐Ÿ“‹ Example Notebooks Example notebooks created by community
๐Ÿ“™ Blog Official blog by creator of PyCaret
๐Ÿ“š Documentation API docs
๐Ÿ“บ Videos Video resources
โœˆ๏ธ Cheat sheet Community Cheat sheet
๐Ÿ“ข Discussions Community Discussion board on GitHub
๐Ÿ› ๏ธ Release Notes Release Notes

pycaret-demo-dphi's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

pycaret-demo-dphi's Issues

Saved Model Cannot Be Used for Prediction When Saved with Model_Only=True

Hi Team,
I saved a lot of models with model_only set to true. I reopened my notebook, cleansed the data and executed the "setup" method. Then loaded all the saved models and tried to run predict method only to receive the below error. I was under the impression that if I saved a model with model_only=True, I need to run the setup and the model can be used for everything without any hassle, Which doesn't seem to be the case.

ERROR


ValueError Traceback (most recent call last)
Input In [71], in <cell line: 1>()
----> 1 unseen_predictions_tuned_model = predict_model(tuned_model, data=data_unseen)
2 unseen_predictions_tuned_model.head()

File ~\AppData\Roaming\Python\Python38\site-packages\pycaret\classification.py:2126, in predict_model(estimator, data, probability_threshold, encoded_labels, raw_score, drift_report, round, verbose, drift_kwargs)
2047 def predict_model(
2048 estimator,
2049 data: Optional[pd.DataFrame] = None,
(...)
2056 drift_kwargs: Optional[dict] = None,
2057 ) -> pd.DataFrame:
2059 """
2060 This function predicts Label and Score (probability of predicted
2061 class) using a trained model. When data is None, it predicts label and
(...)
2123
2124 """
-> 2126 return pycaret.internal.tabular.predict_model(
2127 estimator=estimator,
2128 data=data,
2129 probability_threshold=probability_threshold,
2130 encoded_labels=encoded_labels,
2131 raw_score=raw_score,
2132 drift_report=drift_report,
2133 round=round,
2134 verbose=verbose,
2135 ml_usecase=MLUsecase.CLASSIFICATION,
2136 drift_kwargs=drift_kwargs,
2137 )

File ~\AppData\Roaming\Python\Python38\site-packages\pycaret\internal\tabular.py:9116, in predict_model(estimator, data, probability_threshold, encoded_labels, drift_report, raw_score, round, verbose, ml_usecase, display, drift_kwargs)
9113 probability_threshold = estimator.probability_threshold
9114 estimator = get_estimator_from_meta_estimator(estimator)
-> 9116 pred = np.nan_to_num(estimator.predict(X_test_))
9118 try:
9119 score = estimator.predict_proba(X_test_)

File ~\AppData\Roaming\Python\Python38\site-packages\sklearn\utils\metaestimators.py:119, in _IffHasAttrDescriptor.get..(*args, **kwargs)
116 attrgetter(self.delegate_names[-1])(obj)
118 # lambda, but not partial, allows help() to work with update_wrapper
--> 119 out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
120 # update the docstring of the returned function
121 update_wrapper(out, self.fn)

File ~\AppData\Roaming\Python\Python38\site-packages\sklearn\pipeline.py:408, in Pipeline.predict(self, X, **predict_params)
406 for _, name, transform in self._iter(with_final=False):
407 Xt = transform.transform(Xt)
--> 408 return self.steps[-1][-1].predict(Xt, **predict_params)

File ~\AppData\Roaming\Python\Python38\site-packages\pycaret\internal\meta_estimators.py:151, in CustomProbabilityThresholdClassifier.predict(self, X, **predict_params)
149 if not hasattr(self.classifier_, "predict_proba"):
150 return self.classifier_.predict(X, **predict_params)
--> 151 pred = self.classifier_.predict_proba(X, **predict_params)
152 if pred.shape[1] > 2:
153 raise ValueError(
154 f"{self.class.name} can only be used for binary classification."
155 )

File ~\AppData\Roaming\Python\Python38\site-packages\sklearn\ensemble_forest.py:673, in ForestClassifier.predict_proba(self, X)
671 check_is_fitted(self)
672 # Check data
--> 673 X = self._validate_X_predict(X)
675 # Assign chunk of trees to jobs
676 n_jobs, _, _ = _partition_estimators(self.n_estimators, self.n_jobs)

File ~\AppData\Roaming\Python\Python38\site-packages\sklearn\ensemble_forest.py:421, in BaseForest.validate_X_predict(self, X)
417 """
418 Validate X whenever one tries to predict, apply, predict_proba."""
419 check_is_fitted(self)
--> 421 return self.estimators
[0]._validate_X_predict(X, check_input=True)

File ~\AppData\Roaming\Python\Python38\site-packages\sklearn\tree_classes.py:396, in BaseDecisionTree.validate_X_predict(self, X, check_input)
394 n_features = X.shape[1]
395 if self.n_features
!= n_features:
--> 396 raise ValueError("Number of features of the model must "
397 "match the input. Model n_features is %s and "
398 "input n_features is %s "
399 % (self.n_features_, n_features))
401 return X

ValueError: Number of features of the model must match the input. Model n_features is 233 and input n_features is 235.

There has been no change in the source data and all the steps executed were same.

Please advise.

Thanks in advance,
Abhinav

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.