Giter VIP home page Giter VIP logo

explainit's Introduction

Explainit


License GitHub release (latest SemVer) PyPI version PyPI test Docs Latest

What is Explainit?

Explainit is a modern, enterprise-ready business intelligence web application that re-uses existing frameworks to manage and serve dashboard features to machine learning project lifecycle.

Features

Explainit allows ML platform teams to:

  • Analyze Drift in the existing data stack (Features & Targets).
  • Prepare very short summary of productionized data.
  • Perform Quality Checks on the data to provide the feature overview.
  • Analyze in-depth relationship between features & target.

Who is Explainit for?

Explainit helps ML platform teams with DevOps experience monitor productionized batch data. Explainit can also help these teams build towards a explainability/monitoring platform that improves collaboration between engineers and data scientists.

Explainit is likely not the right tool if you:

  • Are in an organization that’s just getting started with ML and is not yet sure what the business impact of ML is.
  • Rely primarily on unstructured data.

Quick Concepts on Drift

What is Model Drift?

Model Drift (also known as model decay) refers to the degradation of a model’s prediction power due to changes in the environment or changes in feature distribution, and thus the relationships between variables.

Types of Model Drift

There are three main types of model drift:

  • Concept drift
  • Data drift
  • Upstream data changes

Concept drift is a type of model drift where the relationship between the input and target changes over time. It usually occurs when real-world environments change in contrast to the training data the model learned from. For example, the behaviour of customers can change over time, lowering the accuracy of a model trained on historic customer datasets.

Data drift is a type of model drift where the properties of the independent variable(s) change(s). Examples of data drift include changes in the data due to seasonality, changes in consumer preferences, the addition of new products, etc…

Upstream data changes refer to operational data changes in the data pipeline. An example of this is when a feature is no longer being generated, resulting in missing values. Another example is a change in measurement (eg. miles to kilometers).

Installation guide

Install the Explainit Package:

pip install explainit

Run the App

In order to generate the dashboards inside the application, you need to run the following commands.

from explainit.app import build

After importing the methods, we need some data that should be passed to the application in order to generate the dashboards. We'll use the Default Loan dataset.

import pandas as pd

ref_data = pd.read_csv("https://raw.githubusercontent.com/katonic-dev/explainit/master/examples/data/reference_data.csv", index_col=None)
prod_data = pd.read_csv("https://raw.githubusercontent.com/katonic-dev/explainit/master/examples/data/production_data.csv", index_col=None)

Once you have the both reference and production datasets, all you need to do is pass those datasets into the method that we imported along with the target column name and target column type (type should be cat for categorical column and num for numerical columns).

build(
  reference_data=ref_data,
  production_data=prod_data,
  target_col_name="bad_loan",
  target_col_type="cat",
  host="127.0.0.1",
  port=8050
)

If you want to run your application in a separate server rather than localhost, you need to mention the host and port addresses.

App Snapshots

Below is a snapshot of the landing page of Explainit Dashboard.


Contributor Guide

Interested in contributing? Check out our CONTRIBUTING.md to find resources around contributing along with a detailed guide on how to set up a development environment.

QnA

Q. What exactly the scope of the app is?

A. By this app users can calculate Dataset Drift, Target Drift and Data Quality metrics to understand the Production / Real-World Data along with Training / Reference Data better to come to a decision.

Q. What does the input data look like?

A. Input Data is nothing but your reference/training and production/inference data. The reference data will be used for the distribution comparision for the production data. These input data should be passed as pandas dataframes.

Q. What outputs does the app produce?

A. App shows / produces the Statistical Information about the complete data (features + target) for drift analysis, Distribution Plots for each of the features to understand the data better, Contribution of each features on the target along with Correlations metrics.

Q. What decisions can the user make by using the app?

A. With Drift Information from the app user can make some decisions:

  • Look for the quality data for the usecase.
  • Make changes or train new models for production.
  • Update the domain specific concepts to understand the real-world better for new models.

explainit's People

Contributors

shailesh-katonic avatar subhrajit-katonic avatar vinaynaman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

explainit's Issues

[DOC-FIX]: Update documentation for routes

  • Add how to use app on custom endpoint ROUTE to run/deploy app.
  • Add background to target_behaviour_based_on_features graph.
  • Update Histograms, Scatter, bar charts color (remove red color)
  • Update the data_summary table param index -> Metrics
  • Capitalize data_summary table params.
  • Update the additional pie graphs title in tab-1
  • Capitalize feature_summary graphs legend
  • Highlight the Graphs Title
  • Update the Correlation table title to Correlation Table in Correlation Tab
  • Capitalize heatmaps title
  • Update target_name behaviour based on feature_name heading to Target "target_name" behaviour based on "feature_name" Feature
  • Update stats-metrics Column in data drift tab to Feature
  • Update How it Works? section
  • Improve documentation wording.

[DOC-FIX]: Update the app workflow

  • Update the architecture diagram
  • Update the dataset labels current to production, cur to prod in getting-started, README, FAQ, examples
  • Update build function input parameters in README
  • Update the vertical spacing in FAQ

[FR]: Add the graph description with proper axes parameters

Add the description for each graphs (eg: hover details or separate section like):

  • Drift Detection graphs interpretation
  • Data Quality Management metrics, graph interpretation
  • Different correlation metrics, significance (which way it will more help to understand the data)

Update Axes parameters:

  • Distribution graphs for features, Target

Update Statistical Metric:

  • Add/Update drift graph, drift-no-drift parameters colors based on the drift severity (eg: drift-score low -> light color, too-low -> dark color etc.)

Update Drift graph:

  • Give more context to the scatter plot (num) & significance of standard deviation (in-terms of data distribution, more general/mathematical depiction etc.)
  • How Pie chart is good for categorical features in-terms of data distribution (any alternative)

[DOC-FIX]: Update Documentation typos, examples

  • Added kde plot for small histograms
  • Updated color palette for small histograms for training & testing data
  • Update README.md for scikit-learn package installation in example.
  • Update punctuation in README, FAQ
  • Update example in getting-started guide
  • Update input params in getting-started guide
  • Update output in getting-started guide
  • Update FAQ.md URL in README.md
  • Update release URL in README.md
  • Update CONTRIBUTING.md URL in README.md
  • Add dependent pkgs for getting-started examples

Update Explainit in README

  • Updated Python Version badge: Explaintit -> Explainit
  • Added Introduction to Explainit in README.md
  • Updated main function name to build
  • Updated FAQ section
  • Added Contribution guide in README.md
  • Updated Python Version Support on setup.cfg

[BUG]: Index column issue in Generating Graphs.

  • fix integer index has no attribute tolist issue in additional_num_graphs.py
  • fix the pip install command in docs (remove $ sign)
  • Update all screenshot in docs
  • Update target drift graph layout

Base Feature Enhancement

  • Statistical Metrics Rows
  • Distribution & Drift Graphs
  • Target Drift Main Graph
  • Target Drift based on Individual Feature
  • Data Summary
  • Feature Summary
  • Correlation Graphs
  • Components Integration

[FR]: Add unique value threshold parameter

  • Add unique value threshold parameter in build function to separate num, cat features from all features.
  • Add Doc-string in build function
  • Fix typo in getting-started.md
  • Update Doc-strings in correlation.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.