Giter VIP home page Giter VIP logo

deepinsights's Introduction

Short Note about insightsR

insightsR is an open source web application that provides automated insights for tabular data. Insights will be statistically generated based on the target column (dependent variable) selected by the user.

User is only expected to provide a dataset in csv format, select a numeric Target column and click 'Generate Insights' button in the left pane. Rest is taken care by the tool. Further amplifying the ease of use with a statement: Just 2 inputs are required from the end user (1. Dataset and 2. Target column)

Insights provided and how to interpret

Once the user uploads the data to be analysed and provides the target column against which insights to be provided, insightsR tool will handle pre-processing of data which includes removal of null columns, highly correlated columns, columns that has more than 75% null values, identifies and converts date time columns to multiple columns that helps in providing better insights. This pre-processing step is the 1st step.

After pre-processing, insightsR tool will statistically analyse the dataset using machine learning algorithms and provides insights below:

  • Provides top contributing features against the target data column. For eg) if target is 'Sales' features that might turn important is price, seasonality etc. insightsR tool provides such highly important features and % of contribution.
  • Provides data visualizations for top 5 contributing features. insightsR tool puts numeric features into multiple buckets to provide a different perception by aligning data into bucketed categories.
  • Moving on, deeper insights are provided based on identified top contributors. Based on a sample picked from the top contributors, insightsR first provides info on whether the contribution is +ve or -ve and by what %. Features that brought down/up the mean target value is an immensely useful insight. Further within the sample how did each feature contribute is detailed.
  • Finally, the tool simulates a scenario: what happens when the contributor data stays constant and every other column data remains as is - will the target value improves/degrades? This data will provide us view on whether the contributing features has really impacted the target value or is it just part of larger multiple contributing features combined.

Based on the insights provided, user could identify areas that can be improved, devise a plan on what would happen if a change is simulated, devise a plan to achieve the target.

Installation

Pre-requisite: Python(~3.7.9) is installed in the system. Execute the following commands in Terminal/cmd

git clone https://github.com/Vinothsuku/insightsR.git

cd .\insightsr

pip install -r requirements.txt

streamlit run insightsr.py

Open a browser and goto: http://localhost:8501

(Optional - Suggested) Create a virtual environment first and then execute the above mentioned commands.

Blog

I have written a detailed blog here.

Try out online

Hosted online. Feel free to try out with any dataset in csv format.

Constraints

Currently the tool supports only datasets in "csv" and provides insights for a target column that has continuous values (regression type) for eg) total sales, price, cost, income, salary etc.

Credits

fast.ai is the base for this tool. I have put it together as an automation effort (kind of basic autoML type) and as a web application that could be used by anyone with ease.

deepinsights's People

Contributors

vinothsuku avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.