Giter VIP home page Giter VIP logo

datavisualization's Introduction

Data Visualization

What is Data Visualization

According to sas.com, Data visualization is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns.

With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs for more detail, interactively changing what data you see and how it’s processed.

The article attached here explains the History of DataViz, The Current Trend of DataViz, How it's Used and How it works

SearchBusiness explains Data Visualization as it relates to Data Science. In their Definition, Data Visualization is the practice of translating information into a visual context, such as a map or graph, to make data easier for the human brain to understand and pull insights from. The main goal of data visualization is to make it easier to identify patterns, trends and outliers in large data sets. The term is often used interchangeably with others, including information graphics, information visualization and statistical graphics.

They provide examples of Visualizations, some use cases in Business and importance. Read in details here.

How will I use Data Visualization in my ML career?

How Data Vis is used by the business incl. potential linkages to interesting sample visualizations. Data Visualization is a very important skill to have for any ML practitioner. It is a tool to communicate with stakeholders your findings and what the data entails. For example, dashboard the image above is a dashboard and a good use case of data visualization skill. With the dashboard, you are communicating a lot of information to Business stakeholders without saying/using any ML jargoons that sometimes stakeholders can't relate to.

Data Visualization helps uncover insights from our data. It is done before feature selection and engineering. It is to be carried out after Business Understanding and Data gathering. The output of a Data Visualization task can be a dashboard illustrating insights or a report in form of a slide deck communicating what the data entails. Data Visualization helps Data Scientist/Analyst etc build intuition about the results of the model.

Why is it important to understand Data Visualization before starting my ML career?

As stated by Jason Brownlee (PhD), Data Visualization is an important skill in applied statistics and machine learning. Statistics does indeed focus on quantitative descriptions and estimations of data. Data visualization provides an important suite of tools for gaining a qualitative understanding. This can be helpful when exploring and getting to know a dataset and can help with identifying patterns, corrupt data, outliers, and much more. With a little domain knowledge, data visualizations can be used to express and demonstrate key relationships in plots and charts that are more visceral to yourself and stakeholders than measures of association or significance.

It is very important Machine Learning Engineer, Data Scientist, Analyst etc understand in detail what Data Visualization is all about, how to complete a data visualization task for any kind of dataset, how to present insights, findings and make better judgement using Data Visualization.

How Data Visualization helps to solve a problem

Visualization helps you to see "what's happening in your data"
Visualization can help you decide which analysis to use
Visualization is what the business probably speaks

Examples of how you probably used Data Visualization today without realizing it

Pictures they say is lounder than words. Almost everybody has seen one or two meme images. Without saying anything you are already making inference from the picture. That is an example of Data Visualization. Now making good charts is not enough but making quality (clear, precise, neat) chart is very important. The essence is that, it must communicate something that can drive actions.

What differentiates beginners from experts in this competency?

The use and mastery of visualization tools. An expert should know how to use Viz tools like Tableau/Power BI etc to turn Business question into actionable insights
Understanding and being able to use libraries like seaborn, matplotlib etc to derive insights from data is good. Going a step further like designing an interactive dashboard, building an automatically updating dashboard is IMO an expert-like kind of thing.

Tools/Libraries to execute data visualization tasks

  1. Matplotlib: a base visualization python library that makes data come to life. Simple and easy to learn and use
  2. Seaborn: a high-level python visualization library based on matplotlib. It offers additional charts and beautiful colors
  3. Plotly: provides graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST. For python, plotly express (a module in plotly) is most popular for its high end nature, and plotly graph-objects (another module in plotly) for its vast features and customisation capabilities. Closely related, is plotly dash - a python framework well known for delivering excellent visualisation apps and dashboards. The winning point for plotly is the fact that it produces interactive chats, something you cannot easily achieve with matplotlib and seaborn.
  4. Bokeh is an interactive visualization library for modern web browsers. It provides elegant, concise construction of versatile graphics, and affords high-performance interactivity over large or streaming datasets. Bokeh can help anyone who would like to quickly and easily make interactive plots, dashboards, and data applications. See a live notebook here to practice with Bokeh
  5. d3.js: a JavaScript library for producing dynamic, interactive data visualizations in web browsers.
  6. Power BI :: Power BI is a Data Visualization and Business Intelligence tool that converts data from different data sources to interactive dashboards and BI reports. Power BI suite provides multiple software, connector, and services - Power BI desktop, Power BI service based on Saas, and mobile Power BI apps available for different platforms. These set of services are used by business users to consume data and build BI reports. This tutorial covers all the important concepts in Power BI and provides a foundational understanding on how to use Power BI.
  7. Tableau :: Tableau is a Business Intelligence tool for visually analyzing the data. Users can create and distribute an interactive and shareable dashboard, which depict the trends, variations, and density of the data in the form of graphs and charts. Tableau can connect to files, relational and Big Data sources to acquire and process data. The software allows data blending and real-time collaboration, which makes it very unique. It is used by businesses, academic researchers, and many government organizations for visual data analysis. It is also positioned as a leader Business Intelligence and Analytics Platform in Gartner Magic Quadrant.
  8. Altair: a statistical visualization library for Python

What components make up and what are key topics in Data Visualization

  1. Uni-variate Analysis
  2. Bi-variate Analysis
  3. Multi-variate Analysis
  4. Model Visualization*

The article here explains the different kinds of chart to explore data and provides some examples

  1. Uni-variate Analysis

Univariate analysis explores each variable in a data set, separately. It looks at the range of values, as well as the central tendency of the values. It describes the pattern of response to the variable. It describes each variable on its own. Descriptive statistics describe and summarize data. Univariate descriptive statistics describe individual variables. learn about uni-variate analysis here
In the article, a walkthrough of how to complete the univariate analysis part of a data visualization process is explained using the hello world dataset of machine learning iris dataset.

  1. Bi-variate Analysis

According to wikipedia, Bivariate analysis is one of the simplest forms of quantitative analysis. It involves the analysis of two variables, for the purpose of determining the empirical relationship between them. Bivariate analysis can be helpful in testing simple hypotheses of association. The notebook attached here is a kaggle notebook that explains the art of bivariate analysis and also provides various question to solidy the understaning of what has been taught in the notebook. Do well to complete the tasks.

  1. Multi-variate Analysis

Multivariate (Multidimensional) Visualization. β–‘ Visualization of datasets that have more than three variables. In this article, Dipanjan Sarkar (GDE) explains the art of effective viz for multi-dimensional data. The Art of Effective Visualization of Multi-dimensional Data

  1. Model Visualization

Here we have charts like

  1. Clustering: The most common algorithm used is k-means. k-Means starts by choosing k random centers which you can set yourself. Then, all data points are assigned to the closest center based on their Euclidean distance. Next, new centers are calculated and the data points are updated (see gif below). This process continuous until clusters do not change between iterations. k-means
  2. Dendograms: A dendrogram is a network structure. It is constituted of a root node that gives birth to several nodes connected by edges or branches. The last nodes of the hierarchy are called leaves. dendogram
  3. Heatmaps: A heatmap is a two-dimensional graphical representation of data where the individual values that are contained in a matrix are represented as colors.
  4. DBSCAN: (Density Based Spatial Clustering Application of Noise) which clusters data points if they are sufficiently dense. It identifies clusters and expands them by scanning neighborhoods.

What Differentiate a Beginner from an Expert in this Competency

  • Beginner

    Beginner possess skills to use

    • Matplotlib and Seaborn to build charts that communicate insights
    • Plotly library to create interactive visualizations
    • Present charts as a slide deck
  • Experts

    Experts can use more complex tools and derive more hidden insights. Experts can

    • Use Tableau/Power BI to create dashboards that automatically update as new data comes in
    • Create Web App using streamlit/Plotly-Dash or D3.js to present Data Viz project
    • Create attractive and interactive plots that tells useful story about the data

Charts Explanations

This section contains links that explains how various chart relate to analysis, how to interpret the charts and when to use what chart

  1. BoxPlot Explained
  2. What is a Violinplot?
  3. Linecharts in Detail

Learn More

  1. A step-by-step guide to Data Visualizations in Python
  2. Data Visualization with Python
  3. Exploring Data with python from scratch
  4. Kaggle Notebooks: Detailed exploratory data analysis with python
  5. Visual data analysis in Python
  6. Step by Step Guide to learn Tableau

ba-data_visualization_timeline-f_desktop Chart_Selection

datavisualization's People

Contributors

a2-sharma avatar abuton avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

joluoch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.