Data Visualization with Pandas

Introduction

In this lesson, we will be looking at data visualization using Pandas and Matplotlib - modules that we have already seen and used. Pandas uses matplotlib under the hood for data visualization, and provides some handy yet efficient functions for visualizing data from DataFrames.

Objectives

You will be able to:

Understand the relation between pandas and matplotlib plots and their attributes
Plot data from single variables using scatter plots, histograms, line plots, boxplots and KDE plots in pandas
Plot multi-dimensional data using scatter matrix and parallel coordinate plots.

Styling a Plot

Before we dive into data visualization in Pandas, it would be a good idea to get a quick introduction to Matplotlib's style package. Matplotlib comes with a number of predefined styles to customize the plots. These styles generally change the look of plots by changing color maps, line styles, backgrounds etc. Because Pandas is built on Matplotlib for visualizations, this will change the style of our Pandas graphs as well as we shall see below:

We can use plt.style.available to see a list of predefined styles available in Matplotlib. The %matplotlib notebook magic below optimizes the plots for displaying them in jupyter notebooks

import matplotlib.pyplot as plt
%matplotlib notebook
plt.style.available

['seaborn-dark',
 'seaborn-darkgrid',
 'seaborn-ticks',
 'fivethirtyeight',
 'seaborn-whitegrid',
 'classic',
 '_classic_test',
 'fast',
 'seaborn-talk',
 'seaborn-dark-palette',
 'seaborn-bright',
 'seaborn-pastel',
 'grayscale',
 'seaborn-notebook',
 'ggplot',
 'seaborn-colorblind',
 'seaborn-muted',
 'seaborn',
 'Solarize_Light2',
 'seaborn-paper',
 'bmh',
 'seaborn-white',
 'dark_background',
 'seaborn-poster',
 'seaborn-deep']

So this provides us with a list of styles available. In order to use a style, we simply give the command plt.style.use(<style name>). Let's use ggplot for now and see how it changes the default style. Feel free to try other styles and see how they impact the look and feel of the plots!

plt.style.use('ggplot')

Create a dataset for visualization

Pandas offers excellent built-in visualization features. It's particularly useful for exploratory data analysis of data that's stored as Pandas Series or DataFrame.

Let's build a synthetic temporal DataFrame with following steps:

Data frame with three columns A, B and C
For data in each column, we will use a random number generator to generate 365 numbers (to reflect days in a year) using np.random.randn().
Using numpy's cumsum (cumulative sum) method, we will cumulatively sums the generated random numbers in each column.
Offset column A by +25 and column C by -25 with respect to Column B, which will remain unchanged
Using pd.date_range, set the index to be everyday in 2018 (starting from 1st january).

We shall also set a seed for controlling the randomization, allowing us to reproduce the data.

It is always a good idea to set a random seed when dealing with probabilistic outputs.

Let's give this a go:

import pandas as pd
import numpy as np

np.random.seed(777)

data = pd.DataFrame({'A':np.random.randn(365).cumsum(),
                    'B':np.random.randn(365).cumsum() + 25,
                    'C':np.random.randn(365).cumsum() - 25}, 
                     index = pd.date_range('1/1/2018', periods = 365))
data.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	A	B	C
2018-01-01	-0.468209	25.435990	-22.997943
2018-01-02	-1.291034	26.479220	-22.673404
2018-01-03	-1.356414	25.832356	-21.669027
2018-01-04	-2.069776	26.456703	-21.408310
2018-01-05	-1.163425	25.864281	-22.685208

This is great. Now we have a dataset with three columns we can call time-series. Let's inspect our data visually. To plot this data we can simply use the .plot() method on the DataFrame.

data.plot()

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1a208d2cf8>

This is sweet. So we didnt have to define our canvas, axes or labels etc. This is where pandas really shines. The DataFrame.plot() method is just a simple wrapper around plt.plot() that draws line plots. So when we call data.plot(), we get a line graph of all the columns in the data frame with labels.

Also notice how this plot looks different in terms of look and feel. This is because the style we used earlier. Also the %matplotlib notebook makes the plots interactive. Try clicking, dragging , zooming on above plot to see how this works.

Try changing the to a different style and see which one would you prefer.

Scatter Plots

The DataFrame.plot() allows us to plot a number of different kinds of plots. We can select which plot we want to use by pressing it into the kind parameter. Here is a complete list from the documentation

kind : str

‘line’ : line plot (default)
‘bar’ : vertical bar plot
‘barh’ : horizontal bar plot
‘hist’ : histogram
‘box’ : boxplot
‘kde’ : Kernel Density Estimation plot
‘density’ : same as ‘kde’
‘area’ : area plot
‘pie’ : pie plot
‘scatter’ : scatter plot
‘hexbin’ : hexbin plot

Let's try and create a scatter plot that takes the A and B columns of data. We pass in "scatter" to the kind parameter to change the plot type. Also note, putting a semicolon at the end of plotting function would mute any extra text out.

data.plot('A', 'B', kind='scatter' );

<IPython.core.display.Javascript object>

We can also choose the plot kind by using the methods dataframe.plot.kind instead of passing the kind argument as we shall see below. Lets now create another scatter plot with points varying in color and size. We'll perform following steps:

Use df.plot.scatter and pass in columns A and C.
Set the color c and size s of the data points to change based on the value of column B.
Choose the color palette by passing a string into the parameter colormap.

A complete list of colormaps is available at Official Documentation

Let's see this in action:

data.plot.scatter('A', 'C', 
                  c = 'B',
                  s = data['B'],
                 colormap = 'viridis');

<IPython.core.display.Javascript object>

<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUg

foamofthesea / dsc-1-03-14-data-visualization-with-pandas-online-ds-sp-000 Goto Github PK

dsc-1-03-14-data-visualization-with-pandas-online-ds-sp-000's Introduction

Data Visualization with Pandas

Introduction

Objectives

Styling a Plot

Create a dataset for visualization

Scatter Plots

dsc-1-03-14-data-visualization-with-pandas-online-ds-sp-000's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent