Set theory and probability theory are foundational statistical concepts. In this lesson you'll learn about the more formal ways to represent statistical distributions.
Descriptive statistics are used to describe a distribution of data โ in particular, measures of centrality (e.g. mean, median) and measures of spread (e.g. variance, standard deviation).
In most real-world data science contexts, you will not have access to complete information about a distribution of data. Instead, you will have a sample. In order to make claims about the complete population of data, you'll need to perform inference (i.e. "inferential statistics") using the available sample data.
In order to understand how to make these inferences, first you'll need some additional understanding of different kinds of distributions, how they relate to the underlying data types being represented (discrete vs. continuous), and how we represent them formally using mathematical notation.
In particular, we'll look at ways of representing probability distributions using the Probability Mass Function (for discrete data) and Probability Density Function (for continuous data), as well as another statistical distribution represented by the Cumulative Distribution Function.
We'll also dig into some of the specific distributions that data points often fall into, including the Binomial and Bernoulli distributions (for discrete data) and the Normal distribution (for continuous data). We'll conclude by introducing the concepts of skewness and kurtosis, which help to quantify how "un-normal" a given distribution is.
In this section we expanded on the idea of descriptive statistics to provide a foundation for inferential statistics.