This lesson summarizes the topics we'll be covering in section 27 and why they'll be important to you as a data scientist.
You will be able to:
- Understand and explain what is covered in this section
- Understand and explain why the section will help you to become a data scientist
In the last section, we looked at how to use linear algebra to solve a linear regression using OLS. However, towards the end of the section we discovered the computational inefficiency of an OLS approach as your number of features grows.
In this section, we're going to see how you can apply a "gradient descent" to solve a linear regression. Along the way we'll also look at cost functions and will provide a foundation in calculus that will be valuable to you throughout your career as a data scientist.
We're going to start off by introducing derivatives - the "instantaneous rate of change of a function" or (more graphically) the "slope of a curve". We'll start off by looking at how to calculate the slope of a curve for a straight line, and then we'll explore how to calculate the rate of change for more complex (non-linear) functions.
We then learn about the chain rule - a key tool used for calculating derivatives for the more complicated functions we'll encounter in machine learning.
Now that we know how to calculate the slope of a curve - and, by extension, to find a local minima (low point) or maxima (high point) where the curve is flat (the slope of the curve is zero), we'll look at the idea of a gradient descent to step from some random point on a cost curve to find the local optima to solve for a given linear equation. We'll also look at how best to select the step sizes for descending the cost function, and how to use partial derivatives to optimize both slope and offset to more effectively solve a linear regression using gradient descent.
Just as we used solving a linear regression using OLS as an excuse to introduce you to linear algebra - one of the foundational elements of mathematics underpinning machine learning, we're now using the idea of gradient descent to introduce enough calculus to both understand and have good intuitions about many of the machine learning models that you're going to learn throughout the rest of the course.