Map > Data Science > Predicting the Future > Modeling > Classification/Regression > ANN > Gradient Descent | ||
Gradient Descent |
||
Gradient descent is an optimization algorithm
that minimizes functions. For a given function J
defined by a set of parameters
(![]() |
||
Simple Linear Regression using Least Squares | ||
Linear regression is an algorithm which uses ordinary least squares to model the linear relationship between a dependent variable (target) and one or more independent variables (predictors). Simple linear regression (SLR) is a model with one single independent variable. | ||
|
||
Ordinary least squares (OLS) is a non-iterative method that fits a model such that the sum-of-squares of differences of observed and predicted values is minimized. | ||
|
||
The linear model parameters using OLS: | ||
|
||
Simple Linear Regression using Gradient Descent | ||
Gradient descent finds the linear model parameters iteratively. | ||
|
||
If we minimize function J, we will get the best line for our
data which means lines that fit our data better will result in lower
overall error. To run gradient descent on this error function, we first need to compute its gradient. The gradient will act like a compass and always point us downhill. To compute it, we will need to differentiate our error function. Since our function is defined by two parameters
( |
||
|
||
The Learning Rate controls how large of a step we take downhill during each iteration. If we take too large of a step, we may step over the minimum. However, if we take small steps, it will require many iterations to arrive at the minimum. | ||
Finally: |
||
And this is the final equation for updating the intercept iteratively. | ||
|
||
Now we have what needed to run gradient descent.
First we initialize our search to start at any pair of |
||
Non-linear Gradient Descent | ||
The concept of minimizing a cost function to tune parameters applies
not only to linear models but also to non-linear models. For example, the
following is the cost function logistic regression. |
||
|
||
Convergence | ||
How to determine when the search finds a solution? This is typically done by looking for small changes in error iteration-to-iteration (e.g., where the gradient is near zero). | ||
Convexity | ||
Not all problems have only one minimum (global minima). It is possible to have a problem with local minima where a gradient search can get stuck in. | ||
demo |