Optimization
Explore the different variations of the gradient descent algorithm.
We'll cover the following...
Optimization is the selection of the best element (with regard to some criterion) from a set of available alternatives.
In the simplest case, an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function.
In the case of machine learning, optimization refers to minimizing the loss function by systematically updating the network weights. Mathematically, this is expressed as:
given a loss function and weights .
Intuitively, it can be thought of as descending a high-dimensional landscape. If we could project it in 2D plot, the height of the landscape would be the value of the loss function, and the horizontal axis would be the values of our weights w. Ultimately, the goal is to reach the bottom of the landscape by iteratively exploring the space around us.
Gradient descent
Gradient descent is based on the basic idea of following the local slope of our landscape. We essentially introduce physics and the law of gravity in the mix. Calculus provides us with an elegant way to calculate the slope of the landscape, which is the derivative of the function at this point (also known as the gradient) with respect to the weights.
...