...

/

Optimizations and Learning Rate

Optimizations and Learning Rate

Explore different optimization methods and how to adjust learning rate.

Here, we will only discuss gradient-based optimization methods, which are most commonly used in GANs. Different gradient methods have their own strengths and weaknesses. There isn't a universal optimization method that can solve every problem. Therefore, we should choose them wisely when it comes to different practical problems.

Types of optimization methods

Let’s have a look at some now:

  1. SGD (calling optim.SGD with momentum=0 and nesterov=False): It works fast and well for shallow networks. However, it can be very slow for deeper networks and may not even converge for deep networks:

In this equation, θt\theta_t is the parameters at iteration step tt, η\eta is the learning rate, and J\nabla J is the gradient of the objective function, JJ.

  1. Momentum (calling optim.SGD with the momentum argument when it's larger than 0 and nestrov=False): It is one of the most commonly used optimization methods. This method combines the updates of the previous step with the gradient at the current step so that it takes a smoother trajectory than SGD. The training speed of Momentum is often ...