Artificial intelligence has seen a huge rise in the past decades. Much of this rise is due to data-driven intelligence also known as machine learning. The machine learning models, in general, rely on optimization algorithms to minimize the errors in the given data. In layman’s terms, we can see the machine learning model as the mountaineer trying to reach the ground. The optimization algorithms are there to give the mountaineer a direction.
In machine learning, optimization is the process of improving the model's performance by minimizing the error. Gradient descent is one such method used in training machine learning models. It works by moving in the direction of the negative gradient. This method has a significant drawback: it assumes that moving toward negative gradients leads to lesser errors. Thus, it can get stuck in local minima. Using our analogy above, the mountaineer can not see behind a peak and may end up following a path that will not lead to the ground.
The conjugate gradient is an optimization method that works on the gradient principle. The method tries to minimize the error by taking the problem in a linear system. Hence, the method is essentially solving the equation
The function being minimized can be represented as a sum of quadratic and linear functions.
The matrix
Building on these assumptions, the equation to be can be treated as
The conjugate gradient method has proven powerful in optimizing systems of equations. However, it is not considered the best choice for machine learning models. Firstly, the method tends to overfit because the goal of machine learning is not optimizing specific data. Secondly, the machine learning problems are usually in stochastic settings, and optimization algorithms like
Note: You can read more about optimization algorithms in this Answer.
The following code finds the minimum value for numpy
and scipy
libraries.
import numpy as npfrom scipy import optimizeargs = (20, 50) #values for a and bdef function_to_minimize(x, *args):a, b = argsreturn a*x**2 + b*x # function to be optimized (Ax^2 + bx)x0 = np.asarray(0) # initial x value.result = optimize.fmin_cg(function_to_minimize, x0, args=args,disp=False)print("The value for x is : ", round(result[0],4))
Line 4: We set the values for the inputs
Line 6–8: The conjugate gradient function in scipy.optimize
requires the equation to be minimized as a parameter. We provide that equation in this function.
Line 10: We initialize the starting value for
Line 12–13: We call the scipy
conjugate gradient function and it outputs the minimum
value of
The conjugate gradient algorithm is a powerful method for applications involving systems of equations. Despite their drawbacks, the gradient descent based methods are considered preferable in most machine learning problems.
Free Resources