What is the conjugate gradient method?

Artificial intelligence has seen a huge rise in the past decades. Much of this rise is due to data-driven intelligence also known as machine learning. The machine learning models, in general, rely on optimization algorithms to minimize the errors in the given data. In layman’s terms, we can see the machine learning model as the mountaineer trying to reach the ground. The optimization algorithms are there to give the mountaineer a direction.

Optimization

In machine learning, optimization is the process of improving the model's performance by minimizing the error. Gradient descent is one such method used in training machine learning models. It works by moving in the direction of the negative gradient. This method has a significant drawback: it assumes that moving toward negative gradients leads to lesser errors. Thus, it can get stuck in local minima. Using our analogy above, the mountaineer can not see behind a peak and may end up following a path that will not lead to the ground.

Conjugate gradient

The conjugate gradient is an optimization method that works on the gradient principle. The method tries to minimize the error by taking the problem in a linear system. Hence, the method is essentially solving the equation A.x=bA.x = b with the following assumptions:

  • The function being minimized can be represented as a sum of quadratic and linear functions.

  • The matrix AA is symmetric i.e. can be represented as AT=AA^T = A.

Building on these assumptions, the equation to be can be treated as A.xb=0A.x - b = 0. The symmetry makes it easier to minimize the function. Unlike the gradient descent, the conjugate method moves only in the orthogonal direction. This avoids the zigzag motion and converges smoothly compared to the gradient descent (or ascent) method.

Difference of motion between gradient ascent and conjugate gradient
Difference of motion between gradient ascent and conjugate gradient

Limitations

The conjugate gradient method has proven powerful in optimizing systems of equations. However, it is not considered the best choice for machine learning models. Firstly, the method tends to overfit because the goal of machine learning is not optimizing specific data. Secondly, the machine learning problems are usually in stochastic settings, and optimization algorithms like SGDstochastic gradient descent perform better in these settings.

Note: You can read more about optimization algorithms in this Answer.

Code

The following code finds the minimum value for xx in a simple equation Ax2+bxAx^2 + bx with the conjugate gradient method using numpy and scipy libraries.

import numpy as np
from scipy import optimize
args = (20, 50) #values for a and b
def function_to_minimize(x, *args):
a, b = args
return a*x**2 + b*x # function to be optimized (Ax^2 + bx)
x0 = np.asarray(0) # initial x value.
result = optimize.fmin_cg(function_to_minimize, x0, args=args,disp=False)
print("The value for x is : ", round(result[0],4))

Code explanation

  • Line 4: We set the values for the inputs AA and bb.

  • Line 6–8: The conjugate gradient function in scipy.optimize requires the equation to be minimized as a parameter. We provide that equation in this function.

  • Line 10: We initialize the starting value for xx. This value is updated during the optimization process.

  • Line 12–13: We call the scipy conjugate gradient function and it outputs the minimum value of xx for our equation Ax2+bxAx^2 + bx.

Conclusion

The conjugate gradient algorithm is a powerful method for applications involving systems of equations. Despite their drawbacks, the gradient descent based methods are considered preferable in most machine learning problems.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved