Gradient with respect to matrices

Many times, machine learning objectives, such as minimizing the loss function of a linear regression, can be written using matrices and vectors, making them compact and easy to understand. Therefore, to make computations easier, it is worthwhile to understand how gradients are computed when matrices are involved.

The gradient of matrices with respect to vectors (or matrices) can also be computed like the Jacobian of vector-valued functions. The Jacobian can be thought of as a multi-dimensional tensor that is a collection of partial derivatives. For example, the gradient for a $m \times n$ matrix $A$ with respect to the $p \times q$ matrix $B$ will be a $(m \times n) \times (p \times q)$ Jacobian whose entries will be given as follows:

Introduction to Optimization

Vector Calculus

Convex Optimization

Gradient Descent for Non-Convex Optimization

Use Particle Swarm Optimizer to Optimize a Non-convex Function

Constrained Optimization

Miscellaneous Methods

Course Conclusion

Test Your Concepts of Optimization

Training Support Vector Machines (SVMs)

Gradients of Matrices

Gradient with respect to matrices