Auto-differentiation

This lesson will cover auto-differentiation, the core component of deep learning libraries.

We'll cover the following...

Background

First pioneered by the seminal work of Rumelhart and Hinton in 1986, the majority of current machine learning optimization methods use derivatives. So, there is a pressing need for their efficient calculation.

Manual calculations

Most of the early machine learning researchers and scientists $-$ for example, Bottou, 1998 for Stochastic Gradient Descent $-$ had to go through a slow, laborious process of manual calculation of analytical derivatives, which is prone to error.

Using computer program

Programming-based solutions are less laborious, but calculating these derivatives in a program can also be tricky. We can categorize them into three paradigms:

Symbolic differentiation
Numeric differentiation
Auto differentiation

The first and second methods are prone to errors, including:

Calculating higher-derivatives is tricky due to long and complex expressions for symbolic differentiation and rounding-off errors $—$ meaning less accurate results $—$ in numeric differentiation.
Numeric differentiation uses discretization, which results in loss of accuracy.
Symbolic differentiation can lead to inefficient code.
Both are slow to calculate the partial derivatives, a key feature of gradient-based optimization algorithms.

Automatic differentiation

Automatic differentiation (also known as autodiff) addresses all the issues above and is a key feature of modern ML/DL libraries.

Chain rule

Autodiff centers around the concept of the chain rule, the fundamental rule in calculus used to calculate derivatives of the composed functions.

For example,

y = 2x^2 +4

and,

x = 3 w

Introduction

JAX Programming Model

Linear Algebra

Random Variables and Distributions

JAX Ecosystem

Project: GAN Using the JAX ecosystem

Appendix