As a reference, we’ll provide a glossary of some common technical terms used in the course.
JAX
Some common JAX terms are presented here. These terms may or may not match with the common terminologies.
-
Asynchronous Dispatch: The phenomenon in which “jitted” function doesn’t await the complete calculation and passes it on to the next calculation.
-
Device: A generic term for either CPU, GPU or TPU.
-
DeviceArray : JAX’s analog of the
numpy.ndarray
. -
jaxpr
: JAX Expressions (or jaxpr) are the intermediate representations of a computation graph. -
JIT: Just In Time compilation. It is performed in JAX using XLA.
-
Pytrees: A tree-like structure built out of container-like Python objects.
-
static
: Compile-time computation in a JIT compilation and hence not traced. -
TPU: Tensor Processing Unit,
-
Tracer: An object determining the sequence of operations performed by a Python function.
-
Transformation: A higher-order function that takes functions as inputs and returning a transformed function like
jit()
,grad()
,pmap()
and so on. -
VJP: Vector Jacobian Product is opposite of JVP and is used for reverse-mode auto differentiation.
-
XLA: Accelerated Linear Algebra, XLA is a domain-specific compiler for linear algebra. It is used for JAX JIT compilation.
-
Weak type: A JAX data type having the same type promotion semantics as Python scalars.
Theory
-
Adam: A highly used algorithm for stochastic optimization.
-
Auto-differentiation: A technique for calculating derivatives using the chain rule.
-
Batch Normalization: A technique enabling faster training of deep neural networks by rescaling (and centering) them to have zero mean and a variance of one. .
-
Convolution: A mathematical operation expressing how the shape of a signal is modified by another signal .
-
Cumulative Distribution Function (CDF): The value a distribution will take for values less than or equal to a given value .
-
Gaussian (Normal) distribution: The famous bell-shaped curve probability distribution used to model a lot of real-world situations.
-
Gradient Clipping: A technique for faster convergence of deep neural networks by dividing the gradients beyond a threshold by its norm.
-
Hessian: The matrix of 2nd order derivatives for a given vector.
-
Jacobian: First-order derivatives matrix for a vector-valued function.
-
JVP: Jacobian Vector Product is used to implement forward-mode auto differentiation.
-
Kullback-Leibler (KL) Divergence: A commonly-used, asymmetric divergence measure.
-
Poisson distribution: A discrete probability distribution, used to model quite unlikely events.
-
PRNG: Pseudo-Random Number Generator is a key component of JAX and other numerical computation libraries as well.
-
Probability Density Function (PDF): Derivative of CDF at a given point. Its value can be intuitively treated as a measure of probability around that point.
-
Probability Mass Function (PMF): Discrete counterpart of PDF.
-
Transposed Convolution: The opposite of convolution resulting in upsampled output.
-
Wasserstein GAN (WGAN): A type of GAN using Wasserstein loss.
-
Wasserstein Loss: A loss function based on the optimal transport problem (hence also known as earth-mover distance).
Create a free account to view this lesson.
By signing up, you agree to Educative's Terms of Service and Privacy Policy