Often in real-world applications, we need some measure to compare probability distribution. Normal metrics are not useful in such cases and thus we need some other useful measure.
Divergence measures are the measures that are normally used for this task. Kullback-Leibler (KL) divergence is the most commonly used divergence measure.
KL divergence is a way of measuring the deviation between two probability distributions. In the case of discrete distributions, KL is defined as:
And for continuous distributions:
Where
The symbol
Note: Some authors also use the term relative entropy for this, but here we'll follow the practice followed in Convex Optimization by Stephen Boyd and Lieven Vandenberghe, Cambridge University Press, 2004.
Even though KL divergence can give us a way to measure the distance between two probability distributions, it is not a metric. The differences between divergence and metrics are:
Shortcodes replace the most frequently appearing words when compressing a data file such as .txt
. A well-known probability distribution can make a lot of work more manageable. KL divergence is a good method to compare true probability with well-known distributions.
Variational inference is an optimization problem where we can use KL divergence as an approximation method to find how an intractable distribution is close to a tractable distribution.
The above-mentioned use of KL divergence makes it a perfect loss function in Variational autoencoders where we need to
We'll implement KL divergence using NumPy in the code below:
import numpy as npdef kl_divergence(p, q):"""Parameters:p: true distributionq: known distribution---------------------Returns:KL divergence of p and q distributions."""return np.sum(np.where(p != 0, p * np.log(p / q) - p + q, 0))p = np.asarray([0.4629, 0.2515, 0.9685])q = np.asarray([0.1282, 0.8687, 0.4996])print("KL divergence between p and q is: ", kl_divergence(p, q)) # this should return 0.7372678653853546print("KL divergence between p and p is: ", kl_divergence(p, p)) # this should return 0
Note: KL divergence between the same probability distributions is 0
Free Resources