The softmax
function is a mathematical function that converts a vector of real values into a vector of probabilities that sum to 1. Each value in the original vector is converted to a number between 0 and 1.
The formula of the softmax
function is shown below:
As shown above, the softmax
function accepts a vector z
of length K. For each value in z
, the softmax
function applies the standard exponential function to the value. It then divides it by the sum of the exponents of each value in z
.
Consider the following vector:
z = [5, 2, 8]
First, let’s calculate the exponential of each value in z
.
= = 148.4
= = 7.4
= = 2981.0
Next, we can calculate the sum of the exponentials:
=
= 148.4 + 7.4 + 2981.0 = 3136.8
Finally, we can calculate the softmax equivalent for each value in z
, as shown below:
() = = 0.0473
() = = 0.0024
() = = 0.9503
So, we end up with a vector of probabilities:
Softmax(z) = [0.0473, 0.0024, 0.9503]
The code below shows how to implement the softmax
function in Python:
import math# softmax functiondef softmax(z):# vector to hold exponential valuesexponents = []# vector to hold softmax probabilitiessoftmax_prob = []# sum of exponentialsexp_sum = 0# for each value in the input vectorfor value in z:# calculate the exponentexp_value = math.exp(value)# append to exponent vectorexponents.append(exp_value)# add to exponential sumexp_sum += exp_value# for each exponential valuefor value in exponents:# calculate softmax probabilityprobability = value / exp_sum# append to probability vectorsoftmax_prob.append(probability)return softmax_prob# define vectorz = [5, 2, 8]# find softmaxresult = softmax(z)print(result)
In the code above:
math
library.softmax
function that accepts a vector as a parameter.for-loop
to iterate over each value in the given array. We first calculate its exponential for each value through the math.exp()
function, and append the value to exponents
. The sum of exponentials is also updated in each iteration of the loop.for-loop
to find the probability corresponding to each exponential value by dividing the value by exp_sum
. Each probability is appended to softmax_prob
.z
containing three values and pass it to the softmax
function. The vector returned by the function is output accordingly.