How to implement the softmax function in Python

Share

Overview

The softmax function is a mathematical function that converts a vector of real values into a vector of probabilities that sum to 1. Each value in the original vector is converted to a number between 0 and 1.

The formula of the softmax function is shown below:

σ(z)i=ezij=1Kezj\sigma (\vec{z})_{i} = \frac{{e}^{z_{i}}}{\sum_{j=1}^{K}{e}^{z_{j}}}

As shown above, the softmax function accepts a vector z of length K. For each value in z, the softmax function applies the standard exponential function to the value. It then divides it by the sum of the exponents of each value in z.

Example

Consider the following vector:

z = [5, 2, 8]

First, let’s calculate the exponential of each value in z.

ez1e^{z_{1}} = e5e^{5} = 148.4

ez2e^{z_{2}} = e2e^{2} = 7.4

ez3e^{z_{3}} = e8e^{8} = 2981.0

Next, we can calculate the sum of the exponentials:

j=1Kezj{\sum_{j=1}^K e^{z_{j}}} = ez1+ez2+ez3e^{z_{1}} + e^{z_{2}} + e^{z_{3}}

j=1Kezj{\sum_{j=1}^K e^{z_{j}}} = 148.4 + 7.4 + 2981.0 = 3136.8

Finally, we can calculate the softmax equivalent for each value in z, as shown below:

σσ(z1z_{1}) = 148.43136.8\frac{148.4}{3136.8} = 0.0473

σσ(z2z_{2}) = 7.43136.8\frac{7.4}{3136.8} = 0.0024

σσ(z3z_{3}) = 2981.03136.8\frac{2981.0}{3136.8} = 0.9503

So, we end up with a vector of probabilities:

Softmax(z) = [0.0473, 0.0024, 0.9503]

Code

The code below shows how to implement the softmax function in Python:

import math
# softmax function
def softmax(z):
# vector to hold exponential values
exponents = []
# vector to hold softmax probabilities
softmax_prob = []
# sum of exponentials
exp_sum = 0
# for each value in the input vector
for value in z:
# calculate the exponent
exp_value = math.exp(value)
# append to exponent vector
exponents.append(exp_value)
# add to exponential sum
exp_sum += exp_value
# for each exponential value
for value in exponents:
# calculate softmax probability
probability = value / exp_sum
# append to probability vector
softmax_prob.append(probability)
return softmax_prob
# define vector
z = [5, 2, 8]
# find softmax
result = softmax(z)
print(result)

Explanation

In the code above:

  • Line 1: We import the math library.
  • Line 4: We define the softmax function that accepts a vector as a parameter.
  • Lines 7-13: We declare three variables to store the exponential of each value, the corresponding probability, and the sum of all exponentials, respectively.
  • Lines 16-25: We use a for-loop to iterate over each value in the given array. We first calculate its exponential for each value through the math.exp() function, and append the value to exponents. The sum of exponentials is also updated in each iteration of the loop.
  • Lines 28-34: We use another for-loop to find the probability corresponding to each exponential value by dividing the value by exp_sum. Each probability is appended to softmax_prob.
  • Lines 39-43: We declare a vector z containing three values and pass it to the softmax function. The vector returned by the function is output accordingly.