Exponential Linear Unit (ELU) is an activation function that improves a model's accuracy and reduces the training time. It is mathematically represented as follows:
In the formula above,
The activation function ReLU became famous when it solved the problem of the vanishing gradient (namely, the gradients of activation functions, like sigmoids, become very small making it difficult to train bigger models).
At the same time, however, ReLU created a problem for itself, called the dying ReLU problem. This problem occurs when ReLU outputs 0 on any input.
In contrast, ELU (like batch normalization) has negative values that help bring the mean value closer to 0. This improves the training speed.
Even though Parametric ReLU and Leaky ReLU also have negative values, they are not smooth functions. ELU is a smooth function for negative values, making it more noise-robust.
Here, we implement ELU in Python:
import numpy as npimport matplotlib.pyplot as plt# initializing the constantα = 1.0def ELU(x):if x > 0:return xreturn α*(np.exp(x) - 1)x = np.linspace(-5.0, 5.0)result = []for i in x:result.append(ELU(i))plt.plot(x, result)plt.title("ELU activation function")plt.xlabel("Input")plt.ylabel("Output")plt.grid(True)plt.savefig('output/elu_plot.png')
np.linspace
to generate evenly spaced numbers between