Rectified Linear Unit (ReLU) is an activation function in neural networks. It is a popular choice among developers and researchers because it tackles the
Note: You can learn more about this behavior of ReLU here.
Researchers have proposed multiple solutions to this problem. Some of them are mentioned below:
In this Answer, we discuss Parametric ReLU.
The mathematical representation of Parametric ReLU is as follows:
Here,
Note: When
is equal to zero, the function behaves like ReLU. Whereas, when is equal to a small number (such as 0.01), the function behaves like Leaky ReLU.
The above equation can also be represented as follows:
Using Parametric ReLU does not burden the learning of the neural network. This is because the number of extra parameters to learn is equal to the number of channels. This is relatively small compared to the number of weights the model needs to learn. Parametric ReLU gives a considerable rise in the accuracy of a model, unlike Leaky ReLU.
If the coefficient
In this section, we compare Parametric ReLU with the performance of Leaky ReLU.
Here, we plot Leaky ReLU with