Fundamentals of Machine Learning for Software Engineers/

...

Beyond the Sigmoid

Explore ReLU activation function and discover when to use which activation function.

We'll cover the following...

Enter the ReLU
Pick the right function

There is no such thing as a perfect replacement for the sigmoid. Different activation functions work well in different circumstances, and researchers keep coming up with the new ones. That being said, one activation function has proven so broadly useful that it’s become a default of sorts. Let’s discuss it in the next section.

Enter the ReLU

The go-to replacement for the sigmoid these days is the rectified linear unit or ReLU. Compared with the sigmoid, the ReLU is surprisingly simple. Here’s a Python implementation of it:

Press + to interact

The ReLU is composed of two straight segments. However, taken together they add up to a nonlinear function, as a good activation function should be.

The ReLU may be simple, but it’s all the better for it. Computing its gradient is easy, which results in fast training. However, the ReLU’s most useful feature is that gradient of 1 for positive inputs. When backpropagation passes through a ReLU with positive input, the global gradient is multiplied by 1, so it does not change at all. That detail alone solves the problem of vanishing gradients for ...

How Machine Learning Works

Our First Learning Program

Walking the Gradient

Hyperspace

A Discern Machine

Get Real

The Final Challenge

The Perceptron

Designing the Network

Building the Network

Training the Network

How Classifiers Work

Batchin’ Up

The Zen of Testing

Let’s Do Development

A Deeper Kind of Network

Diabetes Prediction Using Keras

Defeating Overfitting

Taming Deep Networks

Beyond Vanilla Networks

Into the Deep

Recognize Handwritten Digits Using a Deep Neural Network

Machine Learning Fundamentals

Beyond the Sigmoid

Enter the ReLU