Learning to choose the right hyperparameters is one of the best ways to extract the most from our machine learning or deep learning models. In this article, we’ll explore five different hyperparameters. These five include:
The process of tuning hyperparameters is an integral part of deep learning. We must understand the significance of the tuning process before building any models. This will allow us to extract the maximum performance from our models and serve as leverage in building top-performing models.
Let’s discuss each hyperparameter individually before jumping into the practice.
We can easily tune epochs because it’s the easiest hyperparameter. We already know that if we train a system long enough, it becomes more accurate. However, if we train it further, we start to underperform and might even become counterproductive and decrease our accuracy.
We don’t need a hidden layer if our data is linear. We need to figure out how complex our data is and decide how many hidden layers we need. Adding more can improve it, but the increased complexity could lead to overfitting. It’s best to stick to one or two digits for the number of layers, since we need more.
To understand the trade-off of different learning rates, let’s go back to the basics
and visualize gradient descent. The following diagrams show a few steps of gradient descent
along a one-dimensional loss curve, with three different values of lr.
The red
cross marks the starting point, and the green cross marks the minimum:
When we set a significant value for lr
, gradient descent tries to minimize the loss with substantial steps. It’s often used for large, sparse datasets because even if the algorithm does not converge, it can still uncover patterns. The opposite case is batch gradient descent, where each algorithm step is small. Still, it executes them all at once: we train a network on many examples in one pass.
Using a smaller value for lr
is more efficient and often preferred. If we have a smaller dataset and want to find the minimum faster, it will yield better results.
The goal of a loss function is to evaluate the “goodness” of its model. There is no one-size-fits-all loss function. They are usually picked based on the machine learning problem we’re trying to solve, which features we’re using, and so on.
There are two broad categories depending on the learning task we’re dealing with — regression losses and classification losses. Mean squared error is one good loss function in regression cases, whereas categorical cross entropy loss is quite handy in classification.
The neuron’s activation function returns a value between 0 and 1 as it determines if the neuron is relevant or should be ignored. The activation function decides how the neurons combine inputs to form the final output.
We use the placeholder sigmoid activation function for the output layer of a binary classification. The value of this node depends on whether its input value is more significant than 0.5, in which case it’ll return 1, or else it’ll return 0.
The hyperbolic tangent activation function is similar to the sigmoid function. It takes any real value as input and outputs values in the range of -1 to 1. Just like the sigmoid activation function, hyperbolic tangent activation has an S-shaped curve that ranges between “off” (x = 0) and “on” (x = 1).
ReLU is one of the most straightforward and efficient activation functions in deep learning. At a time, only a few neurons are activated, making the network sparse, efficient, and easy for computation.
ReLU neurons are not differentiable at 0. They tend to become inactive for all inputs. ReLU neurons can cause problems when learning at high rates; specifically, they can reduce the model’s capacity to learn.
We can use Softmax for multi-class classification to return the probability of each class, and the target class will have the highest probability.
It’s often used in the last layer of neural networks.
Let’s run the application given below and tune hyperparameters without coding for non-linearly separable data.
# A utility function that plots the training loss and validation loss from # a Keras history object. import streamlit as st import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap import seaborn as sns def plot(history): plt.clf() plt.plot(history.history['loss'], label='Training set', color='blue', linestyle='-') plt.plot(history.history['val_loss'], label='Validation set', color='green', linestyle='--') plt.xlabel("Epochs") plt.ylabel("Loss") plt.xlim(0, len(history.history['loss'])) plt.legend() plt.title("Training vs. Validation (loss)", fontsize=10) plt.show()
Free Resources