Loss
Understand the steps in computing the appropriate loss, and explore the binary-cross entropy loss.
We'll cover the following
Defining the appropriate loss
We already have a model, and now we need to define an appropriate loss for it. A binary classification problem calls for the binary cross-entropy (BCE) loss, which is sometimes known as log loss.
The BCE loss requires the predicted probabilities, as returned by the sigmoid function, and the true labels (y
) for its computation. For each data point i
in the training set, it starts by computing the error corresponding to the point’s true class.
If the data point belongs to the positive class (y=1
), we would like our model to predict a probability close to one, right? A perfect one would result in the logarithm of one, which is zero. It makes sense; a perfect prediction means zero loss. It goes like this:
What if the data point belongs to the negative class (y=0
)? Then, we cannot simply use the predicted probability. Why not? Because the model outputs the probability of a point belonging to the positive, not the negative class. Luckily, the latter can be easily computed:
And thus, the error associated with a data point belonging to the negative class goes like this:
Once all errors are computed, they are aggregated into a loss value.
Binary cross-entropy loss
For the binary cross-entropy loss, we simply take the average of the errors and invert its sign. This will have the following equation:
Let us assume we have two dummy data points, one for each class. Then, let us pretend our model made predictions for them: 0.9 and 0.2. The predictions are not bad since it predicts a 90% probability of being positive for an actual positive and only 20% of being positive for an actual negative. What does this look like in code? Here it is:
Get hands-on with 1400+ tech skills courses.