Optimizing BCE Loss
Learn how to minimize BCE loss using gradient descent.
We'll cover the following...
Optimization
Logistic regression aims to learn a parameter vector by minimizing a chosen loss function. While the squared loss might appear as a natural choice, it’s not convex. Fortunately, we have the flexibility to consider alternative loss functions that are convex. One such loss function is the binary cross-entropy (BCE) loss, denoted as , which possesses convexity properties. The BCE loss can be defined as:
Explanation of BCE loss
Let’s delve into the explanation of the BCE loss. For a single example in a dataset with a target label , if and the prediction , the loss . Conversely, if , the loss becomes significantly large. Similarly, we can evaluate the pairs and . The code snippet provided below illustrates the computation of the BCE loss for a single example:
import numpy as npdef BCE_loss(y, y_hat):"""Compute the binary cross-entropy (BCE) loss for a given target label and predicted probability.Args:y: Target label (0 or 1)y_hat: Predicted probabilityReturns:BCE loss value"""if y == 1:return -np.log(y_hat)else:return -np.log(1 - y_hat)# Iterate over different combinations of y and zfor y in [0, 1]:for y_hat in [0.0001, 0.99]:# Compute and print the BCE loss for each combinationprint(f"y = {y}, y_hat = {y_hat}, BCE_loss = {BCE_loss(y, y_hat)}")
By utilizing the BCE loss, we can effectively capture the dissimilarity between the target labels and predicted probabilities, enabling convex optimization during the parameter estimation process of logistic regression.
Minimzing BCE loss
We need to find the model parameters (that is, ) that result in the smallest BCE loss function value to minimize the BCE loss. The BCE loss is defined as:
...