Deep Learning with PyTorch Step-by-Step: Part I - Fundamentals/

...

Loss

Understand the steps in computing the appropriate loss, and explore the binary-cross entropy loss.

We'll cover the following...

Defining the appropriate loss

We already have a model, and now we need to define an appropriate loss for it. A binary classification problem calls for the binary cross-entropy (BCE) loss, which is sometimes known as log loss.

The BCE loss requires the predicted probabilities, as returned by the sigmoid function, and the true labels (y) for its computation. For each data point i in the training set, it starts by computing the error corresponding to the point’s true class.

If the data point belongs to the positive class (y=1), we would like our model to predict a probability close to one, right? A perfect one would result in the logarithm of one, which is zero. It makes sense; a perfect prediction means zero loss. It goes like this:

$y_i = 1 => error_i = log(P(y_i = 1))$

What if the data point belongs to the negative class (y=0)? Then, we cannot simply use the predicted probability. Why not? Because the model outputs the probability of a point belonging to the positive, not the negative class. Luckily, the latter can be easily computed:

$P(y_i = 0) = 1 - P(y_i = 1)$ ...