Search⌘ K
AI Features

Multiclass Formulation

Explore how to adapt binary logistic regression to multiclass problems by applying the one-vs-all approach and the softmax function. Understand how to generate normalized class probabilities and implement these techniques with practical examples using the Iris dataset.

In the previous lessons, we mastered logistic regression as a powerful binary classification model, focusing on predicting the probability p(y=1x)p(y=1|\mathbf{x}). However, most real-world problems involve multiple classes (e.g., classifying 10 types of objects, or 3 species of Iris flowers).

This lesson addresses the question: How do we adapt a binary classification algorithm to handle three or more classes?

We will explore two fundamental approaches to multiclass extension:

  1. The one-vs-all (one-vs-rest) strategy: This classic algorithm breaks the CC-class problem into CC separate binary logistic regression models, providing a simple, natural extension.

  2. The softmax function: We will introduce the softmax function, which acts as a powerful generalization of the sigmoid. Softmax takes the scores from all models and transforms them into a valid, normalized probability distribution over all classes, ensuring that all probabilities sum up to 1.

Multiclass extension

The logistic regression model offers two significant advantages: the ability to learn probabilities and the natural extension to handle multiple classes. Let’s explore how we can extend a binary classification model to a multiclassification model with cc classes using one-vs-all (one-vs-rest).

Algorithm

For each class jj in the dataset:

  1. Set the labels of class jj to 1, indicating positive instances.

  2. Set the labels of all other classes to 0, representing negative instances.

  3. Apply logistic regression on the modified dataset, treating it as a binary classification problem with class jj as the positive class (1) and all other classes as the negative class (0). Save the corresponding model parameters wj\boldsymbol{\bold w_j}.

  4. Predict the probability y^tj=σ(wjTϕ(xt))\hat{y}_{tj} = \sigma(\bold w^T_j\phi(\bold x_t)) ...