Multiclass Formulation

Explore how to adapt binary logistic regression to multiclass problems by applying the one-vs-all approach and the softmax function. Understand how to generate normalized class probabilities and implement these techniques with practical examples using the Iris dataset.

We'll cover the following...

Multiclass extension
- Algorithm
Softmax
- Implementation of multiclass extension
Conclusion

In the previous lessons, we mastered logistic regression as a powerful binary classification model, focusing on predicting the probability $p(y=1|\mathbf{x})$ . However, most real-world problems involve multiple classes (e.g., classifying 10 types of objects, or 3 species of Iris flowers).

This lesson addresses the question: How do we adapt a binary classification algorithm to handle three or more classes?

We will explore two fundamental approaches to multiclass extension:

The one-vs-all (one-vs-rest) strategy: This classic algorithm breaks the $C$ -class problem into $C$ separate binary logistic regression models, providing a simple, natural extension.
The softmax function: We will introduce the softmax function, which acts as a powerful generalization of the sigmoid. Softmax takes the scores from all models and transforms them into a valid, normalized probability distribution over all classes, ensuring that all probabilities sum up to 1.

Multiclass extension

The logistic regression model offers two significant advantages: the ability to learn probabilities and the natural extension to handle multiple classes. Let’s explore how we can extend a binary classification model to a multiclassification model with $c$ classes using one-vs-all (one-vs-rest).

Algorithm

For each class $j$ in the dataset:

Set the labels of class $j$ to 1, indicating positive instances.
Set the labels of all other classes to 0, representing negative instances.
Apply logistic regression on the modified dataset, treating it as a binary classification problem with class $j$ as the positive class (1) and all other classes as the negative class (0). Save the corresponding model parameters $\boldsymbol{\bold w_j}$ .
Predict the probability $\hat{y}_{tj} = \sigma(\bold w^T_j\phi(\bold x_t))$ ...

1.Course Overview

2.Supervised Learning

Project

3.Clustering

Mini Project

4.Generalized Linear Regression

Mini Project

5.Support Vector Machine

6.Logistic Regression

7.Ensemble Learning

Mini Project

8.Decoding Dimensions: PCA and Autoencoders

Mini Project

Mini Project

Mini Project

9.Appendix

10.Wrapping Up

Project

Multiclass Formulation

Multiclass extension

Algorithm