Kernel Logistic Regression

Learn how to implement kernel logistic regression along with its derivation.

We'll cover the following

Minimzing BCE Loss
- Implementation

We can kernelize logistic regression just like other linear models by observing that the parameter vector $\bold w$ is a linear combination of the feature vectors $\Phi(X)$ , that is:

\bold w = \Phi(X) \bold a

Here, $\bold a$ is the dual parameter vector, and, in this case, the loss function now depends upon $\bold a$ .

Minimzing BCE Loss

We need to find the model parameters ( $\bold a$ ) that result in the smallest BCE loss function value to minimize the BCE loss. The BCE loss is defined as:

\begin{align*} L_{BCE}(\bold{a})&=\sum_{i=1}^n L_i\\ L_i &= -(y_ilog(\hat y_i)+(1-y_i)log(1-\hat y_i)) \\ \hat{y}_i&=\sigma(z_i)=\frac{1}{1+e^{-z_i}} \\ z_i&=\bold a^T \Phi(X)^T\phi(\bold{x}_i) \end{align*}

To minimize $L_{BCE}$ , we need to compute its gradient with respect to the parameters $\bold a$ as follows:

\begin{align*} \frac{\partial L_{BCE}}{\partial \mathbf{a}} &= \sum_{i=1}^{N}\frac{\partial L_i}{\partial \hat{y}_i} \frac{\partial \hat{y}_i}{\partial z_i}\frac{\partial z_i}{\partial \mathbf{a}}\\ &= - \sum_{i=1}^{N} \left[ y_i \frac{1}{\hat{y}_i} - (1 - y_i) \frac{1}{1 - \hat{y}_i} \right] \hat{y}_i (1 - \hat{y}_i) \Phi(X)^T\phi(\bold{x}_i)\\ &=-\sum_{i=1}^{N} \left[ y_i-\hat{y}_i \right] \Phi(X)^T\phi(\bold{x}_i)\\ &=\sum_{i=1}^{N} \left[ \hat{y}_i-y_i \right] \Phi(X)^T\phi(\bold{x}_i)\\ &=K(\hat{\bold y}-\bold y)^T \end{align*}

Here, $K$ is the Gram matrixThe Gram matrix is a square matrix containing dot products of all pairs of data points after mapping them to a higher-dimensional feature space using a kernel function.:

K = \Phi(X)^T\phi(\bold{x}_i)

and

\Phi(X) = \begin{bmatrix}\phi(\bold x_1) & \phi(\bold x_2) & \dots & \phi(\bold x_n)\end{bmatrix}

Note: The Gram matrix is used to compute the dot products in feature space using kernel functions without actually computing the feature vectors themselves, making it computationally efficient.

Implementation

The code below implements the kernel logistic regression for binary classification:

Get hands-on with 1200+ tech skills courses.

Course Overview

Supervised Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

Kernel Logistic Regression

Minimzing BCE Loss

Implementation