Search⌘ K

Kernel Logistic Regression

Explore how kernel logistic regression enhances standard logistic regression to handle complex, non-linearly separable data. Learn to implement this method using kernel functions, the Gram matrix, and gradient descent, enabling probabilistic classification for nonlinear problems.

We'll cover the following...

We can kernelize logistic regression just like other linear models by observing that the parameter vector w\bold w is a linear combination of the feature vectors Φ(X)\Phi(X), that is:

w=Φ(X)a\bold w = \Phi(X) \bold a

Here, a\bold a is the dual parameter vector, and, in this case, the loss function now depends upon a\bold a.

Minimzing BCE Loss

We need to find the model parameters (a\bold a) that result in the smallest BCE loss function value to minimize the BCE loss. The BCE loss is defined as:

LBCE(a)=i=1nLiLi=(yilog(y^i)+(1yi)log(1y^i))y^i=σ(zi)=11+ezizi=aTΦ(X)Tϕ(xi)\begin{align*} L_{BCE}(\bold{a})&=\sum_{i=1}^n L_i\\ L_i &= -(y_ilog(\hat y_i)+(1-y_i)log(1-\hat y_i)) \\ \hat{y}_i&=\sigma(z_i)=\frac{1}{1+e^{-z_i}} \\ z_i&=\bold a^T \Phi(X)^T\phi(\bold{x}_i) \end{align*} ...