Kernel Logistic Regression

Learn how to implement kernel logistic regression along with its derivation.

We'll cover the following...

We can kernelize logistic regression just like other linear models by observing that the parameter vector w\bold w is a linear combination of the feature vectors Φ(X)\Phi(X), that is:

w=Φ(X)a\bold w = \Phi(X) \bold a

Here, a\bold a is the dual parameter vector, and, in this case, the loss function now depends upon a\bold a.

Minimzing BCE Loss

We need to find the model parameters (a\bold a) that result in the smallest BCE loss function value to minimize the BCE loss. The BCE loss is defined as:

LBCE(a)=i=1nLiLi=(yilog(y^i)+(1yi)log(1y^i))y^i=σ(zi)=11+ezizi=aTΦ(X)Tϕ(xi)\begin{align*} L_{BCE}(\bold{a})&=\sum_{i=1}^n L_i\\ L_i &= -(y_ilog(\hat y_i)+(1-y_i)log(1-\hat y_i)) \\ \hat{y}_i&=\sigma(z_i)=\frac{1}{1+e^{-z_i}} \\ z_i&=\bold a^T \Phi(X)^T\phi(\bold{x}_i) \end{align*} ...