Kernel Linear Regression

Learn to implement kernel linear regression for a single target.

Single target example

It’s possible to reformulate generalized linear regression to incorporate the kernel trick. For example, the loss function L(w)L(\bold w) for generalised linear regression with a single target is as follows:

L(w)=ϕ(X)wy22+λw22L(\bold w)= \|\phi(X) \bold w-\bold y\|_2^2 + \lambda \|\bold w\|_2^2

Note:

wTw=w22\bold w^T\bold w = \|\bold w\|_2^2

Setting the derivative of the loss with respect to w\bold w to 0\bold 0 results in the following:

ϕ(X)T(ϕ(X)wy)+λw=0w=1λϕ(X)T(ϕ(X)wy)w=ϕ(X)Ta\begin{align*} & \phi(X)^T(\phi(X)\bold w-\bold y)+\lambda \bold w = \bold 0 \\ & \bold w = -\frac{1}{\lambda}\phi(X)^T(\phi(X)\bold w-\bold y) \\ & \bold w = \phi(X)^T\bold a \\ \end{align*}

Here, a=1λ(ϕ(X)wy)\bold a=-\frac{1}{\lambda}(\phi(X)\bold w-\bold y).

Reparameterization

We can now parametrize the loss function with parameter vector a\bold a by replacing w\bold w with ϕ(X)Ta\phi(X)^T\bold a, as follows:

L(a)=ϕ(X)ϕ(X)Tay22+λϕ(X)Ta22=ϕ(X)ϕ(X)Tay22+λaTϕ(X)ϕ(X)Ta=Kay22+λaTKa\begin{align*} L(\bold a)&= \|\phi(X) \phi(X)^T\bold a-\bold y\|_2^2 + \lambda \|\phi(X)^T\bold a\|_2^2 \\ &= \|\phi(X) \phi(X)^T\bold a-\bold y\|_2^2 + \lambda \bold a^T \phi(X)\phi(X)^T\bold a \\ &= \|K\bold a-\bold y\|_2^2 + \lambda \bold a^T K\bold a \\ \end{align*}

Closed-form solution

Setting the derivative of the loss L(a)L(\bold a) with respect to a\bold a to 0\bold 0 results in the following:

KT(Kay)+λKa=0K^T(K\bold a - \bold y)+\lambda K \bold a = \bold 0

As the Gram matrix KK is symmetric, that is, KT=KK^T=K, so the above equation can be written as follows:

K(Kay)+λKa=0K(Kay+λa)=0(K+λI)a=ya=(K+λI)1y\begin{align*} & K(K\bold a - \bold y)+\lambda K \bold a = \bold 0 \\ & K(K\bold a - \bold y + \lambda \bold a) = \bold 0 \\ & (K + \lambda I)\bold a = \bold y \\ & \bold a = (K + \lambda I)^{-1} \bold y \end{align*}

Prediction

Once a\bold a is computed, the prediction y^t\hat y_t on an input vector xt\bold x_t can be made as follows:

y^t=wTϕ(xt)=aTϕ(X)ϕ(xt)=[a1a2an][ϕ(x1)Tϕ(xt)ϕ(x2)Tϕ(xt)ϕ(xn)Tϕ(xt)]=[a1a2an][k(x1,xt)k(x2,xt)k(xn,xt)]\begin{align*} \hat y_t &= \bold w^T \phi(\bold x_t)\\ &= \bold a^T \phi(X) \phi(\bold x_t) \\ &= \begin{bmatrix}a_1 & a_2 & \dots & a_n\end{bmatrix} \begin{bmatrix}\phi(\bold x_1)^T\phi(\bold x_t) \\ \phi(\bold x_2)^T\phi(\bold x_t) \\ \vdots \\ \phi(\bold x_n)^T\phi(\bold x_t)\end{bmatrix} \\ \\ &= \begin{bmatrix}a_1 & a_2 & \dots & a_n\end{bmatrix} \begin{bmatrix}k(\bold x_1,\bold x_t) \\ k(\bold x_2,\bold x_t) \\ \vdots \\ k(\bold x_n,\bold x_t)\end{bmatrix} \end{align*}

Implementation

We now implement the generalized linear regression for a single target using the kernel trick.

Get hands-on with 1400+ tech skills courses.