Search⌘ K

Generalized Linear Regression

Learn to implement closed form solutions, vectorization, and visualization for generalized linear regression.

We’ve previously learned that while standard linear models are powerful, many real-world relationships are non-linear. The generalized linear model (GLM) framework solves this by introducing a basis function (ϕ(x)\phi(\mathbf{x})) that transforms the input features into a higher-dimensional space, allowing a linear model to fit a complex, non-linear curve to the data.

In this lesson, we move from conceptual understanding to practical implementation by exploring closed-form solutions for training generalized linear models.

Single target

The input features xiRd{x}_i \in {R}^d are vectors where each data point has dd distinct, real-valued features (e.g., size, age). The target variable yiRy_i \in {R} is a single, continuous, real-valued number (e.g., house price) that the model aims to predict, defining this as a single-target regression problem.

The model fw(fx)=wTϕ(x)f_{{w}}(f{x}) = {w}^T\phi({x}) is a generalized linear model (GLM). It achieves non-linear modeling by first applying a basis function ϕ(x)\phi({x}) (the mapping) to transform the input features, and then making the prediction via a linear dot product with the learned parameters w\mathbf{w}.

Try this quiz to review what you’ve learned so far.

1.

In the context of the function fw(x)=wTϕ(x)f_\bold w(\bold x) = \bold w^T\phi(\bold x), if xRd\bold x \in \R^d, ϕ(x)Rm\phi(\bold x) \in \R^m and wRk\bold w \in \R^k, then what is kk?

A.

k=dk=d

B.

k=mk=m


1 / 1

The function fw(x)=wTϕ(x)f_{{w}}({x}) = {w}^T\phi({x}) successfully defines the structure of our generalized linear model (GLM) for any given input x{x}. However, this model structure is useless until we determine the ideal values for the parameter vector w{w}. These parameters must be chosen so that the model’s predictions best match the true target values in our training dataset DD.

To quantify how well a given set of parameters w{w} performs, we use a loss function L(w)L({w}). This function measures the total error between the model’s predictions and the actual observed values across all nn data points. To find the w{w} that provides the best fit, we must find the w{w} that minimizes this loss.

The optimal parameters w{w}^* can be determined by minimizing a regularized squared loss as follows:

w=arg minw{i=1n(wTϕ(xi)yi)2+λwTw}\bold w^*=\argmin_{\bold w}\bigg\{\sum_{i=1}^n (\bold w^T\phi(\bold x_i)-y_i)^2 + \lambda \bold w^T\bold w\bigg\}

Here, i=1n(wTϕ(xi)yi)2\sum_{i=1}^n (\mathbf{w}^T\phi(\mathbf{x}_i)-y_i)^2 is the squared error (or data loss) term, and λwTw\lambda \mathbf{w}^T\mathbf{w} is the L2 regularization term. Their sum, i=1n(wTϕ(xi)yi)2+λwTw\sum_{i=1}^n (\mathbf{w}^T\phi(\mathbf{x}_i)-y_i)^2 + \lambda \mathbf{w}^T\mathbf{w}, is the complete regularized squared loss function, which is denoted by L(w)L(\mathbf{w}).

Suppose we want to predict the price of a house based on its size, number of bedrooms, and age. We can use a generalized linear regression to model the relationship between these input features and the target variable (the house price). The model can be defined as: ...