A generalized linear model is an advanced statistical model that adds to the concept of the general linear model. It is applied to continuous response variables with continuous and/or categorical predictors. The general linear model includes the conventional regression models used for response variables with
On the other hand, a generalized linear model (GLM) allows the response variables to have a non-linear distribution, such as a
GLM generalizes the relation between the response variables and predictors to a linear additive relation that looks similar to the diagram below:
GLM uses several components to generalize the relationship between the response variable and the predictors to a linear and additive relation. One of the main components is a link function. In addition to that, certain assumptions are made for this model.
The generalized linear model is used for non-linear,
The data needs to be random and independent.
Random variables should follow the same probability distribution.
The response variable follows an
The response and explanatory variables do not have a linear relationship. However, a linear relationship is established between the transformed response variable (after the link function) and the explanatory variables.
We can also use transformed explanatory variables to build the GLM model, such as taking the log or square of the original variable.
Error variance of the response variable can vary with the independent variables.
To develop a linear relationship between the response variable and the predictors, GLM uses three components:
A linear predictor
A link function
A probability distribution
We use well-known conventional statistical models in the process.
A linear predictor, also known as a systematic component, is the linear combination of the explanatory variables (x1, x2, x3, ....., xi) and the regression coefficients:
Probability distribution, also known as the random component, refers to the distribution that the response variable
This component is usually represented as η or g(μ) in GLM. It specifies how the response variable is related to the linear combination of explanatory variables. It is defined using the probability distribution of the response variable and the linear predictor.
Probability Distribution | Link Function |
Normal | Identity |
Binomial | Logit/Sigmoid |
Poisson | Log |
Various models are used in GLM according to the probability distribution of the response variable. Numerous probability distributions and link functions are available for this purpose. However, only three of them are discussed in this answer to cover the entirety of the model.
Probability distribution: A random variable
Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as
Simple linear regression is used for one predictor. In the case of two or more predictors, multiple regression is used.
Link function: The identity function is used as a link function to transform the relationship into a linear one:
Probability distribution: The response variable
Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as
Link function: The Logit link function is used to return a probability that varies between 0 and 1. It is also known as Log odds.
Probability distribution: The response variable
Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as
Link function: The log link function is used.
GLM uses its three components and some statistical models to generalize the relationship between predictors and response variables.