What is a generalized linear model (GLM)?

A generalized linear model is an advanced statistical model that adds to the concept of the general linear model. It is applied to continuous response variables with continuous and/or categorical predictors. The general linear model includes the conventional regression models used for response variables with normal distributionThis is a continuous probability distribution for a real-valued random variable with a mean of 0 and standard deviation of 1., such as linear regression.It linearly models the relation between a response variable and one or more explanatory variables.

On the other hand, a generalized linear model (GLM) allows the response variables to have a non-linear distribution, such as a binomial distributionThis is a discrete probability distribution with independent outcomes and constant probability of success. It has two outcomes—success or failure.. The following image displays a response outcome with an exponential distribution:

GLM generalizes the relation between the response variables and predictors to a linear additive relation that looks similar to the diagram below:

widget

GLM uses several components to generalize the relationship between the response variable and the predictors to a linear and additive relation. One of the main components is a link function. In addition to that, certain assumptions are made for this model.

Assumptions of GLM

The generalized linear model is used for non-linear, heteroscedasticThis has a variance that changes with mean. For example, increasing variance with increasing the mean. data which does not follow a normal distribution. It has certain underlying assumptions with using which it is implemented. Here are the assumptions:

  • The data needs to be random and independent.

  • Random variables should follow the same probability distribution.

  • The response variable follows an exponential distributionIt is the probability distribution of the time between Poisson point process events., such as a binomial or a Poisson distribution.

  • The response and explanatory variables do not have a linear relationship. However, a linear relationship is established between the transformed response variable (after the link function) and the explanatory variables.

  • We can also use transformed explanatory variables to build the GLM model, such as taking the log or square of the original variable.

  • Error variance of the response variable can vary with the independent variables.

How does GLM work

To develop a linear relationship between the response variable and the predictors, GLM uses three components:

  • A linear predictor

  • A link function

  • A probability distribution

We use well-known conventional statistical models in the process.

Linear predictor

A linear predictor, also known as a systematic component, is the linear combination of the explanatory variables (x1, x2, x3, ....., xi) and the regression coefficients:

Probability distribution

Probability distribution, also known as the random component, refers to the distribution that the response variable YY follows. Some distributions that YY can follow include normal, binomial, multinomial distributionThis is a discrete probability distribution with independent outcomes and constant probability of each outcome. It has multiple outcomes, or Poisson distributions.

Link function

This component is usually represented as η or g(μ) in GLM. It specifies how the response variable is related to the linear combination of explanatory variables. It is defined using the probability distribution of the response variable and the linear predictor.

Probability Distribution

Link Function

Normal

Identity

Binomial

Logit/Sigmoid

Poisson

Log

Different models used in GLM

Various models are used in GLM according to the probability distribution of the response variable. Numerous probability distributions and link functions are available for this purpose. However, only three of them are discussed in this answer to cover the entirety of the model.

Linear regression

  • Probability distribution: A random variable YY follows a continuous normal distribution.

  • Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as log(x)log(x):

Simple linear regression is used for one predictor. In the case of two or more predictors, multiple regression is used.

  • Link function: The identity function is used as a link function to transform the relationship into a linear one:

Binary logistic regression

  • Probability distribution: The response variable YY follows a binomial distribution.

  • Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as log(x)log(x):

  • Link function: The Logit link function is used to return a probability that varies between 0 and 1. It is also known as Log odds.

Poisson regression

  • Probability distribution: The response variable YY follows a Poisson distribution.

  • Linear predictor: Predictors form a linear combination with the parameters. Predictor variables can be continuous or categorical. In addition, transformed variables can also be used in the linear combination, such as log(x)log(x):

  • Link function: The log link function is used.

Conclusion

GLM uses its three components and some statistical models to generalize the relationship between predictors and response variables.

Copyright ©2024 Educative, Inc. All rights reserved