Introduction to Linear Model Analysis

Let’s get a brief overview of the linear model analysis for group comparisons.

R packages

We’ll use the following R packages in this chapter:

  • ggplot2
  • ggfortify
  • SMPracticals
  • arm
  • DAAG

Linear analysis

In the previous chapter, we conducted a simple analysis of Darwin’s maize data using R as a calculator to work out confidence intervals ‘by hand’. This is a simple way to learn about analysis and is good for demystifying the process, but it is inefficient. Instead, we want to take advantage of the more sophisticated functions that R provides that are designed to perform linear-model analysis. We will explore those functions by repeating and extending the analysis of Darwin’s maize data.

A linear model analysis for comparing groups

R uses the general lm() function to fit linear models. We began our analysis of Darwin’s maize data with an estimation of the grand mean height of all 30 plants, ignoring the pollination treatments. We can do that with the lm() function, an example is shown below:

ls0 <- lm(formula = height ~ 1, data = darwin)

Usually, we want to fit more than just one model. Let’s create an R object for each model and name it so that it’s easy to compare them, extract information from them, and so on. For example, here we’ve called the model above ls0, short for least-squares model 0. This is because linear model analyses use the least squares method.

The first argument of the lm() function—the model formula—specifies that we want to analyze the response variable (height) as a function of an explanatory variable, using the tilde (~) symbol. To start with the simplest possible model, we ignored the pollination-type treatment. Instead, 1 indicates that we want to estimate an ‘intercept’. It’s important to note that we need to have something to the right of the tilde; we can’t just leave it blank. When nothing else is included in the linear- model formula apart from the 1, the intercept is the grand mean.

Note: Look at how the lm() function has an argument for specifying the name of the data frame, which saves us having to use the with() function or the dataframe$variable notation.

The display() functionWritten by statisticians Andrew Gelman, Jennifer Hill and colleagues is located in the arm package. This function provides a concise summary of some of the key outputs of linear models. The display() function is a simplified alternative to the base R summary() function, which you can use instead if you don’t have the arm package installed.

Get hands-on with 1400+ tech skills courses.