Introduction to Linear Model Analysis
Let’s get a brief overview of the linear model analysis for group comparisons.
We'll cover the following
R packages
We’ll use the following R packages in this chapter:
ggplot2
ggfortify
SMPracticals
arm
DAAG
Linear analysis
In the previous chapter, we conducted a simple analysis of Darwin’s maize data using R as a calculator to work out confidence intervals ‘by hand’. This is a simple way to learn about analysis and is good for demystifying the process, but it is inefficient. Instead, we want to take advantage of the more sophisticated functions that R provides that are designed to perform linear-model analysis. We will explore those functions by repeating and extending the analysis of Darwin’s maize data.
A linear model analysis for comparing groups
R uses the general lm()
function to fit linear models. We began our analysis of Darwin’s maize data with an estimation of the grand mean height of all 30 plants, ignoring the pollination treatments. We can do that with the lm() function, an example is shown below:
ls0 <- lm(formula = height ~ 1, data = darwin)
Usually, we want to fit more than just one model. Let’s create an R object for each model and name it so that it’s easy to compare them, extract information from them, and so on. For example, here we’ve called the model above ls0
, short for least-squares model 0. This is because linear model analyses use the least squares method.
The first argument of the lm()
function—the model formula—specifies that we want to analyze the response variable (height
) as a function of an explanatory variable, using the tilde (~
) symbol. To start with the simplest possible model, we ignored the pollination-type treatment. Instead, 1
indicates that we want to estimate an ‘intercept’. It’s important to note that we need to have something to the right of the tilde; we can’t just leave it blank. When nothing else is included in the linear- model formula apart from the 1
, the intercept is the grand mean.
Note: Look at how the
lm()
function has an argument for specifying the name of the data frame, which saves us having to use thewith()
function or thedataframe$variable
notation.
The display()
functionarm
package. This function provides a concise summary of some of the key outputs of linear models. The display()
function is a simplified alternative to the base R summary()
function, which you can use instead if you don’t have the arm
package installed.
Get hands-on with 1400+ tech skills courses.