Interaction Model

Let’s look at the interaction model in multiple regression.

We'll cover the following...

Let’s now quantify the relationship of our outcome variable yy and the two explanatory variables using one type of multiple regression model known as an interaction model.

We’ll write out the equation of the two regression lines using the values from a regression table. Before we do this, however, let’s go over a brief refresher of regression when we have a categorical explanatory variable x.x.

Recall how we fit a regression model for countries’ life expectancies as a function of which continent the country was in. In other words, we had a numerical outcome variable y=y = lifeExp and a categorical explanatory variable x=x = continent, which had five levels—Africa, the Americas, Asia, Europe, and Oceania. Let’s redisplay the regression table:

Life Expectancy as a Function of Continent

term

estimate

std_error

statistic

p_value 

lower_ci 

upper_ci 

intercept

54.8

1.02

53.45

0

52.8

56.8

continentAmericas 

18.8

1.80

10.45

0

15.2

22.4

continentAsia

15.9

1.65

9.68

0

12.7

19.2

continentEurope 

22.8

1.70

13.47

0

19.5

26.2

continentOceania

25.9

5.33

4.86

0

5.4

36.5

Recall our interpretation of the estimate column. The continent Africa was the “baseline for comparison” group, hence, the intercept term corresponds to the mean life expectancy for all countries in Africa, i.e., 54.8 years. The other four values of estimate correspond to offsets relative to the baseline group. For example, the offset corresponding to the Americas is +18.8 as compared to the baseline for comparison group Africa. In other words, the average life expectancy for countries in the Americas is 18.8 years higher. Thus, the mean life expectancy for all countries in the Americas is 54.8 + 18.8 = 73.6. The same interpretation holds for Asia, Europe, and Oceania.

Going back to our multiple regression model for teaching score using age and gender, we generate the regression table using the same two-step approach. This approach fits the model using the lm() linear model function, and then we apply the get_regression_table() function. This time, however, our model formula won’t be of the form y ~ x, but rather of the form y ~ x1 * x2. In other words, our two explanatory variables x1 and x2 are separated by a * sign:

Press + to interact
# Fit regression model:
score_model_interaction <- lm(score ~ age * gender, data = evals_ch6)
# Get regression table:
get_regression_table(score_model_interaction)

Looking at the regression table output of the code above, there are four rows of values in the estimate column. While it’s not immediately apparent, using these four values, we can write out the equations of both lines. Female instructors are the baseline for comparison group because the word female comes alphabetically before male. Thus, intercept is the intercept for only the female instructors.

This holds similarly for age, and it’s the slope for age for only the female instructors. Thus, the red regression line in the code output below has an intercept of 4.883 and slope for age of -0.018. Remember that for this data, while the intercept has a mathematical interpretation, it has no practical interpretation because instructors can’t have zero age.

What about the intercept and slope for yjr age of the male instructors in the blue line in the following figure? This is where our notion of offsets comes into play once again.

The value for ...