Simple Linear Regression for a Numerical Explanatory Variable

Perform linear regression for a numerical variable in R and learn the principles behind it.

Recall the concepts of algebra that the equation of a line is 𝑦=𝑎+𝑏𝑥𝑦 = 𝑎 + 𝑏 ⋅ 𝑥. (Note that the symbol is equivalent to the * “multiply by” mathematical symbol. We’ll use the symbol in the rest of this course as it’s more succinct.) It’s defined by two coefficients 𝑎𝑎 and 𝑏𝑏. The intercept coefficient 𝑎𝑎 is the value of 𝑦𝑦 when xx = 0. The slope coefficient 𝑏𝑏 for 𝑥𝑥 is the increase in 𝑦𝑦 for every increase of one in 𝑥𝑥. This is also called the rise over run.

However, when defining a regression line, we use a slightly different notation, i.e., the equation of the regression line is y^=b0+b1x\hat y = b_0 + b_1 \cdot x. The intercept coefficient is b0b_0, so b0b_0 is the value of y^\hat y when 𝑥 = 0. The slope coefficient for 𝑥𝑥 is b1b_1, i.e., the increase in y^\hat y for every increase of one in 𝑥𝑥. Why do we put a hat on top of the yy? It’s a form of notation commonly used in regression to indicate that y^\hat y is a fitted value, representing the value of 𝑦𝑦 on the regression line for a given 𝑥𝑥 value.

Recall that the regression line has a positive slope b1b_1 corresponding to our explanatory 𝑥𝑥 variable bty_avg. This is because instructors tend to have higher bty_avg scores, they also tend to have higher teaching evaluation scores. However, what is the numerical value of the slope b1b_1? What about the intercept b0b_0?

We can obtain the values of the intercept b0b_0 and the slope for bty_avg b1b_1 by outputting a linear regression table. This is done in two steps:

  1. We first fit the linear regression model using the lm() function and save it in score_model.

  2. We get the regression table by applying the get_regression_table() function from the moderndive package to score_model

Get hands-on with 1300+ tech skills courses.