Regression Analysis

Learn about regression analysis and the implementation of linear models in R.

The concept of regression

Regression analysis is a statistical method for examining the relationships between a dependent variable and one or more independent variables. Regression calculates the amount of change in the dependent variable when the independent variable moves one unit. It aims to prove the relationship statistically.

To do that, regression analysis tries to formulate the relationship between the variables as much as possible by using a linear line which is called the regression line.

For example, the chart below shows the locations of the data points across the axes. Can we claim that there is a relationship between the variables?

Press + to interact
Distribution of data points
Distribution of data points

It seems like there is a relationship, but we must prove it using a statistical test. As mentioned, we use the regression model to prove it.

Like in other methods, we have null and alternative hypotheses in the regression analysis. Our null hypothesis states that there is no relationship between the variables. We try to find the probability of having variables with this relationship if the null hypothesis is true.

Application of the linear model

We use the lm() function to apply the regression test in R. It requires us to define dependent and independent variables along with the dataset. We use the ~ operator to specify the relationship. The variable on the left of the ~ operator is dependent, and the one on the right is the independent variable. The function returns the variable coefficients (slope) and the interception point of the regression line, which explain the relationship. The syntax structure is like this:

# Apply the regression analysis and store it in a variable
regression_analysis <- lm(<dependent variable> ~ <independent variable>, data= <dataset>)  
# See the details of the
...