Regression Analysis
Learn about regression analysis and the implementation of linear models in R.
We'll cover the following...
The concept of regression
Regression analysis is a statistical method for examining the relationships between a dependent variable and one or more independent variables. Regression calculates the amount of change in the dependent variable when the independent variable moves one unit. It aims to prove the relationship statistically.
To do that, regression analysis tries to formulate the relationship between the variables as much as possible by using a linear line which is called the regression line.
For example, the chart below shows the locations of the data points across the axes. Can we claim that there is a relationship between the variables?
It seems like there is a relationship, but we must prove it using a statistical test. As mentioned, we use the regression model to prove it.
Like in other methods, we have null and alternative hypotheses in the regression analysis. Our null hypothesis states that there is no relationship between the variables. We try to find the probability of having variables with this relationship if the null hypothesis is true.
Application of the linear model
We use the lm()
function to apply the regression test in R. It requires us to define dependent and independent variables along with the dataset. We use the ~
operator to specify the relationship. The variable on the left of the ~
operator is dependent, and the one on the right is the independent variable. The function returns the variable coefficients (slope) and the interception point of the regression line, which explain the relationship.
The syntax structure is like this:
# Apply the regression analysis and store it in a variable
regression_analysis <- lm(<dependent variable> ~ <independent variable>, data= <dataset>)
# See the details of the
...