What is Analysis of Variance (ANOVA)?

Introduction

Analysis of Variance (ANOVA) is used in statistics to find any statistical differences between three or more samples. It is a hypothesis testing method for checking variance in a population. It is mostly used in regressions to filter out random and systematic variables, which helps us identify where the similarity between two samples might exist and what variables might affect the data the most.

Calculating ANOVA

The formula for ANOVA is:

F=MSTMSEF = \frac{MST}{MSE}

where,

MST = Mean square of treatment

MSE = Mean square of the error

Types of ANOVA

There are two main types of ANOVA.

  1. One-way ANOVA: This determines the statistical similarity between three or more samples for one dependent and one independent data point.
  2. Two-way ANOVA: This is an add-on to the one-way ANOVA, which allows for two independent variables to be compared.

Performing ANOVA

To execute ANOVA in R, we can simply use the built-in aov function.

One-Way Anova

#load dataset
plant = PlantGrowth
#visualize dataset
head(plant)
#conduct anova with weight as dep and group as ind var.
owanova = aov(plant$weight ~ factor(plant$group))
#visualize results
summary(owanova)

Results

We can see that the control group shows a one-star relation with the weight, which indicates that they are not strongly correlated, and one thing will not the other majorly.

Code explanation

Line 2: We copy the built-in data library from R called PlantGrowth. This contains the data for the effect on the length of plant growth under three different conditions.

Line 4: We use the head function to visualize our dataset.

Line 6: To conduct ANOVA, we use the aov function. The function requires two data variables (one independent and one dependent), separated by ~ , for conducting a two-way ANOVA on the dataset. We add len as our dependant and supp and dose as our independent variables. This will then conduct the ANOVA testing and return the result.

Line 8: To visualize our results, we use the summary function.

Two-way ANOVA

#load dataset
tooth = ToothGrowth
#visualize dataset
head(tooth)
#conduct anova with len as dep and supp and dose as ind vars.
twanova = aov(tooth$len ~ factor(tooth$supp)*factor(tooth$dose))
#visualize results.
summary(twanova)

Results

We can observe three-star relations of length with both supplement and dosage, which indicates a strong correlation. However, we can see both the values aren't correlated similarly as their combined correlation is one-star.

Code explanation

Line 2: We copy the built-in data library from R called ToothGrowth. This contains the data for the effect on tooth growth in guinea pigs who are fed Vitamin C.

Line 4: We use the head function to visualize our dataset.

Line 6: To conduct ANOVA, we use the aov function. The function requires three data variables (two independent and one dependent) for conducting two-way ANOVA on the dataset. We add len as our dependant and supp and dose as our independent variables. This will then conduct the ANOVA testing and return the result.

Line 8: To visualize our results, we use the summary function.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved