Chi-Squared Test
Explore how to use the chi-squared test in R to analyze relationships between categorical variables like smoking and lung cancer or preferences by age. Learn to create and interpret contingency tables, calculate test statistics, and understand p-values for hypothesis testing and goodness of fit.
We'll cover the following...
We use the chi-squared test to determine if there are statistically significant differences between the observed data and the expected data in a population. This test helps us investigate the relationship between two categorical variables.
Suppose we want to find the relationship between smoking and lung cancer. We would have four categories:
- Smokers with lung cancer
- Smokers without lung cancer
- Nonsmokers with lung cancer
- Nonsmokers without lung cancer
The chi-squared test calculates the difference between the expected numbers and the observed numbers of lung cancer cases as a result of smoking.
To calculate the test statistic, we sum up the squares of the differences between the observed and expected frequencies. Then, we divide the sum by the expected frequencies. The formula is as follows:
The notations are as follows:
-
...