Binary Data and The Wells Dataset
Let’s get a brief overview of binary data and the wells dataset.
We'll cover the following
R packages
We’ll use the following R packages in this chapter:
ggplot2
arm
ggfortify
Sleuth3
Binary data
One of the most important uses of GLMs is for the analysis of binary data. Binary data are an extreme form of binomial count data where the binomial denominator is equal to one, so that every trial produces a value of either 1 or 0. Therefore, binary data can be analyzed in a similar way to binomial counts. In other words, we can use a GLM with a binomial distribution and the same choice of link functions to prevent predictions from going below zero or above values of one. However, despite the use of the same distribution and link functions, due to the constrained nature of binary data, there are some differences in the analysis of binomial counts.
For one thing, the use of the ratio of the residual deviance to residual DF to diagnose overdispersion or underdispersion doesn’t apply. Given that R’s default set of residual checking plots are also of little (if any) use when applied to a binomial GLM, this leaves us without any means for model checking with the base distribution of R. Luckily, the arm
package
An example of the wells dataset
Our example dataset for a binary GLM comes from an Data_Binary_Wells
:
wells <- read.table("Data_Binary_Wells.txt", header = TRUE)
The example concerns an area of Bangladesh where many wells used for drinking water have been contaminated by naturally-occurring arsenic:
Get hands-on with 1400+ tech skills courses.