Pennies Activity
Learn about bootstrapping through the pennies activity.
Needed packages
Let’s load all the packages needed to run the programs.
library(tidyverse)library(moderndive)library(infer)
We’ll begin with a hands-on tactile activity.
What was the average year on US pennies in 2019?
Try to imagine all the pennies being used in the United States in 2019. That’s a lot of pennies! Now, say we’re interested in the average year of minting of all these pennies. One way to compute this value will be to gather up all the pennies being used in the US, record the year, and compute the average. However, this will be near impossible! Instead, let’s collect a sample of 50 pennies from a local bank in downtown Northampton, Massachusetts, USA, as seen in the figure below:
An image of these 50 pennies can be seen in the figure below. For each of the 50 pennies starting from the top left, progressing row by row, and ending at the bottom right, we assigned an ID identification variable and marked the year of minting.
The moderndive
package contains this data on our 50 sampled pennies in the pennies_sample
data frame:
pennies_sample
The pennies_sample
data frame has 50 rows corresponding to each penny with two variables. The first variable ID
corresponds to the ID labels, whereas the second variable year
corresponds to the year of minting saved as a numeric variable, also known as a double (dbl
).
Based on these 50 sampled pennies, what can we say about all the US pennies in 2019? Let’s study some properties of our sample by performing an EDA. Let’s first visualize the distribution of the year of these 50 pennies using our data visualization tools. We use a histogram to visualize the year
distribution, because year
is a numerical variable.
ggplot(pennies_sample, aes(x = year)) +geom_histogram(binwidth = 10, color = "white")
Observe the slightly left-skewed distribution because most pennies fall between 1980 and 2010, with only a few pennies older than 1970. What’s the average year for the 50 sampled pennies? Eyeballing the histogram, it appears to be around 1990. Let’s now compute this value exactly using our data wrangling tools.
pennies_sample %>%summarize(mean_year = mean(year))
Therefore, if we’re willing to assume that pennies_sample
is a representative sample of all US pennies, a good guess of the average year of minting of all US pennies will be 1995. This should all start sounding similar to what we did previously.
Here our population is