Pennies Activity

Learn about bootstrapping through the pennies activity.

Needed packages

Let’s load all the packages needed to run the programs.

Press + to interact
library(tidyverse)
library(moderndive)
library(infer)

We’ll begin with a hands-on tactile activity.

What was the average year on US pennies in 2019?

Try to imagine all the pennies being used in the United States in 2019. That’s a lot of pennies! Now, say we’re interested in the average year of minting of all these pennies. One way to compute this value will be to gather up all the pennies being used in the US, record the year, and compute the average. However, this will be near impossible! Instead, let’s collect a sample of 50 pennies from a local bank in downtown Northampton, Massachusetts, USA, as seen in the figure below:

Press + to interact
Collecting a sample of 50 US pennies from a local bank
Collecting a sample of 50 US pennies from a local bank

An image of these 50 pennies can be seen in the figure below. For each of the 50 pennies starting from the top left, progressing row by row, and ending at the bottom right, we assigned an ID identification variable and marked the year of minting.

Press + to interact
50 US pennies labeled
50 US pennies labeled

The moderndive package contains this data on our 50 sampled pennies in the pennies_sample data frame:

Press + to interact
pennies_sample

The pennies_sample data frame has 50 rows corresponding to each penny with two variables. The first variable ID corresponds to the ID labels, whereas the second variable year corresponds to the year of minting saved as a numeric variable, also known as a double (dbl).

Based on these 50 sampled pennies, what can we say about all the US pennies in 2019? Let’s study some properties of our sample by performing an EDA. Let’s first visualize the distribution of the year of these 50 pennies using our data visualization tools. We use a histogram to visualize the year distribution, because year is a numerical variable.

Press + to interact
ggplot(pennies_sample, aes(x = year)) +
geom_histogram(binwidth = 10, color = "white")

Observe the slightly left-skewed distribution because most pennies fall between 1980 and 2010, with only a few pennies older than 1970. What’s the average year for the 50 sampled pennies? Eyeballing the histogram, it appears to be around 1990. Let’s now compute this value exactly using our data wrangling tools.

Press + to interact
pennies_sample %>%
summarize(mean_year = mean(year))

Therefore, if we’re willing to assume that pennies_sample is a representative sample of all US pennies, a good guess of the average year of minting of all US pennies will be 1995. This should all start sounding similar to what we did previously.

Here our population is𝑁𝑁 = whatever the number of pennies are being used in the US, a value which we don’t know and probably never will. The population parameter of interest is now the population mean year of all these pennies, a value denoted ...