Promotions Activity

Learn about the introduction and promotions activity of hypothesis testing.

Introduction and needed packages

Now that we’ve studied confidence intervals, let’s study another commonly used method for statistical inference: hypothesis testing. Hypothesis tests allow us to take a sample of data from a population and infer about the plausibility of competing hypotheses. For example, in the upcoming promotions activity, we’ll study the data collected from a psychology study in the 1970s. We’ll study this data to investigate whether gender-based discrimination in promotion rates existed in the banking industry at the time of the study.

There was one general framework that applies to all confidence intervals and the infer package was designed around this framework. While the specifics may change slightly for different types of confidence intervals, the general framework stays the same.

We believe that this approach is much better for long-term learning than focusing on specific details for specific confidence intervals using theory-based approaches. As we’ll now see, we prefer this general framework for hypothesis tests as well.

Press + to interact
library(tidyverse)
library(infer)
library(moderndive)
library(nycflights13)
library(ggplot2movies)

Let’s start with an activity studying the effect of gender on promotions at a bank.

Does gender affect promotions at a bank?

Say we’re working at a bank in the 1970s and we’re submitting our résumé to apply for a promotion. Will our gender affect our chances of getting promoted? To answer this question, we’ll focus on data from a study published in the Journal of Applied Psychology in 1974. This data is also used in the OpenIntro series of statistics textbooks.

To begin the study, 48 bank supervisors were asked to assume the role of a hypothetical director of a bank with multiple branches. Every one of the bank supervisors was given a résumé and asked whether or not the candidate on the résumé was fit to be promoted to a new position in one of their branches.

However, each of these 48 résumés were identical in all respects except one, which was the name of the applicant at the top of the résumé. Of the supervisors, 24 were randomly given résumés with stereotypically male names, while 24 of the supervisors were randomly given résumés with stereotypically female names. Only (binary) gender varied from résumé to résumé, therefore, researchers could isolate the effect of this variable in promotion rates.

Note: Though this study was conducted at a time when more nuanced views of gender were not as prevalent, this example presents ideas still relevant today about how we could study discrimination in the workplace.

The moderndive package contains the data on the 48 applicants in the promotions data frame. Let’s explore this data by looking at six randomly selected rows:

Press + to interact
promotions %>% sample_n(size = 6) %>%
arrange(id)

The variable id acts as an identification variable for all 48 rows. The decision variable indicates whether the applicant was selected for promotion or not. The gender variable indicates the gender of the name used on the résumé. Recall that this data doesn’t pertain to 24 actual men and 24 actual women, but rather 48 identical résumés. Out of these 48 identical résumés, 24 were assigned stereotypically male names and 24 were assigned stereotypically female names.

Let’s perform an exploratory data analysis of the relationship between the two categorical variables decision and gender. Recall that one way we can visualize such a relationship is by using a stacked barplot.

Press + to interact
ggplot(promotions, aes(x = gender, fill = decision)) + geom_bar() +
labs(x = "Gender of name on résumé")

Observe in the code output above that it appears that résumés with female names were much less likely to be accepted for promotion. Let’s quantify these promotion rates by computing the proportion of résumés accepted for promotion for each group using the dplyr package for data wrangling. Note the use of the tally() function here, which is a shortcut for summarize(n = n()) to get counts.

Press + to interact
promotions %>%
group_by(gender, decision) %>% tally()

So of the 24 résumés with male names, 21 were selected for promotion, for a proportion of 21/24 = 0.875 = 87.5%. On the other hand, of the 24 résumés with female names, 14 were selected for promotion, for a proportion of 14/24 = 0.583 = 58.3%. Comparing these two rates of promotion, it appears that résumés with male names were selected for promotion at a rate 0.875 - 0.583 = 0.292 = 29.2% higher than résumés with female names. This is suggestive of an advantage for résumés with a male name on it.

The question is, however: Does this provide conclusive evidence that there’s gender discrimination in promotions at banks? Can a difference in promotion rates of 29.2% ...