...

/

Exploratory Data Analysis of One Categorical Explanatory Variable

Exploratory Data Analysis of One Categorical Explanatory Variable

Learn about analyzing categorical data to make observations that will help in regression.

We'll cover the following...

The data on the 142 countries can be found in the gapminder data frame included in the gapminder package. However, to keep things simple, let’s filter() for only those observations/rows corresponding to the year 2007. Additionally, let’s select() only the subset of the variables we’ll consider. We’ll save this data in a new data frame called gapminder2007, as follows:

Press + to interact
library(gapminder)
gapminder2007 <- gapminder %>%
filter(year == 2007) %>%
select(country, lifeExp, continent, gdpPercap)
gapminder2007

Let’s perform the first common step in an exploratory data analysis, which is looking at the raw data values. We’ll do this by using the glimpse() command for exploring data frames:

Press + to interact
glimpse(gapminder2007)
?gapminder

Observe that Rows: 142 indicates that there are 142 rows/observations in gapminder2007, where each row corresponds to one country. In other words, the observational unit is an individual country. Furthermore, observe that the variable continent is of type <fct>, which stands for factor and is R’s way of encoding categorical variables.

A full description of all the variables included in gapminder can be found by reading the associated help file, which can be accessed by executing the ?gapminder command, as demonstrated above. However, let’s fully describe only the four variables we selected in gapminder2007:

  1. country: This is an identification variable of type character/text used to distinguish the 142 countries in the dataset.

  2. lifeExp: This is a numerical variable of that country’s life expectancy at birth. This is the outcome variable 𝑦𝑦 of interest.

  3. continent: This is a categorical variable with five levels. Here, levels correspond to the possible categories—Africa, Asia, Americas, Europe, and Oceania. This is the explanatory variable 𝑥𝑥 ...