...
/Exploratory Data Analysis of One Categorical Explanatory Variable
Exploratory Data Analysis of One Categorical Explanatory Variable
Learn about analyzing categorical data to make observations that will help in regression.
We'll cover the following...
The data on the 142 countries can be found in the gapminder
data frame included in the gapminder
package. However, to keep things simple, let’s filter()
for only those observations/rows corresponding to the year 2007. Additionally, let’s select()
only the subset of the variables we’ll consider. We’ll save this data in a new data frame called gapminder2007
, as follows:
library(gapminder)gapminder2007 <- gapminder %>%filter(year == 2007) %>%select(country, lifeExp, continent, gdpPercap)gapminder2007
Let’s perform the first common step in an exploratory data analysis, which is looking at the raw data values. We’ll do this by using the glimpse()
command for exploring data frames:
glimpse(gapminder2007)?gapminder
Observe that Rows: 142
indicates that there are 142 rows/observations in gapminder2007
, where each row corresponds to one country. In other words, the observational unit is an individual country. Furthermore, observe that the variable continent
is of type <fct>
, which stands for factor and is R’s way of encoding categorical variables.
A full description of all the variables included in gapminder
can be found by reading the associated help file, which can be accessed by executing the ?gapminder
command, as demonstrated above. However, let’s fully describe only the four variables we selected in gapminder2007
:
country
: This is an identification variable of type character/text used to distinguish the 142 countries in the dataset.lifeExp
: This is a numerical variable of that country’s life expectancy at birth. This is the outcome variableof interest. continent
: This is a categorical variable with five levels. Here, levels correspond to the possible categories—Africa, Asia, Americas, Europe, and Oceania. This is the explanatory variable...