...

/

EDA for a Numerical Explanatory Variable

EDA for a Numerical Explanatory Variable

Learn about analyzing numerical data to make observations that will help in regression.

We'll cover the following...

Typing out all these summary statistic functions in summarize() would be long and tedious. Instead, let’s use the convenient skim() function from the skimr package. This function takes in a data frame, skims it, and returns the commonly used summary statistics. Let’s take our evals_ch5 data frame, select() only the outcome and explanatory variables teaching score and bty_avg, and pipe them into the skim() function:

Press + to interact
evals_ch5 %>%
select(score, bty_avg) %>% skim()

For the numerical variables teaching score and bty_avg, it returns:

  • n_missing: This is the number of missing values.

  • complete_rate: This is the number of non-missing or complete values.

  • mean: This is the average.

  • sd: This is the standard deviation.

  • p0: The 0th percentile is the value at which 0% of the observations are smaller than it ...