...
/EDA for a Numerical Explanatory Variable
EDA for a Numerical Explanatory Variable
Learn about analyzing numerical data to make observations that will help in regression.
We'll cover the following...
Typing out all these summary statistic functions in summarize()
would be long and tedious. Instead, let’s use the convenient skim()
function from the skimr
package. This function takes in a data frame, skims it, and returns the commonly used summary statistics. Let’s take our evals_ch5
data frame, select()
only the outcome and explanatory variables teaching score
and bty_avg
, and pipe them into the skim()
function:
evals_ch5 %>%select(score, bty_avg) %>% skim()
For the numerical variables teaching score
and bty_avg
, it returns:
n_missing
: This is the number of missing values.complete_rate
: This is the number of non-missing or complete values.mean
: This is the average.sd
: This is the standard deviation.p0
: The 0th percentile is the value at which 0% of the observations are smaller than it ...