Data-Centric Statistical Inference Using R and Tidyverse/

...

EDA for a Numerical Explanatory Variable

Learn about analyzing numerical data to make observations that will help in regression.

We'll cover the following...

Press + to interact

For the numerical variables teaching score and bty_avg, it returns:

n_missing: This is the number of missing values.
complete_rate: This is the number of non-missing or complete values.
mean: This is the average.
sd: This is the standard deviation.
p0: The 0th percentile is the value at which 0% of the observations are smaller than it (the minimum value).
p25: The 25th percentile is the value at which 25% of the observations are smaller than it (the 1st quartile).
p50: The 50th percentile is the value at which 50% of the observations are smaller than it (the 2nd quartile and more commonly called the median).
p75: The 75th percentile is the value at which 75% of the observations are smaller than it (the 3rd quartile).
p100: The 100th percentile is the value at which 100% of observations are smaller than it (the maximum value).

Looking at this output, we can see how the values of both variables distribute. For example, the mean teaching score was 4.17 out of 5, whereas the ...

Getting Started with Data in R

Data Visualization

Data Wrangling

Data Importing and “Tidy” Data

Basic Regression

Multiple Regression

Statistical Inference with the infer Package

Bootstrapping and Confidence Intervals

Hypothesis Testing

Inference for Regression

Price Prediction With Regression Analysis in R

Tell a Story with Data

Appendix

Uber Data Analysis Using the R Language

EDA for a Numerical Explanatory Variable