Data-Centric Statistical Inference Using R and Tidyverse/

...

Theory-Based Hypothesis Tests

Learn about theory-based hypothesis tests.

We'll cover the following...

Two-sample t-statistic

We now present an example of a traditional theory-based method to conduct hypothesis tests. This is similar to what we did when we showed a theory-based method for constructing confidence intervals that involved mathematical formulas. This method relies on probability models, probability distributions, and a few assumptions to construct the null distribution. This is in contrast to the approach we’ve been using throughout the course, where we relied on computer simulations to construct the null distribution.

These traditional theory-based methods have been used for decades mostly because researchers didn’t have access to computers that could run thousands of calculations quickly and efficiently. Now that computing power is much cheaper and more accessible, simulation-based methods are more feasible. However, researchers in many fields continue to use theory-based methods so we make it a point to include an example here.

As we’ll show, any theory-based method is ultimately an approximation to the simulation-based method. The theory-based method we’ll focus on is known as the two-sample t-test for testing differences in sample means. However, the test statistic we’ll use won’t be the difference in sample means $\bar𝑥_1 − \bar𝑥_2$ , but rather the related two-sample t-statistic. The data we’ll use will once again be the movies_sample data of action and romance movies.

Two-sample t-statistic

A common task in statistics is the process of standardizing a variable. By standardizing different variables, we make them more comparable. For example, say we’re interested in studying the distribution of temperature recordings from Portland, Oregon, USA, and comparing it to that of the temperature recordings in Montreal, Quebec, Canada. US temperatures are generally recorded in degrees Fahrenheit, and Canadian temperatures are generally recorded in degrees Celsius. Now, how can we make them comparable? One approach will be to convert degrees Fahrenheit into Celsius or vice versa. Another approach will be to convert them both to a common standardized scale, like degrees Kelvin.

One common method for standardizing a variable from probability and statistics theory is to compute the $z$ -score:

Here, $x$ represents one value of a variable, $\mu$ represents the mean of that variable, and $\sigma$ represents the standard deviation of that variable. We first subtract the mean $\mu$ from each value of $x$ and then divide $𝑥 − \mu$ by the standard deviation $\sigma$ . These operations will have the effect of “re-centering” our variable around 0 and rescaling our variable $x$ so that they have what are known as standard units. Therefore, for every value that our variable can take, it has a corresponding $z$ -score that gives how many standard units away that value is from the mean $\mu$ . The ...

Getting Started with Data in R

Data Visualization

Data Wrangling

Data Importing and “Tidy” Data

Basic Regression

Multiple Regression

Statistical Inference with the infer Package

Bootstrapping and Confidence Intervals

Hypothesis Testing

Inference for Regression

Price Prediction With Regression Analysis in R

Tell a Story with Data

Appendix

Uber Data Analysis Using the R Language

Theory-Based Hypothesis Tests

Two-sample t-statistic