...

/

Frequentist vs. Bayesian Statistics

Frequentist vs. Bayesian Statistics

Distinguish between frequentist and Bayesian statistics using multiple examples involving coding in Python in this lesson.

Types of statistics

Frequentist and Bayesian statistics are two approaches to statistical inference used to conclude a population based on sample data.

Frequentist statistics is based on repeated sampling, where the probability of an event is determined by the relative frequency of that event occurring in a large number of independent samples. In this approach, likelihood is considered a long-term relative frequency of an event occurring and is not necessarily associated with any individual event. In frequentist statistics, statistical inference is based on hypothesis testing, where a null hypothesis is assumed to be valid until sufficient evidence is found to reject it in favor of an alternative idea.

Bayesian statistics, on the other hand, is based on subjective probability, where the likelihood of an event is determined by an individual’s belief or degree of confidence in that event occurring. In this approach, probability is considered a measure of an individual’s uncertainty about an event and can be updated as new information becomes available. In Bayesian statistics, statistical inference is based on updating our prior beliefs about an event based on new evidence, using Bayes’ theorem.

Both frequentist and Bayesian statistics have their strengths and weaknesses, and which approach is used depends on the problem being addressed and the goals of the analysis. We'll see them in detail in subsequent sections.

Frequentist statistics

Frequentist statistics is a branch that uses probability theory and assumes that a fixed set of underlying probability distributions can explain all observations. It provides inferences and predictions about a population based on data from a sample. Frequentist statistics aim to infer the population parameters from the sample data. It uses maximum likelihood estimation (MLE)Maximum likelihood estimation estimates the parameters of a population based on the sample data. , confidence intervalsConfidence intervals provide a range of values in which the population parameter is likely to lie., and hypothesis testingHypothesis testing is used to determine whether a given hypothesis is true or false based on the evidence from the sample data. to make inferences about the population. Frequentist statistics has been widely used for many decades and is one of the most popular methods in statistical inference. Frequentist statistics can also calculate the probability of observing a particular importance in the sample data given the population parameters.

Example

Let’s consider an example. The municipal corporation has assigned us a the task of using statistics to suggest a standard door size for all public buildings. In order to do so, we first have to find what the average height of the population in the municipality is. This is shown in the illustration below.

Press + to interact

The ideal way is to collect the heights of all the people from the municipality and calculate their mean (average) height. However, this is not feasible. So let’s assemble a sample of 1000 people to find the population’s mean height. This is done in Python as follows:

Press + to interact
#Importing numpy module
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
#Sampling the height for 1000 samples
np.random.seed(42)
pop_1000 = np.round(np.random.uniform(low=160, high=190, size=(1000,)),2)
#Printing the result
print(f"A sample from the height of thousand people is as follows.")
print(pop_1000[5])
figure(figsize=(8, 6), dpi=80)
plt.hist(pop_1000)
plt.xlabel("Heights in cm")
plt.ylabel("Frequency of Heights")
plt.title("Distribution of Heights for a sample of 1000 people")
plt.savefig('output/graph.png')

In the code above:

  • Lines 2–4: We import the required libraries.
  • Lines 6–7: We generate the sample of 1000 heights using a uniform distribution.
  • Lines 11–15: We plot the distribution of the sample space.

From the data of the sample space generated above, we would use MLE to calculate the mean height for the population. For simplicity, this is the same as the sample mean. As the mean height calculated using MLE is of the sample, we would then use a confidence interval to provide a range of values in which the population mean is likely to lie.

Press + to interact
#Importing numpy module
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
#Sampling the height for 1000 samples
np.random.seed(42)
pop_1000 = np.round(np.random.uniform(low=160, high=190, size=(1000,)),2)
population_mean = pop_1000.mean()
# Calculate the confidence interval
import scipy.stats as stats
alpha = 0.05
confidence_level=1-alpha
z_score = stats.norm.ppf(1-(alpha/2))
sample_std = stats.tstd(pop_1000)
margin_of_error = z_score * (sample_std / (len(pop_1000) ** 0.5))
confidence_interval = (population_mean - margin_of_error, population_mean + margin_of_error)
print(f"The average height of population calculated using sample is {population_mean:.2f}.\nHowever, we are not 100% sure about this. We are {confidence_level*100}% sure that the value of population mean is between {confidence_interval[0]:.2f} and {confidence_interval[1]:.2f}.")

In the code above:

  • Lines 2–8: We import the required libraries and create the sample space.
  • Lines 6–7: We calculate the z-score.
  • Lines 15–17: We estimate the margin of error, which is used to estimate the confidence interval.
...
Access this course and 1400+ top-rated courses and projects.