Bayesian Machine Learning for Optimization in Python/

...

Frequentist vs. Bayesian Statistics

Distinguish between frequentist and Bayesian statistics using multiple examples involving coding in Python in this lesson.

We'll cover the following...

Types of statistics
- Frequentist statistics
  - Example
- Bayesian statistics
  - Example
Bayesian vs. frequentist statistics

Types of statistics

Frequentist and Bayesian statistics are two approaches to statistical inference used to conclude a population based on sample data.

Frequentist statistics is based on repeated sampling, where the probability of an event is determined by the relative frequency of that event occurring in a large number of independent samples. In this approach, likelihood is considered a long-term relative frequency of an event occurring and is not necessarily associated with any individual event. In frequentist statistics, statistical inference is based on hypothesis testing, where a null hypothesis is assumed to be valid until sufficient evidence is found to reject it in favor of an alternative idea.

Bayesian statistics, on the other hand, is based on subjective probability, where the likelihood of an event is determined by an individual’s belief or degree of confidence in that event occurring. In this approach, probability is considered a measure of an individual’s uncertainty about an event and can be updated as new information becomes available. In Bayesian statistics, statistical inference is based on updating our prior beliefs about an event based on new evidence, using Bayes’ theorem.

Both frequentist and Bayesian statistics have their strengths and weaknesses, and which approach is used depends on the problem being addressed and the goals of the analysis. We'll see them in detail in subsequent sections.

Frequentist statistics

Frequentist statistics is a branch that uses probability theory and assumes that a fixed set of underlying probability distributions can explain all observations. It provides inferences and predictions about a population based on data from a sample. Frequentist statistics aim to infer the population parameters from the sample data. It uses maximum likelihood estimation (MLE)Maximum likelihood estimation estimates the parameters of a population based on the sample data. , confidence intervalsConfidence intervals provide a range of values in which the population parameter is likely to lie., and hypothesis testingHypothesis testing is used to determine whether a given hypothesis is true or false based on the evidence from the sample data. to make inferences about the population. Frequentist statistics has been widely used for many decades and is one of the most popular methods in statistical inference. Frequentist statistics can also calculate the probability of observing a particular importance in the sample data given the population parameters.

Example

Let’s consider an example. The municipal corporation has assigned us a the task of using statistics to suggest a standard door size for all public buildings. In order to do so, we first have to find what the average height of the population in the municipality is. This is shown in the illustration below.

Press + to interact

Python 3.8

#Importing numpy module
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
#Sampling the height for 1000 samples
np.random.seed(42)
pop_1000 = np.round(np.random.uniform(low=160, high=190, size=(1000,)),2)
#Printing the result
print(f"A sample from the height of thousand people is as follows.")
print(pop_1000[5])
figure(figsize=(8, 6), dpi=80)
plt.hist(pop_1000)
plt.xlabel("Heights in cm")
plt.ylabel("Frequency of Heights")
plt.title("Distribution of Heights for a sample of 1000 people")
plt.savefig('output/graph.png')

Press + to interact

Python 3.8

#Importing numpy module
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
#Sampling the height for 1000 samples
np.random.seed(42)
pop_1000 = np.round(np.random.uniform(low=160, high=190, size=(1000,)),2)
population_mean = pop_1000.mean()
# Calculate the confidence interval
import scipy.stats as stats
alpha = 0.05
confidence_level=1-alpha
z_score = stats.norm.ppf(1-(alpha/2))
sample_std = stats.tstd(pop_1000)
margin_of_error = z_score * (sample_std / (len(pop_1000) ** 0.5))
confidence_interval = (population_mean - margin_of_error, population_mean + margin_of_error)
print(f"The average height of population calculated using sample is {population_mean:.2f}.\nHowever, we are not 100% sure about this. We are {confidence_level*100}% sure that the value of population mean is between {confidence_interval[0]:.2f} and {confidence_interval[1]:.2f}.")

Introduction to Bayesian Statistics

Bayesian Statistics: Knowledge Check

Bayesian Machine Learning

Regression Using Bayes’ Theorem: Knowledge Check

Optimization: An Overview

Optimization: Knowledge Check

Bayesian Optimization: From Scratch

Hyperparameter Tuning Using Bayesian Optimization

Conclusion

Overcoming Uncertainty with Bayesian Probability in Python

Frequentist vs. Bayesian Statistics

Types of statistics

Frequentist statistics

Example