...

Confidence Interval for the Median

Learn, calculate, and compare the confidence interval of the median using the classical and bootstrapping approaches.

We'll cover the following...

Conventional approach
Bootstrap approach
Bootstrapped vs. conventional confidence intervals
Summary

Press + to interact

Python 3.8

sns.displot(x_boot_mean, bins=50, kde=False, height=5, aspect=3, alpha=0.5) # distribution plot
# x mean
plt.axvline(x = np.mean(x_mean), lw=8, color='red',label='The Mean')
plt.axvline(x = x_lower, lw=5, color='red', alpha=0.5,label='Conventional CI \nestimate for mean') # CI lower
plt.axvline(x = x_upper, lw=5, color='red', alpha=0.5) # CI upper
# Mean of bootstrapped means (the mean of means)
plt.axvline(mean_of_x_boot_means, lw=4, ls='dashed', color='yellow',label='Bootstrapped Mean')
plt.axvline(x_boot_lower, lw=2, ls='dashed', color='yellow',label='Bootstrapped CI \nestimate for mean') # CI lower
plt.axvline(x_boot_upper, lw=2, ls='dashed', color='yellow') # CI lower
plt.text(np.mean(x_mean)+0.3,80,'The Mean/s',rotation=90,verticalalignment='center',weight="bold")
plt.text(x_lower-0.7,70,'2.5th percentile/s',rotation=90,verticalalignment='center')
plt.text(x_upper+0.5,70,'97.5th percentile/s',rotation=90,verticalalignment='center')
plt.title("Distribution plot of bootstrapped means and the\n computed CIs (95%) for the mean/s.\n\n")
plt.ylim(0,100);plt.xlim(100,130)
some_aesthetics('Bootstrapped Means','Frequency/Count');

Both mean and median are often presented as descriptive statistics. However, the median is a central value of the data. It is a value for which we expect half of the (possible or observed) values to be smaller and the other half to be larger. Bootstrap becomes much more helpful when we need to calculate our uncertainty around statistics without straightforward formulas or ones with unreasonably strict assumptions. The median is one such statistic.

The standard error (S.E.) of medians is ~25% greater than the standard error of the mean, and the formula and can be written as:

The above equation is a function of the mean's standard error (S.E.) and uses a heuristic multiplier of 1.2533. Furthermore, it requires these assumptions to work:

$n$ is large (a large number of observations).
The sample of measurements is drawn from a normally distributed population.

What if these assumptions are impractical? The second assumption is strict—many distributions are not normal. The median is much more helpful when we suspect a non-normally distributed population. The mean, median, and mode of a normal distribution ...

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

Confidence Interval for the Median