Confidence Interval for the Median
Learn, calculate, and compare the confidence interval of the median using the classical and bootstrapping approaches.
In the case of estimating the confidence around the sample mean, the bootstrapping procedure may not be beneficial since our sample mean has nicer distributional properties, as shown in the following distribution plot of APM.
sns.displot(x_boot_mean, bins=50, kde=False, height=5, aspect=3, alpha=0.5) # distribution plot# x meanplt.axvline(x = np.mean(x_mean), lw=8, color='red',label='The Mean')plt.axvline(x = x_lower, lw=5, color='red', alpha=0.5,label='Conventional CI \nestimate for mean') # CI lowerplt.axvline(x = x_upper, lw=5, color='red', alpha=0.5) # CI upper# Mean of bootstrapped means (the mean of means)plt.axvline(mean_of_x_boot_means, lw=4, ls='dashed', color='yellow',label='Bootstrapped Mean')plt.axvline(x_boot_lower, lw=2, ls='dashed', color='yellow',label='Bootstrapped CI \nestimate for mean') # CI lowerplt.axvline(x_boot_upper, lw=2, ls='dashed', color='yellow') # CI lowerplt.text(np.mean(x_mean)+0.3,80,'The Mean/s',rotation=90,verticalalignment='center',weight="bold")plt.text(x_lower-0.7,70,'2.5th percentile/s',rotation=90,verticalalignment='center')plt.text(x_upper+0.5,70,'97.5th percentile/s',rotation=90,verticalalignment='center')plt.title("Distribution plot of bootstrapped means and the\n computed CIs (95%) for the mean/s.\n\n")plt.ylim(0,100);plt.xlim(100,130)some_aesthetics('Bootstrapped Means','Frequency/Count');
Both mean and median are often presented as descriptive statistics. However, the median is a central value of the data. It is a value for which we expect half of the (possible or observed) values to be smaller and the other half to be larger. Bootstrap becomes much more helpful when we need to calculate our uncertainty around statistics without straightforward formulas or ones with unreasonably strict assumptions. The median is one such statistic.
The standard error (S.E.) of medians is ~25% greater than the standard error of the mean, and the formula and can be written as:
The above equation is a function of the mean's standard error (S.E.) and uses a heuristic multiplier of 1.2533. Furthermore, it requires these assumptions to work:
is large (a large number of observations). The sample of measurements is drawn from a normally distributed population.
What if these assumptions are impractical? The second assumption is strict—many distributions are not normal. The median is much more helpful when we suspect a non-normally distributed ...