KDE Plots

Learn how to plot, design, and interpret KDE plots for data visualizations.

Overview

A KDE plot is a kernel density estimation plot that allows us to estimate the probability density function of the data. It uses a nonparametric probability density function, meaning that we don’t assume any prior distribution within our data.

Univariate KDE plot

We have already imported the required libraries and saved the penguins dataset in the DataFrame penguins_df (after removing the null values). We use the sns.kdeplot() function and pass the variable 'body_mass_g' to see its distribution as a curve. We can observe from the plot below that most of the penguins have a body_mass_g between 3000–4000 since this is where the KDE plot peaks, and very few are in the 6000–7000 range.

Press + to interact
sns.kdeplot(x='body_mass_g' , data = penguins_df)
plt.savefig('output/graph.png')

The bandwidth of a KDE plot is customizable by using the bw_adjust parameter, the default value of which is 1. In the KDE plot, bandwidth acts as a smoothing factor. Lower numbers correspond to spiky curves, whereas larger values reflect smoother curves.

Press + to interact
sns.kdeplot(x='body_mass_g' , data = penguins_df, bw_adjust = .08) #low bandwidth
plt.savefig('output/graph.png')

It’s important to be careful about the bandwidth ...