Histograms and Density Plots

Learn about graphical methods to analyze the basic properties of your data: the histogram and the density plot.

Visual inspection remains one of the most powerful ways to understand data. We can get a graphical intuition of the moments of the distribution of our data by selecting the right type of chart. Of all the possible charts that we could choose to explore our data, one will almost always be an excellent choice: the histogram. In this lesson, we go a bit deeper into this type of chart, together with its counterpart, the density plot.

Histograms

The histogram is the most common way to represent the unconditional distribution of a sampled random variable. Its beauty lies in its simplicity. At a glance, we can locate the mean, its spread (the variance), the symmetry of its tails, and any potential outliers.

The idea behind a histogram is easy to grasp: The values in our sample are grouped into intervals, which we call bins. The histogram will then show how many data points fall within each of these bins.

But don’t be fooled! Easy as they may be to interpret, generating good histograms is not a quick one-liner type of job. It requires some thought and a lot of trial and error. In the following section, we’ll see why.

Creating histograms

We concluded the paragraph above by saying that generating a good histogram is not a quick one-liner code type of job. While that is true, it is also true that we can get a histogram in one line of Python code. The classic way to create such a histogram is by using the Matplotlib library and, in particular, its pyplot collection.

The matplotlib.pyplot.hist function’s only required argument is an array or sequence of arrays, x. We’ll go into optional arguments a bit further down. The pandas library also allows us to plot a histogram of a series by calling the method hist. Under the hood, pandas is actually using pyplot, so we’ll get the same result one way or another. In fact, the only difference that we’ll see between the two is that matplotlib.pyplot.hist returns a chart with a grid, whereas the grid disappears in the pandas implementation. Let’s use Matplotlib to plot a histogram of our average temperatures:

Get hands-on with 1400+ tech skills courses.