How to Draw a Histogram Plot

In this lesson, we will learn how to represent the distribution of numerical data using a histogram.

The histogra is an important graph in statistics and data analysis. It can be used to help people quickly understand the distribution of data. In order to draw a histogram, we follow the steps outlined below:

  • Step 1: BinBin the range of your data.
  • Step 2: Divide the entire range of values into their corresponding bins.
  • Step 3: Count how many values fall into each different bin

What is hist()?

The function in Matplotlib that we can use to draw a histogram is hist(). Below are some of the important parameters that we may need:

  • x: Our input values, either a single list/array or multiple sequences of arrays.
  • bins: If bins is set with an integer, it will define the number of equal-width bins within a range. If bins is set with a sequence, it will define the bin edges, including the left edge of the first bin and the right edge of the last bin.
  • histtype: Sets the style of the histogram. The default value is bar. step generates a line plot that is unfilled by default. stepfilled generates a line plot that is filled by default.
  • density: Sets True or False. The default is set to False. If True, the histogram will be normalized to form a probability density.
  • cumulative: Sets True or -1. If True, then a histogram is computed where each bin gives the count in that bin plus all bins for smaller values.

Plotting a histogram by using hist()

Below is a simple example of a histogram, where we have passed 2 thousand random data points to hist() at line 7.

Press + to interact
import matplotlib.pyplot as plt
import numpy as np
rng = np.random.RandomState(42)
data = np.random.randn(2000)
fig, axe = plt.subplots(dpi=800)
axe.hist(data)
fig.savefig("output/img.png")
plt.close(fig)

Changing the style of the histogram

The image and code below demonstrates how different parameters can affect the style of a histogram.

Line 8 changes the number of bins. Line 10 normalizes the histogram to form a probability density.Line 12changes thecolorof the histogram to red.Line 14changes thehisttypetostep`.

Press + to interact
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(2000)
fig, axe = plt.subplots(nrows=2, ncols=2, dpi=800)
plt.tight_layout()
axe[0][0].hist(data, bins=30)
axe[0][0].set_title("set bins=30")
axe[0][1].hist(data, density=True)
axe[0][1].set_title("normalized")
axe[1][0].hist(data, color="r")
axe[1][0].set_title("set color as red")
axe[1][1].hist(data, histtype='step')
axe[1][1].set_title("step")
fig.savefig("output/output.png")
plt.close(fig)

Drawing more than one histogram in a chart

Sometimes, we need to compare the distribution of data from different data sets. Drawing multiple histograms in the same chart can help us better understand the data. The following image shows three normally distributed sets of data:

Press + to interact
import matplotlib.pyplot as plt
import numpy as np
data = np.random.randn(2000)
fig, axe = plt.subplots(nrows=2, ncols=2, dpi=800)
plt.tight_layout()
axe[0][0].hist(data, bins=30)
axe[0][0].set_title("set bins=30")
axe[0][1].hist(data, density=True)
axe[0][1].set_title("normalized")
axe[1][0].hist(data, color="r")
axe[1][0].set_title("set color as red")
axe[1][1].hist(data, histtype='step')
axe[1][1].set_title("step")
fig.savefig("output/output.png")
plt.close(fig)

Drawing a curve to fit the histogram

Sometimes, we need to draw a curve to fit the histogram. Drawing a curve requires some of the data that is returned by hist. The following are values that are returned by `hist:

  • n: The values of the histogram bins.
  • bins: The edges of the bins.

There is another return value patches, but, in this example, we only need n and bins.

Press + to interact
import matplotlib.pyplot as plt
import numpy as np
sigma = 1
mu = 0
fig, axe = plt.subplots(dpi=800)
data = np.random.normal(mu, sigma, 3000)
n, bins, _ = axe.hist(data, bins=40, density=True)
y = ((1 / (np.sqrt(2 * np.pi) * sigma)) *
np.exp(-0.5 * (1 / sigma * (bins - mu))**2))
axe.plot(bins, y, '--', color='r')
fig.savefig("output/output.png")
plt.close(fig)

Histograms help us visualize the distribution of data. If we want to know the proportion of categorical data in relation to an overall value, however, then the pie chart is what we would use.