How to Draw a Histogram Plot
In this lesson, we will learn how to represent the distribution of numerical data using a histogram.
The histogra is an important graph in statistics and data analysis. It can be used to help people quickly understand the distribution of data. In order to draw a histogram, we follow the steps outlined below:
- Step 1:
the range of your data.Bin Bin - Step 2: Divide the entire range of values into their corresponding bins.
- Step 3: Count how many values fall into each different bin
What is hist()
?
The function in Matplotlib that we can use to draw a histogram is hist()
. Below are some of the important parameters that we may need:
x
: Our input values, either a single list/array or multiple sequences of arrays.bins
: Ifbins
is set with an integer, it will define the number of equal-width bins within a range. Ifbins
is set with a sequence, it will define the bin edges, including the left edge of the first bin and the right edge of the last bin.histtype
: Sets the style of the histogram. The default value isbar
.step
generates a line plot that is unfilled by default.stepfilled
generates a line plot that is filled by default.density
: Sets True or False. The default is set to False. If True, the histogram will be normalized to form a probability density.cumulative
: Sets True or -1. If True, then a histogram is computed where each bin gives the count in that bin plus all bins for smaller values.
Plotting a histogram by using hist()
Below is a simple example of a histogram, where we have passed 2 thousand random data points to hist()
at line 7
.
import matplotlib.pyplot as pltimport numpy as nprng = np.random.RandomState(42)data = np.random.randn(2000)fig, axe = plt.subplots(dpi=800)axe.hist(data)fig.savefig("output/img.png")plt.close(fig)
Changing the style of the histogram
The image and code below demonstrates how different parameters can affect the style of a histogram.
Line 8
changes the number of bins.
Line 10 normalizes the histogram to form a probability density.
Line 12changes the
colorof the histogram to red.
Line 14changes the
histtypeto
step`.
import matplotlib.pyplot as pltimport numpy as npdata = np.random.randn(2000)fig, axe = plt.subplots(nrows=2, ncols=2, dpi=800)plt.tight_layout()axe[0][0].hist(data, bins=30)axe[0][0].set_title("set bins=30")axe[0][1].hist(data, density=True)axe[0][1].set_title("normalized")axe[1][0].hist(data, color="r")axe[1][0].set_title("set color as red")axe[1][1].hist(data, histtype='step')axe[1][1].set_title("step")fig.savefig("output/output.png")plt.close(fig)
Drawing more than one histogram in a chart
Sometimes, we need to compare the distribution of data from different data sets. Drawing multiple histograms in the same chart can help us better understand the data. The following image shows three normally distributed sets of data:
import matplotlib.pyplot as pltimport numpy as npdata = np.random.randn(2000)fig, axe = plt.subplots(nrows=2, ncols=2, dpi=800)plt.tight_layout()axe[0][0].hist(data, bins=30)axe[0][0].set_title("set bins=30")axe[0][1].hist(data, density=True)axe[0][1].set_title("normalized")axe[1][0].hist(data, color="r")axe[1][0].set_title("set color as red")axe[1][1].hist(data, histtype='step')axe[1][1].set_title("step")fig.savefig("output/output.png")plt.close(fig)
Drawing a curve to fit the histogram
Sometimes, we need to draw a curve to fit the histogram. Drawing a curve requires some of the data that is returned by hist. The following are values that are returned by `hist:
n
: The values of the histogram bins.bins
: The edges of the bins.
There is another return value patches
, but, in this example, we only need n
and bins
.
import matplotlib.pyplot as pltimport numpy as npsigma = 1mu = 0fig, axe = plt.subplots(dpi=800)data = np.random.normal(mu, sigma, 3000)n, bins, _ = axe.hist(data, bins=40, density=True)y = ((1 / (np.sqrt(2 * np.pi) * sigma)) *np.exp(-0.5 * (1 / sigma * (bins - mu))**2))axe.plot(bins, y, '--', color='r')fig.savefig("output/output.png")plt.close(fig)
Histograms help us visualize the distribution of data. If we want to know the proportion of categorical data in relation to an overall value, however, then the pie chart is what we would use.