Histograms and Probability Density Function

In this lesson, we will learn about representing data using histograms and probability density functions.

Representing data #

One of the most common ways to represent a data set is to draw a histogram. For a histogram, you count how many data points fall within a certain interval. For example, how many data points are between 5 and 6. These intervals are called bins. The bar graph of the number of data points in each bin is called a histogram. The function to compute and plot a histogram is called hist() and is part of the matplotlib package. The simplest way of plotting a histogram is to let hist() decide what bins to use; the default number of bins is nbin=10.

hist() even figures out where to put the limits of the bins. The hist() function creates a histogram graph and returns a tuple of three items:

  1. The first item is an array of length nbin with the number of data points in each bin.
  2. The second item is an array of length nbin+1 with the limits of the bins.
  3. The third item is a list of objects that represent the bars of the histogram; this is the least used item.

Let’s draw a histogram:

Get hands-on with 1400+ tech skills courses.