Histogram and Density Plot
Learn how to create histograms and density plots with ggplot2.
Introduction to histograms in ggplot2
A histogram is a data visualization technique in statistics and data science that summarizes data by representing data points organized into user specified ranges or bins. In other words, we can easily interpret the total number of data points falling within specified bins, i.e., the range of values in a histogram. Similar to bar graphs, rectangular blocks are used in a histogram to display data.
Note: Unlike a vertical bar chart, the bars of a histogram don’t have gaps between them.
Bar charts show the graphical comparison of variables (discrete or categorical), whereas a histogram depicts the frequency distribution of variables in a dataset. The x-axis variable for a histogram is continuous (i.e., numeric), whereas, for bar graphs, it is categorical. The dependent variable for the y-axis is numerical for both charts.
Histograms are used to compare the distribution of a dataset variable at different intervals along with the median value, which could be of interest to organizations for decision-making. They also help spot existing outliers or gaps in the dataset.
Note: A histogram commonly uses the frequency of occurrences observed in the data for plotting. If required, it is possible to use a percentage of the total or density instead of frequency for building a histogram.
This lesson assumes you have a basic understanding of histograms and their use in data analysis.
Next, let’s understand how to build a histogram in ggplot2
.
Basic histogram in ggplot2
Let’s start by making a basic histogram. We’ll import the palmerpenguins
package to load the penguins
dataset and print a few rows of this dataset with the code below:
library(palmerpenguins)head(penguins)
- Line 1: We use the
library()
function to load the requiredpalmerpenguins
R package. - Line 2: We use the
head()
function to print few rows of thepenguins
dataset.
Now, we’ll use this dataset to see the penguins
dataset bill length distribution.
ggplot(penguins) +geom_histogram(aes(x=bill_length_mm))
- Line 1: We initialize a new
ggplot
object with theggplot()
function and pass the name of thepenguins
dataset. Using the+
operator, we add a layer to theggplot
object. - Line 2: