...

/

Histogram and Density Plot

Histogram and Density Plot

Learn how to create histograms and density plots with ggplot2.

Introduction to histograms in ggplot2

A histogram is a data visualization technique in statistics and data science that summarizes data by representing data points organized into user specified ranges or bins. In other words, we can easily interpret the total number of data points falling within specified bins, i.e., the range of values in a histogram. Similar to bar graphs, rectangular blocks are used in a histogram to display data.

Note: Unlike a vertical bar chart, the bars of a histogram don’t have gaps between them.

Bar charts show the graphical comparison of variables (discrete or categorical), whereas a histogram depicts the frequency distribution of variables in a dataset. The x-axis variable for a histogram is continuous (i.e., numeric), whereas, for bar graphs, it is categorical. The dependent variable for the y-axis is numerical for both charts.

Histograms are used to compare the distribution of a dataset variable at different intervals along with the median value, which could be of interest to organizations for decision-making. They also help spot existing outliers or gaps in the dataset.

Note: A histogram commonly uses the frequency of occurrences observed in the data for plotting. If required, it is possible to use a percentage of the total or density instead of frequency for building a histogram.

This lesson assumes you have a basic understanding of histograms and their use in data analysis. Next, let’s understand how to build a histogram in ggplot2.

Basic histogram in ggplot2

Let’s start by making a basic histogram. We’ll import the palmerpenguins package to load the penguins dataset and print a few rows of this dataset with the code below:

Press + to interact
library(palmerpenguins)
head(penguins)
  • Line 1: We use the library() function to load the required palmerpenguins R package.
  • Line 2: We use the head() function to print few rows of the penguins dataset.

Now, we’ll use this dataset to see the penguins dataset bill length distribution.

Press + to interact
ggplot(penguins) +
geom_histogram(aes(x=bill_length_mm))
  • Line 1: We initialize a new ggplot object with the ggplot() function and pass the name of the penguins dataset. Using the + operator, we add a layer to the ggplot object.
  • Line 2:
...
Access this course and 1400+ top-rated courses and projects.