Dataset References

Learn more about some important datasets that are useful for building data visualizations in ggplot2.

Reference Datasets for ggplot2 visualizations

The R Datasets package is one of the several datasets maintained by the R Core team and included with the R base installation. By calling the data() function without any arguments, we can list all the built-in datasets.

Press + to interact
data()

Let’s look at the available built-in datasets in the ggplot2 package using the code below:

Press + to interact
data(package="ggplot2")

Note: We can replace ggplot2 in the above command with any required package (example: MASS) to list the datasets available with the specific package.

Therefore, the base R installation, along with ggplot2, offers several useful built-in datasets. Let’s familiarize ourselves with some of those datasets. We’ll import each dataset and print the first ten rows to get an idea about the variables in the dataset.

The mpg dataset

This is one of the popular datasets used in the data science community. The mpg dataset is a built-in dataset from the ggplot2 package. It consists of a subset of the fuel economy data provided by the EPA.

This dataset contains data about the fuel economy of 3838 major car models between 19991999 to 20082008.

Note: We can browse and download this dataset from the official website of the US Department of Energy.

Press + to interact
head(mpg, n=10)

The mtcars dataset

The mtcars (Motor Trend Car Road Tests) dataset is another commonly used dataset for data science projects. This dataset provides the fuel consumption data collected for 3232 automobiles and ten attributes of automotive ...