Join Data Frames

Learn about joining data frames and how to join multiple data frames to get more descriptive data.

Another common data transformation task is joining or merging two different datasets. For example, in the flights data frame, the variable carrier lists the carrier code for the different flights. While the corresponding airline names for UA and AA might be somewhat easy to guess (United and American Airlines), which airlines have the codes VX, HA, and B6? This information is provided in a separate data frame for airlines.

Press + to interact
print(airlines)

We see that in airports, the carrier is the carrier code, while name is the full name of the airline company. Using this table, we can see that VX, HA, and B6 correspond to Virgin America, Hawaiian Airlines, and JetBlue, respectively. However, wouldn’t it be nice to have all this information in a single data frame instead of two separate data frames? We can do this by joining the flights and airlines data frames.

Note that the values in the variable carrier in the flights data frame match the values in the variable carrier in the airline data frame. In this case, we can use the variable carrier as a key variable to match the rows of the two data frames. The key variables are almost always identification variables that uniquely identify ...