Join Data Frames
Learn about joining data frames and how to join multiple data frames to get more descriptive data.
We'll cover the following...
Another common data transformation task is joining or merging two different datasets. For example, in the flights
data frame, the variable carrier
lists the carrier code for the different flights. While the corresponding airline names for UA and AA might be somewhat easy to guess (United and American Airlines), which airlines have the codes VX, HA, and B6? This information is provided in a separate data frame for airlines
.
print(airlines)
We see that in airports, the carrier
is the carrier code, while name
is the full name of the airline company. Using this table, we can see that VX, HA, and B6 correspond to Virgin America, Hawaiian Airlines, and JetBlue, respectively. However, wouldn’t it be nice to have all this information in a single data frame instead of two separate data frames? We can do this by joining the flights
and airlines
data frames.
Note that the values in the variable carrier
in the flights
data frame match the values in the variable carrier
in the airline
data frame. In this case, we can use the variable carrier
as a key variable to match the rows of the two data frames. The key variables are almost always identification variables that uniquely identify ...