Merge 2—Implementation

Learn how to use merge() for implementing joins on pandas objects.

The merge() function

The pandas library provides full-featured, high-performance, in-memory JOIN operations highly similar to relational databases like SQL. The merge() function serves as the entry point for performing the standard database JOIN operations.

So what is the difference between merge and JOIN? JOIN refers to the SQL-like operations that combine rows from two or more DataFrames based on a related column. On the other hand, although conveying a similar meaning to JOIN, merge is essentially the name of the pandas function for performing various types of joins (i.e., merge()).

To reduce confusion, we’ll use JOIN (instead of merge) to describe the various methods of combining DataFrames and reserve the use of merge to the merge() function.

Note: The merge() function can be applied to both DataFrames and named Series objects. However, we’ll focus primarily on DataFrames because named Series objects can easily be structured as single-column DataFrames.

As an example, a basic line of code for an INNER JOIN of two DataFrames, df1 and df2, on a common key (the 'id' column) looks like this:

Press + to interact
# Perform merge of two DataFrames with inner join
df_output = pd.merge(df1, df2,
how='inner',
on='id')

The above example shows that the how parameter specifies the JOIN we want to perform. The table summarizes the options for how in the merge() function along with the matching SQL ...

Get hands-on with 1400+ tech skills courses.