How to merge DataFrames using the merge() method in Python

The merge() method in Python is used to combine two or more DataFrames based on their common columns. The merge() method works similar to the JOIN operations in SQL. Merging multiple DataFrames is useful when we want to visualize different chunks of information from different DataFrames all in one place. It enables us to bring information together for further processing and analysis.

Syntax

pd.merge(left, right, how='inner', on=None)
Syntax of the merge() method

Parameters

The merge() method takes the following parameters:

  • left: This is a DataFrame.

  • right: This is another DataFrame. The left DataFrame is merged with the right DataFrame and a new DataFrame is returned.

Note: Nothing happens to the left and right DataFrames, i.e., they do not change.

  • how: This parameter specifies the type of merge to be performed. The following are the types of merges available within the merge() method:

    • left: This type of merge returns all the rows from the left DataFrame. It also returns the rows from the right DataFrame that are in common with those of the left DataFrame. If there are no rows in the right DataFrame that are in common (rows with similar values) with those of the left DataFrame, NaN is returned. The merge is performed on a common column or columns.

    Note: The number of rows and columns within the merged DataFrame may vary, i.e., it can increase, decrease, or stay the same. The type of merge being performed determines whether the number of rows and columns will change.

    • right: This type of merge returns all the rows from the right DataFrame. It also returns the rows from the left DataFrame that are in common with those of the right DataFrame. If there are no rows in the left DataFrame that are in common with those of the right DataFrame, NaN is returned. The merge is performed on a common column or columns.
    • outer: This type of merge returns all the rows from the left and right DataFrames. This also includes the rows from the left and right DataFrames that are in common with each other. If no common rows are found, NaN is returned. The merge is performed on a common column or columns.
    • inner: This is the default merge type. It returns only the rows that are common between the left and right DataFrames. If there are no common rows, an empty DataFrame is returned. The merge is performed on a common column or columns.
    • cross: This type of merge returns the Cartesian product of the rows from the left and right DataFrames, i.e., it combines and returns each row from the left DataFrame with each row from the right DataFrame. We don’t need to specify any common column or columns when using this merge type.
  • on: This parameter represents the column or columns to perform the merge on. These columns should be present in both the left and right DataFrames. Its default value is None, i.e., no common column or columns are used to perform the merge() operation.

Merge types
Merge types

Code example

Let’s look at an example where we will create two DataFrames and then perform the merge() operation on them.

import pandas as pd
left_dataframe = pd.DataFrame({
'id': [1, 2, 3, 4],
'name': ['Leo', 'Jacob', 'James', 'Mason'],
'age': [18, 20, 23, 19],
'course': ['english', 'persian', 'arts', 'chemistry']
})
print('')
right_dataframe = pd.DataFrame({
'id': [1, 2, 3, 4],
'name': ['William', 'Lucas', 'Henry', 'Elio'],
'age': [18, 21, 26, 25],
'course': ['english', 'persian', 'Literature', 'Physics']
})
print('')
print('Left data frame')
print(left_dataframe)
print('')
print('Right data frame')
print(right_dataframe)
print('')
print('Left merge')
print(pd.merge(left_dataframe, right_dataframe, how='left', on='course'))
print('')
print('Right merge')
print(pd.merge(left_dataframe, right_dataframe, how='right', on='course'))
print('')
print('Outer merge')
print(pd.merge(left_dataframe, right_dataframe, how='outer', on='course'))
print('')
print('Inner merge')
print(pd.merge(left_dataframe, right_dataframe, how='inner', on=['age', 'course']))
print('')
print('Cross merge')
print(pd.merge(left_dataframe, right_dataframe, how='cross'))
print('')

Code explanation

  • Line 1: We import the required library, i.e., pandas.

  • Lines 3–8: We create the left DataFrame and name it left_dataframe. This DataFrame has columns named id, name, age, and course.

  • Lines 11–16: We create the right DataFrame and name it right_dataframe. This DataFrame has columns named id, name, age, and course.

  • Line 20: We print the left_dataframe object.

  • Line 24: We print the right_dataframe object.

  • Line 28: We merge the left_dataframe with the right_dataframe using the left merge method. We perform the merge() operation on the course column.

  • Line 32: We merge the left_dataframe with the right_dataframe using the right merge method. We perform the merge() operation on the course column.

  • Line 36: We merge the left_dataframe with the right_dataframe using the outer merge method. We perform the merge() operation on the course column.

  • Line 40: We merge the left_dataframe with the right_dataframe using the inner merge method. We perform the merge() operation on the age and course columns.

  • Line 44: We merge the left_dataframe with the right_dataframe using the cross merge method. As discussed previously, we do not need to provide any common column or columns when using this merge type.

Test yourself

Let’s test out understanding of the merge() method in Python by solving this multiple-choice question.

Q

Imagine we work for a supermarket with two separate databases—one for customer information and another for orders. The customer database includes customer IDs, names, addresses, and other details, while the order database includes order IDs, customer IDs, product details, and order dates. Which method would we use to get a list of all the customers who have placed orders?

A)

left merge

B)

right merge

C)

inner merge

D)

outer merge

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved