What is the merge function in Pandas?

The merge function in Pandas joins a data frame or a series with another data frame or series. The join operation is similar to that in databases. Merging is based on column headings or indexes.

Syntax

The syntax of the merge function is as follows:

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

Parameters

The table below describes the parameters:

Parameter	Description
`right`	Refers to the object to merge with. Can be either a data frame or series.
`how`	Type of merge to be performed. Options include ‘left’, ‘right’, ‘outer’, ‘inner’, and ‘cross’. It is ‘inner’ by default.
`on`	The column name(s) to merge on. It should be present in both data frames. If nothing is provided, the merge is performed on all columns common in the two data frames.
`left_on`	Column name(s) to join on in the left data frame.
`right_on`	Column name(s) to join on in the right data frame.
`left_index`	The index or column name(s) in the left data frame to be used as the join key(s).
`right_index`	The index or column name(s) in the right data frame to be used as the join key(s).
`sort`	Sort the join keys lexicographically in the resultant data frame. It is `False` by default.
`suffixes`	Added to the end of common columns after the merge. By default, ‘_x’ is used for columns from the first data frame and ‘_y’ for columns from the second data frame.
`copy`	To create a copy of the data frame. It is `True` by default.
`indicator`	If `True`, adds a column to the output data frame called “_merge” with information on the source of each row. It is `False` by default.
`validate`	Validates the type of the merge. It is an optional parameter.

Different types of merge

We can use the how parameter to specify the type of merge. There are several different forms of merges based on the database join operation:

left: Use all columns from the left data frame and the common ones between the left and right data frame.
right: Use all columns from the right data frame and the common ones between the left and right data frame.
outer: Use the union of all columns present in both the data frames.
inner: Use the intersection (common) of columns in both the data frames.
cross: Creates a Cartesian product of both data frames.

The illustration below summarizes different forms of merge operations:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
      'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
      'value': [5, 6, 7, 8]})
print("First Dataframe")
print(df1)
print('\n')
print("Second Dataframe")
print(df2)
print('\n')
# Default merge-Suffixes are added by default
print("Merged Dataframe")
print(df1.merge(df2, left_on='lkey', right_on='rkey'))
print('\n')
# Custom suffixes
print("Dataframe with Suffixes added")
print(df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=('_left', '_right')))
print('\n')

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
print("First Dataframe")
print(df1)
print('\n')
print("Second Dataframe")
print(df2)
print('\n')
# Inner merge
print("Inner merge")
print(df1.merge(df2, how='inner', on='a'))
print('\n')
# Left merge
print("Left merge")
print(df1.merge(df2, how='left', on='a'))
print('\n')
# Right merge
print("Right merge")
print(df1.merge(df2, how='right', on='a'))
print('\n')
# Outer merge
print("Outer merge")
print(df1.merge(df2, how='outer'))
print('\n')

Free Resources

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

What is the merge function in Pandas?

Syntax

Parameters

Different types of merge

Return value

Examples