The merge
function in Pandas joins a data frame or a series with another data frame or series. The join operation is similar to that in databases. Merging is based on column headings or indexes.
The syntax of the merge
function is as follows:
DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
The table below describes the parameters:
Parameter | Description |
---|---|
right |
Refers to the object to merge with. Can be either a data frame or series. |
how |
Type of merge to be performed. Options include ‘left’, ‘right’, ‘outer’, ‘inner’, and ‘cross’. It is ‘inner’ by default. |
on |
The column name(s) to merge on. It should be present in both data frames. If nothing is provided, the merge is performed on all columns common in the two data frames. |
left_on |
Column name(s) to join on in the left data frame. |
right_on |
Column name(s) to join on in the right data frame. |
left_index |
The index or column name(s) in the left data frame to be used as the join key(s). |
right_index |
The index or column name(s) in the right data frame to be used as the join key(s). |
sort |
Sort the join keys lexicographically in the resultant data frame. It is False by default. |
suffixes |
Added to the end of common columns after the merge. By default, ‘_x’ is used for columns from the first data frame and ‘_y’ for columns from the second data frame. |
copy |
To create a copy of the data frame. It is True by default. |
indicator |
If True , adds a column to the output data frame called “_merge” with information on the source of each row. It is False by default. |
validate |
Validates the type of the merge. It is an optional parameter. |
We can use the how
parameter to specify the type of merge. There are several different forms of merges based on the database join operation:
left: Use all columns from the left data frame and the common ones between the left and right data frame.
right: Use all columns from the right data frame and the common ones between the left and right data frame.
outer: Use the union of all columns present in both the data frames.
inner: Use the intersection (common) of columns in both the data frames.
cross: Creates a Cartesian product of both data frames.
The illustration below summarizes different forms of merge operations:
The merge
function returns a data frame with the merged values.
The code snippet below shows how the merge
function can be used in Pandas:
import pandas as pdimport numpy as npdf1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],'value': [1, 2, 3, 5]})df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],'value': [5, 6, 7, 8]})print("First Dataframe")print(df1)print('\n')print("Second Dataframe")print(df2)print('\n')# Default merge-Suffixes are added by defaultprint("Merged Dataframe")print(df1.merge(df2, left_on='lkey', right_on='rkey'))print('\n')# Custom suffixesprint("Dataframe with Suffixes added")print(df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=('_left', '_right')))print('\n')
The following example uses different types of merges:
import pandas as pdimport numpy as npdf1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})print("First Dataframe")print(df1)print('\n')print("Second Dataframe")print(df2)print('\n')# Inner mergeprint("Inner merge")print(df1.merge(df2, how='inner', on='a'))print('\n')# Left mergeprint("Left merge")print(df1.merge(df2, how='left', on='a'))print('\n')# Right mergeprint("Right merge")print(df1.merge(df2, how='right', on='a'))print('\n')# Outer mergeprint("Outer merge")print(df1.merge(df2, how='outer'))print('\n')
Free Resources