Merging Data

In this lesson, an explanation on how to merge different data sets is provided.

Merge

To merge the rows of two or more DataFrames based on a common column between them, use pandas merge(df1, df2, ...) function. This returns another DataFrame with only the common column(s) and their corresponding row values.

In short, two things need to be in common for the DataFrames to be merged:

  • The column names

  • The row values of those column names

Press + to interact
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'pointer':['A', 'B', 'C', 'B', 'A', 'D'],
'value_df1':[0,1,2,3,4,5]})
print("First DataFrame")
print(df1)
df2 = pd.DataFrame({'pointer':['B', 'C', 'B','D'],
'value_df2':[6,7,8,9]})
print("\nSecond DataFrame")
print(df2)
print("\nMerged DataFrame")
print('\n',pd.merge(df1, df2)) # Merging two DataFrames

It can be observed that the two DataFrames have pointer as the common column; in pointer, there are multiple common rows as displayed in the above illustration. The output shows that the two DataFrames are merged on the common rows of that common column, and returned a new DataFrame based on those.

This is the most basic type of merge. There are three more ways to merge a DataFrame, and this is achieved by passing specific parameters to the merge() function.

Left merge

The left merge returns a DataFrame, which has all rows of the DataFrame placed on ...

Access this course and 1400+ top-rated courses and projects.