Merging Data
In this lesson, an explanation on how to merge different data sets is provided.
We'll cover the following...
Merge
To merge the rows of two or more DataFrames
based on a common column between them, use pandas merge(df1, df2, ...)
function.
This returns another DataFrame
with only the common column(s) and their corresponding row values.
In short, two things need to be in common for the DataFrames
to be merged:
-
The column names
-
The row values of those column names
import numpy as npimport pandas as pddf1 = pd.DataFrame({'pointer':['A', 'B', 'C', 'B', 'A', 'D'],'value_df1':[0,1,2,3,4,5]})print("First DataFrame")print(df1)df2 = pd.DataFrame({'pointer':['B', 'C', 'B','D'],'value_df2':[6,7,8,9]})print("\nSecond DataFrame")print(df2)print("\nMerged DataFrame")print('\n',pd.merge(df1, df2)) # Merging two DataFrames
It can be observed that the two DataFrames
have pointer
as the common column; in pointer
, there are multiple common rows as displayed in the above illustration.
The output shows that the two DataFrames
are merged on the common rows of that common column, and returned a new DataFrame
based on those.
This is the most basic type of merge. There are three more ways to merge a DataFrame
, and this is achieved by passing specific parameters to the merge()
function.
Left merge
The left merge returns a DataFrame
, which has all rows of the DataFrame
placed on ...