How to compare two DataFrames in pandas

Overview

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side.

The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.

Note: To learn more about pandas, please visit this link.

Syntax

DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False)

Parameters

The compare method accepts the following parameters:

  • other: This is the DataFrame for comparison.
  • align_axis: This indicates the axis of comparison, with 0 for rows, and 1, the default value, for columns.
  • keep_shape: This is a boolean parameter. Setting this to True prevents dropping of any row or column, and compare drops rows and columns with all elements same for the two data frames for the default value False.
  • keep_equal: This is another boolean parameter. Setting this to True shows equal values between the two DataFrames, while compare shows the positions with the same values for the two data frames as NaN for the default value False.

Example

import pandas as pd
data = [['dom', 10], ['chibuge', 15], ['celeste', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
data1 = [['dom', 11], ['abhi', 17], ['celeste', 14]]
df1 = pd.DataFrame(data1, columns = ['Name', 'Age'])
print("Dataframe 1 -- \n")
print(df)
print("-"*5)
print("Dataframe 2 -- \n")
print(df1)
print("-"*5)
print("Dataframe difference -- \n")
print(df.compare(df1))
print("-"*5)
print("Dataframe difference keeping equal values -- \n")
print(df.compare(df1, keep_equal=True))
print("-"*5)
print("Dataframe difference keeping same shape -- \n")
print(df.compare(df1, keep_shape=True))
print("-"*5)
print("Dataframe difference keeping same shape and equal values -- \n")
print(df.compare(df1, keep_shape=True, keep_equal=True))

Explanation

  • Line 1: We import the pandas module.
  • Lines 3–4: We construct a Pandas DataFrame called df from the list called data. df has two columns: Name and Age.
  • Lines 6–7: We construct another Pandas DataFrame called df1 from the list called data1. df1 has two columns: Name and Age.
  • Lines 9–14: We print df and df1.
  • Line 18: We use compare to obtain the difference between the two DataFrames df and df1.
  • Line 22: We use compare to obtain the difference between the two DataFrames, df and df1, while setting keep_equal to True. We can see that similar values are not omitted in the printed difference.
  • Line 26: We use compare to obtain the difference between the two DataFrames, df and df1, while setting keep_shape to True. We see that the row with the same values for the two DataFrames is not omitted in the printed difference.
  • Line 30: We use compare to obtain the difference between the two DataFrames, df and df1, while setting keep_shape and keep_equal to True. We see that the row with the same values for the two DataFrames is not omitted in the printed difference, nor are the values of the positions with the same values for the two DataFrames.

Free Resources