...

/

Checking if DataFrames are Equal

Checking if DataFrames are Equal

Explore various techniques for debugging pandas.

We'll cover the following...

The first technique we’ll explore is checking whether two DataFrames are equal. This is especially useful after serializing and deserializing data and, unfortunately, is a little more difficult than it should be. We can use the equals method, which will check if two DataFrames are equal, but if they’re not, diagnosing the problem is hard.

Let’s step through an example with our Dirty Devil data:

Press + to interact
import pandas as pd
url = 'https://github.com/mattharrison/datasets/raw/master'\
'/data/dirtydevil.txt'
df = pd.read_csv(url, skiprows=lambda num: num <34 or num == 35,
sep='\t')
def to_denver_time(df_, time_col, tz_col):
return (df_
.assign(**{tz_col: df_[tz_col].replace('MDT', 'MST7MDT')})
.groupby(tz_col)
[time_col]
.transform(lambda s: pd.to_datetime(s)
.dt.tz_localize(s.name, ambiguous=True)
.dt.tz_convert('America/Denver'))
)
def tweak_river(df_):
return (df_
.assign(datetime=to_denver_time(df_, 'datetime', 'tz_cd'))
.rename(columns={'144166_00060': 'cfs',
'144167_00065': 'gage_height'})
)
dd = tweak_river(df)
print(dd)
...