Debugging Chains

Explore various techniques for debugging pandas.

In this section, we’ll explore debugging chains of operations on DataFrames or Series. Almost universally, pandas code is a bit messy. We get it. The chaining produces less code. The pandas library is an in-memory library that works by copying data, this argument is a moot point. Let’s address the debugging complaint.

We’re going to see a “tweak” function that analyzes the fuel economy data.

Here is our tweak function:

Press + to interact
# Import pandas library
import pandas as pd
# Read vehicles.csv file from GitHub and store it in a DataFrame named autos
autos = pd.read_csv('https://github.com/mattharrison/datasets/raw/'
'master/data/vehicles.csv.zip')
# Define a function to convert a datetime column to a specified timezone
def to_tz(df_, time_col, tz_offset, tz_name):
return (df_
.groupby(tz_offset)
[time_col]
.transform(lambda s: pd.to_datetime(s)
.dt.tz_localize(s.name, ambiguous=True)
.dt.tz_convert(tz_name))
)
# Define a function to tweak the autos DataFrame
def tweak_autos(autos):
# Define a list of columns to keep
cols = ['city08', 'comb08', 'highway08', 'cylinders',
'displ', 'drive', 'eng_dscr', 'fuelCost08',
'make', 'model', 'trany', 'range', 'createdOn',
'year']
# Return a modified DataFrame with the specified columns and modifications
return (autos
[cols]
.assign(cylinders=autos.cylinders.fillna(0).astype('int8'),
displ=autos.displ.fillna(0).astype('float16'),
drive=autos.drive.fillna('Other').astype('category'),
automatic=autos.trany.str.contains('Auto'),
speeds=autos.trany.str.extract(r'(\d)+').fillna('20')
.astype('int8'),
offset=autos.createdOn
.str.extract(r'\d\d:\d\d ([A-Z]{3}?)')
.replace('EDT', 'EST5EDT'),
str_date=(autos.createdOn.str.slice(4,19) + ' ' +
autos.createdOn.str.slice(-4)),
createdOn=lambda df_: to_tz(df_, 'str_date',
'offset', 'America/New_York'),
ffs=autos.eng_dscr.str.contains('FFS')
)
.astype({'highway08': 'int8', 'city08': 'int16',
'comb08': 'int16', 'fuelCost08': 'int16',
'range': 'int16', 'year': 'int16',
'make': 'category'})
.drop(columns=['trany', 'eng_dscr'])
)
# Print the tweaked autos DataFrame
print(tweak_autos(autos))

Say we come across this tweak_autos function, and we want to understand what it does. First of all, realize that it’s written like a recipe, step by step:

  • Pull out columns found in columns.
  • Create various columns (assign).
  • Convert column types (astype).
  • Drop extra columns that are no longer needed after we’ve created new columns from them (drop).

Those who don’t support chaining say there’s no way to debug this. We have a few ways to debug the chain. The first is by using comments. We comment out all of the operations and then go through them one at a time. This comes in really handy to ...