Debugging Chains
Explore various techniques for debugging pandas.
We'll cover the following...
In this section, we’ll explore debugging chains of operations on DataFrames or Series. Almost universally, pandas code is a bit messy. We get it. The chaining produces less code. The pandas library is an in-memory library that works by copying data, this argument is a moot point. Let’s address the debugging complaint.
We’re going to see a “tweak” function that analyzes the fuel economy data.
Here is our tweak function:
# Import pandas libraryimport pandas as pd# Read vehicles.csv file from GitHub and store it in a DataFrame named autosautos = pd.read_csv('https://github.com/mattharrison/datasets/raw/''master/data/vehicles.csv.zip')# Define a function to convert a datetime column to a specified timezonedef to_tz(df_, time_col, tz_offset, tz_name):return (df_.groupby(tz_offset)[time_col].transform(lambda s: pd.to_datetime(s).dt.tz_localize(s.name, ambiguous=True).dt.tz_convert(tz_name)))# Define a function to tweak the autos DataFramedef tweak_autos(autos):# Define a list of columns to keepcols = ['city08', 'comb08', 'highway08', 'cylinders','displ', 'drive', 'eng_dscr', 'fuelCost08','make', 'model', 'trany', 'range', 'createdOn','year']# Return a modified DataFrame with the specified columns and modificationsreturn (autos[cols].assign(cylinders=autos.cylinders.fillna(0).astype('int8'),displ=autos.displ.fillna(0).astype('float16'),drive=autos.drive.fillna('Other').astype('category'),automatic=autos.trany.str.contains('Auto'),speeds=autos.trany.str.extract(r'(\d)+').fillna('20').astype('int8'),offset=autos.createdOn.str.extract(r'\d\d:\d\d ([A-Z]{3}?)').replace('EDT', 'EST5EDT'),str_date=(autos.createdOn.str.slice(4,19) + ' ' +autos.createdOn.str.slice(-4)),createdOn=lambda df_: to_tz(df_, 'str_date','offset', 'America/New_York'),ffs=autos.eng_dscr.str.contains('FFS')).astype({'highway08': 'int8', 'city08': 'int16','comb08': 'int16', 'fuelCost08': 'int16','range': 'int16', 'year': 'int16','make': 'category'}).drop(columns=['trany', 'eng_dscr']))# Print the tweaked autos DataFrameprint(tweak_autos(autos))
Say we come across this tweak_autos
function, and we want to understand what it does. First of all, realize that it’s written like a recipe, step by step:
- Pull out columns found in columns.
- Create various columns (
assign
). - Convert column types (
astype
). - Drop extra columns that are no longer needed after we’ve created new columns from them (
drop
).
Those who don’t support chaining say there’s no way to debug this. We have a few ways to debug the chain. The first is by using comments. We comment out all of the operations and then go through them one at a time. This comes in really handy to ...