Filtering

Filter DataFrames for values that fit certain conditions.

Chapter Goals:

  • Understand how to filter a DataFrame based on filter conditions
  • Write code to filter a dataset of MLB statistics

A. Filter conditions

In the Data Manipulation section, we used relation operations on NumPy arrays to create filter conditions. These filter conditions returned boolean arrays, which represented the locations of the elements that pass the filter.

In pandas, we can also create filter conditions for DataFrames. Specifically, we can use relation operations on a DataFrame's column features, which will return a boolean Series representing the DataFrame rows that pass the filter.

The code below demonstrates how to use relation operations as filter conditions.

Press + to interact
df = pd.DataFrame({
'playerID': ['bettsmo01', 'canoro01', 'cruzne02', 'ortizda01', 'cruzne02'],
'yearID': [2016, 2016, 2016, 2016, 2017],
'teamID': ['BOS', 'SEA', 'SEA', 'BOS', 'SEA'],
'HR': [31, 39, 43, 38, 39]})
print('{}\n'.format(df))
cruzne02 = df['playerID'] == 'cruzne02'
print('{}\n'.format(cruzne02))
hr40 = df['HR'] > 40
print('{}\n'.format(hr40))
notbos = df['teamID'] != 'BOS'
print('{}\n'.format(notbos))

In the code above, we created filter conditions for df based on the columns labeled 'playerID', 'HR', and 'teamID'. The boolean Series outputs have True for the rows that pass ...