GroupBy—Transform and Filter

Learn how to apply transform and filter functions on grouped data.

Overview of transform

Having seen how to aggregate grouped data, we'll now switch gears and learn how to apply transform functions. In general, the transform operation applies a function to each group, and the resulting scalar value is then assigned to each row of every group.

When used on grouped data, transform functions perform group-wise computations and return an output indexed the same way as the original DataFrame. Let’s use the credit card dataset to illustrate how this works on grouped data.

Preview of the Credit Card Dataset

ID

Income

Limit

Rating

Cards

Age

Education

Gender

Student

Married

Ethnicity

Balance

1

14.891

3606

283

2

34

11

Male

No

Yes

Caucasian

333

2

106.025

6645

483

3

82

15

Female

Yes

Yes

Asian

903

3

104.593

7075

514

4

71

11

Male

No

No

Asian

580

4

148.924

9504

681

3

36

11

Female

No

No

Asian

964

5

55.882

4897

357

2

68

16

Male

No

No

Caucasian

331

Note: The “ID” column, whose values start with 1, refers to the identification numbers of customers. It is different from the actual index of a DataFrame that begins with 0.

Custom functions

One example of a transform function is the min-max scaling of the Rating column values based on each category in the Ethnicity column. The equation for min-max scaling is shown below:

To make our code more readable, we can first write a function that returns the min-max scaled values before passing them into transform() as a Lambda function.

Press + to interact
# Function to return min-max scaled values
def min_max_scaling(x):
return (x - x.min()) / (x.max() - x.min())
# Apply the transform function with lambda function
df_grouped = df.groupby('Ethnicity')[['Rating']].transform(
lambda x: min_max_scaling(x))
print(df_grouped)

The transformed output keeps the same shape as the original DataFrame, which differs from the aggregate functions where data dimensions are reduced. What do the row values in this transformed output mean? The value in each row is the new Rating value that was scaled based on the minimum and maximum values of the specific group they belong to.

Let’s look at the first row of the DataFrame to illustrate the concept above. From the grouped min-max scaling output above, ...

Get hands-on with 1400+ tech skills courses.