GroupBy—Transform and Filter
Learn how to apply transform and filter functions on grouped data.
Overview of transform
Having seen how to aggregate grouped data, we'll now switch gears and learn how to apply transform functions. In general, the transform operation applies a function to each group, and the resulting scalar value is then assigned to each row of every group.
When used on grouped data, transform functions perform group-wise computations and return an output indexed the same way as the original DataFrame. Let’s use the credit card dataset to illustrate how this works on grouped data.
Preview of the Credit Card Dataset
ID | Income | Limit | Rating | Cards | Age | Education | Gender | Student | Married | Ethnicity | Balance |
1 | 14.891 | 3606 | 283 | 2 | 34 | 11 | Male | No | Yes | Caucasian | 333 |
2 | 106.025 | 6645 | 483 | 3 | 82 | 15 | Female | Yes | Yes | Asian | 903 |
3 | 104.593 | 7075 | 514 | 4 | 71 | 11 | Male | No | No | Asian | 580 |
4 | 148.924 | 9504 | 681 | 3 | 36 | 11 | Female | No | No | Asian | 964 |
5 | 55.882 | 4897 | 357 | 2 | 68 | 16 | Male | No | No | Caucasian | 331 |
Note: The “ID” column, whose values start with 1, refers to the identification numbers of customers. It is different from the actual index of a DataFrame that begins with
0
.
Custom functions
One example of a transform function is the min-max scaling of the Rating
column values based on each category in the Ethnicity
column. The equation for min-max scaling is shown below:
To make our code more readable, we can first write a function that returns the min-max scaled values before passing them into transform()
as a Lambda function.
# Function to return min-max scaled valuesdef min_max_scaling(x):return (x - x.min()) / (x.max() - x.min())# Apply the transform function with lambda functiondf_grouped = df.groupby('Ethnicity')[['Rating']].transform(lambda x: min_max_scaling(x))print(df_grouped)
The transformed output keeps the same shape as the original DataFrame, which differs from the aggregate functions where data dimensions are reduced. What do the row values in this transformed output mean? The value in each row is the new Rating
value that was scaled based on the minimum and maximum values of the specific group they belong to.
Let’s look at the first row of the DataFrame to illustrate the concept above. From the grouped min-max scaling output above, ...
Get hands-on with 1400+ tech skills courses.