Aggregations
Explore how to apply custom aggregation functions and perform multiple aggregations on pandas DataFrames. Learn to use groupby, pivot_table, and pd.crosstab methods to calculate statistics like percentages, minimums, maximums, and averages by categories such as country. Understand how to manage hierarchical and flat columns in aggregation outputs to improve data analysis clarity.
We'll cover the following...
Using a custom aggregation function
Now that we’re done with insights on the age of employment status by country, let’s look at another important question: “What is the percentage of Emacs users by country?”
We’ll need a function that takes a group (in this case, a Series) of country respondents about IDE preference and returns the percent that choose Emacs:
When we need to calculate a percentage in pandas, we can use the mean method. The following code is equivalent to the above:
We’re now ready to pivot. In this case, we still want country in the index, but we only want a single column, the Emacs percentage. So we don’t provide a columns parameter:
Using pd.crosstab is a little more complicated because it expects a ”cross- tabulation” of two columns, one of which goes in the index and the other goes in columns. To get a "column” for the cross-tabulation, we’ll assign a ...