Search⌘ K

Aggregations

Explore how to apply custom aggregation functions and perform multiple aggregations on pandas DataFrames. Learn to use groupby, pivot_table, and pd.crosstab methods to calculate statistics like percentages, minimums, maximums, and averages by categories such as country. Understand how to manage hierarchical and flat columns in aggregation outputs to improve data analysis clarity.

Using a custom aggregation function

Now that we’re done with insights on the age of employment status by country, let’s look at another important question: “What is the percentage of Emacs users by country?”

We’ll need a function that takes a group (in this case, a Series) of country respondents about IDE preference and returns the percent that choose Emacs:

Python 3.8
def per_emacs(ser):
return ser.str.contains('Emacs').sum() / len(ser) * 100

When we need to calculate a percentage in pandas, we can use the mean method. The following code is equivalent to the above:

Python 3.8
def per_emacs(ser):
return ser.str.contains('Emacs').mean() * 100

We’re now ready to pivot. In this case, we still want country in the index, but we only want a single column, the Emacs percentage. So we don’t provide a columns parameter:

Python 3.8
print(jb2
.pivot_table(index='country_live', values='ide_main', aggfunc=per_emacs)
)

Using pd.crosstab is a little more complicated because it expects a ”cross- tabulation” of two columns, one of which goes in the index and the other goes in columns. To get a "column” for the cross-tabulation, we’ll assign a ...