Aggregations

Learn about different types of aggregations.

Using a custom aggregation function

Now that we’re done with insights on the age of employment status by country, let’s look at another important question: “What is the percentage of Emacs users by country?”

We’ll need a function that takes a group (in this case, a Series) of country respondents about IDE preference and returns the percent that choose Emacs:

Press + to interact
def per_emacs(ser):
return ser.str.contains('Emacs').sum() / len(ser) * 100

When we need to calculate a percentage in pandas, we can use the mean method. The following code is equivalent to the above:

Press + to interact
def per_emacs(ser):
return ser.str.contains('Emacs').mean() * 100

We’re now ready to pivot. In this case, we still want country in the index, but we only want a single column, the Emacs percentage. So we don’t provide a columns parameter:

Press + to interact
print(jb2
.pivot_table(index='country_live', values='ide_main', aggfunc=per_emacs)
)

Using pd.crosstab is a little more complicated because it expects a ”cross- tabulation” of two columns, one of which goes in the index and the other goes in columns. To get a "column” for the ...