Aggregations
Learn about different types of aggregations.
We'll cover the following...
Using a custom aggregation function
Now that we’re done with insights on the age of employment status by country, let’s look at another important question: “What is the percentage of Emacs users by country?”
We’ll need a function that takes a group (in this case, a Series) of country respondents about IDE preference and returns the percent that choose Emacs:
def per_emacs(ser):return ser.str.contains('Emacs').sum() / len(ser) * 100
When we need to calculate a percentage in pandas, we can use the mean
method. The following code is equivalent to the above:
def per_emacs(ser):return ser.str.contains('Emacs').mean() * 100
We’re now ready to pivot. In this case, we still want country in the index, but we only want a single column, the Emacs percentage. So we don’t provide a columns
parameter:
print(jb2.pivot_table(index='country_live', values='ide_main', aggfunc=per_emacs))
Using pd.crosstab
is a little more complicated because it expects a ”cross- tabulation” of two columns, one of which goes in the index and the other goes in columns. To get a "column” for the ...