Search⌘ K

Grouping

Explore the concept of grouping in pandas to segment data into meaningful sets for calculation and analysis. Understand how to manage hierarchical indexes, group by custom functions, and handle categorical data efficiently. This lesson equips you to confidently apply grouping methods for effective data manipulation and analysis using pandas.

What is grouping?

Grouping is a method of dividing data into distinct sets in order to perform calculations/computations for a more thorough analysis.

Grouping by hierarchy

We just saw how much hierarchical columns bothered us. But, they’re sometimes useful. Now we’re going to see how to create hierarchical indexes. Suppose that someone asks about minimum and maximum age for each country and editor. We want to have both the country and the editor in the index. To do this, we just need to pass in a list of columns we want in the index:

Python 3.8
print(jb2.pivot_table(index=['country_live', 'ide_main'],
values='age', aggfunc=[min, max]))

Here is the groupby version:

Python 3.8
print(jb2
.groupby(by=['country_live', 'ide_main'])
[['age']]
.agg([min, max])
)

If you look carefully, you’ll note that the results ...