Advanced Indexing 1
Learn how to use advanced indexing techniques to work effectively with MultiIndex DataFrames
Motivation
Although we typically deal with single-indexed datasets in most situations, it’s helpful to understand how to work effectively with MultiIndex DataFrames using advanced indexing methods. These MultiIndex DataFrames enable us to perform sophisticated data analysis and manipulation, especially when dealing with higher dimensional data.
MultiIndex DataFrames may seem slightly complicated to work with, but this is due to our infrequent interaction with them. Let’s begin with a quick refresher. A pandas
MultiIndex DataFrame is a DataFrame that has a hierarchical index (aka MultiIndex), which means that there are multiple index levels on either the row or column axis. In simpler terms, it’s a DataFrame with multiple columns acting as a row identifier or multiple rows acting as a column identifier.
Example of a MultiIndex DataFrame
store_name | fruit_name | 2021 | 2021 | 2021 | 2022 | 2022 | 2022 |
unit_price | unit_cost | kg | unit_price | unit_cost | kg | ||
Midtown | Cavendish Banana | 3.95 | 1.87 | 1 | 4 | 1.95 | 1 |
Downtown | Cavendish Banana | 4.25 | 1.87 | 1 | 4 | 1.95 | 1 |
Midtown | Spain Black Plum | 4.9 | 2.5 | 0.4 | 5 | 2.6 | 0.5 |
Midtown | Salustiana Orange | 5.25 | 3.55 | 0.5 | 5.4 | 3.6 | 0.6 |
Downtown | Salustiana Orange | 5.9 | 3.55 | 0.5 | 6 | 3.6 | 0.6 |
In the table above, the DataFrame has MultiIndex features at both the row and column levels. The row MultiIndex has two levels, the store_name and fruit_name. The column MultiIndex has two levels as well that is a level for the year (2021 and 2022) and another for column names (unit_price, unit_cost, and kg).
Sorting MultiIndex
For the MultiIndex DataFrames to be indexed and sliced effectively, it’s recommended that we first sort the indexes using sort_index()
. For example, we can sort out our DataFrame based on the two indexes of store_name
and fruit_name
in ascending order. ...