Ordered Numerical Data
Learn how to handle and retrieve information from ordered sequences of numerical data.
We'll cover the following...
Introduction
When handling numerical features, one of the intuitive things we’ll do is to get a sense of the ordered sequence of the numbers. Beyond the basic sort_values()
and sort_index()
that we’re already familiar with, we can retrieve information from sorted arrays with the methods of nlargest()
, nsmallest()
, and rank()
. We’ll use a subset of the credit card dataset to illustrate these pandas
methods.
Preview of Subset of Credit Cards Dataset
ID | Limit | Rating | Cards | Age | Gender | Balance |
1 | 3606 | 283 | 2 | 34 | Male | 333 |
2 | 6645 | 483 | 3 | 82 | Female | 903 |
3 | 7075 | 514 | 4 | 71 | Male | 580 |
4 | 9504 | 681 | 3 | 36 | Female | 964 |
5 | 4897 | 357 | 2 | 68 | Male | 331 |
6 | 8047 | 569 | 4 | 77 | Male | 1151 |
Note: The ID column whose values start with 1 refers to the identification numbers of customers. It’s different from the actual index of a DataFrame that begins with
0
.
Largest n
The nlargest()
method is used to retrieve the first n rows ordered by one or more columns in descending order. Given that the columns will be sorted in descending order, these first n rows are essentially the n largest values of the columns. On the other hand, the columns that aren’t specified in the nlargest()
method won’t be used for the ordering operation.
Let’s say we want to obtain information on the top five credit card holders based on the size of their credit limit in the Limit
column.
# Obtain top 5 customers with largest credit limitdf_largest = df.nlargest(n=5, columns='Limit')print(df_largest)
The n
parameter takes an integer to indicate the number of rows to return, while the columns
parameter defines the columns to order by. In the example above, we chose to filter by just one column. We can apply nlargest()
to more than one column by passing a list of column names into columns
...