Ordered Numerical Data

Learn how to handle and retrieve information from ordered sequences of numerical data.

We'll cover the following...

Introduction

When handling numerical features, one of the intuitive things we’ll do is to get a sense of the ordered sequence of the numbers. Beyond the basic sort_values() and sort_index() that we’re already familiar with, we can retrieve information from sorted arrays with the methods of nlargest(), nsmallest(), and rank(). We’ll use a subset of the credit card dataset to illustrate these pandas methods.

Preview of Subset of Credit Cards Dataset

ID

Limit

Rating

Cards

Age

Gender

Balance

1

3606

283

2

34

Male

333

2

6645

483

3

82

Female

903

3

7075

514

4

71

Male

580

4

9504

681

3

36

Female

964

5

4897

357

2

68

Male

331

6

8047

569

4

77

Male

1151

Note: The ID column whose values start with 1 refers to the identification numbers of customers. It’s different from the actual index of a DataFrame that begins with 0.

Largest n

The nlargest() method is used to retrieve the first n rows ordered by one or more columns in descending order. Given that the columns will be sorted in descending order, these first n rows are essentially the n largest values of the columns. On the other hand, the columns that aren’t specified in the nlargest() method won’t be used for the ordering operation.

Let’s say we want to obtain information on the top five credit card holders based on the size of their credit limit in the Limit column.

Press + to interact
# Obtain top 5 customers with largest credit limit
df_largest = df.nlargest(n=5, columns='Limit')
print(df_largest)

The n parameter takes an integer to indicate the number of rows to return, while the columns parameter defines the columns to order by. In the example above, we chose to filter by just one column. We can apply nlargest() to more than one column by passing a list of column names into columns ...