...

/

Modify Categorical Data

Modify Categorical Data

Learn the various techniques for modifying categorical data.

Methods for modifying categorical data

Once we have our DataFrame columns correctly encoded with dtype=category, there are numerous methods that we can apply to them. We’ll explore some common methods with the example of the Education column from the credit card dataset.

Note: When dealing with Series objects, which include DataFrame columns, we include a cat. prefix in front of the methods for them to work e.g., cat.add_categories(). It’s a way for us to access the methods that come with the .cat attribute of Series objects.

View category properties

Besides printing out the entire column to view the output, another way to check the categorical properties of the Education column is with the attributes categories and ordered, as shown below:

Press + to interact
# View categories of Education column
print(df['Education'].cat.categories)
# View whether the categories are ordered or not (boolean output)
print('Categories are ordered:', df['Education'].cat.ordered)

The output above shows the complete list of categories and a boolean indicator to specify whether the list is ordered.

Add categories

We can use the add_categories() method to append new categories to the list of categories. For example, we can add an education level 21 to the list of education levels with the following code:

Press + to interact
# Append new category (education level 21) to list
df['Education'] = df['Education'].cat.add_categories([21])
# View categories
print(df['Education'].cat.categories)
...