Modify Categorical Data
Learn the various techniques for modifying categorical data.
Methods for modifying categorical data
Once we have our DataFrame columns correctly encoded with dtype=category
, there are numerous methods that we can apply to them. We’ll explore some common methods with the example of the Education
column from the credit card dataset.
Note: When dealing with
Series
objects, which include DataFrame columns, we include acat.
prefix in front of the methods for them to work e.g.,cat.add_categories()
. It’s a way for us to access the methods that come with the.cat
attribute ofSeries
objects.
View category properties
Besides printing out the entire column to view the output, another way to check the categorical properties of the Education
column is with the attributes categories
and ordered
, as shown below:
# View categories of Education columnprint(df['Education'].cat.categories)# View whether the categories are ordered or not (boolean output)print('Categories are ordered:', df['Education'].cat.ordered)
The output above shows the complete list of categories and a boolean indicator to specify whether the list is ordered.
Add categories
We can use the add_categories()
method to append new categories to the list of categories. For example, we can add an education level 21
to the list of education levels with the following code:
# Append new category (education level 21) to listdf['Education'] = df['Education'].cat.add_categories([21])# View categoriesprint(df['Education'].cat.categories)