Other Features and Properties
Discover some other features, properties, and caveats of handling categorical data.
Introduction
Having covered the essential operations and methods around categorical data, let's wrap up this chapter by going over some other noteworthy features and properties.
Unioning categories
To combine multiple categorical variables with different categories, we must first create a common set of categories for them. We can do so with the union_categoricals()
function, which generates a union of the categories being combined. It works with Series
, Categorical
, and CategoricalIndex
, and the output of the union operation is a Categorical
object.
When we refer to data types, there are two concepts at play:
DataFrame columns (or
Series
objects) can have differentdtypes
. When dealing with a categorical variable, we can encode it withdtype='category'
. For such situations, we’ll use the termdtype
.For categorical variables that are already encoded with
dtype='category'
, the category elements can have different data types too. For example, theEducation
column contains integer values indicating the different education levels, whereas theEthnicity
column contains string values indicating the various ethnicity groups. For such situations, we’ll use the term data type.
An important thing to note is that for union_categoricals()
to work, all the categories must have the same data type. For example, we can successfully combine two category-encoded Series
objects with categories of the same integer data type.
Get hands-on with 1400+ tech skills courses.