Categorical Data in pandas
Learn about the categorical data type in pandas.
What is categorical data?
In pandas, the category data type is a hybrid data type. It frequently has the appearance and behavior of a string but is actually an array of integers. It makes it possible to arrange data in a specific order and to store it more effectively.
Benefits of categorical data
When we load data, we can indicate that the data is categorical. If we know that our data is limited to a few values, we might want to use categorical data. Categorical values have a few benefits:
- They use less memory than strings.
- They improve performance.
- They can have an ordering.
- They can perform operations on categories.
- They enforce membership on values.
Categories are not limited to strings; we can also convert numbers or datetime
values to categorical data.
How to create a category in a series
To create a category, we pass dtype="category"
into the Series constructor. Alternatively, we can call the astype("category")
method on a Series:
Get hands-on with 1400+ tech skills courses.