Categorical Data in pandas

What is categorical data?

In pandas, the category data type is a hybrid data type. It frequently has the appearance and behavior of a string but is actually an array of integers. It makes it possible to arrange data in a specific order and to store it more effectively.

Benefits of categorical data

When we load data, we can indicate that the data is categorical. If we know that our data is limited to a few values, we might want to use categorical data. Categorical values have a few benefits:

  • They use less memory than strings.
  • They improve performance.
  • They can have an ordering.
  • They can perform operations on categories.
  • They enforce membership on values.

Categories are not limited to strings; we can also convert numbers or datetime values to categorical data.

How to create a category in a series

To create a category, we pass dtype="category" into the Series constructor. Alternatively, we can call the astype("category") method on a Series:

Get hands-on with 1200+ tech skills courses.