Text Data Types
Explore the general concepts behind handling and manipulating text data types in pandas.
We'll cover the following
Importance of working with text
In today’s increasingly online world, we’re generating vast amounts of data at an unprecedented rate. One type of data that is increasing exponentially and becoming increasingly important is text data.
From social media posts to online reviews, news articles, and product descriptions, text data is ubiquitous and offers insights into consumer behavior, sentiment, and preferences. Therefore, working with text data is a crucial skill for data practitioners.
Handling text data, however, can be challenging due to its unstructured nature. Unlike structured data, typically stored in tables with predefined columns, text data can take different forms and contain a wide range of information.
Fortunately, pandas
comes with numerous capabilities that can help us work with text data effectively.
Text data types
There are two ways to store text data in pandas
:
The
object
data type (NumPy
array)The
StringDtype
extension type
Note: The terms
dtype
and data type refer to the same concept becausedtype
is short for data type. They are used to describe the type of data that is stored in a DataFrame orSeries
. To keep things clear and standardized, we’ll use data type when describing general concepts and the keyworddtype
when referring topandas
code.
While it’s generally recommended to store text data using StringDtype
due to its clarity, the object
data type remains the default type when inferring a list of strings for backward compatibility with older pandas
versions.
Suppose we have the following mock dataset of three webcam products and their corresponding retail prices:
Get hands-on with 1400+ tech skills courses.