Preprocessing Using pandas
Learn to write a simple text preprocessing pipeline to normalize texts.
We'll cover the following
The preprocessing
module of the RecordLinkage package does a good job of preparing string attributes for matching. Its main clean
function is a composition of several steps. Are these same steps always the best? Let’s understand and gain control over text preprocessing by building our version of clean
.
The clean
function
To preprocess strings, recordlinkage
uses the str
methods of pandas
. We use the following examples to showcase the default preprocessing steps:
Get hands-on with 1400+ tech skills courses.