Preprocessing Using pandas

Learn to write a simple text preprocessing pipeline to normalize texts.

The preprocessing module of the RecordLinkage package does a good job of preparing string attributes for matching. Its main clean function is a composition of several steps. Are these same steps always the best? Let’s understand and gain control over text preprocessing by building our version of clean.

The clean function

To preprocess strings, recordlinkage uses the str methods of pandas. We use the following examples to showcase the default preprocessing steps:

Get hands-on with 1200+ tech skills courses.