Text Normalization

Learn how to perform numeric digit normalization and handle contractions using Python.

We'll cover the following...

Numeric digit normalization
Handling contractions

Numeric digit normalization

In text data, numbers can appear in diverse formats, leading to challenges in analysis and modeling. For example, “two pizzas” and “2 pizzas” might refer to the same quantity but appear differently. Numeric digit normalization addresses these discrepancies, allowing algorithms to treat different representations of the same number as equivalent. It involves converting different representations of numeric digits within text data into a standardized format and, as a result, helps ensure consistency in the representation of numbers, making it easier to analyze and understand the data.

Common approaches for performing numeric digit normalization include:

Converting words to digits: This approach involves converting numeric words to their corresponding digits. For example, “five” would be transformed into “5.” This would ensure that numeric words are consistently represented as digits, making them compatible with calculations and comparisons.
Converting digits to words: This technique involves converting numeric digits to words, which can enhance text readability. For instance, “10” could be transformed into “ten.”
Removing numeric separators: Numeric digits might be separated by commas, spaces, or other symbols. Removing ...

About This Course

Introduction To Text Preprocessing

Regular Expressions

Irrelevant Text Data

Basic Text Preprocessing Techniques

Indexing

Text Transformation

Text Representation

Text Feature Engineering

Advanced Text Preprocessing

N-grams

Text Classification of Customer Reviews

Conclusion

Text Classification Using PyTorch

Text Normalization

Numeric digit normalization