Reducing and Aggregating Terms

Learn about stemming and lemmatization to simplify words to their root form.

Stemming vs. lemmatization

Stemming and lemmatization are two techniques used in natural language processing (NLP) to reduce words to their base or root form. This is done to simplify text processing and analysis by grouping together different forms of the same word. While stemming and lemmatization serve a similar purpose, they differ in their approach.

Understanding stemming

Consider the following words:

  • Walked

  • Walking

  • Walker

  • Walk

These are derivatives of “walk.” When calculating word frequency on text containing these words, we may not want them to appear as four instances. Rather, it might make more sense for them to count as four instances of ”walk.” Stemming is the process of reducing words to the stem of the word.

Here’s an example of how stemming works.

Get hands-on with 1400+ tech skills courses.