Stopword Removal

Learn why stopwords negatively affect our NLP research and how to remove them.

Stopwords

In our project, we use words from each novel to identify interesting discussion groups. Words like “and,” “the,” or “that” are too common to have any use for this task. What we need is to remove these types of words from consideration.

Introduction to stopword removal

Stopwords make sentences pleasant to read and sometimes clarify the context of associated words. For the most part, they aren’t important for natural language processing. An important part of text mining is removing these connecting words, which is called stopword removal.

Let’s look at a simple example:

Press + to interact
library(tm, quietly=TRUE)
myText <- "Stopwords are nice words for humans.
They make sentences pleasant to read and sometimes clarify
the context of associated words, but for the most part,
they aren't important for natural language processing.
An important part of text mining is removing these
extraneous words; it's called stopword removal."
removeWords(myText, stopwords("english"))
  • Line 3: The ...

Access this course and 1400+ top-rated courses and projects.