How to remove stop words with NLTK library in Python

What exactly are stop words?

Stop words are words in any language or corpus that occur frequently. For some NLP tasks, they do not provide any additional or valuable information to the text containing them. Words like a, they, the, is, an, etc. are usually considered stop words.

Let’s take the title of this article as an example:

How to remove stop words with NLTK library in Python

Words like how, to, with, and in, do not clearly state the topic of the article. However, keywords like remove, stop words, NLTK, library, and Python, give a much clearer idea of what to expect from this article.

Interestingly, some of these keywords are part of the tags for this article :)

Text: "How to remove stop words with NLTK library in Python"
Tokens: ['how', 'to', 'remove', 'stop', 'words', 'with', 'nltk', 'library', 'in', 'python']
Text without stop words: "remove stop words nltk library python"

Specializing

Sometimes you may need to add or remove words from your list of stop words.

For example, imagine you’re trying to classify food magazines based on what kinds of foods are the focus. Now, you would expect that the word food (or similar words) would be mentioned a lot. These would not provide valuable information.

Hence, food is a stop word and you may consider adding it to your list of stop words.

Luckily, stopwords.words('english') returns a regular Python list which we can easily modify. Keep in mind that this does not change the stop words you downloaded to your disk.

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

How to remove stop words with NLTK library in Python

Introduction

What exactly are stop words?

Removing stop words

Specializing

Not just in English