Stemming and Lemmatization
Learn about stemming and lemmatization techniques, and how to apply them using Python.
We'll cover the following...
Stemming
Stemming is a text preprocessing technique that simplifies words by stripping prefixes and suffixes, yielding base forms for effective processing and storage. For instance, the word “running” becomes “run” once we’ve performed stemming. However, while performing this technique, it’s important to note that it can result in inaccuracies and semantic loss, as we’ll get to see. Because of this, even though stemming has advantages like reducing vocabulary, it requires careful application.
Press + to interact
main.py
reviews.csv
review_id,review_text,rating,author_name1,"I have to go to the store.",5,"john smith"2,"This product is amazing.",4,"jane doe"3,"This is the best movie I have ever seen.",5,"alex johnson"4,"The customer support was terrible.",2,"emily thompson"5,"The food was delicious.",4,"michael brown"6,"I had a wonderful experience at this hotel.",5,"sophia miller"7,"The shopping experience was very disappointing.",1,"william davis"8,"The spelling in this book is horrible.",3,"olivia wilson"9,"The performance of the actor was impressive.",4,"daniel anderson"10,"The product description is inaccurate.",2,"emily thompson"
Let’s review the code line by line:
Lines 1–2: We import the
pandas
library andSnowballStemmer
class fromnltk.stem
. ...
Access this course and 1400+ top-rated courses and projects.