What is lexicon-based sentiment analysis?

Step 1: Tokenization

Tokenization divides the sentence, including punctuation, into tokens. For example, take the sentence The food is good. After tokenization, it will become:

Tokenization
The
food
is
good
.

Step 2: Cleaning the data

In this step all the special characterspunctuation are removed (i.e., (,.,,,!, etc.)

In the given example, . is removed, which leaves us with, The food is good.

Cleaneased
The
food
is
good

Step 3: Removing stop words

Stop words are words like, and, are, the was, is, etc. So, after we remove the stop words in our example (is and the), only good and food will be left.

Stop Words Removal
food
good

Step 4: Classification

Now, the data is will be classified as negative, positive, or neutral and will be given a point from -1 to 1.

negative means -1
positive means 1
neutral means 0

In the stated example, food is neutral, so the given score will be 0, and good, so the given score will be 1.

Words	Score
food	neutral (0)
good	positive (1)

Free Resources

License: Creative Commons-Attribution-ShareAlike 4.0 (CC-BY-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design

What is lexicon-based sentiment analysis?

What is sentiment analysis?

Where is sentiment analysis used?

Step-by-step

Step 1: Tokenization

Step 2: Cleaning the data

Step 3: Removing stop words

Step 4: Classification

Step 5: Calculation