Introduction

In text classification, we can use n-grams as features for training a machine-learning model. A good use case of n-grams would be when classifying reviews as positive or negative sentiment. In such a situation, we can use bigrams (2-grams) or trigrams (3-grams) as features that can help the classifier identify phrases that convey sentiment more accurately. As such, we can use them over text representation techniques such as BoW, TF-IDF, or word embeddings because they require minimal preprocessing, which is advantageous when we have limited resources or time constraints. If such constraints don’t exist, we can use them together with the text representation techniques to yield better outcomes during further analysis.

Reasons for choosing n-grams

Here are a few other reasons why we might choose n-grams over other techniques during text preprocessing:

Press + to interact

About This Course

Introduction To Text Preprocessing

Regular Expressions

Irrelevant Text Data

Basic Text Preprocessing Techniques

Indexing

Text Transformation

Text Representation

Text Feature Engineering

Advanced Text Preprocessing

N-grams

Text Classification of Customer Reviews

Conclusion

Text Classification Using PyTorch

N-Grams for Text Classification

Introduction

Reasons for choosing n-grams