Introduction to N-Grams

Learn what n-grams are and how to implement bigrams using Python.

Overview

N-grams in text preprocessing are sequences of nn number of items, such as words or characters, extracted from text data. They help address the challenge of capturing linguistic relationships and context in text data. For example, by extracting sequences of adjacent items, such as words or characters, n-grams enable models to understand the associations between elements with a deeper context. This is particularly true for sentiment analysis tasks, where capturing phrases such as “not good” is crucial for understanding negation. Additional benefits of n-grams include enhancing text classification by considering the co-occurrence of words and improving the accuracy of machine translation by considering word sequences. Here are common types of n-grams represented in a table:

Get hands-on with 1200+ tech skills courses.