What is an n-gram representation?

import re
def n_gram(text, n=1):
    text = text.lower()
    text = re.sub(r'[^a-zA-Z0-9\s]', ' ', text)
    gram_tokens = [token for token in text.split(" ") if token != ""]
    ngrams = zip(*[gram_tokens[i:] for i in range(n)])
    return [" ".join(ngram) for ngram in ngrams]
def unigram(text):
    print("Unigram")
    print(n_gram(text, 1))
def bigram(text):
    print("Bigram")
    print(n_gram(text, 2))
def trigram(text):
    print("Trigram")
    print(n_gram(text, 3))
if __name__ == "__main__":
    text = "Educative is the best platform"
    unigram(text)
    bigram(text)
    trigram(text)

Explanation

Line 1: We import the re module.
Line 3: We define the n_gram() method. This generates the n-gram for the given text and the n value.
Line 4: The text is converted to lowercase.
Line 5: The non-alphanumeric characters in the text are replaced with space.
Line 6: The tokens are generated by splitting the text by the space character.
Lines 7–8: The n-grams are generated and returned as a list.
Lines 10–12: We define the unigram() method. This generates the unigram representation of the text by invoking the n_gram() method with n=1.
Lines 14–16: We define the bigram() method. This generates the bigram representation of the text by invoking the n_gram() method with n=2.
Lines 18–20: We define the trigram() method. This generates the trigram representation of the text by invoking the n_gram() method with n=3.
Line 23: We define the text.
Line 24: We invoke the unigram() method.
Line 25: We invoke the bigram() method.
Line 26: We invoke the trigram() method.

Free Resources

License: Creative Commons-Attribution NonCommercial-ShareAlike 4.0 (CC-BY-NC-SA 4.0)

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

TRENDING TOPICS

Learn to Code

Tech Interview Prep

Generative AI

Data Science

Machine Learning

GitHub Students Scholarship

Early Access Courses

Blind 75

Layoffs

Pricing

For Individuals

Try for Free

Gift a Subscription

CONTRIBUTE

Become an Author

Become an Affiliate

Earn Referral Credits

RESOURCES

Blog

Cheatsheets

Webinars

Answers

ABOUT US

Our Team

Careers

Hiring

Frequently Asked Questions

Press

LEGAL

Cookie Policy

Business Terms of Service

Data Processing Agreement

INTERVIEW PREP COURSES

Grokking the Modern System Design Interview

Grokking the Product Architecture Design Interview

Grokking the Coding Interview Patterns

Machine Learning System Design