Preserve Phrases with N-grams
Learn to use the tm package to create n-grams from a document.
We'll cover the following
N-grams are phrases
Tokenization can be adjusted to respect lines, sentences, and paragraphs as well as words. But what about phrases? For example, “Frankenstein's monster” and “philosopher’s stone” are both phrases characteristic of Mary Shelley’s writing. Neither of them would be broken out by the tokenization strategies we’ve discussed so far. Instead, they require a strategy called n-grams.
Most frequent phrases
For our work with Mary Shelley, it’ll be helpful to know a list of the most frequent phrases. The following code produces this list:
Get hands-on with 1400+ tech skills courses.