Using the File Folder as Corpus
Learn about using files and folders as SimpleCorpus.
We'll cover the following...
The documentation for tm
is nearly 60 pages long and immediately dives into the mechanics of NLP. Rather than trying to understand the entire depth of this package in one go, let’s break it down into understandable and related components. The tm
package can be broken down into these main topics:
Corpora and sources
Metadata
Preprocessing: Cleaning, stopwords, and stemming
Tokenizing: Words, n-grams, weighting
Statistics: Term frequency
Visualization
In this lesson, ...