...

/

Using the File Folder as Corpus

Using the File Folder as Corpus

Learn about using files and folders as SimpleCorpus.

The documentation for tm is nearly 60 pages long and immediately dives into the mechanics of NLP. Rather than trying to understand the entire depth of this package in one go, let’s break it down into understandable and related components. The tm package can be broken down into these main topics:

  • Corpora and sources

  • Metadata

  • Preprocessing: Cleaning, stopwords, and stemming

  • Tokenizing: Words, n-grams, weighting ...

Access this course and 1400+ top-rated courses and projects.