Using a Suitable Corpus Class
Learn about the different types of corpora in the tm package and plug-in packages for efficient text mining and NLP analysis in R.
We'll cover the following...
Let’s do a deeper exploration of the corpora included as part of the tm package via plug-in packages.
Corpus
Corpus is a convenient alias to create either a SimpleCorpus or a VCorpus, depending on the arguments provided. For example, SimpleCorpus can’t contain XML, so if we were to use Corpus with XML, Corpus would create a VCorpus. Here is an example of Corpus:
This a simple example. At the top of the structure list, we’ll see a line listing the classes where it is listed as a SimpleCorpus. If the source had been anything other than DataframeSource, DirSource, or VectorSource, this would have been a VCorpus.
Here is the Corpus command with all arguments defined:
xis asourceobject.readerControlis a list of two components:readerandlanguage.The
readerfunction constructs a text document from the files identified byx....