Understanding Metadata in Text Analysis
Learn how metadata is used by the tm package during natural language processing.
We'll cover the following...
Metadata
Metadata is information about the corpus and its content. This includes information like the author, timestamp, and so on. Metadata is data about data!
A corpus contains two types of metadata: corpus metadata and document-level metadata. Here’s how to list the document metadata:
Press + to interact
library(tm)docDir <- DirSource(directory = "data")newSimpleCorpus <- SimpleCorpus(docDir)meta(newSimpleCorpus[[2]])
This is a list of the metadata for the second document in the corpus.
The meta
function ...
Access this course and 1400+ top-rated courses and projects.