...

/

Understanding Metadata in Text Analysis

Understanding Metadata in Text Analysis

Learn how metadata is used by the tm package during natural language processing.

Metadata

Metadata is information about the corpus and its content. This includes information like the author, timestamp, and so on. Metadata is data about data!

A corpus contains two types of metadata: corpus metadata and document-level metadata. Here’s how to list the document metadata:

Press + to interact
library(tm)
docDir <- DirSource(directory = "data")
newSimpleCorpus <- SimpleCorpus(docDir)
meta(newSimpleCorpus[[2]])

This is a list of the metadata for the second document in the corpus.

The meta function ...

Access this course and 1400+ top-rated courses and projects.