Performing Natural Language Processing with R/

...

Analyzing Textual Comparisons with Document-Term Matrices

Learn the significance of document-term matrices in text mining.

We'll cover the following...

Why use document-term matrices?
DTM retains document relationships
Why corpora?

Press + to interact

library(tm, quietly = TRUE) 
docDir <- DirSource(directory = "data",pattern = "mws_.+txt")
newCorpus <- Corpus(docDir)
DTmatrix <- DocumentTermMatrix(newCorpus, 
                     control = list(tolower = TRUE,
                                    stopwords = TRUE,
                                    stripWhiteSpace = TRUE, 
                                    removePunctuation = TRUE,
                                    removeNumbers = TRUE,
                                    tokenize = "Boost"
                                    )
                               )
inspect(DTmatrix)

Before We Begin

Important Concepts in Natural Language Processing

Text Mining Package

Understanding Corpora and Sources

Converting Text to Structured Data

Document Insights and Advanced Search Techniques

Working with Metadata in the tm Package

Implementing NLP with the quanteda Package

Implementing NLP with the tidytext Package

Assess What You Have Learned About NLP

Concluding Remarks

Appendix

Analyzing Textual Comparisons with Document-Term Matrices

Why use document-term matrices?