Installation and Documentation
Learn about installing the text mining package and its documentation.
We'll cover the following
Introduction to the tm
package
R is an excellent programming language for statistics and matrix manipulation. Given a table or matrix where data is organized, R can return a wealth of insights and visualizations. However, data from real-life applications is rarely stored in clean tables with well-ordered rows and columns. Data is messy and requires cleaning, which is often referred to as data wrangling.
Human language is a prime example of messy data. Concepts aren’t tagged, and context is fluid. There are no standardized rules and no reliable indicators to help a computer understand what is being said and how to separate the information from the presentation.
This is where natural language processing (NLP) comes in. NLP is a collection of tools and techniques to convert human language into a format useful to computers. If we wanted to, we could do this by hand, but it would be painfully long. Instead, it’s easier to use a framework. In this part of the course, we’ll use a package called tm
, which is short for text mining.
Installation process
To use tm
, we’ll need to install it in our copy of R.
Get hands-on with 1400+ tech skills courses.