R data analysis packages
Data analysis is a complex and multifaceted process and includes a wide range of subdisciplines like data cleaning, transformation, and visualization. R is a powerful programming language that makes it easier for us to carry out these tasks. However, to fully leverage the capabilities of R, we must use packages to supplement the built-in functions.
This lesson provides instructions on accessing the following practical and popular libraries used in this course: ggplot2
, tidyr
, readr
, stringr
, dplyr
, readxl
, and tidyverse
.
ggplot2
The ggplot2
library is a popular tool utilized by other programming languages, such as Python, that enables the plotting of graphs in various forms. Some of the plot formats are listed here:
- Line plots
- Bar plots
- Scatter plots
- Histograms
- Density plots
The ggplot2
library also allows customizing a wide range of features associated with graphs, including line color, title, and element size.
Run the following code to load the library:
library(ggplot2) # Load ggplot2 library
tidyr
The tidyr
library provides functions that help us do the following:
- Convert data frames into long or wide forms
- Combine or split data frames
- Deal with null values
- Unravel dictionaries
We frequently rely on the tidyr
package to facilitate data cleaning and manipulation, which are often a dominant part of data analysis when data is untidy.
Run the following code to load the library:
library(tidyr) # Load tidyr library
readr
The readr
package allows us to read data from a wide range of sources, including:
- CSV
- TSV
- Excel
- JSON
- Text files
- Other files with delimiters
This is an essential package since built-in functions in R only support working with a limited number of data file types. The readr
package successfully solves this issue.
Run the following code to load the library:
library(readr) # Load readr library
stringr
The stringr
library focuses on the manipulation of strings. Here are some functionalities that stringr
provides:
- Slice and dice or concatenate strings
- Search/find/replace patterns in strings
- Create new string patterns
- Modify letter cases
Run the following code to load the library:
library(stringr) # Load stringr library
dplyr
The dplyr
library makes data frame manipulation easier with its dedicated pipeline operand %>%
.
The %>%
operand is one of the most popular operands in data analysis thanks to its user-friendly structure. With the pipeline operand, we can work on a specific piece of data by sequentially adding the steps we need to execute.
The dplyr
package provides the following functions for tabular data:
-
Select
-
Create
-
Mutate
-
Summarize
-
Filter
Run the following code to load the library:
library(dplyr) # Load dplyr library
readxl
The readxl
package allows us to do the following:
- Read data from Excel files
- Save data in Excel files in a practical way
The syntax structure is almost identical to the built-in functions.
Run the following code to load the library:
library(readxl) # Load readxl library
tidyverse
The tidyverse
library provides a set of subpackages that make data manipulation and analysis easier, making it an essential library for data science. Loading this package gives access to powerful functions for various practices, such as data wrangling, visualization, and statistical modeling. Each package mentioned above is a component of tidyverse
.
Run the following code to load the library:
library(tidyverse) # Load the tidyverse library
Remember that in addition to the libraries listed above, there are many others out there, some of which may offer similar functionalities. For the sake of simplicity, we will only use the listed ones in this course. Feel free to explore the other options available and find the libraries that best suit your needs.