Introduction to Data Analysis and Visualization with R/

...

Dealing With Missing Values in R

Learn how to drop, impute, and replace null values in R.

We'll cover the following...

Strategies to deal with null values
- Drop null values
- Imputation of null values
  - Direct imputation
  - Imputation based on clusters
How to choose the imputation method

Drop null values

The first method is to drop the null values in datasets. We should prefer this method only when the percentage of null values is so low that data cohesion is not disrupted.

If we remove too many rows because they have null values, we damage the columns with valid values.

We can delete data with nulls using the na.omit() function. This function allows for dropping all rows that include at least one null value. It simply takes the dataset as input and outputs the dataset that has no rows with null values.

na.omit(<data>) # Syntax structure

Let’s see the examples below to understand the concept better.

Press + to interact

# We use the `airquality` dataset in this exercise
library(dplyr)
print('------- Preview of the `airquality` dataset: --')
print(head(airquality))  # Preview of the dataset
print('------- The number of null values in the `airquality` dataset: --')
nulls <- sum(is.na(airquality)) # Check the number of null values in the data
print(nulls)     # Print the number of null values
print('------- The number of rows in the `airquality` dataset: --')
nrow1 <- nrow(airquality) # Show the number of rows of the data frame
print(nrow1)  # Print the number of rows of the data frame
print('--- The number of null values in the dataset: -----')
airquality1 <- na.omit(airquality) # The null numbers are removed from the data
nulls1 <- sum(is.na(airquality1)) # Check the number of null values in the data
print(nulls1)    # Print the number of null values
nrow2 <- nrow(airquality1)
print('------ The number of rows after removing the rows with nulls: ------')
print(nrow2)  # Print the number of rows of the data frame after removing nulls
print('------ The ratio of the removed rows to the size of the original dataset: --------')
print((nrow1-nrow2) / nrow1)  # Calculate the percentage of the lost rows
print('-------- The number of rows after removing the nulls using the pipeline structure: --------')
nulls2 <- airquality %>% na.omit()   # The application of the na.omit function in pipeline
print(nrow(nulls2))   # Check the number of rows

Getting Started

File Management

Data Structures

Data Cleaning

Statistical Analysis

Data Transformation

Data Visualization

Uber Data Analysis Using the R Language

Conclusion

Evaluation

Netflix Shows

Dealing With Missing Values in R

Strategies to deal with null values

Drop null values