Dealing With Missing Values in R
Learn how to drop, impute, and replace null values in R.
Strategies to deal with null values
Missing values cause errors or lead to unexpected results in our code. Additionally, null values may make understanding the data difficult. By dealing with null values appropriately, we can ensure that our code runs smoothly and that our data is accurate and meaningful.
There are different methods to deal with null values in data analytics, like removing them and replacing them with appropriate values.
Drop null values
The first method is to drop the null values in datasets. We should prefer this method only when the percentage of null values is so low that data cohesion is not disrupted.
If we remove too many rows because they have null values, we damage the columns with valid values.
We can delete data with nulls using the na.omit()
function. This function allows for dropping all rows that include at least one null value. It simply takes the dataset as input and outputs the dataset that has no rows with null values.
na.omit(<data>) # Syntax structure
Let’s see the examples below to understand the concept better.
# We use the `airquality` dataset in this exerciselibrary(dplyr)print('------- Preview of the `airquality` dataset: --')print(head(airquality)) # Preview of the datasetprint('------- The number of null values in the `airquality` dataset: --')nulls <- sum(is.na(airquality)) # Check the number of null values in the dataprint(nulls) # Print the number of null valuesprint('------- The number of rows in the `airquality` dataset: --')nrow1 <- nrow(airquality) # Show the number of rows of the data frameprint(nrow1) # Print the number of rows of the data frameprint('--- The number of null values in the dataset: -----')airquality1 <- na.omit(airquality) # The null numbers are removed from the datanulls1 <- sum(is.na(airquality1)) # Check the number of null values in the dataprint(nulls1) # Print the number of null valuesnrow2 <- nrow(airquality1)print('------ The number of rows after removing the rows with nulls: ------')print(nrow2) # Print the number of rows of the data frame after removing nullsprint('------ The ratio of the removed rows to the size of the original dataset: --------')print((nrow1-nrow2) / nrow1) # Calculate the percentage of the lost rowsprint('-------- The number of rows after removing the nulls using the pipeline structure: --------')nulls2 <- airquality %>% na.omit() # The application of the na.omit function in pipelineprint(nrow(nulls2)) # Check the number of rows
-
Line 6: We find the total number of rows that include null ...