Dealing With Missing Values in R
Learn how to drop, impute, and replace null values in R.
Strategies to deal with null values
Missing values cause errors or lead to unexpected results in our code. Additionally, null values may make understanding the data difficult. By dealing with null values appropriately, we can ensure that our code runs smoothly and that our data is accurate and meaningful.
There are different methods to deal with null values in data analytics, like removing them and replacing them with appropriate values.
Drop null values
The first method is to drop the null values in datasets. We should prefer this method only when the percentage of null values is so low that data cohesion is not disrupted.
If we remove too many rows because they have null values, we damage the columns with valid values.
We can delete data with nulls using the na.omit() function. This function allows for dropping all rows that include at least one null value. It simply takes the dataset as input and outputs the dataset that has no rows with null values.
na.omit(<data>) # Syntax structure
Let’s see the examples below to understand the concept better.
-
Line 6: We find the total number of rows that include null values using the
is.na()function. -
Line 12: We remove all rows with nulls using the
na.omit()function. -
Line 19: We calculate the percentage of rows removed from the original dataset. It is calculated by dividing the number of removed rows by the number ...