Pandas DataFrame Operations - Dealing With Missing and Duplicates
8. Dealing With Missing Values
The difference between fake data and real-world data is that real data is rarely clean and homogeneous. One particular issue that we need to tackle when working with real data is that of missing values. And it’s not just about values being missing, different data sources can indicate missing values in different ways as well.
The two flavors in which we are likely to encounter missing or null values are:
- None: A Python object that is often used for missing data in Python. None can only be used in arrays with data type ‘object’ (i.e., arrays of Python objects).
- NaN (Not a Number): A special floating-point value that is used to represent missing data. A floating-point type means that, unlike with None’s object array, we can perform mathematical operations. However, remember that, regardless of the operation, the result of arithmetic with NaN will be another NaN.
Run the examples in the code widget below to understand the difference between the two. Observe that performing arithmetic operations on the array with the None type throws a run-time error while the code executes without errors for NaN:
Get hands-on with 1400+ tech skills courses.