Assertion Functions

Discover how to use assertions functions to test data integrity in pandas.

Assertion functions

An important part of ensuring data integrity in analysis and modeling processes is the use of assertions. Assertions allow us to set up checks to confirm that our code behaves as expected. The pandas library provides a module named testing that comes with assertion functions for comparing pandas objects with one another.

It’s useful for unit tests and data quality checks so that we can catch errors early before they cause problems down the line. There are numerous assertion functions available, but we’ll focus on the commonly used ones:

Overview of Commonly Used Assertion Functions

Assertion Function

Description

assert_frame_equal()

Checks that left and right DataFrames are equal

assert_series_equal()

Checks that left and right Series objects are equal

assert_index_equal()

Checks that left and right indexes are equal

assert_extension_array_equal()

Checks that left and right ExtensionArrays are equal.

DataFrame equality

The following example shows how assert_frame_equal() can be used to check the equality of two DataFrames:

Press + to interact
# Import assertion functions
from pandas.testing import assert_frame_equal
# Generate pair of DataFrames (equal) to check
df_left = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df_right = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
# Check equality for df pair
print(' === Assertion Check === ')
assert_frame_equal(df_left, df_right)

The example above shows that when the DataFrames are equal, there will be no output returned. The lack of output indicates that the assertion check has passed. On the other hand, if we have DataFrames with differences in values, we’ll get an AssertionError, as shown in the example below:

Press + to interact
# Import assertion functions
from pandas.testing import assert_frame_equal
# Generate pair of DataFrames (unequal) to check
df_left = pd.DataFrame({'A': [9999, 2], 'B': [3, 4]}) # Change value 1 to 9999
df_right = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
# Check equality for df pair
print(' === Assertion Check === ')
assert_frame_equal(df_left, df_right)

The good thing about these assertion checks is that the AssertionError error displays clear information on where the inequality ...