...

/

Accessors and Operations

Accessors and Operations

Learn the accessors and operations for handling sparse arrays.

Introduction

Having learned about how sparse data can be represented as SparseArray objects in pandas, let’s now look at the accessors and operations we can apply to these sparse arrays. We’ll look at the sparse dataset of movie ratings scored between 1 and 5 by different viewers, where NaN means that the movie isn’t rated yet:

Movie Ratings By Viewers


Movie 1

Movie 2

Movie 3

Movie 4

Movie 5

Movie 6

Viewer 1

NaN

3.0

NaN

5.0

3.0

NaN

Viewer 2

NaN

NaN

3.0

NaN

NaN

3.0

Viewer 3

2.0

1.0

1.0

NaN

NaN

1.0

Viewer 4

5.0

NaN

NaN

NaN

NaN

5.0

Viewer 5

NaN

NaN

NaN

2.0

NaN

NaN

Viewer 6

2.0

NaN

NaN

NaN

NaN

NaN

Accessors

The SparseArray object supports the .sparse accessor for sparse-specific methods and attributes. It’s similar to the other accessors we have seen before, such as .str for string data and .dt for datetime data. Firstly, let’s convert the original DataFrame into a fully sparse representation:

Press + to interact
# Convert df to sparse representation for all columns
df_sparse = df.copy()
for col in df_sparse.columns:
df_sparse[col] = pd.arrays.SparseArray(df_sparse[col])
# View dtypes
print(df_sparse.dtypes)

We can then use the .sparse accessor to find attributes, such as fill and non-fill values of a SparseArray and the density of a DataFrame (i.e., the proportion of non-fill values).

Press + to interact
# Get fill value of a DataFrame column
print('Fill value of Movie 1 col:', df_sparse['Movie 1'].sparse.fill_value)
# Get non-fill values of a DataFrame column
print('Non-fill values of Movie 1 col:', df_sparse['Movie 1'].sparse.sp_values)
# Get density of Sparse DataFrame
print('Density:', df_sparse.sparse.density)

In the example above, the fill_value and sp_values attributes are for the SparseArray at the column level (i.e., an array with SparseDtype). On the other hand, the density attribute is generated from the DataFrame.sparse accessor because it applies to the entire sparse DataFrame. This is because pandas has included the .sparse accessor for DataFrames as well.

The DataFrame.sparse accessor also lets us perform conversions to other formats. For instance, the following code shows how to convert a sparse DataFrame into a sparse SciPy COO (Coordinate Format) matrix:

Press + to interact
# Convert df to sparse representation for all columns
df_sparse = df.copy()
# Ensure every sparse array has fill value of 0 in order to convert to COO
for col in df_sparse.columns:
df_sparse[col] = pd.arrays.SparseArray(df_sparse[col], fill_value=0)
# Convert to SciPy COO matrix
coo_matrix = df_sparse.sparse.to_coo()
print(f'SciPy COO matrix:\n{coo_matrix}\n')

The COO representation is a sparse matrix format for efficiently storing ...