Search⌘ K
AI Features

Optimizing Manipulations Using Cython

Explore how to enhance pandas data manipulation by using Cython to compile functions for native speed improvements. Understand when to use apply in string operations, how to optimize with typed Cython code, and leverage vectorized methods and regular expressions for efficient data processing.

We'll cover the following...

The previous example uses apply, and it’s clear by now that we don’t prefer that method because it’s slow. Let’s divert from strings for a minute and look at making the apply operation quicker using Cython.

Cython is a superset of Python that can compile to native code. To enable it in Jupyter, you’ll need to run the following cell magic:

Python 3.10.4
%load_ext Cython

Then you can define functions with Cython. We’re going to “cythonize” the between function as a first step:

Python 3.10.4
%%cython
import random
def between_cy(row):
return random.randint(*row.values)

When we benchmark this, it’s no faster than our current code. If we add types to Cython code, we can get a speed increase. We’ll try that here:

Python 3.10.4
%%cython
import random
cpdef int between_cy3(int x, int y):
return random.randint(x, y)

Because we’re ...