Speed up Column Operations

In this lesson, you may learn which method is the best to do the column operation.

In data processing, it is very common to perform overall operations on certain columns. pandas provides lots of methods for us to do this. The following are some methods we could choose. However, the performance of different methods varies greatly. When we have large amounts of data, the gap becomes more pronounced and has a greater impact on our productivity. In this lesson, we want to compare the performance, or running time, of different methods on the same task.

  • Iteration by iloc.
  • Iteration by .iterrows().
  • apply() function.
  • Vectorize like Numpy.

The task is very simple: just calculate the sqrt of column a.

Here is the result on my PC, but it varies greatly in different environments. As you can see, Vectorize is the fastest method; however, iterrows() is the slowest method. Vectorize is about 9000 times faster than iterrows(). So, we strongly recommend that you prioritize Vectorize.

Running time of iterrows is 8.340519215s
Running time of iloc is 0.8993627980s
Running time of apply() is 0.032458498000s
Running time of Vectorize is 0.0009303759999s

Here is the code so you can have a try. It may take a few seconds to finish.

Get hands-on with 1400+ tech skills courses.