Speed up Column Operations
In this lesson, you may learn which method is the best to do the column operation.
In data processing, it is very common to perform overall operations on certain columns. pandas provides lots of methods for us to do this. The following are some methods we could choose. However, the performance of different methods varies greatly. When we have large amounts of data, the gap becomes more pronounced and has a greater impact on our productivity. In this lesson, we want to compare the performance, or running time, of different methods on the same task.
- Iteration by
iloc
. - Iteration by
.iterrows()
. apply()
function.- Vectorize like Numpy.
The task is very simple: just calculate the sqrt
of column a
.
Here is the result on my PC, but it varies greatly in different environments. As you can see, Vectorize
is the fastest method; however, iterrows()
is the slowest method. Vectorize
is about 9000
times faster than iterrows()
. So, we strongly recommend that you prioritize Vectorize.
Running time of iterrows is 8.340519215s
Running time of iloc is 0.8993627980s
Running time of apply() is 0.032458498000s
Running time of Vectorize is 0.0009303759999s
Here is the code so you can have a try. It may take a few seconds to finish.
Get hands-on with 1400+ tech skills courses.