Efficiency Boosts
Learn about various strategies and techniques to uplift pandas efficiency.
Introduction
In today’s world of ever-expanding datasets and increasingly complex data transformations, efficiency is paramount. While pandas
is a handy tool, its versatility can sometimes lead to inefficiencies that slow down computations, particularly when working with large datasets.
In this lesson, we’ll turn our focus toward techniques that can significantly enhance the performance of pandas
operations. While there are numerous techniques available, but we’ll focus on two essential ones:
Optimize row and column operations
Speed-up filtering and querying
We’ll work with an online retail transaction dataset, as shown below:
Online Retail Transactions Data
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country |
536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 1/12/2010 8:26 | 2.55 | 17850 | United Kingdom |
536365 | 71053 | WHITE METAL LANTERN | 6 | 1/12/2010 8:26 | 3.39 | 17850 | United Kingdom |
536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 1/12/2010 8:26 | 2.75 | 17850 | United Kingdom |
536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 1/12/2010 8:26 | 3.39 | 17850 | United Kingdom |
Optimize row and column operations
A common way to perform row and column operations is with the use of iterator functions. These functions iterate over the rows of a DataFrame without loading all the data into memory. They are particularly useful when dealing with large datasets because they help us avoid memory issues and improve performance. The two iterator functions that are specifically for row operations are:
iterrows()
: Iterates over the rows ...
Get hands-on with 1400+ tech skills courses.