What is the DataFrame.lazy() method in Polars?
The DataFrame.lazy() method
The DataFrame.lazy() method in Polars is used to initiate a lazy computation on a DataFrame. This means that the operations applied to the DataFrame will not be executed immediately but will be stored as a
Syntax
Let’s see the syntax of the lazy() method:
DataFrame.lazy()
Return value
The DataFrame.lazy() method returns a LazyFrame object. The LazyFrame object is similar to a DataFrame object, but it’s lazily evaluated.
Code
We create a sample DataFrame with three columns a, b, and c to apply a lazy() method below:
import polars as pldf = pl.DataFrame({"a": [50, 100, 35, 87],"b": [9.2, 5.4, 2.5, 13.4],"c": [True, True, False, True],"d": [23, 65, 83, 91],})lazy_frame = df.lazy()print(lazy_frame)#Another example of lazy() method with filterlazy_frame2 = df.lazy().filter(pl.col("a") == 100)print(lazy_frame2)
Explanation
Here’s a step-by-step explanation of the provided code:
Lines 3–10: We create a DataFrame named
dfusing thepl.DataFrame()constructor. The DataFrame has four columns (a,b,c, andd) with some data.Line 12: We apply the
lazy()method to the DataFramedf, creating a LazyFrame namedlazy_frame. This LazyFrame represents a computation query or graph of deferred operations.Line 13: We print the representation of the LazyFrame.
Lines 16–17: We apply the
filter()method on the LazyFrame returned by thedf.lazy()method.
Note: Check out the Answer on the
filter()function for more information.
Note that directly printing the LazyFrame won’t display the content of the LazyFrame. We would need to execute some operations with the LazyFrame to view the actual content. Some LazyFrame operations are given below:
Operations on LazyFrame
Upon the creation of a LazyFrame, we can apply a range of operations to it. It’s important to note that these operations remain inactive until called explicitly. Here are some of the methods that can be used:
fetch(): This executes the lazy operations on a small number of rows.collect(): This executes the lazy operations on all the data.describe_plan(): This prints the unoptimized query plan.describe_optimized_plan(): This prints the optimized query plan.show_graph(): This displays the (un)optimized query plan as a Graphviz graph.
Now, let’s take a look at the fetch() operation:
import polars as pldf = pl.DataFrame({"a": [50, 100, 35, 87],"b": [9.2, 5.4, 2.5, 13.4],"c": [True, True, False, True],"d": [23, 65, 83, 91],})lazy_frame= df.lazy()print(lazy_frame.fetch(2))lazy_frame2 = df.lazy().filter(pl.col("a") == 100)print(lazy_frame2.collect())
The fetch() method triggers the execution of the operations and displays a DataFrame containing the first two rows of the original DataFrame. On the other hand, the collect() method executes the query on the data and returns the result as a DataFrame object.
Conclusion
Using DataFrame.lazy() is a powerful feature in Polars that enables lazy evaluation of operations on DataFrames. This allows for deferred execution of computations, providing opportunities for optimization and parallelization, which can be crucial when dealing with large datasets. When working with complex queries, utilizing lazy operations can lead to more efficient and faster data processing.
Free Resources