What is DataFrame.filter in Polars?

Polars is a fast and efficient data manipulation library written in Rust. It is designed to provide high-performance operations on large datasets and handles them more quickly than pandas library.

Note: Learn more about the difference between the polars and pandas library.

The filter() function

The DataFrame.filter() function is used to apply filtering conditions to the DataFrame and retrieve rows that satisfy our given conditions. It’s particularly useful when dealing with large datasets where we need to narrow down our data to the information that meets our requirements.

Syntax

Here is the syntax of the DataFrame.filter() function:

DataFrame.filter(condition)
  • condition: It can be comparisons, logical expressions, arithmetic operations, or column references that define which rows to be included in the filtered data.

Return value

The function returns a DataFrame, which contains the filtered rows according to the defined condition.

Code

Here is the coding example of the DataFrame.filter() method to filter the DataFrame in Polars:

import polars as pl
df = pl.DataFrame(
{
"Students": ["Joseph", "Danial", "Ema", "John"],
"Calculus": [98, 85, 92, 67],
"Data structures": [91, 89, 92, 55],
"Operating system": [96, 88, 91, 62],
}
)
# Filtering rows on the base of one condition
print(df.filter(pl.col("Students") == "Danial"))
# Filtering rows on the base of AND conditions
print(df.filter((pl.col("Calculus") > 90) & (pl.col("Data structures") > 90) & (pl.col("Operating system") > 90)))
# Filtering rows on the base of OR conditions
print(df.filter((pl.col("Calculus") < 60) | (pl.col("Data structures") < 60) | (pl.col("Operating system") < 60)))

Code explanation

  • Line 1: We import polars library as pl.

  • Lines 2–9: We defined our DataFrame as df for the student’s score report in calculus, data structures, and operating system courses.

  • Line 11: We used the df.filter() function only to print the score report of the student named Danial.

  • Line 14: We applied multiple conditions using the & operator to find the students who scored greater than 90 in all courses.

  • Line 17: We applied multiple conditions using the | operator to find the students who scored less than 60 in any of the courses.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved