What is Polars library in Python?

Polars is a fast DataFrame library implemented in Rust with bindings for Python. It provides high performance data manipulation and analysis capabilities similar to those found in libraries like Pandas and Apache Spark. Polars aims to handle large datasets efficiently while providing a familiar API for data manipulation tasks. This explanation will cover the key features and provide code examples to illustrate its usage.

Importing the Polars library

You can import Polars library in your Python script or notebook:

import polars as pl

Creating a DataFrame

The central data structure in Polars is the DataFrame, which represents a two-dimensional table with labeled columns. You can create a DataFrame from various data sources, including Python lists, NumPy arrays, or CSV files.

Here's an example of creating a DataFrame from a Python dictionary:

data = {
"column1": [1, 2, 3],
"column2": ["foo", "bar", "baz"]
}
df = pl.DataFrame(data)
print(df)

Basic operations

Polars supports a wide range of operations for data manipulation and analysis. Let's explore some of the commonly used operations.

Selecting columns

You can select specific columns from a DataFrame using the select method:

df = pl.DataFrame(["column1", "column2"])
print(df)

Filtering rows

To filter rows based on certain conditions, you can use the filter method:

df_filtered = df.filter(pl.col("column1") > 1)
print(df_filtered)

Grouping and aggregating

Polars enable you to group your DataFrame based on one or more columns and perform aggregations:

df_grouped = df.groupby("column2").agg({"column1": "sum"})
print(df_grouped)

Sorting

Sorting can be done using the sort method:

df_sorted = df.sort("column1")
print(df_sorted)

Joining DataFrames

Polars supports various join operations, such as inner join, outer join, and left join:

df1 = pl.DataFrame({
"key": ["Alpha", "Beta", "Gamma"],
"value": [10, 20, 30]
})
df_new = pl.DataFrame({
"key": ["Beta", "Gamma", "Delta"],
"value": [40, 50, 60]
})
df_join = df1.join(df_new, on="key", how="inner")
print(df_join)

Performing arithmetic operations

Polars allows you to perform arithmetic operations on columns:

df["column3"] = df["column1"] + df["column2"]
print(df)

Reading and writing data

You can read data from various file formats, including CSV, Parquet, and Arrow, using the read_csv, read_parquet, and read_arrow functions. Similarly, you can write data in these formats using the corresponding write_csv, write_parquet, and write_arrow functions.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved