Polars is a fast DataFrame
library implemented in Rust with bindings for Python. It provides high performance data manipulation and analysis capabilities similar to those found in libraries like Pandas and Apache Spark. Polars aims to handle large datasets efficiently while providing a familiar API for data manipulation tasks. This explanation will cover the key features and provide code examples to illustrate its usage.
Polars
libraryYou can import Polars library in your Python script or notebook:
import polars as pl
The central data structure in Polars is the DataFrame, which represents a two-dimensional table with labeled columns. You can create a DataFrame from various data sources, including Python lists, NumPy arrays, or CSV files.
Here's an example of creating a DataFrame from a Python dictionary:
data = {"column1": [1, 2, 3],"column2": ["foo", "bar", "baz"]}df = pl.DataFrame(data)print(df)
Polars supports a wide range of operations for data manipulation and analysis. Let's explore some of the commonly used operations.
You can select specific columns from a DataFrame using the select
method:
df = pl.DataFrame(["column1", "column2"])print(df)
To filter rows based on certain conditions, you can use the filter
method:
df_filtered = df.filter(pl.col("column1") > 1)print(df_filtered)
Polars enable you to group your DataFrame based on one or more columns and perform aggregations:
df_grouped = df.groupby("column2").agg({"column1": "sum"})print(df_grouped)
Sorting can be done using the sort
method:
df_sorted = df.sort("column1")print(df_sorted)
Polars supports various join operations, such as inner join, outer join, and left join:
df1 = pl.DataFrame({"key": ["Alpha", "Beta", "Gamma"],"value": [10, 20, 30]})df_new = pl.DataFrame({"key": ["Beta", "Gamma", "Delta"],"value": [40, 50, 60]})df_join = df1.join(df_new, on="key", how="inner")print(df_join)
Polars allows you to perform arithmetic operations on columns:
df["column3"] = df["column1"] + df["column2"]print(df)
You can read data from various file formats, including CSV, Parquet, and Arrow, using the read_csv
, read_parquet
, and read_arrow
functions. Similarly, you can write data in these formats using the corresponding write_csv
, write_parquet
, and write_arrow
functions.
Free Resources