How to create an object in Koalas

Koalas is an important package when dealing with Data Science and Big data in python. Koalas implements the pandas DataFrame API on top of the Apache Spark using a simple mechanism. This makes life easier for Data Scientists who constantly interact with Big Data.

pandas itself is widely used in the field of Data Science. The only difference between pandas and Spark is that pandas is a single node DataFrame implementation; whereas, Spark is the standard for Big data processing.

The Koalas package makes sure that a user can immediately start working with Spark as long as they have experience with pandas. Aditionally, it provides a single codebase that works with both Spark and pandas.

Object creation in Koalas

There are several ways to create a Koalas object. Let’s explore them below.

Using a series

Pass a list containing random values to create a Koalas series, as shown below:

// import the relevant libraries
import pandas as pd
import numpy as np
import databricks.koalas as ks
from pyspark.sql import SparkSession

series = ks.Series([1,2,3,4,5,6,7,8])

Using dictionaries

Create a data frame using a dictionary of key-value pairs by:

koalas_df = ks.DataFrame(
    {'unit': [1, 2, 3, 4, 5, 6],
     'hundred': [100, 200, 300, 400, 500, 600],
     'english': ["one", "two", "three", "four", "five", "six"]})

This will create a koalas DataFrame koalas_df.

Converting pandas DataFrame to Koalas DataFrame

We can convert pandas DataFrame to Koalas DataFrame as shown:

df = pd.DataFrame(
    {'unit': [1, 2, 3, 4, 5, 6],
     'hundred': [100, 200, 300, 400, 500, 600],
     'english': ["one", "two", "three", "four", "five", "six"]})
// converting to koalas
koalas_df = ks.from_pandas(df)

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved