PySpark

In this lesson, we will set up Docker Job for PySpark in Code Widget, Single Page Application, Live Application.

Press + to interact
from pyspark.sql import SparkSession
from dotenv import load_dotenv
def create_spark_session():
"""Create a Spark Session"""
_ = load_dotenv()
return (
SparkSession
.builder
.appName("helloworld")
.master("local[5]")
.getOrCreate()
)
spark = create_spark_session()
print('Session Started')

Docker job for Code Widget

Let’s see what each field in the above job means:

Select Docker job type

This is the Docker Job Type selection in which we have to select what kind of docker job we are creating.

Default

Job name

This is just a job name for reference. You can use any name you want to specify for this job.

PySparkCodeWidget

Input file name

Name of the input file you want to run in the live widget. In our case, we are running a server so we can go with a pseudo name.

main.py

Run script

This script runs when we execute the code in the widget. It is mandatory.

python3 main.py

Select Docker job

After creating the docker job for the code widget now select it as given below.

Let’s run PySpark code in code widget

Press + to interact
from pyspark.sql import SparkSession
from dotenv import load_dotenv
def create_spark_session():
"""Create a Spark Session"""
_ = load_dotenv()
return (
SparkSession
.builder
.appName("helloworld")
.master("local[5]")
.getOrCreate()
)
spark = create_spark_session()
print('Session Started')