- Spark Clusters

Distributing workloads in Spark clusters.

Spark environment

A Spark environment is a cluster of machines with a single driver node and zero or more worker nodes. The driver machine is the master node in the cluster and is responsible for coordinating the workloads to perform.

Driver and worker nodes

In general, workloads will be distributed across the worker nodes when performing operations on Spark dataframes. However, when working with Python objects, such as lists or dictionaries, objects will be instantiated on the driver node. ...

Access this course and 1400+ top-rated courses and projects.