Spark design

Spark is a distributed parallel data-processing framework and bears many similarities to the traditional MapReduce framework. Spark has the same leader-worker architecture as MapReduce, the leader process coordinates and distributes work to be performed among work processes. These two kinds of processes are formally called the driver and the executor.

Driver

The driver is the leader process that manages the execution of a Spark job. It is responsible for maintaining the overall state of the Spark application, responding to a user's program or input and analyzing, distributing and scheduling work among executor processes. The driver process is in essence the heart of the Spark application and maintains all application related information during an application's lifetime.

Spark Driver converts Spark operations into DAG computations and schedules and distributes them as tasks across the Spark executors. The Spark Driver accesses the distributed components in the cluster, including the executors and the cluster manager, via the SparkSession. You can consider the SparkSession to be a ...

Spark Overview

DataFrames

Datasets

Spark SQL

Summary

Architecture

Spark design

Driver