Architecture
Get insights on the architecture of Spark.
We'll cover the following...
Spark design
Spark is a distributed parallel data-processing framework and bears many similarities to the traditional MapReduce framework. Spark has the same leader-worker architecture as MapReduce, the leader process coordinates and distributes work to be performed among work processes. These two kinds of processes are formally called the driver and the executor.
Driver
The driver is the leader process that manages the execution of a Spark job. It is responsible for maintaining the overall state of the Spark application, responding to a user's program or input and analyzing, distributing and scheduling work among executor processes. The driver process is in essence the heart of the Spark application and maintains all application related information during an application's lifetime.
Spark Driver converts Spark operations into DAG computations and schedules and distributes them as tasks across the Spark ...