Architecture

This lesson describes the architecture of Spark.

Architecture

Spark is a distributed parallel data-processing framework and bears many similarities to the traditional MapReduce framework. Spark has the same master-slave architecture as MapReduce, where one process, the master, coordinates and distributes work among slave processes. These two processes are formally called:

  • Driver
  • Executor

Driver

The driver is the master process that manages the execution of a Spark job. It is responsible for maintaining the overall state of the Spark application, responding to a user’s program or input and analyzing, distributing and scheduling work ...