Architecture

Spark is a distributed parallel data-processing framework and bears many similarities to the traditional MapReduce framework. Spark has the same master-slave architecture as MapReduce, where one process, the master, coordinates and distributes work among slave processes. These two processes are formally called:

Driver
Executor

Driver

The driver is the master process that manages the execution of a Spark job. It is responsible for maintaining the overall state of the Spark application, responding to a user’s program or input and analyzing, distributing and scheduling work ...

Hadoop

YARN

Map Reduce

HDFS

Spark

Input & Output Formats

Misc

Quiz

Reference: Replication

Reference: Partitioning

Reference: Transactions

Reference: Issues in Distributed Systems

Architecture

Architecture

Driver