Components and Architecture
Get introduced to Spark main components and cluster managers.
Core components
Behind the scenes, Spark is comprised of a core component on top of which different libraries sit. This is no accident, as the creators of Spark applied this type of architecture to continue adding modules pertaining to different functionalities.
This type of architecture resembles a “Plugin Architecture,” in which features can be developed and incorporated over time.
Let’s take a brief look at each of them.
Spark Core
The nucleus of Spark contains the basic but fundamental functionalities for scheduling applications execution, memory management, storage systems’ interaction, fault recovery, etc.
Spark Core is the home of the Resilient Distributed Dataset (RDD) data structure, an in-memory fault-tolerant and immutable collection of elements representing partitioned data. Besides raw data, it can also contain a more complex type of data such as Scala, Python, or Java programmatic constructs (such as classes or data structures.)
RDDs have been a part of Spark since the 1.0 version and allow ...