Detailed Design of Spark
Let's learn how Spark utilizes its driver and workers.
Spark can read input data from any
Cluster manager
There can be multiple applications running in Spark. If a user starts its own application while some other applications are already running on a cluster of machines, they would need resources to allocate to their tasks. This is where the cluster manager comes in. The driver uses cluster manager (an external service) to allocate a cluster of machines to the application. The cluster manager manages the cluster by keeping an eye on the failed workers and replacing them with another, greatly reducing the programming complexity we had to add to Spark.
The cluster managers that Spark can use include Mesos, YARN, and Spark’s standalone. The option that is available on all cluster managers is static partitioning of resources, meaning that each application gets maximum resources and holds on to them for the duration of its execution. However, the following resource allocations can be controlled:
The number of executors an application gets
The number of cores per executor
The executor memory
Level up your interview prep. Join Educative to access 80+ hands-on prep courses.