...

/

Anatomy of a Spark Application

Anatomy of a Spark Application

This lesson explains the constituents of a Spark job.

We'll cover the following...

Anatomy a Spark Application

In this lesson, we’ll formally look at various components of a Spark job. A Spark application consists of one or several jobs. But a Spark job, unlike MapReduce, is much broader in scope. Each job is made of a directed acyclic graph of stages. A stage is roughly equivalent to a map or reduce phase in MapReduce. A stage is split into tasks by the Spark runtime and executed in parallel on partitions of an RDD across the cluster. The relationship among these various concepts is depicted below:

A single Spark application can run one or more Spark jobs serially or in parallel. Cached RDDs output from one job can be made available to a second without requiring disk I/O in between. This makes certain computations extremely fast. A job always executes in the context of a Spark application. The spark-shell is an instance of a Spark application.

Let’s see an example to better understand jobs, stages and tasks. Consider the example below; it creates two DataFrames, each consisting of integers from 0 to 9. Next, we transform one of the DataFrames to consist of multiples of 3 by multiplying each element with 3. Finally, we ...

Access this course and 1400+ top-rated courses and projects.