Driver Program Design and Project Structure

Let’s dive into our application's design and code implementation and focus on Spark batch developments.

Design

Before the coding process starts, it is pretty helpful to understand the building blocks of an application. This can act as a guideline and a model to come back to later while we implement the code.

This lesson focuses on the backbone of the batch application, which include the parts of the application that run in the driver process, along with the following functionalities:

  • All types of application configurations.
  • Validations.
  • Parsing of input parameters (jobs’ arguments).
  • Error handling.
  • Application components bootstrapping .
  • Creation and coordination of the parts of the application running distributedly in the worker nodes (the classes implementing the Spark API that contains the main logic of the batch jobs).

We start by familiarizing ourselves with the application’s technology stack, but instead of listing the several technologies used (languages, APIs, libraries, and so on), we’ll use a technology stack diagram.

The diagram includes the technologies in which the application is constructed, but is not intended to show them all. Instead, it only shows the essential technologies. To consult all the technologies used, the developer can inspect the Maven’s pom.xml file, and within it, the dependencies XML tags.

Note: A technology stack diagram, in general terms, refers to a set of components that compose a logical platform representing a software solution or product. The diagram can also be expanded to represent cloud services or other products.

Technology stack

The following technology stack diagram depicts the technologies chosen for the different components that live in the various layers of the batch application:

Get hands-on with 1300+ tech skills courses.