Ingestion Job: Part I

Let’s build a Spark batch job for the batch template application that ingests information, does some processing (transformation), and persists into an in-memory database.

Spark in action

Now that the application template has been fully set up in the previous lessons, we can develop a functional Job example.

The Job implements the Java Spark API in the Spark component classes, as taught in the previous sections.

Business domain

The business domain of the application is within the realm of market operations, such as sales and purchases, for a fictitious company called Market Analytics Solution Inc. This company provides retailers with a solution to process vast volumes of information around sales and acquisitions of many sorts of goods.

The Job for this lesson, the IngesterJob, offers functionality to ingest raw data as presented in files of different formats. It then transforms this data into meaningful information and persists it to a database for further processing, potentially by other Jobs downstream in a batch workflow.

Requirements

The business analyst in this company has been kind enough to provide us with the following refined requirements for the Job at hand:


  1. The IngesterJob needs to process records of a JSON format representing sales and persist them into a database table. The records possess the following information:

    Seller_Id,Date,Product,Quantity

Note: An ingestion format example is attached as well (see next subsection).

  1. The DBA team has modeled a database with a SALES table, which has the following structure:

    ID|SELLER_ID|DATE|PRODUCT|QUANTITY

Ingestion file example

The following is an extract for an ingestion file in JSON format:

Get hands-on with 1300+ tech skills courses.