Ingestion Job: Part I
Let’s build a Spark batch job for the batch template application that ingests information, does some processing (transformation), and persists into an in-memory database.
Spark in action
Now that the application template has been fully set up in the previous lessons, we can develop a functional Job
example.
The Job
implements the Java Spark API in the Spark component classes, as taught in the previous sections.
Business domain
The business domain of the application is within the realm of market operations, such as sales and purchases, for a fictitious company called Market Analytics Solution Inc. This company provides retailers with a solution to process vast volumes of information around sales and acquisitions of many sorts of goods.
The Job
for this lesson, the IngesterJob
, offers functionality to ingest raw data as presented in files of different formats. It then transforms this data into meaningful information and persists it to a database for further processing, potentially by other Jobs
downstream in a batch workflow.
Requirements
The business analyst in this company has been kind enough to provide us with the following refined requirements for the Job
at hand:
-
The
IngesterJob
needs to process records of a JSON format representing sales and persist them into a database table. The records possess the following information:Seller_Id,Date,Product,Quantity
Note: An ingestion format example is attached as well (see next subsection).
-
The DBA team has modeled a database with a
SALES
table, which has the following structure:ID|SELLER_ID|DATE|PRODUCT|QUANTITY
Ingestion file example
The following is an extract for an ingestion file in JSON format:
Get hands-on with 1400+ tech skills courses.