Batch Pipelines and Other Types of Jobs
Get introduced to the other possible types of Jobs the batch application is composed of.
The batch pipeline application pattern
The batch application developed in the previous lessons is built on top of design patterns, of which the template method is considered a core piece.
This pattern, though, and the other implementation patterns present in the code, are ultimately good recipes for providing concrete shape to a broader and more abstract batch design pattern, which we call the batch pipeline pattern.
The idea of having an underlying pattern for the application as a whole, and all the jobs within it, is to make our life easier and provide an ordered flow of execution that at the same time divides the processing into clearly defined and separate steps, each of them with specific functions and responsibilities.
Let’s illustrate this concept with a diagram:
In the diagram, the Ingestion or Reading stage components are only responsible for consuming or loading information in bulk for further processing.
The Transformations or Processing stage components are then in charge of modifying, transforming, or applying whatever type of processing might be needed to convert raw ingested data into meaningful information.
Note: It’s worth mentioning that “meaningful” here is a term that makes sense within a business domain of an application.
Lastly, the Extracting or Writing stage components deal with persisting or storing the results of both previous stages into a durable media, or even streaming it to another system, depending on the complexity of the solution devised.
Each batch job is then a specific implementation of the batch pipeline pattern.
Following the pattern guarantees, to a certain degree, that none of the steps does too much or goes beyond its aim, which is crucial in dealing with huge volumes of information and also reduces the likelihood of having errors difficult to trace. In terms of performance, isolating the different operations can also help fine-tune them or shed some light on potential bottlenecks, or system slowness in general.