Lambda architecture is a big data processing model that organizations use to combine the batch pipeline and real-time data pipeline.
The incoming data can be from various sources like application logs, clickstreams, etc. The data is simultaneously sent to the batch layer and the speed layer.
The batch layer is responsible for managing the
The master dataset is the source of truth and lives forever. Even if there’s loss of data in other layers, the results can be recomputed by running through the master dataset. The batch layer also precomputes the data into batch views.
The speed layer takes care of the data that is yet to be indexed by the batch layer, i.e., recently arrived data. It complements the batch layer by indexing the new data; thus, the speed layer reduces the latency of user queries on the latest data.
The serving layer combines the results generated from the batch and speed layer in order to answer the user queries.
It can be difficult to maintain and debug two different technology stack and code bases for batch.