What is lambda architecture?

Lambda architecture is a big data processing model that organizations use to combine the batch pipeline and real-time data pipeline.

The lamda architecture

Incoming data component

The incoming data can be from various sources like application logs, clickstreams, etc. The data is simultaneously sent to the batch layer and the speed layer.

Batch layer

The batch layer is responsible for managing the master datasetalso called the data lake. The data stored in the master dataset has the following properties:

  1. Data is raw, i.e., unprocessed data.
  2. Data is immutable, i.e., new data gets appended to the dataset rather than getting updated.

The master dataset is the source of truth and lives forever. Even if there’s loss of data in other layers, the results can be recomputed by running through the master dataset. The batch layer also precomputes the data into batch views.

Speed layer

The speed layer takes care of the data that is yet to be indexed by the batch layer, i.e., recently arrived data. It complements the batch layer by indexing the new data; thus, the speed layer reduces the latency of user queries on the latest data.

Serving layer

The serving layer combines the results generated from the batch and speed layer in order to answer the user queries.

Advantages of lambda architecture

  1. Scalability: each component in the architecture can be scaled independently.
  2. High availability: a combination of the batch and speed layer ensures that queries never go unanswered.
  3. Real-time in nature
  4. Fault tolerance: if there are problems in other layers, the results can be recomputed by running through the master dataset.

Disadvantages of lambda architecture

It can be difficult to maintain and debug two different technology stack and code bases for batch.

Free Resources