Architecture of Apache Druid
Get to know the architecture of Apache Druid.
We'll cover the following...
As a case study of a distributed system, we will discuss the architecture of Apache Druid, a popular OLAP database.
High-level architecture of Druid
Druid is an OLAP database. For systems that need support for a large volume of data, Druid has to be deployed as a distributed system. In a distributed environment, Druid has multiple nodes that work together to achieve the common goal of efficient analytical data querying.
In a distributed Druid system we have five types of nodes, as depicted in the following table:
Type | Role |
MiddleManagers | Ingest data into the Druid system |
Historical nodes | Store the queryable data |
Coordinators | Manage data availability in a Druid cluster by rebalancing and replication of data in Historical nodes |
Overlords | Manage data ingestion by assigning ingestion tasks to MiddleManagers |
Brokers | Handle queries from external systems or clients |
The above five node types are internal to the Druid architecture. Apart from them, Druid also takes support from some external systems.
-
Deep storage: To ensure fault tolerance, Druid uses deep storage. Any data in the system is stored in deep storage like AWS S3. This ensures data durability, as well as data transfer between different node types.
-
Metadata ...