Cloud Data Architectures: Data Lake and Data Mesh
Understand two types of cloud data architectures: data lake and data mesh.
We'll cover the following...
This lesson reviews two popular cloud data architecture frameworks: data lake and data mesh.
Data lake
A data lake is a popular data architecture comparable, to a data warehouse. It’s a storage repository that holds a large amount of data, but unlike a data warehouse where data is structured, data in a data lake is in its raw format. Apart from the format, the following table summarizes other differences:
Data Lake vs. Data Warehouse
Topic | Data Lake | Data Warehouse |
Data Format | Store unstructured, semi-structured and structured data in its raw format. | Store only structured data after the transformation. |
Schema | Schema-on-read: Schema is defined after data is stored. | Schema-on-write: Schema is predefined prior to when data is stored. |
Usecase |
|
|
Data Quality | Data is in its raw format without cleaning, so data quality is not ensured. | Data is highly curated, resulting in higher data quality. |
Cost | Both storage and operational costs are lower. | Storing data in the data warehouse is usually more expensive and time-consuming. |
The following graph illustrates the key components of a data lake:
Ingestion layer: The ingestion layer collects raw data and loads them into the data lake. The raw data is not modified in this layer.
Processing layer: Data lake uses object storage to store data. Object storage stores data with metadata ...