...

/

Cloud Data Architectures: Data Lake and Data Mesh

Cloud Data Architectures: Data Lake and Data Mesh

Understand two types of cloud data architectures: data lake and data mesh.

This lesson reviews two popular cloud data architecture frameworks: data lake and data mesh.

Data lake

A data lake is a popular data architecture comparable, to a data warehouse. It’s a storage repository that holds a large amount of data, but unlike a data warehouse where data is structured, data in a data lake is in its raw format. Apart from the format, the following table summarizes other differences:

Data Lake vs. Data Warehouse

Topic

Data Lake

Data Warehouse


Data Format

Store unstructured, semi-structured and structured data in its raw format.

Store only structured data after the transformation.


Schema

Schema-on-read: Schema is defined after data is stored.

Schema-on-write: Schema is predefined prior to when data is stored.



Usecase

  • Data exploration: Unstructured data opens more possibilities for analysis and ML algorithms.
  • A landing place before loading data into a data warehouse.
  • Reporting: Reporting tools and dashboards prefer highly coherent data.


Data Quality

Data is in its raw format without cleaning, so data quality is not ensured.

Data is highly curated, resulting in higher data quality.


Cost

Both storage and operational costs are lower.

Storing data in the data warehouse is usually more expensive and time-consuming.

The following graph illustrates the key components of a data lake:

Press + to interact
Data lake architecture
Data lake architecture
  • Ingestion layer: The ingestion layer collects raw data and loads them into the data lake. The raw data is not modified in this layer.

  • Processing layer: Data lake uses object storage to store data. Object storage stores data with metadata ...