...

Storage and Infrastructure

Learn about two components in the data engineering life cycle: storage and infrastructure.

We'll cover the following...

Storage
DataOps/security/infrastructure

Ingestion, transformation, and visualization are three separate stages in the data life cycle that move data from one place to another. In this lesson, we will look at the other two stages: storage and infrastructure. They are the key to success in the data life cycle because they run across the entire life cycle and function as a backbone to support business flows.

Storage

In many ways, how data is stored determines how it is used. For example, data in a data warehouse is typically used by batch processes and analytics, while frameworks like Apache Kafka facilitate real-time use cases. They offer not only storage capabilities but also function as an ingestion and query system. Generally speaking, there are four standard storage systems.

Data warehouse

A traditional data warehouse is a central data hub for reporting and analytics. Data in the data warehouse is generally structured and formatted for analytical purposes. Data flows into the data warehouse from transactional systems and other sources regularly.

A typical data warehouse has three tiers. The bottom tier is the database server, where data is loaded and stored. On top of that, the middle tier is the analytics engine, where data is transformed for analytics usage. A common approach is OLAP (online analytical processing). The top tier is the front-end client that users have access to for their reporting or visualization tools.

Three tiers of a traditional data warehouse

Getting Started

Data Team Structure

Data Engineering Life Cycle

Cloud Data Architecture

Data Ingestion

Data Modeling

Data Orchestration

Mastering Airflow: Building an ETL Pipeline

Data Quality

Build an End-to-End Data Pipeline for Formula 1 Analysis

Epilogue

Appendix

Storage and Infrastructure

Storage

Data warehouse

Data lake