Storage and Infrastructure
Learn about two components in the data engineering life cycle: storage and infrastructure.
Ingestion, transformation, and visualization are three separate stages in the data life cycle that move data from one place to another. In this lesson, we will look at the other two stages: storage and infrastructure. They are the key to success in the data life cycle because they run across the entire life cycle and function as a backbone to support business flows.
Storage
In many ways, how data is stored determines how it is used. For example, data in a data warehouse is typically used by batch processes and analytics, while frameworks like Apache Kafka facilitate real-time use cases. They offer not only storage capabilities but also function as an ingestion and query system. Generally speaking, there are four standard storage systems.
Data warehouse
A traditional data warehouse is a central data hub for reporting and analytics. Data in the data warehouse is generally structured and formatted for analytical purposes. Data flows into the data warehouse from transactional systems and other sources regularly.
A typical data warehouse has three tiers. The bottom tier is the database server, where data is loaded and stored. On top of that, the middle tier is the analytics engine, where data is transformed for analytics usage. A common approach is OLAP (online analytical processing). The top tier is the front-end client that users have access to for their reporting or visualization tools.
Note: OLAP uses multidimensional database structures, known as cubes, to answer complex business questions by preaggregating underlying datasets.
Data lake
Another comparable storage type is data lake. Rather than only storing structured data, a data lake is capable of storing both structured and unstructured ...