OLTP and OLAP Databases
Learn what OLTP and OLAP databases are.
We'll cover the following
In the data world, OLAP and OLTP have become increasingly well known terms. In this lesson, we will break down these terms and try to understand what they mean.
OLAP databases
OLAP stands for online analytical processing.
Imagine we have a large volume of data with many attributes. We need to analyze that data quickly. Also, consumers like product managers or data scientists require insights from the data.
An example of such data is user interaction on an application or website. Product managers might decide on certain features that help business development based on the data.
Typically, OLAP databases are suitable for such a use case. These databases enable fast querying on a very large volume of data.
OLTP databases
The other side of the coin is the OLTP databases.
OLTP stands for online transactional processing.
For example, think of a payment system that stores the users’ balances. Every time users send or receive money, the balances must be consistently updated. In such transactional use cases, an OLTP database is a great fit.
The MySQL database is widely used as an OLTP database because of its robust support for transactions. On the other hand, a popular OLAP database is Apache Druid.
In this chapter, we will introduce Apache Druid and discuss its architecture.
Apache Druid
What is Apache Druid? We can quickly answer this from its documentation.
Apache Druid is a real-time analytics database designed for fast slice-and-dice analytics (“OLAP” queries) on large data sets. Most often, Druid powers use cases where real-time ingestion, fast query performance, and high uptime are important.
Let’s break it down.
-
Real-time analytics: It is important to gather insights from data to drive business in today’s systems. With the availability of internet access worldwide, the amount of such data is just huge. Still, there are use-cases in which the insight is required as fast as possible, i.e, in real-time. Druid supports fast insight gathering from a large volume of data.
-
Slice-and-dice: Imagine a dataset with millions of rows and tens of columns. Each of those columns is one of the two types—a dimension or a metric. A dimension is like an attribute of the data, for example, country, city, date, some form of id, etc. On the flip side, a metric is a value of the data that can be aggregated among the rows. For example, the amount of money spent, count of clicks, etc. By the term slice-and-dice, it means that we can choose any random set of dimensions, and expect the aggregation of the metrics based on those dimensions. This is the same as running
group_by
queries in MySQL, but Druid provides this functionality very efficiently, even with a huge column set.
In short, Apache Druid is an OLAP database that allows for fast insight generation on a large volume of data, even with many dimensions upon which we would need to run group_by
queries randomly.
Key takeaways
-
OLTP and OLAP databases are suitable for specific usecases. It’s not wise to use one type for the other.
-
Apache Druid can efficiently serve OLAP usecases with a very large volume of data.
Get hands-on with 1400+ tech skills courses.