Introduction to Bigtable

Learn about Bigtable and the motivation behind its creation.

From the dark ages to a renaissance for the databases

With the advent of hyperscale services such as worldwide search, online shopping, messaging, and so on, the deficiencies of the traditional databases (based on the relational data model) became apparent. These deficiencies can be grouped into two classes—scalability challenges and performance challenges.

Traditional databases are optimized for read-heavy workload, where data schema is known at the time of writing and does not change too frequently. Additionally, most implementations of relation DB engines were either based on a single beefy server or a group of servers physically nearby. Such a setup is needed to rely on vertical scalingVertical scaling, also known as “scaling up,” refers to scaling by providing additional capabilities (for example, additional CPUs or RAM) to an existing device. for improvements, though there are limits to such scaling. The workloads for applications were approaching the limits in terms of raw data size and available IOPS (input/output operations per second) with good throughput and latency from the database systems.

These deficiencies pushed organizations on a multi-decade quest to research and develop custom database systems. Primarily, the guiding rule was that for some specific applications, we might not need the full feature set of a relational model, and inventing a new, simpler model would enable us to get highly scalable and highly performant database systems. In this chapter, we will focus on one such system designed by Google, known as Bigtable.

The need for Bigtable

While traditional relational databases apply to many data problems, they are not suitable for important use cases concerning data-size scalability and read/write performance. Some of those use cases are:

  • Fraud detection: It relies on rules for data detection algorithms, transaction information, customer information, time of day, location, etc., all of which are instantly applied on a big scale. For a common case, most of the data might not be read too frequently, but when needed, we might have to read most of it in near real-time. Such workloads are not suitable for traditional databases.

  • Time-series data: This concerns data such as cumulative CPU and memory usage across several thousand servers of a data center.

  • ...

Access this course and 1400+ top-rated courses and projects.