Google's Bigtable
Learn how to design a distributed database that can use automatic sharding to scale horizontally for better performance and storage space utilization.
Motivation
Google developed Bigtable in response to the limitations of traditional relational databases when dealing with hyperscale services and diverse data needs. Relational databases were optimized for static schemas and read-heavy workloads, struggling to handle growing data sizes, dynamic structures, and real-time processing required for applications like fraud detection, IoT data management, and financial transactions.
Traditional databases faced issues in both scalability and performance. They relied on
Note: The CAP theorem tells us why it is challenging to have a strongly consistent and highly available system under common faults such as network partitioning.
Common Uses of Bigtable
Features | Bigtable Characteristics |
Single row transactions | Bigtable enables single row transactions, allowing users to execute atomic read-modify-write operations on data that is stored in a single row key. |
Client interface | Bigtable has a client interface for batch writing over row keys, but it does not allow transactions across row keys. |
Integer counters | Cells can be used as integer counters in Bigtable. |
MapReduce jobs | Bigtable can be used in MapReduce jobs as both an input source and an output target due to a set of wrappers. |
Writing scripts | Clients can also write Sawzall scripts (a language created by Google) to guide server-side data processing. |
Requirements
Bigtable’s functional requirements include wide applicability to many use cases, high performance, user-controlled data locality, the ability to do continuous updates, and atomic row operations. The non-functional requirements encompass storage and performance scalability, high read/write rates, availability, and durability.
Note: In this chapter on Bigtable, we focus on the system’s original architecture, design choices, and trade-offs introduced in this
. The system may have evolved since then. paper Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., & Gruber, R. E. (2006). Bigtable: A Distributed Storage System for Structured Data. Proceedings of the 7th Symposium on Operating Systems Design and Implementation, Seattle, WA, USA, 205-218.
Bigtable’s data model
A data model and an associated API are the cornerstones of any database. In this lesson, we will learn how Bigtable uses key-value stores to provide an abstraction of a table, associated table, and data manipulation operations.
Bigtable is a