Google's Bigtable

Learn how to design a distributed database that can use automatic sharding to scale horizontally for better performance and storage space utilization.

Motivation

Google developed Bigtable in response to the limitations of traditional relational databases when dealing with hyperscale services and diverse data needs. Relational databases were optimized for static schemas and read-heavy workloads, struggling to handle growing data sizes, dynamic structures, and real-time processing required for applications like fraud detection, IoT data management, and financial transactions.

Traditional databases faced issues in both scalability and performance. They relied on vertical scalingVertical scaling, also known as scaling up, refers to scaling by providing additional capabilities (for example, additional CPUs or RAM) to an existing device., leading to the need for expensive hardware upgrades. Additionally, relational databases suffered from slower response times due to increased table quantities, data volume, and complex joins, making them less suitable for modern data-intensive applications. In response, Google created Bigtable, a wide-column storeWide-column databases, also known as column family databases, are a type of NoSQL database that provide faster writes and reads and also perform well for sparse tables. These stores feature characteristics of both regular relational databases and of key-value stores. Its design is based on a persistent, sparse matrix, multi-dimensional mapping (row-value, column-value, and timestamp) in a tabular format enabling huge scalability (petabytes of data). designed to efficiently handle massive amounts of data, varied column structures, and high-performance read-and-write operations. By addressing these deficiencies, Bigtable aimed to provide a highly scalable and performant solution tailored to the demands of Google’s diverse projects, such as web indexing, Google Earth, and Google Finance.

Note: The CAP theorem tells us why it is challenging to have a strongly consistent and highly available system under common faults such as network partitioning.

Common Uses of Bigtable

Features

Bigtable Characteristics

Single row transactions

Bigtable enables single row transactions, allowing users to execute atomic read-modify-write operations on data that is stored in a single row key.

Client interface

Bigtable has a client interface for batch writing over row keys, but it does not allow transactions across row keys.

Integer counters

Cells can be used as integer counters in Bigtable.

MapReduce jobs

Bigtable can be used in MapReduce jobs as both an input source and an output target due to a set of wrappers.

Writing scripts

Clients can also write Sawzall scripts (a language created by Google) to guide server-side data processing.

Requirements

Bigtable’s functional requirements include wide applicability to many use cases, high performance, user-controlled data locality, the ability to do continuous updates, and atomic row operations. The non-functional requirements encompass storage and performance scalability, high read/write rates, availability, and durability.

Note: In this chapter on Bigtable, we focus on the system’s original architecture, design choices, and trade-offs introduced in this paperChang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., & Gruber, R. E. (2006). Bigtable: A Distributed Storage System for Structured Data. Proceedings of the 7th Symposium on Operating Systems Design and Implementation, Seattle, WA, USA, 205-218.. The system may have evolved since then.

Bigtable’s data model

A data model and an associated API are the cornerstones of any database. In this lesson, we will learn how Bigtable uses key-value stores to provide an abstraction of a table, associated table, and data manipulation operations.

Bigtable is a sparseIt does not need to store an entry in every cell ...