System Design Interview: Fast-Track in 48 Hours/

...

Databases in Distributed Systems

Shard records, replicate nodes, uphold consistency. System Design Interview Fast-Track.

We'll cover the following...

Overview
File storage
Database
Relational databases
- Why relational databases?
Why non-relational (NoSQL) databases?
- Types of NoSQL databases
Data replication
- Replication
  - Synchronous vs. asynchronous replication
Partitioning
- Vertical partitioning
- Horizontal partitioning
  - Key-range based partitioning
  - Hash-based partitioning
Conclusion

Overview

This lesson provides an overview of the challenges associated with storing data using simple file storage and highlights the advantages of using databases as a more efficient and scalable solution. It explores the two primary categories of databases, relational and non-relational, discussing their characteristics and intended use cases. The lesson also delves into the importance of data replication and partitioning techniques in achieving high availability, scalability, and performance.

Let's start with simple file storage in the following section.

File storage

The elementary and most convenient method to store data for an application is to use a simple file. However, using this approach has limitations, such as no concurrent management, limited access rights, and scalability and search challenges, as illustrated below.

Primarily, databases are divided into the following two categories:

Relational databases are also called SQL databases because the primary language used to interact with these databases is SQL (Structured Query Language). The SQL operations include insertion, deletion, and retrieval of data.
Non-relational databases are also called NoSQL (Not only SQL) because SQL is not the only primary language for interacting with such databases.

They differ in terms of their intended use case, the type of information they hold, and the storage method they employ.

Relational databases

Relational databases store data in structured schemas, organizing it into tables with unique keys for each row. Data entities are represented as instances (tuples) and attributes, with instances stored in rows and attributes in columns. The tables within a database can be linked using primary and foreign keys, allowing connections between tuples in different tables.

Relational databases provide atomicity, consistency, isolation, and durability (ACID)ACID properties to maintain the integrity of the database.

ACID is like a big hammer by design. This means that it’s generic enough for all problems. If some specific application only needs to deal with a few anomalies, there’s a window of opportunity to use a custom solution for higher performance, though there is added complexity.

Why relational databases?

One of the greatest powers of a relational database is its abstractions of ACID transactions and related programming semantics. This makes it very convenient for the end programmer to use a relational database. Let’s explore some important features of relational databases.

Flexibility: In the context of SQL, data definition language (DDL)DDL is a computer language used to create and modify the structure of database objects in a database. allows us to modify the database, including tables, columns, renaming the tables, and other changes. DDL even allows us to modify schema while other queries are happening and the database server is running.
Reduced redundancy: ...

Introduction

Elementary Design Problems

Advanced Design Problems

Concluding Remarks

Databases in Distributed Systems

Overview

File storage

Database

Relational databases

Why relational databases?