...

/

Databases in Distributed Systems

Databases in Distributed Systems

Learn about databases, their types, and how data replication and partitioning is handled in them.

Overview

This lesson provides an overview of the challenges associated with storing data using simple file storage and highlights the advantages of using databases as a more efficient and scalable solution. It explores the two primary categories of databases, relational and non-relational, discussing their characteristics and intended use cases. The lesson also delves into the importance of data replication and partitioning techniques in achieving high availability, scalability, and performance.

Let's start with simple file storage in the following section.

File storage

The elementary and most convenient method to store data for an application is to use a simple file. However, using this approach has limitations, such as no concurrent management, limited access rights, and scalability and search challenges, as illustrated below.

The limitations of simple file storage can be addressed using databases. Let's explore about the database in the following sections:

Database

A database is an organized collection of data that can be easily managed and accessed. Databases are created to make it easier to store, retrieve, modify, and delete data in connection with different data-processing procedures.

Primarily, databases are divided into the following two categories:

  • Relational databases are also called SQL databases because the primary language used to interact with these databases is SQL (Structured Query Language). The SQL operations include insertion, deletion, and retrieval of data.

  • Non-relational databases are also called NoSQL (Not only SQL) because SQL is not the only primary language for interacting with such databases.

They differ in terms of their intended use case, the type of information they hold, and the storage method they employ.

Relational databases, like phone books that record contact numbers and addresses, are organized and have predetermined schemas. Non-relational databases, like file directories that store anything from a person’s constant information to shopping preferences, are unstructured, scattered, and feature a dynamic schema.

Relational databases

Relational databases store data in structured schemas, organizing it into tables with unique keys for each row. Data entities are represented as instances (tuples) and attributes, with instances stored in rows and attributes in columns. The tables within a database can be linked using primary and foreign keys, allowing connections between tuples in different tables.

Relational databases provide atomicity, consistency, isolation, and durability (ACID)ACID properties to maintain the integrity of the database.

ACID is like a big hammer by design. This means that it’s generic enough for all problems. If some specific application only needs to deal with a few anomalies, there’s a window of opportunity to use a custom solution for higher performance, though there is added complexity.

Why relational databases?

...