Cost Benefit Analysis of Centralised and Distributed Database

Learn when to use horizontal sharding instead of vertical sharding and vice versa.

Which database sharding approach is best for your organization?

For your organization and business to grow, prevent downtime, and reduce latency, you must scale your resources accordingly. You can scale these resources through a combination of adjustments to network bandwidth, CPU and physical memory requirements, and hard disk adjustments. Both horizontal sharding and vertical sharding involve adding resources to your computing infrastructure; your business stakeholders must decide which is right for your organization.

The following sections explains the pros and cons of no-sharding vs sharding.

Pros and cons of centralized database (No-Sharding)

Advantages

Data maintenance such as updating and taking backups of a centralized database.
Centralized database has more data consistency than a distributed database.
Is more efficient for businesses having a small amount of data to store.

Disadvantages

Centralized database takes time when data is retrieved because the number of users accessing the centralized database is higher than the distributed database.
Database is inaccessible in the event of failures.

Advantages and disadvantages of distributed database

Advantages

Accessing data in a distributed database is easier and fast as data is retrieved from the nearest database shard or the one frequently used.
Data not required by local applications is not available to unauthorized users.
Provides materialized view for frequent data access.
Intensive transactions consisting of queries can be divided into subqueries which can be processed in a parallel fashion.

Disadvantages

Sometimes data is required from multiple sites, which takes more time than expected.
As relations are divided (vertically/horizontally) among different nodes; therefore, expensive operations such as joins and unions are needed to reconstruct complete relations.
Maintaining consistency of data across sites is difficult and requires extra measures.
Updation and backups in distributed database take time to synchronize data

Assume the distribution of both tables on different sites as

The Store table has 10,000 tuples stored at site A
The Product table has 100,000 tuples stored at site B
The Sales has 1 Million tuples store at site A.

Now assume that the following query needs to be processed.

Select Store_key from (Store JOIN Sales JOIN Product)
where Region= 'East' AND Brand='Wolf'

Assume every stored tuple is 25 bytes long. Furthermore, estimated cardinalities of certain intermediate results are:

Number of Wolf brand =10
Number of East region is 100,000

Communication assumptions are:

Data rate $=$ 50,000 bit per second
Access delay = 0.1 second

Now let’s consider the following three strategies for processing the above query.

Let

$a=$ total access delay

$b=$ ...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)

Cost Benefit Analysis of Centralised and Distributed Database

Which database sharding approach is best for your organization?

Pros and cons of centralized database (No-Sharding)

Advantages

Disadvantages

Advantages and disadvantages of distributed database

Advantages

Disadvantages

Query optimization and processing speed in a distributed database