Cost Benefit Analyses

Learn when to use horizontal sharding instead of vertical sharding and vice versa.

Which database sharding approach is best?

Both horizontal sharding and vertical sharding involve adding resources to our computing infrastructure; our business stakeholders must decide which is suitable for our organization. For our organization and business to grow, prevent downtime, and reduce latency, we must scale our resources accordingly. We can scale these resources through a combination of adjustments to CPU, physical memory requirements, hard disk adjustments, and network bandwidth.

The following sections explain the pros and cons of no-sharding vs. sharding.

Pros and cons of centralized database

Advantages

Data maintenance such as updating and taking backups of a centralized database is easy.
Centralized databases provide strong consistency and ACID transactions than a distributed database.
Centralized databases provide a much simpler programming model for the end programmers as compared to distributed databases.
It is more efficient for businesses having a small amount of data to store that can reside on a single node.

Disadvantages

Centralized database can slow down (hence high latency for end-users) when the number of queries per second accessing the centralized database is approaching single-node limits.
Centralized database has a single point of failure due to which its probability of not being accessible is much higher.

Advantages and disadvantages of distributed database

Advantages

Accessing data in a distributed database is fast as data is retrieved from the nearest database shard or the one frequently used.
Data with different levels of distribution transparencyDistributionTransparency can be stored in separate places.
Intensive transactions consisting of queries can be divided into multiple optimized subqueries which can be processed in a parallel fashion.

Disadvantages

Sometimes data is required from multiple sites, which takes more time than expected.
As relations are partitioned (vertically/horizontally) among different nodes; therefore, operations such as joins need to reconstruct complete relations by carefully fetching data, and such operations can become much more expensive and complex.
Maintaining consistency of data across sites is difficult and requires extra measures.
Updation and backups in distributed database take time to synchronize data.

Assume the distribution of both tables on different sites as

The Store table has 10,000 tuples stored at site A
The Product table has 100,000 tuples stored at site B
The Sales has 1 Million tuples stored at site A.

Now assume that the following query needs to be processed.

Select Store_key from (Store JOIN Sales JOIN Product)
where Region= 'East' AND Brand='Wolf';

The above query performs the join operations on Store, Sales, and Product tables and retrieves the Store_key values from the table generated in the result of join operations.

Assume every stored tuple is 200 bits (25 bytes) long. Furthermore, estimated cardinalities of certain intermediate results are:

Number of Wolf brand is 10
Number of East region stores is 100,000

Communication assumptions are:

Data rate

...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy

Introduction

Abstractions

Non-functional System Characteristics

Back-of-the-Envelope Calculations

Building Blocks

Domain Name System (DNS)

Sequencer

Rate Limiter

Distributed Cache

Blob Store

Content Delivery Network (CDN)

Load Balancers

Key-Value Store

Distributed Messaging Queue

Pub-sub

Distributed Task Scheduler

Distributed Search

Distributed Logging

Distributed Monitoring

Monitoring Server Side Errors

Monitoring Client Side Errors

Databases

Sharded Counters

Concluding Building Blocks

Design YouTube

Design Quora

Design Google Maps

Designing a Proximity Server like Yelp

Design Uber

Design Twitter

Newsfeed System

Design Instagram

Design URL Shortening Service / TinyURL

Design a Web Crawler

Design WhatsApp

Design Typeahead Suggestion

Design Collaborative Document Editing Service / Google Docs

Spectacular Failures

Concluding Remarks

Appendix: System Design Interviews

All content below this will likely go away

Design Exercises

Archived temporary lessons

Design Resource Allocator for a Large Datacenter

Design Zoom

Continuous Monitoring using Data Processing

Design Live Commenting at Facebook

Security

For Noor: Placeholder for Illustration Making

Appendix

Backup of our Lessons

Caching Billions of Tiny Objects on Flash

Design Quora

Copy-Design YouTube

Identity & Access Management

Copy of CDN (02-03-2022)

Cost Benefit Analyses

Which database sharding approach is best?

Pros and cons of centralized database

Advantages

Disadvantages

Advantages and disadvantages of distributed database

Advantages

Disadvantages

Query optimization and processing speed in a distributed database