Keeps application running

Spark provides graceful degradation in cases where memory is not enough so that the application does not fail but keeps running with decreased performance.

For instance, Spark can recalculate any partitions on demand when they don’t fit in memory or spill them to disk.

Increasing performance

Wide dependencies cause more data to be exchanged between nodes compared to narrow dependencies, so performance is increased significantly by reducing wide dependencies, or the amount of data that needs to be shuffled. One way to do this is by pre-aggregating data, also known as map-side reduction.

Note: As explained previously, the map-side reduction is a capability provided in the MapReduce framework through ...

Before Getting Started

Introduction to Distributed Systems

Basic Concepts and Theorems

Distributed Transactions

Achieving Isolation

Achieving Atomicity

Concluding Distributed Transactions

Consensus

Time

Order

Networking

Security

Security Protocols

From Theory to Practice

Case Study 1: Distributed File Systems

Case Study 2: Distributed Coordination Service

Case Study 3: Distributed Data Stores

Case Study 4: Distributed Messaging System

Case Study 5: Distributed Cluster Management

Case Study 6: Distributed Ledger

Case Study 7: Distributed Data Processing Systems

Practices & Patterns

Communication Patterns

Coordination Patterns

Data Synchronization

Shared-nothing Architectures

Distributed Locking

Compatibility Patterns

Dealing with Failure

Distributed Tracing

Concluding this Course

Perks of Apache Spark

Keeps application running

Increasing performance