Distributed Systems: Building Software for the Real World/

...

Preventing Disaster

Learn about load shedding, service self response time check, queuing time, and processing time.

We'll cover the following...

Load shedding
Service self checks
Queuing time vs. processing time
TIME_WAIT and the bogons
Summary of health checks

Load shedding

We can see that the best thing to do under high load is turn away work we can’t complete in time. This is called “load shedding,” and it’s the most important way to control incoming demand. Load shedding happens very quickly when a socket’s listen queue is full, and a quick rejection is better than a slow timeout.

More generally, we want to shed load as early as possible so we can avoid tying up resources at several tiers before rejecting the request. Load balancers near the network edge are the ideal place. A good health check on the first tier of services can inform the load balancer when response times are too high, meaning higher than the service’s SLA. The load balancer also needs to be configured to send back an HTTP 503 response code when all instances fail their health checks. That’s a quick response to the caller that says “too busy, try later.”

Living in Production

The Exception That Grounded an Airline

Stabilize Your System

Stability Antipatterns

Failures And Blockages

Force Multiplier

Stability Patterns

Launching An Online Store

Foundations

Processes on Machines

Interconnect

Control Plane

Security

Design for Deployment

Handling Versions

Case Study: Trampled by Your Own Customers

Adaptation

System Architecture

Information Architecture

Chaos Engineering

Bibliography

Preventing Disaster

Load shedding

Service self checks