Facebook, WhatsApp, Instagram, Oculus Outage - 2021-10-04 COPY
Learning from major Facebook outage.
On October 4, 2021 at 15:39 UTC, the social network Facebook and its subsidiaries (Messenger, Instagram, WhatsApp, Mapillary, Oculus) experienced a global outage for about six hours. The popular media reported the impact of this failure prominently (for example NYT reported: “Gone in Minutes, Out for Hours: Outage Shakes Facebook”). According to one estimate, this outage cost Facebook about $100 million in revenue losses, and many billions due to declining stock of the company.
We now see the sequence of events that caused this global problem.
Sequence of Events
- A routine maintenance system needed to find out the spare capacity on Facebook’s backbone network.
- Due to a configuration error, the maintenance system disconnected all the datacenters from each other on the backbone network. There was another automated configuration review tool, that missed the above problem.
- The authoritative Domain Name Systems (DNS) of Facebook had a health-check rule that if it can not reach to Facebook’s internal data centers, then it stops replying to client DNS queries by withdrawing the routes.
- When networks routes (where Facebook’s authoritative DNS were hosted) were withdrawn, soon all cached mapping of human readable names to IPs timed out at all public DNS resolvers. (When a client resolved www.facebook.com, the DNS resolver first goes to one of the root DNS
Create a free account to access the full course.
By signing up, you agree to Educative's Terms of Service and Privacy Policy