...

/

Facebook, WhatsApp, Instagram, Oculus Outage - 2021-10-04 COPY

Facebook, WhatsApp, Instagram, Oculus Outage - 2021-10-04 COPY

Learning from major Facebook outage.

On October 4, 2021 at 15:39 UTC, the social network Facebook and its subsidiaries (Messenger, Instagram, WhatsApp, Mapillary, Oculus) experienced a global outage for about six hours. The popular media reported the impact of this failure prominently (for example NYT reported: “Gone in Minutes, Out for Hours: Outage Shakes Facebook”). According to one estimate, this outage cost Facebook about $100 million in revenue losses, and many billions due to declining stock of the company.

We now see the sequence of events that caused this global problem.

Sequence of Events

  • A routine maintenance system needed to find out the spare capacity on Facebook’s backbone network.
  • Due to a configuration error, the maintenance system disconnected all the datacenters from each other on the backbone network. There was another automated configuration review tool, that missed the above problem.
  • The authoritative Domain Name Systems (DNS) of Facebook had a health-check rule that if it can not reach to Facebook’s internal data centers, then it stops replying to client DNS queries by withdrawing the routes.
  • When networks routes (where Facebook’s authoritative DNS were hosted) were withdrawn, soon all cached mapping of human readable names to IPs timed out at all public DNS resolvers. (When a client resolved www.facebook.com, the DNS resolver first goes to one of the root DNS
...

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy