Incorrect manipulations with BGP led to 6-hour unavailability of Facebook, Instagram and WhatsApp

facebook Faced with the largest failure in its history, as a result of which all services Companies, including Facebook.com, Instagram.com and WhatsApp, were not available within 6 hours – from at 18:39 (MSK) on Monday to 0:28 (MSK) on Tuesday. The source of failure was Changing in the BGP settings on trunk routers that control traffic between the data centers, which led to a cascading violation Facebook data center connections with the rest of the global network. From the side, the happened looked as if someone turned off the cables from all the datasets Facebook.


It is interesting that the failure has led to a violation of the efficiency of internal information systems and communication systems, because of which the staff, most of which worked remotely, could not connect to the infrastructure and contact their colleagues, which significantly complicated the restoration work, since the key Network engineers also worked remotely. Moreover, There were problems to obtain physical access, since employee identification cards and the room access control system were tied to centralized services And also stopped working.

Failure also reflected on the information exchange system between DNS registrars (Facebook domains are served by their own registrar) and some major domain registrars, including GoDaddy, included Facebook.com domain in lists available for sale, which revealed a new layer of potential problems with the possibility of making attacks On registrators to capture domains.

Moreover, it is still not clear whether the changes made in the BGP settings of a random error or they have become the result of attack and malicious activity. In Published Application Facebook has limited himself to approval that she has no evidence that These users were compromised. It is noteworthy that after coincidence, the work was broken after a few hours after exit on CBS channel Interview with Francis Haagen (Frances Haugen) on Facebook abuse.

Sloudflare analysis showed that at the time of the BGP failure, the routes on the subnet stopped announced In which there were Facebook DNS servers, which led to the impossibility of determining IP addresses for domains such as Facebook.com and Instagram.com. The TTL parameter that determines the caching time for the domains data was set in 5 minutes, which led to the fact that third-party DNS servers have quickly ceased to issue information about addresses. Routing to the remaining IP addresses of the company continued to work, but lost its meaning without DNS and addressing addresses to domains.

/Media reports.