An IP address change on a WAN router responsible for the latest global Microsoft 365 outage

Microsoft has identified more specifically in a rapport preliminary the reasons for the outage that on January 25 affected several of its Azure services for more than 5 hours globally. As the Redmond company had put forward the hypothesis, it is indeed a WAN update on a router which is the source of the failure.

“As part of a planned change to update the IP address on a WAN router, a command given to the router caused it to send messages to all other routers on the WAN, which caused them all to recalculate their adjacency and transfer the tables. During this recalculation process, the routers were unable to properly forward the packets passing through them.”explains the publisher.

The issue impacted service in waves, resulting in customers experiencing network connectivity issues when attempting to connect to resources hosted in Azure regions, as well as other Microsoft services, including Microsoft 365 and Power Platform. The incident even affected Azure Government cloud services that depended on the Azure public cloud.

Microsoft began investigating the outage at 7:05 UTC and found by 8:10 UTC that the network was beginning to recover on its own. But then the automated systems responsible for maintaining the health of the WAN network paused, due to the impact of the outage on the network. Hence new network problems (loss of packets) which took place from 9:35 UTC until these systems were restarted manually. It was finally only at 12:43 UTC that the WAN returned to normal operation.

Following this incident, Microsoft has now declared that it will now block the automatic execution of high-impact commands. All orders will also need to follow the guidelines for safe configuration changes, which this WAN router obviously did not. A final investigation report should be posted online in the coming days.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.