Microsoft 365 Outage Causes Widespread Service Disruption
On November 25th, 2024, a significant Microsoft 365 outage impacted numerous services, causing considerable disruption for users worldwide. The outage, which began approximately six hours before the initial report, affected core applications including Exchange Online, Microsoft Teams, and SharePoint Online. Thousands of reports flooded Downdetector, highlighting the scale of the problem.
Impact and Affected Services
The initial reports indicated problems accessing Exchange Online and Microsoft Teams calendars. However, the disruption quickly expanded to encompass other services, including OneDrive, Purview, Copilot, and both Outlook Web and Desktop applications.
According to Microsoft’s own incident report (MO941162), the outage prevented users from accessing Exchange Online via various methods: Outlook on the web, the Outlook desktop client, REST (Representational State Transfer), and EAS (Exchange ActiveSync). Furthermore, the company acknowledged that some users encountered difficulties with Microsoft Fabric, Microsoft Bookings, and Microsoft Defender for Office 365.
Microsoft’s Response and Root Cause of Microsoft 365 Outage
Microsoft swiftly acknowledged the issue, stating, “We’re investigating an issue impacting users attempting to access Exchange Online or functionality within Microsoft Teams calendar.”
Initially, the company attributed the outage to “a recent change,” but later provided a more detailed explanation. The company deployed a fix to the affected infrastructure and initiated restarts of impacted systems. Progress updates were shared, indicating the fix’s deployment progress across affected environments (reaching approximately 60% and later 90% completion). Manual restarts were also performed on a subset of machines exhibiting unhealthy states.
Later updates revealed the root cause: “a change that caused an influx of retry requests routed through servers, impacting service availability.”
To mitigate this, Microsoft implemented optimizations to enhance the infrastructure’s processing capabilities. While these changes offered some relief, the company remained committed to further actions to fully restore service.
Interestingly, Microsoft’s Office service health and Microsoft 365 network health status pages initially showed no issues with network health, ISP availability, or customer network infrastructure. This highlights the complexity of identifying the source of such widespread outages. The incident also brought to mind a similar, large-scale outage in July, which affected multiple Microsoft 365 and Azure services and was later attributed to a distributed denial-of-service (DDoS) attack.
Ongoing Recovery and Future Implications
While Microsoft reported that the deployed patch had reached 90% deployment and service availability was recovering, the company cautioned that an estimated time for complete service restoration was unavailable. Targeted server restarts were underway to address routing service issues, prioritizing customers during business hours.
The incident underscores the critical reliance on Microsoft 365 services and the significant impact even temporary disruptions can have on businesses and individuals. The detailed root cause analysis and subsequent mitigation efforts suggest a focus on improving the resilience and scalability of the Microsoft 365 infrastructure to prevent similar incidents in the future.
The ongoing monitoring and follow-up actions by Microsoft will be crucial in ensuring the long-term stability and reliability of its services. This Microsoft 365 outage serves as a reminder of the potential vulnerabilities even in seemingly robust systems and the importance of robust contingency plans.