Last week if you recall I wrote about Microsoft’s 365 outage and how it impacted our organization. Today Microsoft issued a public statement regarding what occurred. This statement, to me is another example of Microsoft’s change to a more open business model with their customers which is greatly appreciated and welcomed. Outages like the one that occurred last week will occur occasionally if rarely but the cloud providers we rely on must have an open line of commutation with their customers if they want to maintain their business and confidence.
Microsoft’s Office 365 service experienced a rather lengthy outage earlier this week. On Monday and Tuesday, users in North America were faced with Lync and Exchange outages that lasted for several hours.
Microsoft said that the Lync and Exchange outages were unrelated, but another breakdown in the Service Health Dashboard meant that those who were affected were not being notified of the outage. It was a double hit for Microsoft: not only were core features offline, but the mechanism to alert users of the outage was failing as well.
Lync Online’s drop off was caused by a brief loss of connectivity. When connectivity was restored, the backlog of traffic caused a significant spike in traffic and overloaded the remaining servers, which disrupted the service for some customers.
The Exchange issue was the result of a failure in a directory that caused a directory partition to stop responding to authentication requests. Microsoft said that this was a unique failure and that was the reason for the extended downtime with that platform.
As you would expect, Microsoft said that the issues have been fixed and that they have learned from this experience on how to avoid such scenarios again. While Office 365 has been stable (for the most part), the platform has historically had no issues with downtime of this length in the past.
One final comment regarding this particular situation and where Microsoft needs to improve. If you recall from my previous post about this incident when I called Microsoft Support during the outage I was greeted with a message that basically said, “the engineers are aware of the outage and have no further information and they are working on it”. Other then offering to send a text message with updates (which they did) there was no other contact with their customers that I am aware of during the outage. This is not good enough. Customers need to hear from customer support, especially during times like this.