Wednesday, February 29, 2012

Microsoft’s Official Response to the Windows Azure Outage of 2/29/2012

Bill Liang, Corporate VP, Server and Cloud, added the following post to the Windows Azure blog on 2/29/2012:

imageI lead the engineering organization responsible for the Windows Azure service and I want to update you on the service disruption we had over the past day. First let me apologize for any inconvenience this disruption has caused our customers. Our focus over the past day has been to resolve the Windows Azure Compute service disruption. As always we communicate the status of incidents through the Windows Azure Service Dashboard and update that status on an hourly basis or as the situation changes.

imageYesterday, February 28th, 2012 at 5:45 PM PST Windows Azure operations became aware of an issue impacting the compute service in a number of regions. The issue was quickly triaged and it was determined to be caused by a software bug. While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year. Once we discovered the issue we immediately took steps to protect customer services that were already up and running, and began creating a fix for the issue. The fix was successfully deployed to most of the Windows Azure sub-regions and we restored Windows Azure service availability to the majority of our customers and services by 2:57AM PST, Feb 29th.

However, some sub-regions and customers are still experiencing issues and as a result of these issues they may be experiencing a loss of application functionality. We are actively working to address these remaining issues. Customers should refer to the Windows Azure Service Dashboard for latest status. Windows Azure Storage was not impacted by this issue.

We will post an update on this situation, including details on the root cause analysis at the end of this incident. However, our current priority is to restore functionality for all of our customers, sub-regions and services.
We sincerely apologize for any inconvenience this has caused.

imageAccording to Pingdom, my OakLeaf Systems Azure Table Services Sample Project (Tools v1.4 with Azure Storage Analytics) was down for five minutes or less. The next OakLeaf Uptime Report, expected to post on 3/3/2012, will provide confirmation of downltime during the period in question. My SQL Azure Reporting Systems Preview Demo service indicated no problems during intermittent tests on 2/29/2012.

0 comments: