Wednesday, April 21, 2010

Windows Azure Live DNS Brief Outage at All Data Centers on 4/21/2010 at ~ 2:30 PM PDT

I received the following alert from Pingdom on 4/21/2010:

PingdomAlert DOWN:
  Azure Tables (oakleaf.cloudapp.net) is down since 04/21/2010 01:48:21PM.

and from Mon.itor.us:

image

Following is a screen capture of the Azure service dashboard at about 3:00 PM PDT (click for full 1024px width version):

Notice that DNS is reported to have failed for all Data Centers, which would indicate that storage and compute replicas geolocated in multiple data centers would not have solved the problem.

Here’s the mouse-over message that appeared in the Status History section after the live DNS operation was reported restored (and confirmed by me for the OakLeaf test harness) at about 3:00 PM PDT:

image 

Pingdom reports the outage duration was ~40 minutes:

PingdomAlert UP:
  Azure Tables (oakleaf.cloudapp.net) is UP again at 04/21/2010 02:28:21PM after 40m of downtime.

The outage resulted in a few #FAIL tweets with the #Azure hashtag:

Here’s the mouse-over message for the AppFabric Service Bus problem in the South Central US data center:

image 

and the Project “Dallas” portal and content in the same data center:

image

I’ll update this post if the Windows Azure, Data Center team, or both provide an explanation for the outage.

blog comments powered by Disqus