Sunday, July 10, 2011

Scott Guthrie Reports Some Windows Azure Customers Are Storing 25 to 50 Petabytes of Data

Updated 7/10/2011 to reflect elimination of data ingress charges for the Windows Azure Platform as of 7/1/2011 and my admission to the Microsoft SQL Azure Federations Product Evaluation Program.

• Updated 6/11/2011 7:30 AM PDT: Added 9-second audio excerpt, as well as bandwidth and typical transaction charges plus upload times for 25 to 50 petabytes.

• Updated 6/10/2011 2:30 PM PDT: Added storage cost of 25 to 50 petabytes and ScottGu’s updated bio.


At 00:36:33 in the live stream archive of his Windows Azure and Cloud Computing keynote to the Norwegian Developers Conference (NDC) 2011 on 6/8/2011, Scott Guthrie (@scottgu) said:

But some of our customers we have in Azure today, for example, are doing 25 to 50 petabytes. That’s kind of their target range, of sorts, which is a lot of storage.

You aren’t kidding that’s a “lot of storage”, Scott. At Microsoft’s current storage price of US$0.15 per GB/month, that’s US$0.15 * 1,000,000 = US$150,000/PB-month or billings of US$ 3.75 to 7.5 million/month.

Click here and then click Open to play the above excerpt of the live stream from a 9-second *.mp3 file stored in my Windows Live Skydrive account.

•• The one-time cost of data ingress at the current bandwidth price of US$0.10 per GB for all data centers except Asia East and Asia Southeast would be US$0.10 * 1,000,000 = US$100,000/PB = billings of US$2.5 to 5 million.

The Windows Azure Team posted Announcing Free Ingress for all Windows Azure Customers starting July 1st, 2011 on 6/22/2011:

Today we’re pleased to announce a change in pricing for the Windows Azure platform that will provide significant cost savings for customers whose cloud applications experience substantial inbound traffic, and customers interested in migrating large quantities of existing data to the cloud. For billing periods that begin on or after July 1, 2011, all inbound data transfers for both peak and off-peak times will be free. …

Assuming table storage with the maximum 1 MB/row and 100 rows/batch transaction results in 10 transactions/GB = 10,000,000 transactions/PB. At US$0.01/10,000 transactions, the minimum charge would be US$10 per PB, which is inconsequential. Small row sizes would increase transactions charges substantially.

Note that storage accounts currently have a “strict limit” of 100 TB. Therefore, 25 to 50 PB of storage would require 250 to 500 accounts.

See the Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity post of 7/8/2010 to the Windows Azure Storage Team blog for more information about storage billing.

The current scalability target for storage bandwidth is 3 gigabits/second and storage transactions is 5,000 entities per second, according to the Windows Azure Storage Abstractions and their Scalability Targets post to the Windows Azure Storage Team blog of 5/10/2010. 5,000 1MB table rows/sec would require a bandwidth of 5 GB/second. 3 gigabits/second ~ 300 megabytes/second, so the best case for uploading a PB would take a minimum of 3,333,333 seconds or 926 hours = 38.6 days. A more probable T3 connection (44.735 mbps) would require ~ 223,538,000 seconds = 62,000 hours/PB = 2,583 days/PB.

Alex Williams posted What Would 10 Petabytes Look Like? [Infographic] to the ReadWriteCloud blog on 1/15/2011. Phillip Elmer-DeWitt provided more petabyte background in a What's 12 petabytes to Apple? post of 4/7/2011 to the CNNMoney site.

For more details about new Windows Azure and SQL Azure features, see my New Migration Paths to the Microsoft Cloud cover story for Visual Studio Magazine’s June 2011 issue and Michael Desmond’s Windows Azure Q&A with Roger Jennings of 6/10/2011.


Scott also discussed SQL Azure and said at 00:42:18:

We also do autosharding as part of SQL Azure, which means that from a scale-out perspective, we can handle super-high loads, and we do all of that kind of load-balancing and scale-out work for you.

Today SQL Azure supports up to 50 GB of relational storage for a database, but you can have any number of databases. In the future, you’ll see us support hundreds of Gigabytes and Terabytes [that] you can take advantage of.

Scott jumped the gun a bit on the autosharding front. Autosharding is a component of the SQL Azure Federations program, which Cihan Biyikoglu, who’s a senior program manager for SQL Azure, described in his Federations Product Evaluation Program Now Open for Nominations! post of 5/13/2011:

imageMicrosoft SQL Azure Federations Product Evaluation Program nomination survey is now open. To nominate an application and get access to the preview of this technology, please fill out this survey.

Let me take a second to explain the program details: the preview program is a great fit for folks who would like to get experience with the federations programmability model. As part of the preview, you get access to very early bits on scale-minimized versions of SQL Azure. This minimized version does not provide high availability or high performance and scale just yet as it runs under a different configuration. Those properties will come when we deploy to the public cluster. However you can exercise the programmability model and develop a full fledged application. There may still be minor changes to the surface but will be incremental small changes at this point. Participants will also get a chance to provide the development team with detailed feedback on the federations technology before it appears in SQL Azure. The preview program is available to only a limited set of customers. Customer who are selected for the program, will receive communication once the program is kicked off in the months of May and June 2011.

I was admitted to the Microsoft SQL Azure Federations Product Evaluation Program under a NDA in early July 2011. 

Of course, you might not need to autoshard SQL Azure when it supports “hundreds of Gigabytes and Terabytes”.

For more details about sharding SQL Azure databases and SQL Azure Federations, see my Build Big-Data Apps in SQL Azure with Federation cover story for Visual Studio Magazine’s March 2011 issue.

Note: The NDC 2011 site provided an updated bio:

Scott Guthrie is corporate vice president of Microsoft's Azure Application Platform team, and runs the development teams responsible for delivering Microsoft’s Windows Azure, AppFabric, and Web Server Technologies and Tools. [Emphasis added.

A founding member of the .NET project, Guthrie has played a key role in the design and development of Visual Studio and the .NET Framework since 1999. Today, Guthrie directly manages the development teams that build ASP.NET, WCF, Workflow, IIS, AppFabric, WebMatrix and the Visual Studio Tools for Web development.

Guthrie graduated with a degree in computer science from Duke University.

It appears from the preceding bio that Scott doesn’t run SQL Azure development teams. Note that the second paragraph doesn’t include the Windows Azure development team (probably an error). Scott’s official Microsoft Executive Biography was last updated on 2/15/2008, so it’s woefully out-of-date and doesn’t include his new Windows Azure responsibilities.