OakLeaf Systems: Windows Azure and Cloud Computing Posts for 7/25/2011+

A compendium of Windows Azure, SQL Azure Database, AppFabric, Windows Azure Platform Appliance and other cloud-computing articles.

• Updated 7/25/2011 5:00 PM with new articles marked • by Rob Hirschfeld, Windows Azure Team, Martin Tantow, Marcelo Lopez Ruiz and Alex Williams.

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

To use the above links, first click the post’s title to display the single article you want to navigate.

Azure Blob, Drive, Table and Queue Services

No significant articles today.

SQL Azure Database and Reporting

Lynn Langit (@llangit) posted SQL Server Developer Tools (Juneau)–first look on 7/25/2011:

I have been preparing a talk (for an internal Microsoft product training) on the new SQL Server Developer Tools (or SSDT – code named ‘Juneau), based on the SQL Server vNext CTP 3 (code named – ‘Denali’).

I encourage you to try this out. To do so, you’ll need to have Visual Studio 2010, with SP1 installed. Then you’ll need to download and install a version SQL Server Denali CTP 3, AFTER that you can download and install the CTP for Juneau.

Start by taking a look at the new node in the Visual Studio Server Explorer > SQL Server (as shown below with the [new] local instance, a SQL Azure instance, a SQL Denali instance, and a SQL Server 2008 R2 instance). Right click on any database in this node to try out the new ‘Create new project’ (for off-line database development) or the ‘Schema Compare’ features.

Of particular interest to SQL Azure developers is that the Juneau tools are version-aware. What this means is that you can target your development to SQL Azure (or other versions) and the tools will provide you with warnings, highlights, etc…in your T-SQL code that are specific to that particular version.

There are a couple of good talks from TechEd North America on the topic of Juneau as well. The first one covers the new features around database lifecycle management.

The second video covers future features (the bits don’t seem to be released to the public yet) in Entity Framework and Juneau.

As I work on my talk I am wondering how those of you who have been using ‘Data Dude’ are finding SSDT? Drop me a note via this blog.

Also, I’ll publish the presentation after I give it.

Tim Anderson (@timanderson) asked “Tasty Visual Studio upgrade or 'One step forward, two steps back'?” in a deck for his Microsoft previews 'Juneau' SQL Server tools article of 7/25/2011 for The Register:

Microsoft has released a third preview of SQL Server 2011, codenamed "Denali" and including the "Juneau" toolset.

In the Denali database engine there are new features that supporting high availability, and improve query performance of data warehousing queries. Then there's FileTable, a special table type that is also published as a Windows network share, and which enables file system access to data managed by SQL Server.

For business intelligence, Denali includes tabular modelling, which means in-memory databases that support business intelligence analysis, and a new interactive visualisation and reporting client called Project Crescent.

Interestingly, Project Crescent is built with Silverlight rather than HTML5, despite Microsoft's new-found commitment to all things HTML.

Alongside these new database features Microsoft is introducing a new set of tools delivered as add-ins to Visual Studio, and which work with older versions of SQL Server as well as Denali. This SQL Server Development Tools (SSDT) collection, codenamed "Juneau", is an ambitious project that builds on what was done for the existing Visual Studio 2008 and 2010 database projects – although, curiously, some features currently found in database projects are missing in Juneau.

The database being a key part of most business applications, the goal of Juneau is to integrate database development with application development, and to bring capabilities like testing, debugging, version control, refactoring, dependency checking, and deployment to the database. In addition, the query and design tools should mean that developers rarely need to switch to SQL Server Management Studio.

Juneau previews database changes and warns of the consequences

Juneau provides a new SQL Server project type in Visual Studio for database design and debugging. A SQL Server database project stores the entire database schema as the Transact-SQL (T-SQL) scripts that are required to create it, including code that runs in the database, such as stored procedures and triggers. You can import the schema from an existing database, or you can build it from scratch.

Since it's code, the schema is amenable to code-oriented operations such as versioning and refactoring, and it enables features such as Go to Definition.

Juneau uses a new testing and debugging method that works with a local, single-user database that's created automatically for each database project. When you debug the project, the database schema is applied to this local instance.

There are some limitations to this method, such as no support for full-text indexes. You can, however, debug SQL CLR, which is custom .NET code running in the database – SQL Server 2011 uses .NET Framework 4.0.

Unlike the old Visual Studio database projects, Juneau has a visual table designer. That said, some actions are not really visual and simply generate a template line of T-SQL for you to edit. The T-SQL editor, however, stays automatically in sync with the visual designer and includes smart code completion based on the underlying schema. Apparently this uses the actual parser from the SQL Server database engine under the covers.

Compare and contrast

A key part of the SSDT collection is a tool called Schema Compare. This takes the schema in your project, compares it to a target database, and reports any differences. For example, if a database admin has made some changes since the schema was imported to the project, Schema Compare will highlight them. The tools are also able to generate update scripts that apply the project schema to the target, and warn you of consequences such as data loss.

The SSDT Juneau tools always use a two-stage process to amend an existing database. Rather than apply the changes immediately, it previews the update, warns you about what will break, and offers the choice of a generated script or immediate update. This buffered approach is a strong feature, and will help to prevent mistakes.

Another notable feature is deployment. The Publish Database tool lets you specify a target which can be an on-premise SQL Server or SQL Azure, hosted on Microsoft's cloud. [Emphasis added.]

Be forewarned: not all the features of Juneau are in the current preview. One missing piece – so far, at least – is the application project and database project integration, which will link SQL Server projects to the Entity Framework for object-relational mapping.

While Juneau's general approach looks strong, it is puzzling that some of the best parts of the old database projects – such as the Schema Dependency Viewer and Data Compare tool – are not in the preview version. As one one developer on Microsoft's discussion forum put it, "It feels like 1 step forward, 2 steps backwards."

As of today, Juneau's visual design tools are weak, there's no visual query designer, and no database diagramming or visual modelling tool. Some of these gaps may be filled by the time the tools are released – or, failing that, in a future update. If so, the new SSDT could be an excellent set of tools – although of course you're tied to SQL Server.

Although the Juneau tools currently install into Visual Studio 2010, Microsoft says they will be part of the next version of Visual Studio. They can also be used standalone, though they still use the Visual Studio shell. Expect more detail on this at the Build conference in September.

You can find more information about Juneau and download the preview here.

MarketPlace DataMarket and OData

• The Windows Azure (Marketplace) Team (@WindowsAzure) announced Available Now: Windows Azure Marketplace Content & Tools Update in a 7/25/2011 post:

We are very excited today to announce our latest content release, which makes some great new data available on the Windows Azure Marketplace, as well as the release of our new Offer Submission page!

Are you looking to publish your application or data on the Marketplace? Check out https://marketplace.windowsazure.com/publishing for all the information you need including samples, documentation and to access the new Offer Submission page!

Also available today is BranchInfo™ 2011 from our friends at RPM Consulting. BranchInfo 2011 is a historical database of every bank branch location in the U.S, containing branch level information carefully address-matched and geocoded by institution and site across a 5 year timeframe. This allows analysts to focus on current office locations, and the capability to trace and project the development and ownership of branches over time. Users can also drill down into one mile trade areas for each branch, to determine deposits and share, market potential, and competition.

RPM Consulting has also just released MarketBank™ 2011, providing estimated usage and balances of financial products and services among America’s households. Data are provided at the block group level, and aggregated to branch trade areas and other standard and custom business and community geographies. Estimates are based upon a proprietary model utilizing data from the Federal Reserve Bank’s Survey of Consumer Finances, and other industry data to market populations as defined by ESRI 2010/2015 Updated Demographics.

Windows Azure AppFabric: Apps, Access Control, WIF and Service Bus

Vittorio Bertocci (@vibronet) described Using the Windows Azure Access Control Service in iOS Applications in a 7/25/2011 post:

Nossir, there is nothing wrong with my blog. The screenshot above is indeed taken from a Mac desktop, and the one you see on the center is actually an instance of the iPhone emulator. What you should notice, however, is the list of Identity Providers it shows, so strangely similar to what we have shown on Windows Phone 7 and ACS… but I better start from the beginning.

A couple of months ago the Windows Azure Platform Evangelism team, in which I worked until recently, released a toolkit for taking advantage of the Windows Azure platform services from Windows Phone 7 applications. The toolkit featured various integration points with ACS, as explained at length here.

At about the same time, the team (and Wade specifically) released a version of the same toolkit tailored to iOS developers. That first iOS version integrated with the core Windows Azure services, but didn’t take advantage of ACS.
Well, today we are releasing a new version of the Windows Azure toolkit for iOS featuring ACS integration!

Dear iOS friends landing on this blog for the first time: of course I understand you many not be familiar with the Windows Azure Control Service. You can get a quick introduction here, however for the purpose of providing context for this post let me just say the following: ACS is a fully cloud-hosted service which helps you to add to your application sign-in capabilities from many user sources such as Windows Live ID, Facebook, Google, Yahoo, arbitrary OpenID providers, local Active Directory instances, and many more. Best of all, it allows you to do so without having to learn each and every API or SDK; the integration code is the same for everybody, and extremely straightforward. All communications are done via open protocols, hence you can easily take advantage of the service from any platform, as this very post demonstrates. Try it!

I am no longer on the Evangelism team, but the ACS work for this deliverable largely took place while I was still on it: recording the screencast and writing this blog post provides nice closure. Thanks to Wade for having patiently prepped & provided a Mac already perfectly configured for the recording! Also, for driving the entire project, IMO one of the coolest things we’ve done with ACS so far.

And now, for something completely different:

The Release

As usual, you’ll find everything in DPE’s GitHub repository: https://github.com/microsoft-dpe
There will be four main entries you’ll want to pay attention to:

watoolkitios-lib
This is a library of Objective-C snippets which can help you to perform a number of common tasks when using WIndows Azure. For the specific ACS case, you’ll find code for listing identity providers, acquire and handle tokens, invoke the ACS management APIs, and so on.

watoolkitios-doc
As expected, some documentation.

watoolkitios-samples
A sample application which demonstrates how to put the various snippets together

cloudreadypackages
Those are a set of ready-to-go packages that can be directly uploaded and launched in Windows Azure, without requiring you to have access to Visual Studio or the Windows Azure SDK: all you need is to deploy them via the portal (which works on Mac, too). The packages can be used as test backend for your iOS applications.
The packages take advantage of the technique described here to allow changes in the config settings even after deploy time. Which is a great segue for…

The ACS config tool for IOs
In the Windows Azure Toolkit for Windows Phone 7 we included some Visual Studio templates which contain all the necessary logic for wiring up a phone application to ACS and configure ACS to issue tokens for that app. In iOS/xCode there’s no direct equivalent of those templates, but we still wanted to shield the developer from many of the low level details of using Windows Azure. To that end, we created a tool which can automatically configure the application, ACS and Windows Azure.

Using the ACS Configuration Tool for iOS in the Toolkit

If you want to see the tool in action, check out the webcast; here I will give you few glimpses, just to whet your appetite.

This is a classic wizard, and it opens with a classic welcome page. We don’t like surprises, hence we announce what the tool is going to do. Let’s click next.

The first screen gathers info about the Windows Azure storage account you want to use; nothing to do with ACS yet. Next.

The next screen gathers the certificate used for doing SSL with the cloud package. Again, no ACS yet. Next.

Ahh, NOW we are talking business. Just like in the toolkit for Windows Phone 7 we offered the possibility of using the membership provider or ACS, here we do the same: depending on which option you pick, the way in which the user will be prompted for credentials and how calls will be secured will differ accordingly. Here we go the ACS way, or course.

I would say this is the key screen in the entire process. Here we prompt the developer to provide the ACS namespace theyw ant to use with their iOS application, and the management key we need to modify the namespace settings accordingly. If you are unsure about how to obtain those values, a helpful link points to a document which will briefly explain how to navigate the ACS portal to get those.

In this wizard we try to strike a balance between showing you the power of the services we use and keeping the experience simple. As we did for the WP7 toolkit, here we apply some defaults (Google, yahoo and live id as identity providers, pass-through rules for all) that will show how ACS works without offering too many knobs and levers to operate. If you are unhappy with the defaults, you can always go directly to the portal and modify the settings accordingly. For example you may add a Facebook app as identity provider, and that will show up automatically in the phone application without any changes to the code.

The final screen of the wizard informs you that it has enough info to start the automatic configuration process. First it will generate a ServiceConfiguration.cscfg file, which you’ll use for configuring the Windows Azure backend (your cloudready package) via the portal. Then the wizard will reach out directly to the ACS management endpoint, and will add all the settings as specified.

As soon as you hit Save the wizard will ask you for amlocation for the cscfg file, then it will contact ACS and show you a bar as it progresses thru the configuration. Pretty neat!

Above you can see the generated ServiceConfiguration.cscfg. Of course the entire point of generating the file is so that you don’t have to worry about the details, but if you are curious you can poke around. You’ll mainly find the connection strings for the Windows Azure storage and the settings for driving the interaction with ACS.

All you need to do is to navigate (via the Windows Azure management portal) to the hosted service you are using for your backend, hit Configure and paste in the autogenerated ServiceConfiguration.cscfg.

The next step in the screencast shows how to run the sample application, already properly configured, in Xcode. If you hit the play button, you’ll be greeted by the screen which which I opened the post.

The rest is business as usual: the application follows the same pattern as the ACS phone sample and labs: an initial selection driven by browser based sign in protocols to obtain and cache the token from ACS (a SWT) and subsequent web service calls secured via OAuth. Below a Windows Live ID prompt, followed by the first screen of the app upon successful authentication.

Well, that’s it folks! I know that Wade and the gang will keep an eye on the GitHub repository: play with the code, let them know what you like and what you don’t like, branch the code and add the improvements you want, go crazy!

Vittorio Bertocci (@vibronet) reported New in ACS: Portal in Multiple Languages, a New Rule Type… and Wave Bye-Bye to Quotas in a 7/25/2011 post:

Big news in ACSland today! There few new key features that - I am sure – many of you will welcome with a big smile.

As usual, for the full scoop take a look at the announcement and the release notes; here I’ll just give you few highlights & customarily lighthearted commentary.

The Portal Comes in 11 Languages

Riding the wave of the general localization effort sweeping the Windows Azure portal, the ACS portal can now entertain users in 10 extra languages, such as Japanese, Chinese (simplified and traditional), Korean, Russian, Portuguese, Spanish, German, French and even Italian .

Switching it is pretty trivial, to the point that I am daring to switch to Chinese without (too much) fear of not being able to revert to English . Just pick the language you want in the dropdown on the top right corner, and the UI will switch immediately. Also note the URL (in my case it moved to https://windows.azure.com/Default.aspx?lang=zh-Hans).

From that moment on, everything will be localized accordingly: for example if I invoke the management portal for one namespace, I get the HRD page localized accordingly:

And of course, the portal itself is now fully localized:

Note that I can override the language settings directly from the ACS portal, as highlighted in the image above.
Biographic note: I always have a lot of fun checking out the Italian versions of the software I use. The reason is that everybody have a different threshold about what should be translated and what should remain in their original formulation (why translating IP to “provider di identita’” but leaving RP as “relying party”? (or even why keeping “provider” but translating “identity”?)), and for expats like myself that threshold is often 0 (as in “do not translate at all”). Mismatches in expectations lead to those "benign violations” that McGraw claims constituting the basis of humor. But I digress: ignore my pet peeves, I am sure that having the portal available in multiple languages will be of enormous help for making ACS even easier to use. Good job guys!

Quotas Are No More

Ah, this one is as simple as it will be appreciated, I have not the slightest doubt about it.
Some of you occasionally stumbled on quotas: deliberate restrictions which capped the maximum number of entities (rules, trusted IPs, RPs, etc etc) that could be created within a given namespace. Well, rejoice: those restrictions are now all gone. Have fun!

Rules Accept Up To 2 Input Claims

Here I risk throwing myself in a somewhat lengthy explanation, which I know many of my colleagues will deem unnecessary (as in “why does he always take hours to get to the point?!”). In order to preempt their complaints, here there are the sheer facts about the new rules:

From this release on, you have the option of specifying up to two claims as input for claims transformation rules. If claims triggering both input conditions are present (logical AND), then the rule will trigger. The input claims must both be from the same identity provider, as there is no flow that would allow ACS to gather claims from multiple sources at once; alternatively, they can mix one identity provider and ACS itself.

That’s all very straightforward. When you create your rule, specify your input claim conditions as usual; you’ll have the chance of adding a second input claim, by clicking on “Add a second input claim” as shown above.

That opens up a new area in the UI, where you can specify the details of the second input claim. It’s that easy! Note that only newly created rules will allow a second input claim, and that rules created via the Generate command won’t have the second input claim either.

One application of this new rule type is pretty obvious: you can express logic which depends from more than one factor (two, in fact) in the input token. As in “you get to be in the ‘Gold’ role only if you are in the group ‘Managers’ AND in the group ‘Partners’”, which was impossible to express before introducing the new rule type. Unless you enlist in the process the administrator of the IP and you convince them to add the rule in THEIR system directly at the origin, but that would be cheating.

Another application is slightly less obvious: it is the chance of composing the current input with decisions taken in former iterations. I know, that’s not especially clear. That’s why I am throwing myself in the lengthy explanation in this other post, which is totally optional.

That’s it folks! Once again, don’t rely on this unreliable blog and read for yourself about the news in the announcement and the release notes. I am sure you’ll surprise us with real creative uses of those new features now at your disposal!

The Identity and Access Control Team posted Announcing July 2011 update to Access Control Service 2.0 on 7/25/2011:

Windows Azure AppFabric Access Control Service (ACS) 2.0 received a service update. All customers with ACS 2.0 namespaces automatically received this update, which primarily contained bug fixes in addition to a few new features and service changes:

Localization in eleven languages

The ACS management portal is now available in 11 languages. Newly-supported languages include Japanese, German, Traditional Chinese, Simplified Chinese, French, Italian, Spanish, Korean, Russian, and Brazilian Portuguese. Users can choose their desired language from the language chooser in the upper-right corner of the portal.

Rules now support up to two input claims

The ACS 2.0 rules engine now supports a new type of rule that allows up to two input claims to be configured, instead of only one input claim. Rules with two input claims can be used to reduce the overall number of rules required to perform complex user authorization functions. For more information on rules with two input claims, see http://msdn.microsoft.com/en-us/library/gg185923.aspx.

Encoding is now UTF-8 for all OAuth 2.0 responses

In the initial release of ACS 2.0, the character encoding set for all HTTP responses from the OAuth 2.0 endpoint was US-ASCII. In the July 2011 update, the character encoding of HTTP responses is now set to UTF-8 to support extended character sets.

Quotas Removed

The previous quotas on configuration data have been removed in this update. This includes removal of all limitations on the number of identity providers, relying party applications, rule groups, rules, service identities, claim types, delegation records, issuers, keys, and addresses that can be created in a given ACS namespace.

Please use the following resources to learn more about this release:

Vittorio Bertocci's blog post

Release Notes

MSDN Documentation

CodePlex Site

For any questions or feedback please visit the Security for the Windows Azure Platform forum.

If you have not signed up for Windows Azure AppFabric and would like to start using these new capabilities, be sure to take advantage of our free trial offer. Just click on the image below and get started today!

The Windows Azure AppFabric Team (@Azure_AppFabric) posted Announcing the Windows Azure AppFabric July release on 7/25/2011:

Today we are excited to announce several updates and enhancements made to the Windows Azure AppFabric Management Portal and the Access Control service.

Management Portal

Localization

As announced on the Windows Azure blog: Windows Azure Platform Management Portal Updates Now Available, the Windows Azure Platform Management Portal now supports localization in 11 languages. The newly supported languages are Japanese, German, Traditional Chinese, Simplified Chinese, French, Italian, Spanish, Korean, Russian, and Brazilian Portuguese.

Users can choose their desired language from the language chooser in the top left pane of the Portal.

You can read about additional enhancements in the blog post: Windows Azure Platform Management Portal Updates Now Available.

Co-admin support

Customers can grant access to additional users (Co-Administrators) on the Windows Azure Management Portal as documented here: How to Setup Multiple Administrator Accounts.

These Co-administrators will now have access to the AppFabric section of the portal.

For any questions or feedback regarding the Management Portal please visit the Managing Services on the Windows Azure Platform forum.

Access Control

The following updates have been made to all ACS 2.0 namespaces.

Rules now support up to two input claims

The ACS 2.0 rules engine now supports a new type of rule that allows up to two input claims to be configured, instead of only one input claim. Rules with two input claims can be used to reduce the overall number of rules required to perform complex user authorization functions. For more information on rules with two input claims, see http://msdn.microsoft.com/en-us/library/gg185923.aspx.

Encoding is now UTF-8 for all OAuth 2.0 responses

In the initial release of ACS 2.0, the character encoding set for all HTTP responses from the OAuth 2.0 endpoint was US-ASCII. In the July 2011 release, the character encoding of HTTP responses is now set to UTF-8 to support extended character sets.

Quotas Removed

The previous quotas on configuration data have been removed in this release. This includes removal of all limitations on the number of identity providers, relying party applications, rule groups, rules, service identities, claim types, delegation records, issuers, keys, and addresses that can be created in a given ACS namespace.

Please use the following resources to learn more about this release:

Vittorio Bertocci's blog post

Release Notes

MSDN Documentation

CodePlex Site

For any questions or feedback regarding the Access Control service please visit the Security for the Windows Azure Platform forum.

If you have not signed up for Windows Azure AppFabric and would like to start using these new capabilities, be sure to take advantage of our free trial offer. Just click on the image below and get started today!

• Martin Tantow (@mtantow) asserted The Battle For The Social Cloud Has Just Started in a 7/25/2011 post to the CloudTimes blog:

The new technology trend next year will be about strong connections between various electronic gadgets, PC’s and post-PC devices. It will be about the cloud dominion using shared interfaces, wireless communications, cloud-based database storage, which will highlight connections without using cords.

This is what Google mobile apps is all about with the launch of OS X Lion last Tuesday; Apple is further developing the App Store and the iCloud. This is also what Microsoft is aiming with its forthcoming product launch of Windows 8, where the Redmond company is focused on creating a single and consolidated ID from Windows Phone 7, Office Xbox and Skype.

Google’s mobile apps for Android and iOS already showcase these features and are continuously developing its web capabilities using their strong social network partners.

Google mobile apps have strong social networking features; however, everyone must understand that it isn’t anything like just another Twitter or Facebook account. With Google mobile apps it is not just viewing public profiles or just making friends on social network sites, rather it is about creating valid identities across varying cloud based services. Multiple devices like Facebook, Twitter, Amazon, Apple, Google and Microsoft will now have the capability to provide more substantial information like Social Security information, Driver’s license, car keys, passport and credit cards.

According to Edd Dumbill of O’Reilly Radar, this new feature will now have the capacity to provide user identity, sharing information, communication with other users, annotation capabilities and notification alerts. Dumbill said, Google will be “The social backbone of the web,” which is now an important part of the entire technology ecosystem.

Eric Schmidt, Google’s chairman now talks about Facebook as an identity drawer instead of a social network site. This is why Google+ was developed and started over a year ago to be at par with this feature of Facebook. According to Schmidt, mobile cloud computing is about customization and personalization, communication and identity.

Meanwhile, Facebook continues to get the upper hand on this with heir new identity machines like Spotify for login credentials, Gawker Media for the communication threads, Bing for a more customized search and an integrated contact details management for Windows Phone 7.

Google+ webapps hopes to deliver the same if not exceed its performance in this area through its strong web partners who are all willing to take advantage of Facebook’s millions of accounts. They can also take advantage of Google’s ability to get access from their own mobile applications through Android.

These social network giants continue to battle for the cloud and now not just with their newest product applications, but with their extended users and partners. The faceoff between these rivals was initially seen through their video chat offerings, Facebook acquired Skype, while Google is powered by Hangouts squaring. Microsoft is one perfect example of the link between these two giants. Skype, which happens to be one of Microsoft’s newest partnerships in providing Skype’s voice and video chat facilities, is also pushing partnerships with Google, and Baidu.

Google’s strategy in all these is to have social platforms like Twitter, Linkedin, Tumbler and Quora to defend itself against the solid partnership of Microsoft and Facebook, which is effectively doing the trick of putting pressure on their products.

It was actually their way of getting user’s attention that is captured in their clever ads that says, “What G+ is all about (psst!!! It’s not social).” Google is seriously building and “fixing collaborations and sharing across apps and across platforms,” Vincent Wong of Google said. Other than just the social networking features’ post and updates, Google focuses more on their strengths in Gmail, Documents, Calendar, Web, Reader and Photos. Wong said “That’s almost everything you use on your computer!”

Google’s strong alliances is evidenced by Google+ Apps releasing the iOS app. Here Google plans to share their user’s content to target Twitter and LinkedIn. Here Google+ for iPhone will work like a notification machine that will pull everyone towards their own set of extended circle.

The Definition of the Future eStore

At the rate Google is going on with its business strategies, it is so easy to share pictures, videos and links, but with Google+ mobile apps, it will no longer be impossible to share office documents, news and shopping discounts and maps.

Google will now become the virtual mobile wallet system, when the previous identity system becomes a strong purchasing system. So an example will be, if Silicon Valley were to host a basketball tournament, which is shared on the public cloud, then we will soon be seeing the Final Four that includes Google, Apple with Twitter, Facebook with Microsoft and Amazon.

Windows Azure VM Role, Virtual Network, Connect, RDP and CDN

No significant articles today.

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Wade Wegner (@WadeWegner) filled in a few more gaps in the ACS story with Windows Azure Toolkit for iOS Now Supports the Access Control Service in a 7/25/2011 post:

Today we released an update to our Windows Azure Toolkit for iOS that provides some significant enhancements – in particular, we now provide support for using the Windows Azure Access Control Server (ACS) from an iOS application. You can get all the bits here:

(updated) watoolkitios-lib

(updated) watoolkitios-samples

(updated) watoolkitios-doc

(new) watoolkitios-configutility

(new) cloudreadypackages

We first released this toolkit on May 6th, and since then we’ve released two minor updates and even accepted a merge request from the community. This toolkit has been a real pleasure to work on. Not only has it been to break out of the traditional Microsoft stack and learn about new languages and environments, but it’s also been great to introduce a lot of Objective-C and iOS developers to the power of Windows Azure.

There are three key aspects to version 1.2 of the iOS toolkit:

Cloud Ready Packages for Devices

Configuration Tool

Support for ACS

These three pieces are incredibly important when trying to develop iOS applications that use Windows Azure; consequently, let me try and explain each of these components and how they help to make development easier.

Cloud Ready Packages for Devices

One of the biggest challenges when using Windows Azure for an iOS developer today is the inability to create a package that can be deployed to Windows Azure. To make this easier, we have pre-built four Cloud Ready Packages for Devices so that you – the iOS developer – don’t have to setup Windows 7 and run CSPACK. Instead, you simply have to download the most appropriate cloud ready package, update the .CSCFG file, then deploy through the Windows Azure Portal.

We have four “flavors” of the Cloud Ready Packages:

ACS + APNS – this version allows you to use the Access Control Service and register your certificate for the Apple Push Notification Service

ACS – this version allows you to use the Access Control Service

Membership + APNS – this version allows you to use a simple membership store in Windows Azure table storage for users and register your certificate for the Apple Push Notification Service

Membership – this version allows you to use a simple membership store in Windows Azure table storage for users

For more information on how to use and deploy these packages, take a look at this video on deploying the Cloud Ready Packages for Devices.

Configuration Tool

Along with the CSPKG you need a CSCFG to deploy your application to Windows Azure. The CSCFG file is an xml document that helps to describe elements of your application to Windows Azure so that it is able to correctly run your application.

In Visual Studio we have tools that make it easy to update the CSCFG file without having to open up the XML, but of course you cannot do this on a Mac. To make this easier, we created a tool that you can use on the Mac to walkthrough and generate the CSCFG file with all the appropriate details. Once created, you can use this CSCFG file along with the downloaded CSPKG file to deploy your application.

In addition to creating the CSCFG file, the configuration tool will also updated ACS with all the appropriate settings so that you can build & run your application quickly. For all the details, please take a look at Vittorio Bertocci’s post on Using the Windows Azure Access Control Service in iOS Applications.

Support for ACS

Everything I’ve described above is designed to make it easier for an iOS developer to quickly and easily use the Access Control Service. To use the library for authenticating to ACS, it’s really quite simple:
NSLog(@"Intializing the Access Control Client...");
WACloudAccessControlClient *acsClient = [WACloudAccessControlClient accessControlClientForNamespace:@"iostest-walkthrough" realm:@"uri:wazmobiletoolkit"];

[acsClient showInViewController:self.viewController allowsClose:NO withCompletionHandler:^(BOOL authenticated) {
    if (!authenticated)
    {
         NSLog(@"Error authenticating");
    }
    else
    {
         NSLog(@"Creating the authentication token...");
         WACloudAccessToken *token = [WACloudAccessControlClient sharedToken];
         /* Do something with the token here! */
    }
}];
I’ll post more walkthroughs and documentation shortly.

As always, please let me know what you think of the release! Your feedback is important to us, especially as it pertains to prioritizing future features and capabilities.

See the Windows Azure AppFabric: Apps, Access Control, WIF and Service Bus section above for more details of ACS support.

Cory Fowler (@SyntaxC4) described Configure Windows Azure Diagnostics–One Config To Rule Them All! in a 7/25/2011 post:

One thing each developer strives for is maintaining less code. I don’t know why anyone would code something more than once, or have to modify code when it could be placed in a configuration file, which is much simpler to modify, not to mention easier to share between projects.

At first, I considered creating a base class for RoleEntryPoint which contained my Diagnostics Configuration for my Windows Azure Projects. Although this would have done a decent job of facilitating my Diagnostics Configuration, it didn’t quite sit right with me as it would need to either load a large set of configurations from the Cloud Service Configuration file which would need to be added to every project, or I would be stuck hardcoding a number of defaults which I deemed to be a good baseline set, neither of these options were very appealing.

Configuration Settings could have been handled by creating a new Project Template within Visual Studio but this still seems like a larger effort than is needed.

I started thinking about how a Windows Azure VM Role would configure it’s diagnostics and started searching around the web of a solution. I’m glad I did as I ended up finding this Golden Nugget. Enter Windows Azure Diagnostics Configuration File:

Web Role: Role [Root] Bin Directory
Worker Role Placement: Role Root
VM Role Placement: %ProgramFiles%\Windows Azure Integration Components\<VersionNumber>\Diagnostics

Windows Azure Diagnostics Configuration File

Schema: %ProgramFiles%\Windows Azure SDK\<VersionNumber>\schemas\DiagnosticsConfig201010.xsd
<DiagnosticMonitorConfiguration xmlns="http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration"
      configurationChangePollInterval="PT1M"
      overallQuotaInMB="4096">

   <DiagnosticInfrastructureLogs bufferQuotaInMB="1024"
      scheduledTransferLogLevelFilter="Verbose"
      scheduledTransferPeriod="PT1M" />

   <Logs bufferQuotaInMB="1024"
      scheduledTransferLogLevelFilter="Verbose"
      scheduledTransferPeriod="PT1M" />

   <Directories bufferQuotaInMB="1024" 
      scheduledTransferPeriod="PT1M">

   
      
      <CrashDumps container="wad-crash-dumps" directoryQuotaInMB="256" />

      <FailedRequestLogs container="wad-frq" directoryQuotaInMB="256" />

      <IISLogs container="wad-iis" directoryQuotaInMB="256" />

      
      
      <DataSources>
         <DirectoryConfiguration container="wad-panther" directoryQuotaInMB="128">
            
            <Absolute expandEnvironment="true" path="%SystemRoot%\system32\sysprep\Panther" />
         </DirectoryConfiguration>

         <DirectoryConfiguration container="wad-custom" directoryQuotaInMB="128">
            
            <LocalResource name="MyLoggingLocalResource" relativePath="logs" />
         </DirectoryConfiguration>

      </DataSources>
   </Directories>

   <PerformanceCounters bufferQuotaInMB="512" scheduledTransferPeriod="PT1M">
      
      <PerformanceCounterConfiguration 
         counterSpecifier="\Processor(_Total)\% Processor Time" sampleRate="PT5S" />
   </PerformanceCounters>

   <WindowsEventLog bufferQuotaInMB="512"
      scheduledTransferLogLevelFilter="Verbose"
      scheduledTransferPeriod="PT1M">
      
      <DataSource name="System!*" />
   </WindowsEventLog>

</DiagnosticMonitorConfiguration>
For more information, see the Windows Azure Libarary Article: Using the Windows Azure Diagnostics Configuration File.

I would suggest reading the Section: “Installing the Diagnostics Configuration File” which outlines how the diagnostics configuration file is handled along side the DiagnosticManager configurations.

Mary Jo Foley (@maryjofoley) reported Microsoft updates Azure tookit for Apple's iOS to support federation with Facebook, Google and more on 7/25/2011 (see also the Windows Azure AppFabric: Apps, Access Control, WIF and Service Bus section):

Microsoft rolled out an updated version (1.2) of its Windows Azure toolkit for iOS on July 25.

Included in the latest update is support for the Azure Access Control Service (ACS). According to a blog post by Wade Wegner, an Azure Technical Evangelist, this ACS support “makes it extremely easy for an Objective-C and iOS developer to leverage ACS to provide identity federation to existing identity providers (such as Live ID, Facebook, Yahoo!, Google, and ADFS,” or Active Directory Federation Services.”

ACS is one of the elements of Windows Azure AppFabric.

Today’s 1.2 release is the fourth update to the toolkit since Microsoft initially delivered it in early May. Microsoft also has released similar toolkits for Windows Phones and for Android phones.

Other new features in the 1.2 iOS toolkit include:

A number of “Cloud Ready Packages” that can be deployed to Azure, eliminating the need for developers to run Windows and/or Visual Studio to create the package being deployed to Microsoft’s cloud OS

A polished sample app that now supports ACS

A native OSX application that can both create the .CSCFG file with the correct values and configure the Access Control Service

Last week, Microsoft made available an alpha of an Azure toolkit for social-game developers.

More details about the Access Control Services features are in the Windows Azure AppFabric: Apps, Access Control, WIF and Service Bus section above.

Bill Wilder described his Talk: Architecture Patterns for Scalability and Reliability in Context of Azure Platform in a 7/21/2011 post (missed when posted):

I spoke last night to the Boston .NET Architecture Study Group about Architecture Patterns for Scalability and Reliability in Context of the Windows Azure cloud computing platform.

The deck is attached at the bottom, after a few links of interest for folks who want to dig deeper.

Command Query Responsibility Segregation (CQRS):

I’m a big fan of Bertrand Meyer‘s work, and I just learned that CQRS is based on his earlier CQR pattern

Martin Fowler has a entry on CQRS (recently added, I will now read this)

CQRS on Windows Azure (MSDN Magazine article)

.NET Rocks podcast: Episode 639 Udi Dahan Clarifies CQRS (That same podcast episode is also included in the Azure Top 40 feed that I curate: Azure Top 40 http://bit.ly/azuretop40)

http://abdullin.com/cqrs/

Sharding is hard:

Foursquare down for 11 hours from imbalanced shards: http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/ (though the High Scalability blog says it was longer)

NoSQL:

NoSQL and the Windows Azure Platform whitepaper from Microsoft (which I found to be a very good read)

CAP Theorem:

Wikipedia

HA MySQL (interesting links)

PowerPoint slide deck used during my talk:

DotNetArchStudyGroup_ArchitecturePatternsForScalabilityAndReliability_BillWilder

Janie Chang posted a description of Microsoft Research’s forthcoming Excel DataScope application as From Excel to the Cloud on 6/13/2011 (missed when posted):

On June 15, Microsoft’s Washington (D.C.) Innovation and Policy Center will host thought leaders, policymakers, analysts, and press for Microsoft Research’s D.C. TechFair 2011. The event showcases projects from Microsoft Research facilities around the world and provides a strategic forum for researchers to discuss with a broader community their work in advancing the state of the art in computing. Microsoft researchers and attendees alike will have an opportunity to exchange ideas on how technology and the policies concerning those technologies can improve our future.

Roger Barga, architect in the Cloud Research Engagement team within the eXtreme Computing Group (XCG), looks forward to the D.C. TechFair, where he will demonstrate Excel DataScope, a Windows Azure cloud service for researchers that simplifies exploration of large data sets.

The audience for the event—members of the Obama administration and staff, members of the U.S. Congress and staff, representatives of prominent think tanks, academics, and members of the media—also will get an opportunity to explore cutting-edge research projects in other areas, such as natural user interfaces, the environment, healthcare, and privacy and security.

Opportunities in Big Data

“Big data” is a term that refers to data sets whose size makes them difficult to manipulate using conventional systems and methods of storage, search, analysis, and visualization.

“Scientists tend to talk about big data as a problem,” Barga says, “but it’s an ideal opportunity for cloud computing. How large data sets can be addressed in the cloud is one of the important technology shifts that will emerge over the next several years. Microsoft Research’s Cloud Research Engagement projects push the frontiers of client and cloud computing by making investments in projects such as Excel DataScope to support researchers in the field.”

As one of XCG’s cloud-research projects, Excel DataScope offers data analytics as a service on Windows Azure. Users can upload data, extract patterns from data stored in the cloud, identify hidden associations, discover similarities, and forecast time series. The benefits of Excel DataScope go beyond access to computing resources: Users become productive almost immediately because Microsoft Excel acts as an easy-to-use interface for the service.

“Excel is a leading tool for data analysis today,” Barga explains. “With 500,000,000 licensed users, there are incredible numbers of people already comfortable with Excel. In fact, the spreadsheet itself is a fine metaphor for manipulating data. It’s friendly, and it allows different data types, so it’s a good technology ramp to the cloud for data analysts.”

The Excel DataScope research ribbon adds new algorithms and analytics that users can execute in Windows Azure.

The project enables the use of Excel on the cloud through an add-in that displays as a research ribbon in the spreadsheet’s toolbar. The ribbon provides seamless access to computing and storage on Windows Azure, where users can share data with collaborators around the world, discover and download related data sets, or sample from extremely large—terabyte-sized—data sets in the cloud. The Excel research ribbon also provides new data-analytics and machine-learning algorithms that execute transparently on Windows Azure, leveraging dozens or even hundreds of CPU cores.

“All Excel DataScope does,” Barga explains, “is start up an analytics algorithm in the cloud. You get to visualize the results and never have to move the actual data out of the cloud. We don’t want a data analyst to learn much more than the names of the algorithms and what they do. Users should just think that Excel has a new capability which opens up great opportunities for extracting new insights out of massive data sets.”

Opportunities for Access

While the cloud puts massively scalable computing resources into the hands of users, Barga notes that there are performance differences between cloud-based computers and supercomputers. Supercomputers and high-performance computing clusters are designed to share data at high frequencies and with low latency; in this respect, cloud computing is slower. Another key differentiator is storage services. High-performance clusters have storage arrays that provide high-speed, high-bandwidth pathways to storage. This is not the case in clouds, where storage often resides separately from computing nodes, with multiple routers or hops widening the distance between them.

Barga believes the highly available nature of cloud computing mitigates these differences and changes the game when it comes to opportunities for accessing computing resources.

“Our observation,” he muses, “is that while a cloud may be slower in some regards, you get the computing resources when you want and as much as you want. Many of the major data labs in the country, who have some of the biggest iron around, have wait times of weeks for jobs in the queue, so, in terms of elapsed time, your cloud job could have run ages ago, and your report could be written up by now.”

Successful university research groups and companies are interested in Microsoft’s Cloud Research Engagement initiatives because, while their labs have plenty of processing power, when there are situations that require fast decision making, such as pandemics or a new crop virus, it’s hard to secure enough CPU cycles on short notice. The cloud, therefore, is a game-changer even for groups that already have many computing resources.

Opportunities in Analysis

In the same way that Excel was the logical interface choice for the project, the research team selected algorithms to include on the research ribbon based on popularity.

“It turns out there’s a fairly consistent set of tasks,” Barga says, “to making sense out of data, whether in the social sciences, engineering, or oceanography. You need clustering, for example, to see how the data groups together. You want to look for outliers and run regression analysis to understand how the data trends. We felt if we implemented the top two dozen or so algorithms, we would have a good starter set. It’s extensible, so people can add their own analytics over time. That’s when things will get really exciting.”

Barga and his colleagues want users to write their own algorithms for the cloud, then upload and register the code on the service. Once that happens, the next time the user logs into Excel DataScope, the algorithm will appear on the research ribbon. When users begin to publish algorithms into a shared workspace, things get even more interesting.

“That’s the vision: for users to publish high-value or specialized algorithms in a viewable library that others can access to try out, install, and make part of their working set of algorithms,” Barga says. “This is where it gets exciting, when experts in particular domains start contributing algorithms that unlock the value of data.”

Opportunities for Collaboration

The ability to share both data and algorithms has been one of the project’s design goals. Excel DataScope includes the notion of security-enhanced workspaces, where users can upload data sets to share with research colleagues anywhere in the world. This opens opportunities for cross-discipline collaboration that is nothing less than transformative.

For example, say that an expert in oceanography works within a particular discipline and with data specific to an area of study. Understanding and predicting the effects of an oil spill, however, requires knowledge from multiple disciplines, such as ocean chemistry, biology, and ecology. Simulations of complex oceanographic and atmospheric models require mining, searching, and analysis of huge data sets in near-real time, across disciplines, as never before. The ability to collaborate and extract insight from large data sets is part of a shift from traditional paradigms of theory, experimentation, and computation to data-driven discovery.

Barga (pictured at right) and colleagues from the Cloud Research Engagement team are busy publicizing the resources they offer to researchers in the field. The response they receive from talks and introductory videos validate the research community’s interest in data-driven discovery.

“We hear a lot of excitement,” he says, “and there are always researchers in the back of the room who want to know whether we have algorithms or data sets for a particular domain. We’ll say, ‘Sorry, we don’t have those,’ but they’ll say: ‘No, that’s good. We have expertise in that area, and we’d love to build a library of algorithms, contribute data, and make it available to the rest of the world,’ which is very encouraging—not to mention very cool.”

Democratizing Access to Data

During the D.C. TechFair, Barga will discuss the value of cloud computing in the context of the Open Government Directive, which makes data available to the public through websites such as data.gov.

“The government is contributing data sets,” he explains, “and we’d like to engage with scientists who are willing to add other data sets to data.gov. But data by itself is not enough. We’d like to see people proposing analytics associated with those data sets, so that when you go to data.gov, you also find useful algorithms to run against the data. We’d like to talk to scientists who want to extract insights or craft policies based on that data.”

At the moment, relatively few scientists or organizations can perform analysis of big data, either because of a lack of knowledge or a lack of resources. This distances data from the people who need to make decisions. The team plans to expand the Excel DataScope initiative later this year to include more of the research community and to release a programming guide that will explain how to write algorithms that scale out on Azure.

Barga sees the project as revolutionary: It democratizes access to data and, consequently, insights into data. He envisions a future data market in which users can find data sets and, with a few mouse clicks, select algorithms that the system identifies as relevant to the selected data, then start analyzing.

“We have architected a pathway from Excel to the cloud,” he says. “We have an infrastructure that sets the stage for a future where a decision maker can select from huge data sets and ask about trends in healthcare, poverty, or education. When we get there, it will have a huge economic impact, not just for scientific research, but also for businesses and for the country. This is a dialogue we’re trying to open and the space we are trying to move into with this project.”

Jared Jackson presented a 00:02:57 Cloud Data Analytics from Excel Webcast on 6/12/2011 (missed when posted):

In this video, Jared Jackson, researcher in the eXtreme Computing Group at Microsoft Research, provides an overview of the features of Excel DataScope, a tool that enables researchers and data analysts to seamlessly access the resources of the cloud, via Windows Azure, from the familiar interface of Microsoft Excel. By using Excel DataScope, you can download and use extremely large data collections; extract insight by using machine learning and data analytics libraries; and take action exploring the data, sharing the data, and publishing the data to the cloud.

Visual Studio LightSwitch and Entity Framework 4.1+

The ADO.NET Team announced EF 4.1 Update 1 Released on 7/25/2011:

Back in April we announced the release of Entity Framework 4.1 (EF 4.1), today we are releasing Entity Framework 4.1 – Update 1. This is a refresh of the EF 4.1 release that includes a small number of bug fixes and some new types to support the upcoming migrations preview.

What’s in Update 1?

Update 1 includes a small set of changes including:

Bug fix to remove the need to specify ‘Persist Security Info=True’ in the connection string when using SQL authentication. In the EF 4.1 release ‘Persist Security Info’ was required for Code First to be able to create a database for a connection using SQL Authentication. The update includes a fix to remove this requirement. Note that ‘Persist Security Info’ is still required if you construct a DbContext using a DbConnection instance that has already been opened and closed.

Introduction of new types to facilitate design-time tools for Code First. Update 1 introduces a set of types to make it easier for design time tools to interact with derived DbContexts:

DbContextInfo can be used to instantiate and interact with a derived context as well as determine information about the origin of the connection string etc..

IDbContextFactory<TContext> is used to let DbContextInfo know how to construct derived DbContext types that do not expose a default constructor. If your context does not expose a default constructor then an implementation if IDbContextFactory should be included in the same assembly as your derived context type.

How Do I Get Update 1?

Entity Framework 4.1 – Update 1 is available in a couple of places:

Download the stand alone installer
Note: This is a complete install of EF 4.1 and does not require a previous installation of the original EF 4.1 RTM.

Add or upgrade the ‘EntityFramework’ NuGet package
Note: If you have previously run the EF 4.1 stand alone installer you will need to upgrade or remove the installation before using the updated NuGet package. This is because the installer will add the EF 4.1 assembly to the Global Assembly Cache (GAC). When available the GAC’d version of the assembly will be used at runtime.
Note: The NuGet package only includes the EF 4.1 runtime and does not include the Visual Studio item templates for using DbContext with Model First and Database First development.

Getting Started with EF 4.1

There are a number of resources to help you get started with EF 4.1:

ADO.NET Entity Framework page on the MSDN Data Developer Center
There is lots of great new content on this site, including ‘Getting Started’ videos for the new features in EF 4.1

MSDN Documentation

ADO.NET Entity Framework Forum

Code First walkthrough

Model First / Database First walkthrough

Support

This release can be used in a live operating environment subject to the terms in the License Terms. The ADO.NET Entity Framework Forum can be used for questions relating to this release.

Return to section navigation list>

Windows Azure Infrastructure and DevOps

Bruce Kyle reported Windows Azure Supports NIST Standards Acceleration for Cloud Computing on 7/25/2011:

Microsoft is participating in the National Institute for Standards and Technology (NIST) initiative to jumpstart the adoption of cloud computing standards called Standards Acceleration to Jumpstart the Adoption of Cloud Computing, (SAJACC).The goal is to formulate a roadmap for adoption of high-quality cloud computing standards.

One way they do this is by providing working examples to show how key cloud computing use cases can be supported by interfaces implemented by various cloud services available today.

NIST works with industry, government agencies, and academia. They use an open and ongoing process of collecting and generating cloud system specifications. The hope is to have these resources serve to both accelerate the development of standards and reduce technical uncertainty during the interim adoption period before many cloud computing standards are formalized.

By using the Windows Azure Service Management REST APIs we are able to manage services and run simple operations including simple CRUD operations, solve simple authentication and authorizations using certificates. Our Service management components are built with RESTful principles and support multiple languages and runtimes including Java, PHP and .NET as well as IDEs including Eclipse and Visual Studio.

For more information, see Windows Azure Supports NIST Use Cases using Java.

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

• Martin Tantow (@mtantow) posted Piston Cloud Computing Takes on Private Cloud Security on 7/25/2011:

Piston Cloud Computing recently earned $4.5 million dollars of funding for supplying software in the crowded space of private cloud computing. Earlier this year and for several months Piston participated in the Nebula NASA project, and for awhile it has been the favored beta software for testing by early users. Joshua McKenty, CEO and co-founder of Piston announced that they will announce a new set of products in the near future, but in the meantime it will remain as a small group of developers who are all working on their cloud platform.

It is a fact that users have varying user experience from even the top tablet PC’s. To make this on field level, take a close look at the strengths and weaknesses of Android/Honeycomb, Apple iOS and RIM’S QNX OS. McKenty, one of the founders of the OpenStack was the brains behind Nebula. OpenStack has contributed to Rackspace as well as NASA on a service-for-a-fee cloud infrastructure and hopes to become the standard for the cloud-based environment as preferred by users and suppliers.

On the NASA project, McKenty said “Most of what we did at NASA around security couldn’t be released,” this is understandable since any leakage on their security domain will compromise the entire system, so the government gave specific instructions that they do not want any information to be on the public domain.

All IT platforms in general should operate as a business working within another business. This is when resources for business are provided by the company to get a good return of their revenue investment. McKenty said “Some aspects of security are being addressed (in OpenStack) and others are not. Security is a logical place to build a start-up, open source business.”

Piston products will be offered similar to the components of OpenStack that is built on Phyton. It will hopefully arrive at a balance between being a major contributor to OpenStack, while it reserves some important codes for Piston as a value added feature of the product.

According to McKenty, OpenStack’s Project Policy Board is the governing body for Piston. He said at the last poll survey, Piston landed at number three among the contributors for OpenStack codes and he said he will be glad for them to land at number one.

On adding security to big data that have already been setup, McKenty stated although the cloud is the perfect place for huge data. It will be so complex to add security once the cloud platform has been in place already. To add to this, security that is placed after a huge data may not be as efficient if they were done simultaneously. “You have to take the right architectural approach to keep them both performing well,” McKenty said.

Another cloud player based on OpenStack is Cloud.com that combines its extensions based on Java to create CloudStack. Cloud.com was recently acquired by Citrix. Amazon’s former VP of engineering Chris Pinkham has a new project that is to create a generic edition of Amazon Services, to avoid being tied up with Amazon’s API virtual machine. To date, Piston partnered with Hummer Winblad, Divergent Ventures, and True Ventures (see quick announcement here) for their funding.

Cloud Security and Governance

Jon Shende prefixed his Risk and Its Impact on Security Within the Cloud - Part 1 article of 7/25/2011 for the Cloud Security Journal with “The effect of people and processes on cloud technologies”:

These days when we hear the term "cloud computing" there is an understanding that we are speaking about a flexible, cost-effective, and proven delivery platform that is being utilized or will be utilized to provide IT services over the Internet. As end users or researchers of all things "cloud" we expect to hear about how quickly processes, applications, and services can be provisioned, deployed and scaled, as needed, regardless of users' physical locations.

When we think of the typical traditional IT security environment, we have to be cognizant of the potential for an onslaught of attacks, be they zero day, the ever-evolving malware engines and the increase in attacks via social engineering, the challenge for any security professional is to develop and ensure as secure an IT system as possible.

Thoughts on Traditional Security and Risk
Common discussions within the spectrum of IT security are risks, threats and vulnerability, and an awareness of the impact of people and processes on technologies. Having had opportunities to work on data center migrations as well as cloud services infrastructures, a primary question of mine has been: what then of the cloud and cloud security and the related risk derived from selected services being outsourced to a third-party provider?

ISO 27005 defines risk as a "potential that a given threat will exploit vulnerabilities of an asset or group of assets and thereby cause harm to the organization."

In terms of an organization, risk can be mitigated, transferred or accepted. Calculating risk usually involves:

Calculating the value of an asset

Giving it a weight of importance in order to prioritize its ranking for analysis

Conducting a vulnerability analysis

Conducting an impact analysis

Determining its associated risk.

As a security consultant, I also like the balanced scorecard as proposed by Robert Kaplan and David Norton, especially when aimed at demonstrating compliance with policies that will protect my organization from loss.

Cloud Security and Risk
In terms of cloud aecurity, one key point to remember is that there is an infrastructure somewhere that supports and provides cloud computing services. In other words the same mitigating factors that apply to ensure security within a traditional IT infrastructure will apply to a cloud provider's infrastructure.

All this is well and good within the traditional IT environment, but how then can we assess, or even forecast for and/or mitigate risk when we are working with a cloud computing system? Some argue that "cloud authorization systems are not robust enough with as little as a password and username to gain access to the system, in many private clouds; usernames can be very similar, degrading the authorization measures" (Curran,Carlin 2011)

We have had the arguments that the concentrated IT security capabilities at cloud service provider (CSP) can be beneficial to a cloud service customer (CSC); however, businesses are in the realm of business to ensure a profit from their engagements. One study by P. McFedries (2008) found that "disciplined companies achieved on average an 18% reduction in their IT budget from cloud computing and a 16% reduction in data center power costs."

To mitigate this concern, a CSC will need to ensure that their CSP defines the cloud environment as the customer moves beyond their "protected" traditional perimeter. Both organizations need to ensure that all high risk security impact to the customer organization meets or exceeds the customer organization's security policy and requirements and their proposed mitigation measures. As part of a "cloud policy" a CSC security team should identify and understand any cloud-specific security risks and their potential impact to the organization.

Additionally a CSP should leverage their economies of scale when it comes to cloud security (assets, personnel, experience) to offer a CSC an amalgamation of security segments and security subsystem boundaries. Any proficient IT Security practitioner then can benefit from the advantage of leveraging a cloud provider's security model. However, when it applies to business needs the 'one size fits all' cloud security strategy will not work.

Of utmost importance when looking to engage the services of a cloud provider is gaining a clear picture of how the provider will ensure the integrity of data to be held within their cloud service/s. That said all the security in the world would not prevent the seizure of equipment from government agencies investigating a crime. Such a seizure can interrupt business operations or even totally halt business for an innocent CSC sharing a server that hosts the VM of an entity under investigation. One way to manage the impact on a CSC function within the cloud as suggested by Chen, Paxon and Katz (2010) is the concept of "mutual auditability."

The researchers further went on to state that CSPs and CSCs will need to develop a mutual trust model, "in a bilateral or multilateral fashion." The outcome of such a model will allow a CSP "in search and seizure incidents to demonstrate to law enforcement that they have turned over all relevant evidence, and prove to users that they turned over only the necessary evidence and nothing more."

Is it then feasible for a CSC to calculate the risk associated with such an event and ensure that there is a continuity plan in place to mitigate such an incident ? That will depend on the business impacted.

Another cause for concern from cloud computing introduces a shared resource environment from which an attacker can exploit covert and side channels.

Risks such as this need to be acknowledged and addressed when documenting the CSP-CSC Service Level Agreement (SLA). This of course may be in addition to demands with respect to concerns for Availability, Integrity, Security, Privacy and Reliability? Would a CSC feel assured that their data is safe when a CSP provides assurance that they follow the traditional static based risk assessment models?

I argue not, since we are working within a dynamic environment. According to Kaliski, Ristenpart, Tromer, Shacham, and Savage (2009) "neighbouring content is more at risk of contamination, or at least compromise, from the content in nearby containers."

So how then should we calculate risk within the Cloud? According to Kaliski and Pauley of the EMC Corporation, "just as the cloud is "on-demand," increasingly, risk assessments applied to the cloud will need to be "on-demand" as well."

The suggestion by Kaliski and Pauley was to implement a risk as a service model that integrates an autonomic system, which must be able to effectively measure its environment as well as "adjust its behaviour based on goals and the current context".

Of course this is a theoretical model and further research will have to be conducted to gather data points and "an autonomic manager that analyses risks and implements changes".

In terms of now, I believe that if we can utilize a portion of a static risk assessment, define specific controls and control objectives as well as map such to that within a CSP or, define it during the SLA process, a CSC can then observe control activities that manage and/or mitigate risk to their data housed at the CSP.

Traditionally governance and compliance requirements should also still apply to the CSP, e.g., there must be a third-party auditor for the CSP cloud services and these services should have industry recognized security certificates where applicable.

Conclusion
Some things that a CSC needs to be cognizant of with regard to cloud security in addition to tradition IT security measures with a CSP are:

The ability of the CSP to support dynamic data operation for cloud data storage applications while ensuring the security and integrity of data at rest

Have a process in place to challenge the cloud storage servers to ensure the correctness of the cloud data with the ability of original files being able to be recovered by interacting with the server (Wang 2011)

Encryption-on-demand ability or other encryption metrics that meets an industry standard, e.g., NIST

A privacy-preserving public auditing system for data storage security in Cloud Computing (W. L. Wang 2010)

Cloud application security policies automation

Cloud model-driven security process, broken down in the following steps: policy modelling, automatic policy generation, policy enforcement, policy auditing, and automatic update (Lang 2011)

Works Cited

Curran, Sean Carlin and Kevin. "Cloud Computing Security. ." International Journal of Ambient Computing and Intelligence, 2011: 38-46.

Lang, Ulrich. Model-driven cloud security. IBM, 2011.

Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. Hey, You, Get Off of My Cloud!Exploring Information Leakage in Third-Party Compute Clouds. CCS 2009, ACM Press, 2009.

Wang, Wang, Li, Ren. Privacy-Preserving Public Auditing for Data Storage Security in Cloud Computing. IEEE INFOCOM, 2010.

Wang, Wang,Li Ren. Lou. "Enabling Public Verifiability and Data Dynamics for Storage Security in Cloud Computing." Chicago, 2011.

Yanpei Chen, Vern Paxson,Randy H. Katz. What's New About Cloud Computing Security? Berkeley: University of California at Berkeley, 2010.

Marcia Savage reported ISACA releases cloud computing governance guide to the SearchCloudSecurity.com site on 7/25/2011:

ISACA released a new guide designed to help organizations understand how to implement effective cloud computing governance.

The guide, IT Control Objectives for Cloud Computing, aims to help readers understand cloud computing and how to build the relevant controls and governance around their cloud environments. It also provides guidance for companies considering cloud services.

Governance becomes more critical than ever for organizations utilizing cloud services, according to the guide from ISACA, a non-profit global organization focused on information systems assurance and security, and enterprise governance. Companies need to implement a cloud computing governance program to effectively manage increasing risk and multiple regulations, and ensure continuity of critical business processes in the cloud, according to ISACA.

In these economic times, executive management is excited about the potential for the cloud to reduce costs and increase the value of IT, but “getting that value is part of a good governance program,” said Jeff Spivey, international vice president of ISACA, and president of Security Risk Management Inc., a consulting firm based in Charlotte, N.C. “And making sure when you are getting the value, that you’re also managing the risk as opposed to jumping blindly off the cliff and hoping there’s water down there,” he added.

The cloud computing governance guide outlines how COBIT and other IT governance tools developed by ISACA can help organizations in managing cloud environments. Spivey said COBIT can be applied to a number of different scenarios, including cloud technologies. ISACA is accepting public comment on the latest version, COBIT 5, through July 31.

The ISACA guide is complementary to work from the Cloud Security Alliance by focusing on ISACA’s strength in governance, said Spivey, who was a founding member of the CSA.

As the cloud evolves and companies increasingly adopt cloud services, there’s still a lot of ambiguity around the topic and a need for guidance, he said. The ISACA guide can help organizations make sure they have the right controls in place and assist others contemplating the cloud to understand its complexities, he added.

IT Control Objectives for Cloud Computing, the third in ISACA’s IT Control Objectives series, is available at www.isaca.org/ITCOcloud. The first book in the series focused on Sarbanes-Oxley.

Full disclosure: I’m a paid contributor to SearchCloudComputing.TechTarget.com, a sister publication.

Cloud Computing Events

• Marcelo Lopez Ruiz (@mlrdev) reported on 7/25/2011 that he’ll present datajs at DevCon5 on 7/27/2011 at 11:45 AM to 1:00 PM:

Later this week I'll be speaking at DevCon5 in New York. We'll look at how the browser landscape is evolving and where it's going, and present some of the work we've done in layering conventions over REST in producing OData, as well as the work we've been doing in datajs to leverage the increase in capabilities.

Hope to see any readers of this blog at the conference - drop me a comment if you want to meet up during/after the conference, or catch me on Twitter at @mlrdev.

Marcelo’s session is named HTML5 and the future JavaScript platform.

O’Reilly Media’s Open Source Convention (OSCON, @oscon) began 7/25/2011 with the following Cloud Computing track:

Open source played a vital role in making cloud technologies possible. It underpins much cloud infrastructure, has driven down costs, and led innovation in cloud management. Yet the cloud model of software as a service presents direct challenges to the freedom and principles of open source.

9:00am Monday, 07/25/2011

Learning Puppet - A Tutorial for Beginners

Location: D137/138

Garrett Honeycutt (Puppet Labs)

Puppet is an enterprise systems management platform that standardizes the way you deploy and manage infrastructure in the enterprise and the cloud. By the end of the tutorial we’ll produce a simple Puppet architecture that can manage a few services and applications as well as discuss best practices and common design patterns. Read more.

1:30pm Monday, 07/25/2011

Getting Started with Chef

Location: D137/138

Joshua Timberman (Opscode, Inc.), Aaron Peterson (Opscode, Inc.)

Chef is a powerful open source system integration framework, built to bring the benefits of configuration management to the entire infrastructure. This tutorial will cover key concepts and how to get started using Chef to manage systems and integrate them together to build fully automated infrastructure. Read more.

1:30pm Tuesday, 07/26/2011

Google App Engine Workshop

Location: Portland 255

wesley chun (Google)

Google App Engine is an application development and cloud-hosting platform that lets users create apps to run Google's datacenters. In this 3-part tutorial, we'll give a 1-hour intro talk on cloud computing and App Engine, a 90-100 minute introductory codelab to get your feet wet with App Engine development, and finally conclude with about a half-hour intro to some of App Engine's newest features! Read more.

11:30am Wednesday, 07/27/2011

Stratos - an Open Source Cloud Platform

Location: Portland 251

Paul Fremantle (WSO2)

Cloud is the biggest user of Open Source, but also a threat - people are building their apps on Cloud Platforms that are closed. Stratos is an Apache Licensed project for a Cloud Platform-as-a-Service. We will take a deep dive into this multi-tenant, elastic, metered cloud runtime that includes Tomcat, ESB, Registry and more. This will be a detailed session aimed at developers and infra experts. Read more.

1:40pm Wednesday, 07/27/2011

Introduction to OpenStack

Location: Portland 251

Eric Day (Rackspace Cloud), James Turnbull (Puppet Labs)

The OpenStack project was launched last summer during OSCON by Rackspace, NASA, and a number of other cloud technology leaders in an effort to build a fully-open cloud computing platform. It is a collection of scalable, secure, standards-based projects consisting of compute, storage, images, and more. This session will introduce the projects, the principles behind it, and how to get started. Read more.

2:30pm Wednesday, 07/27/2011

The Open Compute Project

Location: Portland 251

Amir Michael (Facebook)

A behind the scenes view as to why and how Facebook implemented the Open Compute Project, an open community focused on data center design, and the resulting radical reduction in data center power consumption the project offers. Read more.

4:10pm Wednesday, 07/27/2011

Using OpenStack APIs: Present And Future

Location: Portland 251

Wade Minter (TeamSnap), Michael Mayo (Rackspace)

OpenStack is an effort to build a completely open, community driven, enterprise-level cloud computing and storage platform. Not only is the technology open, but the APIs are as well. This session will show how to leverage the power of the current compute and storage APIs, as well as look down the road to future releases. Read more.

5:00pm Wednesday, 07/27/2011

Fog, or How I Learned to Stop Worrying and Love the Cloud

Location: Portland 251

Wesley Beary (Engine Yard)

Cloud computing scared the crap out of me - the quirks and nightmares of provisioning cloud computing, dns, storage, etc on AWS, Terremark, Rackspace, etc - until I took the bull by the horns. Come see me demonstrate tools and examples that will allow you to skip the headaches and cut straight to the cloud. Read more.

10:40am Thursday, 07/28/2011

Freeing the Cloud, One Service at a Time

Location: D139/140

Francois Marier (Catalyst IT)

An approach to building freedom-respecting online services and a presentation of Libravatar, a federated clone of the Gravatar profile image hosting service. Read more.

11:30am Thursday, 07/28/2011

From Inception to Acquisition: One Startup's Journey through the Cloud

Location: D139/140

Patrick Lightbody (BrowserMob)

Launched in December 2008, BrowserMob set out to change the way load testing is done - all using the cloud and open source. Learn from the founder how he built a high performance testing product, and how the operational support the cloud provided and speed to market of open source enabled the company to not only profit from day one, but to be acquired within a year and a half of it's launch. Read more.

1:40pm Thursday, 07/28/2011

Utility and Automation: Low Overhead Operations with Amazon and Puppet

Location: D139/140

James Loope (Janrain)

This session will demonstrate an example scenario from Janrain and discuss the implications, benefits, and pitfalls of moving to a utility cloud computing architecture from a traditional co-located hosting environment. Read more.

2:30pm Thursday, 07/28/2011

Achieving Hybrid Cloud Mobility with OpenStack and XCP

Location: D139/140

Paul Voccio (Rackspace), Ewan Mellor (Citrix Systems, Inc.)

There are many challenges to being able to move virtual machines to and from your datacenter and public cloud hosting service providers (in other words to obtain hybrid cloud mobility). In this session, members of the OpenStack and Xen.org communities discuss the open source and open standards approach that they are taking and include some of the challenges they face. Read more.

4:10pm Thursday, 07/28/2011

Dropping ACID: Eating Data in a Web 2.0 Cloud World

Location: D139/140

Stewart Smith (Percona)

Those who cannot remember the past are condemned to repeat it. This is part survey, part critique of the various Atomicity, Consistency, Isolation and Durability models available from various modern databases and data stores used in modern Web and Cloud environments. Read more.

5:00pm Thursday, 07/28/2011

Going Green with Linux

Location: D139/140

Matthew Garrett (Red Hat)

Real clouds look fluffy but mass up to a million tonnes. Virtual clouds look cheap but consume the output of 10 nuclear power stations. Real life factors can seriously influence your data center requirements. How can Linux help you? Read more.

10:00am Friday, 07/29/2011

Manage Distributed Systems with Zookeeper

Location: Portland 251

Tom Hanlon (Cloudera)

Is your application distributed ? How have you chosen to deal with the implications of this distribution? In this session we will introduce and explore zookeeper. Originally developed at Yahoo and used by hbase, zookeeper is a wonderful tool. Zookeeper is straightforward and provides an interface allowing for easy configuration and use. Read more.

11:00am Friday, 07/29/2011

Managing Open Source Releases of a Cloud Platform

Location: Portland 251

Adam Kalsey (Tropo)

Tropo's platform for voice, SMS, and IM is a hosted cloud service, and we've opened the source of the core platform. Hear the lessons learned from running a cloud service and a parallel open source project. We did a lot wrong, and we got many things right. We'll discuss what we've learned about product management, release management, marketing, and third party licensing. Read more.

A feed of videos of OSCON 2011 are available here.

Other Cloud Computing Platforms and Services

• Alex Williams reported Big Data Demands Leading to Greater NoSQL Acceptance by Corporate Developers in a 7/25/2011 post to the ServicesANGLE blog:

NoSQL is gaining acceptance by corporate enterprise developers – arguably even more so than the overall developer community.

About 56% of enterprise developers report at least some use of NoSQL and 63% plan to use it in the next two years. That’s according to Evans Data Corp. annual developer survey which found that the big driver for the interest is the need to manage big data. NoSQL is generally viewed as a next generation database environment. It is non-relational, distributed, open-source and horizontally scalable.

Evans Data CEO Janel Garvin said to Integration Developer News that NoSQL is “considerably stronger” in the enterprise segment than within the general developer population where 43% expect to use NoSQL.

This defies the conventional belief that the enterprise community is behind in terms of NoSQL adoption. Its strength though is testament to a growing interest by IT operations in application development and working with developers to provide tools for creating services where traditional methods fail. The failures of traditional architecture is becoming increasingly apparent as the capability to scale is hindered by the inability to manage big data loads and the new generation of applications that increasingly requires software that is decoupled from the hardware. The movement is heralding an acceptance for such technologies as Hadoop which have historically been used to manage large-scale Web oriented environments.

The Evans Data survey also pointed to other similar trends that shows how IT is going through a major transformation.

According to the survey, about 40% of all North American developers are now working on apps for wireless devices with 73% planning to extend enterprise apps to mobile devices within the next 12 months.

“Extending the enterprise to mobile clients is one of the hottest areas of development in the mobile space today,” Garvin told IDN. “Enterprises have to accommodate a large variety of clients today – and most of them are mobile. Enterprise messaging and collaboration tools are considered the most important enterprise apps for mobile devices currently,” she added.

In terms of the cloud, developers say they are still reluctant to move beyond test and development as security is still a major concern. Instead, they are opting to keep sensitive data stored on-premise while doing transactions in the cloud.

The services providers of the world need to increase their investment in developers who understand NoSQL and even to a larger extent, big data. These are the people who will help IT operations with managing information and providing leading edge perspectives about IT transformation.

• Rob Hirschfeld (@zehicle) announced #Crowbar source released, includes #OpenStack Cloud install (#apache2 #opschef #Dell #cloud) in a 7/25/2011 post:

I’m delighted to announce that my team at Dell has opened the Crowbar source under the Apache 2 license. This action is part of the broader Dell OpenStack Cloud Solution (details) which includes OpenStack install packages, Crowbar, reference hardware architectures, and services/consulting to support deployments.

There are two important components to this news:

Dell is officially offering our OpenStack Solution and helping advance the community’s ability to implement OpenStack quickly and consistently.

Dell is releasing the Crowbar code (which is included in the solution) as open source.

Both are significant items; however, my focus here is on the Crowbar release.

Crowbar started as a Dell OpenStack installer project and then grew beyond that in scope. Now it can be extended to work with other vendors’ kits and other solutions bits.

We are contributing Crowbar to the community because we believe that everyone benefits by sharing in the operational practices that Crowbar embodies. These are rooted in Opscsode Chef (which Crowbar tightly integrates with) and the cloud & hyper-scale proven DevOps practices that are reflected in our deployment model.

Where to get it?

Code: Github (Apache 2 license)

Community: Dell Crowbar listserv,

Information: wiki on Crowbar Github, Crowbar User’s Guide and RobHirschfeld.com web site.

What’s included?

A comprehensive set of barclamps to set up an OpenStack cloud.

Crowbar UI and Remote APIs to make it easy to set up your cloud

Automated testing scripts for community members doing continuous integration with OpenStack.

Build scripts so you can create your own Crowbar install ISO

Switch discovery so you can create Chef Cookbooks that are network aware.

Open source Chef server that powers much of Crowar’s functionality

What’s not included?

Non-open source license components (BIOS+RAID config) that we could not distribute under the Apache 2 license. We are working to address this and include them in our release. They are available in the Dell Licensed version of Crowbar.

Dell Branded Components (skin + overview page). Crowbar has an OpenSource skin with identical functionality.

Pre-built ISOs with install images (you must download the open source components yourself, we cannot redistribute them to you as a package)

Important notes:

Crowbar uses Chef Server as its database and relies on cookbooks for node deployments. It is installed (using Chef Solo) automatically as part of the Crowbar install.

Crowbar has a modular architecture so individual components can be removed, extended, and added. These components are known individually as barclamps.

Each barclamp has its own Chef configuration, UI sub-component, deployment configuration, and documentation.

On the project roadmap:

Hadoop support

Additional operating system support (specifically RHEL)

Barclamp version repository

Network configuration

We’d like suggestions! Please comment!

Sites for more information: Joseph George, Barton George (launch day), Dell

Rob works for Dell, Inc as a Senior Cloud Solutions Architect.

Todd Hoff asked Is NoSQL a Premature Optimization that's Worse than Death? Or the Lady Gaga of the Database World? in a 7/25/2011 post to his High Scalability blog:

Michael Stonebraker sure knows how to stir up a storm. Unlike for others, that doesn't make him a troll in my mind, he's way too accomplished in the field to be that, but he does have a bit of Barnum & Bailey in him, which serves to get the discussion flowing, and that's a good thing. A lot of previously hidden wisdom and passion unlocks, which we'll try to capture here.

This disturbance in the force is over OldSQL vs NoSQL vs NewSQL. Warning, these are not crisp categories, there's leakage all over the place, watch your step:

OldSQL (Oracle, MySQL, etc) refers to what some want to term as legacy relational database like MySQL, that don't scale out horizontally with aplomb.

NoSQL (CouchDB, Redis, Cassandra, HBase, MongoDB, Riak, Neo4j, etc) refers to, well, a collection of technologies that aren't OldSQL, these often are designed to scale out horizontally, aren't on ACID, and use schemaless non-relational datamodels.

NewSQL (Xeround, Clustrix, NimbusDB, GenieDB, ScaleBase, VoltDB) are databases that preserve SQL, the relational model, ACID, schemas, and are scalable, though not necessarily horizontally (which I don't quite understand). Sharding should be transparent. The general pitch is once you have ACIDy SQL goodness and elasticity, all on commodity hardware, then there's no reason to use NoSQL.

OK, got it? Then you might be the only one...

The disturbance first started with this article by Derrick Harris, which gets a lot of mileage out of a few quotes by Stonebraker. The short of it is:

Facebook trapped in MySQL ‘fate worse than death’ by Derrick Harris.

On Hacker News. On Reddit. On Slashdot. On Quora.

Facebook is operating a huge, complex MySQL implementation equivalent to ‘a fate worse than death,’ and the only way out is ‘bite the bullet and rewrite everything.’”

It takes a large amount of work and skill required to make MySQL fit its purposes. Life would be easier if Facebook was built initially on something designed for its purposes.

That prompted a doozy of a response post by Facebook Database Engineer Domas Mituzas, who feels MySQL suits their purposes just fine, given Facebook actually knows what their purposes are:

Stonebraker trapped in Stonebraker ‘fate worse than death’ by Domas Mituzas

On Hacker News

Building the web that lasts is completely different task from what academia people imagine building the web is.

Disks are way more cost efficient, and if used properly can be used to facilitate way more long-term products, not just real time data.

We balance the workload so that I/O subsystem provides as efficient as possible delivery of the long tail.

The average write transaction at FB is at around 5ms timing, so argument about transactional cost is not that important, as RPC times add up to that quite a bit, and multi-cluster dbms wouldn't reduce the RPC costs.

There’s some operational overhead in handling sharding and availability of MySQL deployments, at large scale it becomes somewhat constant cost, whereas operational efficiency gains are linear.

Then the stream split in this post by Bob Warfield, which riffs on the idea of NoSQL being premature optimization:

NoSQL is a Premature Optimization and Redux by Bob Warfield.

On Reddit

Starting with NoSQL because you think you might someday have enough traffic and scale to warrant it is a premature optimization, and as such, should be avoided by smaller and even medium sized organizations.

NoSQL technologies require more investment than Relational to get going with. There is no particular advantage to NoSQL until you reach scales that require it. In fact it is the opposite, given Point 1. If you are fortunate enough to need the scaling, you will have the time to migrate to NoSQL and it isn’t that expensive or painful to do so when the time comes.

Jeremy Zawodny did not care for this take, and a memo to some of his commenters, if anyone has the chops to have an opinion on all of this, it's Jeremy. The short of it:

NoSQL is What? by Jeremy Zawodny.

On Hacker News

NoSQL exists because a lot of people find it useful. Features like more domain friendly modeling, built-in sharding, replica sets, a datastructure capabilities are other important features independent of scale. Scaling is just one aspect to consider.

Choose the best tool for the job, not all jobs. SQL isn't cheaper and easier for many requirements. Gross generalizations don't apply. It’s closed-minded to argue that everyone should just start with SQL because it’s easy and worry about the real issues later.

What does Lady Gaga have to do with all this?

In a parallel subject continuum, I think this comment by Slappy198s in Lady Gaga's solo piano performance of "The Edge Of Glory" on Howard Stern, nicely summarizes the zeitgeist:

You know, there's a difference between not liking someone's music and not recognizing their talent. If you can't recognize the fact that Lady GaGa is, in fact, extremely talented in many ways, then you may want to try to look at her with less of a bias. There's plenty of artists I can't stand, but still respect their talent.

Even if you don't like Lada Gaga's schtick, that is a great performance. I get the feeling a lot SQL people don't recognize the talent of NoSQL, whereas NoSQL people are generally “use the best tool for the job” types who have no problem with you using SQL if that works for you. Which leads to...

SQL is not the Default Option So it Can't Be Premature Optimization

Jeremy does a good job saying why NoSQL isn't a case of premature optimization. It's not just about scale, it's about solving a problem in a way that is sensible for your job.

There's a deeper fundamental problem I have with the characterization of NoSQL as premature optimization, and it's the same one I have when this phrase is used by programmers. Usually it's just code for I don't like your design choices; I would do it another way so whatever you are doing is wrong. This is a common tactic in in the rough and tumble world of design reviews. It's dismissive. The problem is the person making the assertion probably doesn't have the same experiences as the person doing the implementation and they probably haven't thought through their problem as deeply, so it's often arrogance to say you know what is premature optimization for another's problem.

The arrogance here is seeing a SQL product as the default choice with any deviation requiring a justification. This is not the case. There is no default choice. You don't have to make excuses for choosing something different. You just have solve your particular problem. That's it.

cogman10 has a good sense of it:

Picking Tech A over Tech B is NOT a premature optimization. Would the author claim that "Using InnoDB is a premature optimization because MySQL is better supported!" It is called planning, you do that whenever you write a new application.

Use the database that best matches your data. If some non-relational database is a perfect match for the data you want to store, by all means use it. Don't give two shits about people like the author that think SQL is the one and only query language. (hell, I wish that SQL would die in flames, but it is heavily built into current business models. Not because it is the best, but because it is common.)

Insisting on the wrong tech is not premature optimization. It is stupidity.

Errr also makes some good points:

NoSQL isn’t just something that you replace your SQL database with when it reaches a certain scale. If that was the only case where NoSQL was useful then it would be just an optimization, but in a lot of cases it’s easier, cheaper and more efficient to implement it as an NoSQL database right away.

You’re hardly an expert if you’re going to devote more time, effort and money to fit something into a SQL database when all that can be achieved much easier using a NoSQL database.

Is Facebook Screwed?

I'll bet on Domas Mituzas knowing what he's doing, so it seems highly unlikely that Facebook is screwed. He's looking from the long view, the perspective of someone who provides a working service for 700+ million people, accessing uncountable bytes of old and new data, that will ideally last forever. That's a completely different perspective from someone coming from a place of building relatively simple and specialized transactional systems.

Facebook can still do what they need doing, so there's no reason to switch, but it's reasonable to ask if Facebook would be in a better place if they started today with a different technology? Would they prefer to use their new HBase stack, for example? It may not be as clear cut as you think. Take a moment to browse MySQL at Facebook, Facebook's own page on how they use MySQL. What's clear is they are MySQL experts and they deal with other stone cold experts like Percona. They know MySQL backwards and forwards and can make it do what they want. Would you change something that's working to use a new technology that you don't know?

At Facebook it's not a matter of MySQL or else either. With a policy of using the best technology for the job and deep experience in other stacks, Facebook could switch stacks if they wanted. Not every product at Facebook uses MySQL. Take Facebook's New Real-Time Messaging System which uses HBase, for example.

Don't believe the switch would be too hard and that they are locked in. Facebook uses APIs that would allow incrementally moving data over to another backend datastore over time. That they aren't doing this means they are satisfied enough with their highly tuned memcached + highly tuned MySQL + highly automated operations approach to stand pat.

As Mituzas hints at, when operating at this scale, all the programs created to manage the system completely dwarf all the other code, regardless of the starting point of the technology, and that's what Facebook has mastered, so the reports of Facebook's impending infrastructure death may be premature.

Related Articles

Facebook on HighScalability

Facebook on InfoQ

MySQL at Facebook - Facebook's own MySQL page.

MySQL Performance Blog

Lessons Learned from Migrating 2+ Billion Documents at Craigslist by Jeremy Zawodny, Craigslist

The NewSQL Movement by Klint Finley

Pierce Wetter with a good comment on why NoSQL is preoptimization.

Replacing Datacenter Oracle with Global Apache Cassandra on AWS by Adrian Cockcroft

Rethinking Databases for the Cloud on the Cloud Computing news group.

What we talk about when we talk about NewSQL by Matthew Aslett

VoltDB Decapitates Six SQL Urban Myths And Delivers Internet Scale OLTP In The Process

Is VoltDB really as scalable as they claim? by Baron Schwartz.

Jason Ouellette posted an Introducing Force.com book chapter to the InformIT Web site on 7/25/2011:

This chapter introduces the concepts, terminology, and technology components of the Force.com platform and its context in the broader Platform as a Service (PaaS) landscape. The goal is to provide context for exploring Force.com within a corporate software development organization.

If any of the following sentences describe you, this chapter is intended to help:

You have read about cloud computing or PaaS and want to learn how Force.com compares to other technologies.

You want to get started with Force.com but need to select a suitable first project.

You have a project in mind to build on Force.com and want to learn how you can leverage existing development skills and process.

This chapter consists of three sections:

Force.com in the Cloud Computing Landscape: Learn about PaaS and Force.com's unique features as a PaaS solution.

Inside a Force.com Project: Examine how application development with Force.com differs from other technologies in terms of project selection, technical roles, and tools.

Sample Application: A sample business application is referenced throughout this book to provide a concrete basis for discussing technical problems and their solutions. In this chapter, the sample application's requirements and use cases are outlined, as well as a development plan, mapped to chapters of the book.

Force.com in the Cloud Computing Landscape

Phrases like "cloud computing" and "Platform as a Service" have many meanings put forth by many vendors. This section provides definitions of the terms to serve as a basis for understanding Force.com and comparing it with other products in the market. With this background, you can make the best choice for your projects, whether that is Force.com, another PaaS product, or your own in-house infrastructure.

Platform as a Service (PaaS)

The platform is infrastructure for the development of software applications. The functionality of a platform's infrastructure differs widely across platform vendors, so this section focuses on a handful of the most established vendors. The suffix "as a Service" (aaS) means that the platform exists "in the cloud," accessible to customers via the Internet. Many variations exist on this acronym, including SaaS (Software as a Service), IaaS (Infrastructure as a Service), and so forth.

PaaS is a category within the umbrella of cloud computing. "Cloud computing" is a phrase to describe the movement of computing resources away from physical data centers or servers in a closet in your company and into the network, where they can be provisioned, accessed, and deprovisioned instantly. You plug a lamp into an electrical socket to use the electrons in your region's power grid. Running a diesel generator in your basement is usually not necessary. You trust that the power company is going to provide that service, and you pay the company as you use the service.

Cloud computing as a general concept spans every conceivable configuration of infrastructure, well outside the scope of this book. The potential benefits are reduced complexity and cost versus a traditional approach. The traditional approach is to invest in infrastructure by acquiring new infrastructure assets and staff or redeploying or optimizing existing investments. Cloud computing provides an alternative.

Many companies provide PaaS products. The following subsections introduce the mainstream PaaS products and include brief descriptions of their functionality. Consult the Web sites of each product for further information.

Amazon Web Services

Amazon Web Services refers to a family of cloud computing products. The most relevant to PaaS is Elastic Beanstalk, a platform for running Java applications that provides load balancing, auto-scaling, and health monitoring. The platform is actually built on several other Amazon Web Services products that can be independently configured by advanced users, with the most significant being Elastic Compute Cloud (EC2). EC2 is a general-purpose computing platform, not limited to running Java programs. You can provision virtual instances of Windows or Linux machines at will, loading them with your own custom operating-system image or one prebuilt by Amazon or the community. These instances run until you shut them down, and you are billed for usage of resources such as CPU, disk, and network.

A raw machine with an OS on it is a great start, but to build a business application requires you to install, manage access to, maintain, monitor, patch and upgrade, back up, plan to scale, and generally care and feed in perpetuity an application platform on the EC2 instance. Many of these tasks are still required of Amazon's higher-level Elastic Beanstalk offering. If your organization has the skills to build on .NET, J2EE, LAMP, or other application stacks, plus the OS, database administration, and IT operations experience, Amazon's virtual servers in the cloud could be a strong alternative to running your own servers in-house.

Amazon provides various other products that complement Elastic Beanstalk and EC2. These include Simple Queue Service for publish-and-subscribe-style integration between applications, Simple DB for managing schemaless data, and Simple Storage Service, a content repository.

Microsoft Azure

Azure consists of two products. The first is Windows Azure, an operating system that can utilize Microsoft's data centers for general computation and storage. It is a combination of infrastructure and platform designed to take existing and new .NET-based applications and run them in the cloud, providing similar features for scalability and elasticity as Amazon Web Services. Most Azure applications are developed in C# using Microsoft Visual Studio, although other languages and tools are supported. The second part is SQL Azure, a hosted version of Microsoft SQL Server. The cost of these products is based on resource consumption, defined as a combination of CPU, network bandwidth, storage, and number of transactions.

Google App Engine

App Engine is a platform designed for hosting Web applications. App Engine is like having an unlimited number of servers in the cloud working for you, preconfigured with a distributed data store and Python or Java-based application server. It's much like Amazon's Elastic Beanstalk but focused on providing a higher-level application platform. It lacks the configurable lower-level services like EC2 to provide an escape hatch for developers requiring more control over the infrastructure. App Engine includes tools for managing the data store, monitoring your site and its resource consumption, and debugging and logging.

App Engine is free for a set amount of storage and page views per month. Applications requiring more storage or bandwidth can purchase it by setting a maximum daily dollar amount they're willing to spend, divided into five buckets: CPU time, bandwidth in, bandwidth out, storage, and outbound email.

Force.com

Force.com is targeted toward corporate application developers and independent software vendors. Unlike the other PaaS offerings, it does not expose developers directly to its own infrastructure. Developers do not provision CPU time, disk, or instances of running operating systems. Instead, Force.com provides a custom application platform centered around the relational database, one resembling an application server stack you might be familiar with from working with .NET, J2EE, or LAMP.

Although it integrates with other technologies using open standards such as SOAP and REST, the programming languages and metadata representations used to build applications are proprietary to Force.com. This is unique among the PaaS products but not unreasonable when examined in depth. Force.com operates at a significantly higher level of abstraction than the other PaaS products, promising dramatically higher productivity to developers in return for their investment and trust in a single-vendor solution.

To extend the reach of Force.com to a larger developer community, Salesforce and VMware provide a product called VMforce. VMforce brings some of the features of the Force.com platform to Java developers. It consists of development tools from the Salesforce community and virtualized computing resources from VMware. With VMforce, you can create hybrid applications that use Force.com for data and services, but are built with Java standard technologies such as Spring. Along the same lines, Salesforce's acquisition of Heroku is expected to extend Force.com features to Ruby developers.

Force.com is free for developers. Production applications are priced primarily by storage used and number of unique users.

Facebook

Facebook is a Web site for connecting with your friends, but it also provides developers with ways to build their own socially aware applications. These applications leverage the Facebook service to create new ways for users to interact while online. The Facebook platform is also accessible to applications not built inside Facebook, exposing the "social graph" (the network of relationships between users) where permitted.

Much of the value of Facebook as a platform stems from its large user base and consistent yet extensible user experience. It is a set of services for adding social context to applications. Unlike Force.com and App Engine, for example, Facebook has no facility to host custom applications.

Force.com as a Platform

Force.com is different from other PaaS solutions in its focus on business applications. Force.com is a part of Salesforce.com, which started as a SaaS Customer Relationship Management (CRM) vendor. But Force.com is not CRM. It provides the infrastructure commonly needed for any business application, customizable for the unique requirements of each business through a combination of code and configuration. This infrastructure is delivered to you as a service on the Internet.

Because you are reading this book, you have probably developed a few business applications in your time. Consider the features you implemented and reimplemented in multiple applications, the unglamorous plumbing, wiring, and foundation work. Some examples are security, user identity, logging, profiling, integration, data storage, transactions, workflow, collaboration, and reporting. This infrastructure is essential to your applications but expensive to develop and maintain. Business application developers do not code their own relational database kernels, windowing systems, or operating systems. This is basic infrastructure, acquired from software vendors or the open-source community and then configured to meet user requirements. What if you could do the same for your application infrastructure? This is the premise of the Force.com.

The following subsections list differentiating architectural features of Force.com with brief descriptions.

Multitenancy

Multitenancy is an abstract concept, an implementation detail of Force.com, but one with tangible benefits for developers. Figure 1-1 shows a conceptual view of multitenancy. Customers access shared infrastructure, with metadata and data stored in the same logical database.

Figure 1-1 Multitenant architecture

The multitenant architecture of Force.com consists of the following features:

Shared infrastructure: Every customer (or tenant) of Force.com shares the same infrastructure. They are assigned an independent logical environment within the Force.com platform.

At first, some might be uncomfortable with the thought of handing their data to a third-party where it is co-mingled with that of competitors. Salesforce's whitepaper on its multitenant technology includes the technical details of how it works and why your data is safe from loss or spontaneous appearance to unauthorized parties.

NOTE

The whitepaper is available at http://wiki.developerforce.com/index.php/Multi_Tenant_Architecture.

Single version: Only one version of the Force.com platform is in production. The same platform is used to deliver applications of all sizes and shapes, used by 1 to 100,000 users, running everything from dog-grooming businesses to the Japanese national post office.

Continuous, zero-cost improvements: When Force.com is upgraded to include new features or bug fixes, the upgrade is enabled in every customer's logical environment with zero to minimal effort required.

Salesforce can roll out new releases with confidence because it maintains a single version of its infrastructure and can achieve broad test coverage by leveraging tests, code, and configurations from their production environment. You, the customer, are helping maintain and improve Force.com in a systematic, measurable way as a side effect of simply using it. This deep feedback loop between the Force.com and its users is something impractical to achieve with on-premise software.

Relational Database

The heart of Force.com is the relational database provided as a service. The relational database is the most well-understood and widely used way to store and manage business data. Business applications typically require reporting, transactional integrity, summarization, and structured search, and implementing those on nonrelational data stores requires significant effort. Force.com provides a relational database to each tenant, one that is tightly integrated with every other feature of the platform. There are no Oracle licenses to purchase, no tablespaces to configure, no JDBC drivers to install, no ORM to wrangle, no DDL to write, no queries to optimize, and no replication and backup strategies to implement. Force.com takes care of all these tasks.

Application Services

Force.com provides many of the common services needed for modern business application development. These are the services you might have built or integrated repeatedly in your past development projects. They include logging, transaction processing, validation, workflow, email, integration, testing, reporting, and user interface.

These services are highly customizable with and without writing code. Although each service can be valued as an individual unit of functionality, their unification offers tremendous value. All the features of Force.com are designed, built, and maintained by a single responsible party, Salesforce. Salesforce provides documentation for these features as well as support staff on-call, training and certification classes, and accountability to its customers for keeping things running smoothly. This is in contrast to many software projects that end up as a patchwork of open-source, best-of-breed tools and libraries glued together by you, the developer, asked to do more with fewer people, shorter timelines, and cheaper, often unsupported tools.

Declarative Metadata

Almost every customization configured or coded within Force.com is readily available as simple XML with a documented schema. At any point in time, you can ask Force.com for this metadata via a set of Web services. The metadata can be used to configure an identical environment or managed with your corporate standard source control system. It is also helpful for troubleshooting, allowing you to visually compare the state of two environments. Although a few features of Force.com are not available in this declarative metadata form, Salesforce's stated product direction is to provide full coverage.

Programming Language

Force.com has its own programming language, called Apex. It allows developers to script interactions with other platform features, including the user interface. Its syntax is a blend of Java and database stored procedure languages like T/SQL and can be written using a Web browser or a plug-in to the Eclipse IDE.

Other platforms take a different approach. Google's App Engine simultaneously restricts and extends existing languages such as Python so that they play nicely in a PaaS sandbox. This offers obvious benefits, such as leveraging the development community, ease of migration, and skills preservation. One way to understand Apex is as a domain-specific language. Force.com is not a general-purpose computing platform to run any Java or C# program you want to run. Apex is kept intentionally minimalistic, designed with only the needs of Force.com developers in mind, built within the controlled environment of Salesforce R&D. Although it won't solve every programming problem, Apex's specialized nature leads to some advantages in learning curve, code conciseness, ease of refactoring, and ongoing maintenance costs.

Force.com Services

Force.com can be divided into four major services: database, business logic, user interface, and integration. Technically, many more services are provided by Force.com, but these are the high-level categories that are most relevant to new Force.com developers.

Database

Force.com is built around a relational database. It allows the definition of custom tables containing up to 800 fields each. Fields contain strongly typed data using any of the standard relational database data types, plus rich types such as currency values, picklists, formatted text, and phone numbers. Fields can contain validation rules to ensure data is clean before being committed, and formulas to derive values, like cells in a spreadsheet. Field history tracking provides an audit log of changes to chosen fields.

Custom tables can be related to each other, allowing the definition of complex data schemas. Tables, rows, and columns can be configured with security constraints. Data and metadata is protected against accidental deletion through a "recycling bin" metaphor. The database schema is often modifiable instantly, without manual migration. Data is imported from files or other sources with free tools, and APIs are provided for custom data-loading solutions.

Data is queried via a SQL-like language called SOQL (Salesforce Object Query Language). Full-text search is available through SOSL (Salesforce Object Search Language).

Business Logic

Apex is the language used to implement business logic on Force.com. It allows code to be structured into classes and interfaces, and it supports object-oriented behaviors. It has strongly typed collection objects and arrays modeled after Java.

Data binding is a first-class concept in Apex, with the database schema automatically imported as language constructs. Data manipulation statements, trigger semantics, batch processing, and transaction boundaries are also part of the language.

The philosophy of test-driven development is hard-wired into the Force.com platform. Methods are annotated as tests and run from a provided test harness or test API calls. Test methods are automatically instrumented by Force.com and output timing information for performance tuning. Force.com prevents code from being deployed into production that does not have adequate unit test coverage.

User Interface

Force.com provides two approaches for the development of user interfaces: Page Layouts and Visualforce. Page Layouts are inferred from the data model, including validation rules, and then customized using a WYSIWYG editor. Page Layouts feature the standard Salesforce look-and-feel. For many applications, Page Layouts can deliver some or all of the user interface with no development effort.

Visualforce allows developers to build custom user interfaces. It consists of a series of XML markup tags called components with their own namespace. As with JSP, ASP.NET, Velocity, and other template processing technologies, the components serve as containers to structure data returned by the Controller, a class written in Apex. To the user, the resulting Web pages might look nothing like Salesforce, or adopt its standard look-and-feel. Visualforce components can express the many types and styles of UIs, including basic entry forms, lists, multistep wizards, Ajax, Adobe Flex, mobile applications, and content management systems. Developers can create their own components to reuse across applications.

User interfaces in Visualforce are public, private, or some blend of the two. Private user interfaces require a user to log in before gaining access. Public user interfaces, called Sites, can be made available to anonymous users on the Internet.

Integration

In the world of integration, more options are usually better, and standards support is essential. Force.com supports a wide array of integration technologies, almost all of them based on industry-standard protocols and message formats. You can integrate other technologies with Force.com using an approach of configuration plus code. Here are some examples:

Apex Web Services allows control of data, metadata, and process from any platform supporting SOAP over HTTP, including JavaScript. This makes writing composite applications that combine Force.com with technology from other vendors in many interesting and powerful ways possible. Force.com's Web services API has evolved over many years, spanning more than 20 versions with full backward compatibility.

The Force.com database is accessible via Representational State Transfer (REST) calls. This integration method is much lighter weight than Web Services, allowing Web applications to query and modify data in Force.com with simple calls accessible to any development language.

Business logic developed in Apex can be exposed as a Web service, accessible with or without a Force.com user identity. Force.com generates the WSDL from your Apex code. Additionally, Force.com converts WSDL to Apex bindings to allow access to external Web services from within the platform.

You can create virtual email inboxes on Force.com and write code to process the incoming email. Sending email from Force.com is also supported.

Force.com provides an API for making HTTP requests, including support for client-side certificates, SSL, proxies, and HTTP authentication. With this, you can integrate with Web-based resources, everything from static Web pages to REST services returning JSON.

Salesforce-to-Salesforce (S2S) is a publish-and-subscribe model of data sharing between multiple Force.com environments. If the company you need to integrate with already uses Force.com and the data is supported by S2S, integration becomes a relatively simple configuration exercise. There is no code or message formats to maintain. Your data is transported within the Force.com environment from one tenant to another.

If your requirements dictate a higher-level approach to integration, software vendors like IBM's Cast Iron Systems and Informatica offer adapters to Force.com to read and write data and orchestrate complex transactions spanning disparate systems. …

Read more: Page 1 of 4 - Next

Derrick Harris (@derrickharris) announced on 7/19/2011 the availability of his Infrastructure Q2: Big data and PaaS gain more momentum report for GigaOm Pro (missed when posted, requires annual subscription). From the Summary:

Big data and Platform-as-a-Service offerings highlighted the second quarter, suggesting that we can expect to see a shift in enterprise IT practices around application development and analytics very soon. On the PaaS front, we saw new projects like DotCloud and Cloud Foundry gain incredible momentum in just a few short months. We also saw Heroku, Google App Engine and Microsoft Windows Azure mature in meaningful ways. All of this means that developers — even of the enterprise variety — won’t be able to avoid PaaS for much longer.

The big-data activity ranged from major new Hadoop vendors to heavy investment in flash storage that will speed the serving of data to processing engines. Even more interesting use cases for Hadoop and other big-data tools popped up to demonstrate that with the right technology running the right algorithms, we can use data to power an endless variety of applications. In other areas, we saw an uptick in cloud-computing plans from large vendors.

OpenStack continued to mature and pick up both contributors and users, and Facebook caught our eye by launching an open-source project around the designs for its specialized servers and data centers.

It wasn’t all great news, though, as the second quarter kicked off with the weeklong Amazon Web Services outage. That event exposed architectural flaws on AWS’ side as well as among a large number of its customers. Actually, the good name of cloud computing came out of the incident relatively untarnished, as the media and the companies involved focused on how to resolve the problems. …

Derrick’s summary continues with a list of the companies mentioned in the report.

Larry Dignan asked Big data vs. traditional databases: Can you reproduce YouTube on Oracle's Exadata? in a 7/8/2011 post to ZDNet’s Behind the Lines blog (missed when posted):

Increasing data requirements, especially the unstructured information such as video, are going to relegate relational databases to the enterprise scrap heap as an emerging breed of vendors chips away at traditional software powers.

That’s the overview from Cowen & Co. analyst Peter Goldmacher. In a 75-page report, Goldmacher walks through the database landscape and concludes that the consensus view that the growth of data will boost traditional database vendors is dead wrong. Goldmacher said:

We believe the vast majority of data growth is coming in the form of data sets that are not well suited for traditional relational database vendors like Oracle. Not only is the data too unstructured and/or too voluminous for a traditional RDBMS, the software and hardware costs required to crunch through these new data sets using traditional RDBMS technology are prohibitive. To capitalize on the Big Data trend, a new breed of Big Data companies has emerged, leveraging commodity hardware, open source and proprietary technology to capture and analyze these new data sets. We believe the incumbent vendors are unlikely to be a major force in the Big Data trend primarily due to pricing issues and not a lack of technical know-how.

Oracle doesn’t buy Goldmacher’s take. On Oracle’s most recent conference call, executives talked up big data and how it will benefit the company.

The crux of Goldmacher’s argument that big data will crush traditional database companies revolves around cost. Emerging big data players can price better than large database players like Oracle that have margins to protect. In other words, Oracle would have to charge 9x more than the blended average of big data vendors to solve data conundrums.

Over time, this price differential as well as the growth of corporate unstructured data will mean so-called big data players win. That means the likes of big fish like Oracle and IBM and middle-tier players—HP Vertica, EMC Greenplum and Teradata—will have to deal with the likes of Infobright, 1010 Data, Splunk and Cloudera.

To illustrate this point, Goldmacher did an interesting exercise where [he] outlined how to replicate YouTube on proprietary enterprise systems. Here’s what happens to costs when YouTube meets Oracle Exadata machines.

First the assumptions: Goldmacher estimated that YouTube consumption—user uploads of 48 hours of video a minute and 3 billion videos a day along with roughly 45 petabytes of viewed videos a day—would require at least 9 full-rack Exadata machines at $1.5 million each. There would be at least 18 Exadata machines to handle spikes. Those machines would add up to 14 Exalogic devices to serve data at $1.1 million per system. The software stack under Oracle would include WebLogic middleware, Oracle databases, Exadata optimized storage and Oracle as operating system. The open source comparison included JBoss middleware, MySQL, Hadoop and Red Hat Enterprise Linux as the OS.

The bottom line looks like this (click to enlarge):

In a nutshell, the Oracle Exadata capital expenses for hardware and software total $589.4 million compared to an open source and commodity hardware cost of $104.2 million. Annual expenses (staff and support) are $99 million for Oracle Exadata and $15.1 million for an open source stack. The personnel costs are based on the nine engineer staff of the original YouTube team.

Here’s a look at the hardware involved:

The open source hardware stack consists of HP server racks, storage with Cisco Nexus switches.

But hardware is fairly simple. The beauty of Oracle’s integrated hardware/software stacks—at least for the company—is the licensing and maintenance revenue stream.

Goldmacher noted:

At first glance, total core hardware costs of roughly $155M, just roughly 5% of Google’s current CapEx seem reasonable. This line of thinking lasts until Oracle presents the bill for its software: a not-insignificant $400M for database and Exadata storage licenses alone, bringing the total upfront investment to $570M.

Here’s a look at the software costs:

And the open source side.

Now there are a few caveats. Goldmacher didn’t create assumptions for in-memory databases like Membase because support pricing wasn’t readily available. But overall, you get the picture. Big data may mean some large headaches for established relational database players looking to preserve chunky profit margins.

Related:

Yahoo, Benchmark Capital launching independent company for Apache Hadoop

Big Data: Pervasive is making a big bet on Hadoop with accelerator technology

Oracle’s hardware business is all about the software

IBM launches new Netezza appliance, eyes big data

Tale of two data center strategies: Apple vs. Facebook

IBM launches Hadoop-based analytics software, big data services

Cloudera’s latest Hadoop stack generally available

eBay’s Teradata implementation headed to 20 petabytes

Cloudera, EMC Greenplum form data warehousing alliance

US$12.9 million for Red Hat Enterprise Linux support, Datameer Hadoop subscriptions, and JBoss licenses makes me question if the “open source side” is really open source.