OakLeaf Systems: Windows Azure and Cloud Computing Posts for 8/15/2011+

A compendium of Windows Azure, SQL Azure Database, AppFabric, Windows Azure Platform Appliance and other cloud-computing articles.

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

Azure Blob, Drive, Table and Queue Services
SQL Azure Database and Reporting
Marketplace DataMarket and OData
Windows Azure AppFabric: Apps, WF, Access Control, WIF and Service Bus
Windows Azure VM Role, Virtual Network, Connect, RDP and CDN
Live Windows Azure Apps, APIs, Tools and Test Harnesses
Visual Studio LightSwitch and Entity Framework v4+
Windows Azure Infrastructure and DevOps
Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds
Cloud Security and Governance
Cloud Computing Events
Other Cloud Computing Platforms and Services

Azure Blob, Drive, Table and Queue Services

No significant articles today.

SQL Azure Database and Reporting

See the Scalebase, Inc. (@SCLBase) claimed Scalebase 1.0 “Transforms Scalability Model for MySQL Databases Allowing for Enterprise-Level High Availability” in a deck for its ScaleBase Delivers Transparent Scaling to MySQL Databases press release of 8/15/2011 article in the Other Cloud Computing Platforms and Services section below.

Bruce Kyle reported Hadoop Connectors Coming for Parallel Data Warehouse, SQL Server in an 8/15/2011 post to the US ISV Evangelism blog:

Microsoft will soon release two new Hadoop connectors to help customers exploit the benefits of unstructured data in both SQL and non-SQL environments.

Connectors will include:

Hadoop to SQL Server Parallal Data Warehouse (PDW) for large data volumes.

Hadoop to SQL Server 2008 R2 or SQL Server ‘Denali’ software.

Microsoft brings over a decade of Big Data expertise to the market. For instance we use it at Bing to deliver the best search results (over 100 PBs of data). Over the years Microsoft has invested steadily in unstructured data, including support for Binary files, FILESTREAM in SQL Server, semantic search, File Table, StreamInsight and geospatial data types.

Microsoft understands that customers are working with unstructured data in different environments such as Hadoop; we are committed to providing these customers with interoperability to enable them to move data between their Hadoop and SQL Server environments.

The announcement was made on the SQL Server team blog post Parallel Data Warehouse News and Hadoop Interoperability Plans.

About Hadoop

The Apache Hadoop software library is a framework that supports distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop is highly scalable and can support petabytes of data. One of its key attractions is cost: through the use of commodity servers, Hadoop dramatically reduces the cost of analyzing large data volumes. As an example there is an application of Hadoop at New York Times that processed 4 TB of images, producing up to 11 million PDF files in 24 hours for only $240 in computational cost.

Neither Bruce nor the SQL Server Team mentioned Hadoop Connectors for future versions of SQL Azure.

Erik Ejlskov Jensen (@EricEJ) described a Major update to SQL Server Compact 3.5 SP2 available in an 8/15/2011 post:

A major update to SQL Server Compact 3.5 SP2 has just been released, disguised as a “Cumulative Update Package”. Microsoft knowledgebase article 2553608 describes the update. The update contains the following product enhancements:

Support for Windows Embedded CE 7.0

The update contains updated device components. This expand the supported device platforms to this impressive list: Pocket PC 2003 Software, Windows CE, Windows Mobile 5.0, Windows Mobile 6, Windows Mobile 6.1 , Windows Mobile 6.5 Professional, Windows Mobile 6.5 Standard, Windows Embedded CE 7.0. [Emphasis Erik’s].

Support for Merge Replication with SQL Server “Denali” CTP3

The update contains new Server Tools, that support Merge Replication with the next version of SQL Server, codename “Denali”. The replication components also work with Windows Embedded CE 7.0. [Link added.]

For a list of fixes in the Cumulative Updates released for SQL Server Compact 3.5 SP2, see my blog post here.

It is nice to see that the 3.5 SP2 product, with it’s full range of device support and synchronization technologies is kept alive and kicking.

NOTE: Currently, the only download available is the desktop runtime, I will update this blog post and tweet (@ErikEJ) when the other downloads are available.

MarketPlace DataMarket and OData

MSDN’s Data Developer Center posted a new Open Data Protocol Q&A topic recently:

Open Data Protocol

Q: What is the Open Data Protocol?

A: The Open Data Protocol (OData) is a web protocol for querying and updating data. OData applies web technologies such as HTTP, Atom Publishing Protocol (AtomPub) and JSON to provide access to information from a variety of applications, services, and stores. OData emerged organically based on the experiences implementing AtomPub clients and servers in a variety of products over the past several years. OData is being used to expose and access information from a variety of sources, including but not limited to relational databases, file systems, content management systems, and traditional web sites. Microsoft has released OData under the Open Specification Promise (OSP) to allow anyone to freely interoperate with OData implementations. We intend on working with others in the community to move the features of OData into future version of AtomPub or other appropriate standards.

There is a growing list of products that implement OData. Microsoft supports the Open Data Protocol in SharePoint Server 2010, Excel 2010 (through SQL Server PowerPivot for Excel), Windows Azure Storage, SQL Server 2008 R2, Visual Studio 2008 SP1. Support in other Microsoft products is currently underway.

The Open Data Protocol was previously talked about in three ways:

“Astoria” Protocol

ADO.NET Data Services Protocol

“Our conventions/extensions to AtomPub”

Please see http://odata.org for more information.

Q: What are Microsoft’s aspirations for the protocol?

A: Customers have consistently given us feedback that we need to take what is already a standards-based approach and open it even further. We are excited about the many clients and services that use this protocol and look forward working with the community on it. We intend on working with others in the community to move the features of OData into future version of AtomPub or other appropriate standards..

Q: How is the Open Data Protocol (OData)being released?

A: OData is defined as a set of open extensions/conventions to AtomPub documented and released under the OSP (Open Specification Promise).

Q: Which Microsoft products support the Open Data Protocol (OData)?

A: There is a growing list of products that implement the Open Data Protocol. Microsoft supports OData in SharePoint Server 2010, Excel 2010 (through SQL Server PowerPivot for Excel), Windows Azure Storage, SQL Server 2008 R2,Visual Studio 2008 SP1 and the .NET Framework. Microsoft provides client libraries for .NET, Silverlight, AJAX. Support in other Microsoft products is currently underway. Client libraries are also available for .NET, Silverlight, Windows Phone 7, PHP, AJAX, Javascript, Ruby, Objective C and Java.

Q: How do I create a service that uses the Open Data Protocol (OData)?

A: Since the Open Data Protocol is fully specified, you can implement on any HTTP server using any language. On the .NET Framework, WCF.NET Data Services provides a framework allowing developers to create a OData services in .NET. Likewise in Java there is a server library called OData4J. As noted above, client libraries are available for .NET, Silverlight, Windows Phone 7, PHP, AJAX, Javascript, Ruby, Objective C and Java.

Q: How is the Open Data Protocol (OData) related to AtomPub?

A: OData adds the following to AtomPub:

A convention for representing structured data

A resource addressing scheme and URL syntax

A set of common query options (filter, sort, etc.)

Schema describing resource structure, links and metadata

Payload formats and semantics for batch and “unit of work” requests

Alternate representations of resource content (JSON)

Since OData is based on AtomPub, it is possible for OData clients and services to be written with minimal extra code allowing them to work with AtomPub (and GData) data..

Q: Will the Open Data Protocol (OData) be standardized?

A: We are making the OData specification available under Microsoft’s Open Specification Promise (OSP) so third parties, including open source projects, can build Open Data Protocol clients and services. We intend on working with others in the community, including Google, to move the features of OData into future version of AtomPub or other appropriate standards. We encourage Google (GData) to join us in these conversations.

Q: How should I think about OData vs. GData

A: There is no OData vs. GData. Both are based on ATOM and JSON. Both protocol support our goal of an open data protocol for the Web. We intend on working with others in the community, including Google, to move the features of OData into future version of AtomPub or other appropriate standards.. We encourage Google (GData) to join us in these conversations.

Q: Where can I learn more about OData?

A: To learn more about OData visit http://odata.org.

Windows Azure AppFabric: Apps, WF, Access Control, WIF and Service Bus

Ron Jacobs (@ronljacobs) explained How to create a Custom Activity Designer with Windows Workflow Foundation (WF4) in an 8/14/2011 post:

The Windows Workflow Foundation (WF4) - Custom Activity Designer sample demonstrates how you can build an activity and activity designer. It includes three projects

MyActivityLibrary - The activity library project

MyActivityLibrary.Design - The activity designer project

TestDesigner - A rehosted designer project useful for testing the activity

Step 1: Create the Activity

The first step is to build your activity. Don't create a designer until you are satisifed with the interface to your activity in terms of arguments and properties. This sample includes a native activity named MyActivity which simply returns a string with the activity values.

The activity includes an InArgument and two properties including an enumerated value so you can see how to use these with your activity designer.
   1: public sealed class MyActivity : NativeActivity<string>
   2: {
   3:     public MyEnum Option { get; set; }
   4:     public bool TestCode { get; set; }
   5:
   6:     [DefaultValue(null)]
   7:     public InArgument<string> Text { get; set; }
   8:  
   9:     protected override void Execute(NativeActivityContext context)
 10:     { 
 11:         this.Result.Set(
 12:             context,
 13:             string.Format(
 14:                 "Text is {0}, TestCode is {1}, Option is {2}",
 15:                 context.GetValue(this.Text),
 16:                 this.TestCode,
 17:                 this.Option));
 18:     }
 19: }
Step 2: Add the Design Project

Visual Studio uses a naming convention to locate an associated designer project. Since our assembly is named MyActivity.dll, Visual Studio will attempt to load MyActivity.Design.dll.

Select File / Add / New Project

Choose the Activity Designer Library project template

Name the project MyActivityLibrary.Design

Step 3: Add RegisterMetadata method

The activity designer class should include a method to register the metadata for that designer.
   1: public partial class MyActivityDesigner
   2: {
   3:     public MyActivityDesigner()
   4:     {
   5:         this.InitializeComponent();
   6:     }
   7:  
   8:     public static void RegisterMetadata(AttributeTableBuilder builder)
   9:     {
 10:         builder.AddCustomAttributes(typeof(MyActivity), new DesignerAttribute(typeof(MyActivityDesigner)));
 11:         builder.AddCustomAttributes(typeof(MyActivity), new DescriptionAttribute("My sample activity"));
 12:     }
 13: }
Step 4: Add Metadata Class

Add a class that implements System.Activities.Presentation.Metadata.IRegisterMetadata. This class will be invoked at runtime to add attributes to the activity class. In the sample, I've added a static method called RegisterAll() which will register all of the activities contained in this library. This method is called from the test designer.
   1: public sealed class MyActivityLibraryMetadata : IRegisterMetadata
   2: {
   3:     public void Register()
   4:     { 
   5:         RegisterAll();
   6:     }
   7:
   8:     public static void RegisterAll()
   9:     { 
 10:         var builder = new AttributeTableBuilder(); 
 11:         MyActivityDesigner.RegisterMetadata(builder); 
 12:         // TODO: Other activities can be added here 
 13:         MetadataStore.AddAttributeTable(builder.CreateTable()); 
 14:     }
 15: }
Step 5: Add an activity image to your Design Project

Your designer should include a 16x16 ToolBox image. The sample application includes the image QuestionMark.png in the activity library. The Build Action for this file should be set to Resource

Step 6: Add the ToolboxBitmap

To support a Toolbox Bitmp you will need to add the activity image to your activity library project also and set the Build Action to Embedded Resource. The sample application has included the file QuestionMark.png as a linked file from the design project.

Next go back to your activity class and add the ToolboxBitmap attribute
   1: // TODO: Be sure the build action for your bitmap is set to Embedded Resource
   2: [ToolboxBitmap(typeof(MyActivity), "QuestionMark.png")]
   3: public sealed class MyActivity : NativeActivity<string>
Step 7: Create The Designer

The XAML in the Activity Designer Library template is for a simple designer. This sample includes a designer with support for Expand/Collapse and an activity image as well as a drop down list with enumerated values.

Here is the collapsed view where you would show only the most important values

And this is the expanded view where you can add more commonly access properties. Remember the property grid allows access to other properties that are not included on the design surface.

Step 8: Configure the Designer project for debugging

There are two options for debugging your designer project. The recommended approach is to debug with a re-hosted designer application such as the one included with this application. Set the project properties as shown for debugging support.

To test with Visual Studio it is best to use the Experimental Instance option. Debugging with Visual Studio can take a long time so it is best to test by starting without debugging.

And that’s it. Now all you need to do is manage the WPF side of things. Check the sample to see how it’s done.

Sahil Malik explained Integrating AppFabric and SharePoint 2010 in an 8/15/2011 post to his Winsmarts blog:

2 months ago, I kick started a whole new series of articles in SharePoint 2010 with the “Cloudy SharePoint: Office 365 and Azure” article [for Code Magazine]. In that article, I emphasized how Azure will put you out of a job, unless of course you choose to learn it :).

Carrying that theme further, I am glad to see my next article is now online. In this article, I talk about a rather common scenario that needs a solution in SharePoint, which is, “Session State in SharePoint”. Even though the title is session state, really I am talking about solving a core need, using AppFabric and SharePoint. And for a good measure, I show Windows Server AppFabric, the code for Azure AppFabric is identical. So even if you are not ready to jump on the Azure bandwagon today (lets say, you have a behind the times boss :)), these are concepts you can use TODAY!

Excerpt from the beginning of the article -

The title of this article is a misnomer, but I still picked this title because it is indeed the problem we are trying to solve. The problem is session state, especially in-process session state, is just evil. It makes your application less predictable, less reliable, less scalable, and locks you out of possibilities such as Windows Azure.

Not just Windows Azure, it also makes it somewhat less suitable for a load balanced stateless environment. It is thus for a good reason that SharePoint discourages the use of session state. The usual solution for the lack of session state we rely on is for the browser to maintain session information, an approach that leads to things like bloated viewstate. It could be argued that most Web Forms-based architectures, including SharePoint, suffer from bloated viewstate. So that isn’t an ideal solution either. The reality is, as evil as session state may be, we do need it. It’s a necessary evil.

But let’s step back for a moment and examine the real problem we are trying to solve. We are trying to have stateful information persist across a stateless protocol, and we want to do it in a scalable form. Out of process session state seems to solve that problem, but that introduces a few more challenges of its own. It introduces the additional server hop, and it is a server-side solution only. And it introduces potential additional complexity for the administrator and setup issues.

I hope you enjoy reading, “Session State in SharePoint 2010”. More fun stuff to follow. w00t! Happy SharePointing.

Leandro Boffi (@leandroboffi) described Windows Azure & WF 4.0 - Creating a Workflow-Driven WorkerRole in an 8/12/2011 post:

Worker roles are a great resource we can use to execute background tasks like integration, migration, scheduling tasks, etc., but also are very useful to run asynchronous tasks triggered by the user. All this kind of processes usually are composed by a set of steps, and we usually hardcode the flow in which those steps are executed. In very trivial cases this is not a problem, but in real and complex scenarios this “flow” [management] code increases the lines of code, which complicates the maintainability of our code, as well as the response to change factor.

Yes, as you can imagine, using Workflow is a very nice and powerful way of avoid all that extra code you need to coordinate the process, just keeping focus on writing the code you need to carry out the functionality you need.

In this post I’ll show you how to create a very simple Workflow-Driven WorkerRole, that reads a message from a queue and sends an email.

If you want to download the code click here.

Creating Custom Activities

To accomplish our objective we will need to create a few Custom activities, due to the functionality we need is not included in WF 4.0 out of the box (you can see a list of the provided activities here). I will quickly show how to create one of them without much detail, due to this is not the subject for this post. We will create three different custom activities:

ReadMessageFromQueue: This will be the activity we going to use to read the message from the queue.

DeleteMessageFromQueue: This activity will delete the message after it has been processed.

SendEmail: As you can guess, this activity will be the responsible for sending the email.

First of all we need to create a new WorkerRole project called “SubscriptionEmailSender” (I’m not going to detail this, I’m guessing that this is not your first worker role). Once we have that, we will add an “Activities” folder, where we going to add our custom activities.

To create a custom activity, we just need to right click the “Activities” folder and Add New Item and use the Code Activity template.

This template will add a class to our project, that extends type System.Activities.CodeActivity, just like a “Command Pattern” this will force us to provide an implementation for a method called “Execute” that will be called when the activity is executed.

We will create the ReadMessageFromQueue activity, basically we need to read and retrieve a message from a queue, but note that the Execute method is a void method, to change that we need to change this class to extend the class System.Activities.CodeActivity<TResult>, also we need two input parameters, the name of the queue we want to read and the Azure storage account we going to use to do that.

Notice that to read the parameters values you need to use the context.GetValue(parameter) method. I’ve also added the Designer attribute, this allows me to use a nice XAML designer for the activity that you’ll see in the solution. Build and it’s done.

Creating the Workflow

Once we have all the activities we need, we can start to create the workflow. To do that, right click the WorkerRole project and add new item, using the “Activity” template.

In WF 4.0 there is not a Workflow concept, everything is an activity, so we call “Workflow” to an activity composed of other activities. In our case, our workflow need to do the next:

Read message from the “newsubscriptions” queue.

If no message is present, wait 10 seconds and go to step 1.

If a message is present, send a welcome message to the email address. (the email address will be the message text)

Deletes the message.

Go to step 1.

Let’s start, In the workflow designer, click in the “arguments” tab in the down-left corner, and add the next argument:

This means that our workflow will receive as argument the CloudStorageAccount that we are going to use to work with the queue. Always that we talk about “if” in a workflow we are talking about a flowchart, and that’s the first activity we need to drop in our workflow designer.

Also we need to create a variable at Flowchart level named “Message” where the message will store the message.

Drop the ReadMessageFromQueue activity from the toolbox to the flowchart as initial activity, and configure these values in the property window and tie the start icon to the ReadMessageFromQueue activity.

Account: Account (this makes reference to the Account argument we set in the first step.

QueueName: “newsubscriptions”

Result: Message (this means that the result will be put in the Message variable we set in the previous step)

Now drop a FlowchartDecision activity, tie the ReadMessageFromQueue activity to this one and set these values in the property window:

Condition: IsNothing(Message)

FalseLabel: “Message!”

TrueLabel: “No Message”

To continue, drop a Delay activity and tie the “No Message” endpoint of the FlowchartDecision activity to it and tie it to the ReadMessageFromQueue activity. Also set in property window this value:

Duration: 00:00:10

Now drop the SendEmail activity, tie the “Message!” endpoint of the FlowchartDecision to it, and configure this values in the properties window:

Message: "Welcome to WF & Azure happiness"

Subject: “Welcome “ + Message.AsString

To: Message.AsString

To finish, drop the DeleteMessageFromQueue activity, tie the SendEmail activity to it, also tie it to the ReadMessageFromQueue and set this values in the properties window:

QueueName: “newsubscriptions”

Message: Message

Account: Account

You should have something like this:

Writing the WorkerRole

Now that we have the Workflow completed, we just need to execute it from the WorkeRole, and this is all the code we need to do that.

As you can see, the only thing we need to do is call to WorkflowInvoker providing the instance of the workflow that we want to execute, and the arguments that it is expecting. As our workflow has no end, we don’t need the typical while(true) instruction to keep the process alive.

Conclusion

That’s all, It wasn’t hard at all, right?. To close this post I leave some points to think about:

We just write code to fit our functional requirements, we completely avoid the process management code.

All tasks we’ve created are very reusable.

In a complex process, to have a graphical map of it, is really very useful.

If something change in the process you just need to change the workflow without touching any line of code.

The workflow XAML file could be stored in Azure’s Blob storage and that would allow you to avoid re-deploy to make a change in the process. In my next post I’ll show how to do that.

Windows Azure VM Role, Virtual Network, Connect, RDP and CDN

No significant articles today.

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Greg Oliver explained Installing Windows Features in a Windows Azure Role Instance in an 8/15/2011 post to the Cloud Comments blog:

In this posting I’ll talk about my experiences and final solution to what is usually a simple thing with a Windows Server box: installing Windows features.

If your Azure solution requires a Windows Feature, you might be tempted to set up a VHD and then upload to a VM Role. While this will work, for various reasons it’s not the optimal solution if a worker or web role can do the job. And, as is frequently the case these days, startup tasks can be used to handle this requirement quite nicely.

Initially, I experimented with techniques such as OCSetup and ServerManagerCmd. I used Remote Desktop to try out scripting these options before settling on PowerShell. OCSetup doesn’t give great feedback and has no option for querying status of existing features, and ServerManagerCmd is deprecated. I needed to learn a little bit about PowerShell, but as it’s the technology of the future I decided it was worth the effort.

I used a couple of excellent resources to learn what I needed to know:

Powershell.com has a free eBook entitled Mastering Powershell. Here’s the link: http://powershell.com/cs/blogs/ebook/default.aspx.

Microsoft’s own Scripting Guy, Ed Wilson, published an excellent series of videos for the beginner: http://technet.microsoft.com/en-us/scriptcenter/dd742419.aspx.

The particular Windows Features that I needed to add are “Ink Support” and “Ink Handwriting Recognition”, where “Ink Support” requires a reboot and is a prerequisite for “Ink Handwriting Recognition”. Fortunately, the startup task that you can specify in your CSDEF file gets run every time a role instance boots (not only when the instance is created.)

Two scripts are needed in a PowerShell scenario, one CMD file and one PS1 file:

Startup.CMD
powershell -command "Set-ExecutionPolicy Unrestricted" 2>> err.out
powershell .\startup.ps1 2>> err.out

Startup.PS1
Import-Module Servermanager
$ink = Get-WindowsFeature "IH-Ink-Support"
$hwr = Get-WindowsFeature "IH-Handwriting"
if (!$ink.Installed) {
Add-WindowsFeature -name "IH-Ink-Support" -Restart
}
if (!$hwr.Installed) {
Add-WindowsFeature -name "IH-Handwriting"
}

Here, Get-WindowsFeature returns an object with various properties. The one we care about is “.Installed”, which returns a boolean value indicating the installation status of the feature. For the non-C crowd, the exclamation point, in PowerShell speak, indicates “not”.

Just a few notes:

Set-ExecutionPolicy
Here’s a great blog post that deals with this:
http://blogs.msdn.com/b/jimoneil/archive/2011/02/07/azure-startup-tasks-and-powershell-lessons-learned.aspx

Set up remote desktop
Another Jim O’Neil posting:
http://blogs.msdn.com/b/jimoneil/archive/2010/12/29/azure-home-part-13-remote-desktop-configuration.aspx

This page has a list of installable Windows Features on WS2008R2:
http://technet.microsoft.com/en-us/library/cc732757.aspx

Another good way to see a list of features but also each one’s current status on your instance is to open a PowerShell box, Import-Module Servermanager and then Get-WindowsFeature. Pipe the output to “more” (| more) if you want to see a page at a time.

And here’s Steve Marx’ very useful blog posting on Startup Tasks:
http://blog.smarx.com/posts/introduction-to-windows-azure-startup-tasks

And finally, as mentioned before (but always important to remember), don’t create a command file in Visual Studio. Doing so inserts a couple of byte order chars at the front that prevent the command interpreter from being able to read it. Start your command file with notepad, add it to your project, then you can edit in VS.

Steve Marx (@smarx) described Running ClamAV (Antivirus Software) in Windows Azure in an 8/15/2011 post:

Somewhat regularly, someone asks what antivirus options they have in Windows Azure, usually for the purpose of scanning files uploaded by end-users. The answer I always give is that though Windows Azure doesn’t include any built-in functionality for performing virus scanning, there’s nothing application developers from building this into their applications.

To show how this might be done, I put together an application today that accepts a file upload, stores it in blob storage, and asynchronously (in a worker role) scans it for viruses. It uses ClamAV, an open-source antivirus engine, to detect potential viruses. You can try it yourself at http://antivirus.cloudapp.net.

BIG DISCLAIMER: I’m not an expert on viruses, virus scanners, or security in general. The purpose of this blog post is not to demonstrate security best practices or to make recommendations. This post and accompanying code is meant to be purely educational. [Emphasis Steve’s.]

Okay, with that out of the way, let’s move on to the fun stuff.

The Web Front-End

My virus scanning app has a simple front-end that accepts a file upload, and then displays the results of the virus scan once it’s complete. (The actual scan is handled by the back-end.) The front-end is an ASP.NET MVC 3 web role (using the new template in the 1.4 tools release).

The main page is a form (prettied up a bit by Formly) that either accepts a file or uses the “EICAR test” instead. That second option is really cool. The EICAR test is a tiny, well-known sequence of 68 bytes that is completely harmless but should be flagged by most antivirus engines as harmful. This makes it perfect for testing an antivirus solution, and so I included it as an option in my app so users can verify the scanning is working without having to find and upload a piece of true malware.

Here’s the controller action that handles the form submission:
[HttpPost]
public ActionResult Submit(string method)
{
    var guid = Guid.NewGuid().ToString();
    var blob = incomingContainer.GetBlobReference(guid);
    if (method == "file")
    {
        blob.Properties.ContentType = Request.Files[0].ContentType;
        blob.UploadFromStream(Request.Files[0].InputStream);
    }
    else
    {
        string eicar = @"X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*";
        blob.UploadText(eicar);
    }
    incomingQueue.AddMessage(new CloudQueueMessage(guid));
    return RedirectToAction("show", new { id = guid });
}
It uploads either the file submitted or the EICAR test file (yup, that’s it right there in the code) to a blob in the “incoming” directory and then puts a message on a queue so the worker role knows to scan this blob later. Finally, it redirects the browser to a “show” page that simply polls for the blob’s status and shows the results of the scan when they’re ready.

That page uses Superagent to make the AJAX polling easier (cleaner syntax than native jQuery, quite a nice project), and it polls the following action in the controller:
[HttpPost]
public ActionResult Check(string id)
{
    try
    {
        var blob = publicContainer.GetBlobReference(id);
        blob.FetchAttributes();
        return Json(new { done = true, quarantined = blob.Metadata["quarantined"] == "true", url = blob.Uri.AbsoluteUri });
    }
    catch (Exception e)
    {
        return Json(new { done = false });
    }
}
This code looks for the blob to show up in the “public” container. What will show up there is either the original blob (safe to be shared and downloaded by other users), or a replacement blob with the text “File quarantined: possible infection,” and a piece of metadata specifying that its been quarantined. The above method simply returns the completion and quarantine status to the client, where some JavaScript displays the right thing (using the jQuery tmpl plugin).

The Back-End

The worker role is where the antivirus stuff happens. Inside the worker role, I’ve included ClamAV (including the required Visual Studio 2005 C++ redistributable binaries) as “Content.” ClamAV fetches the most up-to-date virus signatures at runtime using a command called “freshclam.” Code in OnStart() creates a directory within a local storage resource, tells “freshclam” to download the signature database to that location, and launches “clamd” (ClamAV as a daemon) configured to use that local storage resource:
var clamPath = RoleEnvironment.GetLocalResource("clamav").RootPath;
Directory.CreateDirectory(clamPath);
Directory.CreateDirectory(Path.Combine(clamPath, "db"));
File.WriteAllText(Path.Combine(clamPath, "clamd.conf"),
    string.Format("TCPSocket 3310\nMaxThreads 2\nLogFile {0}\nDatabaseDirectory {1}",
        Path.Combine(clamPath, "clamd.log"), Path.Combine(clamPath, "db")));

FreshClam(false);
Process.Start(Environment.ExpandEnvironmentVariables(@"%RoleRoot%\approot\clamav\clamd.exe"),
    "-c " + Path.Combine(clamPath, "clamd.conf"));
The implementation of FreshClam(…) is as follows:
private void FreshClam(bool notify)
{
    var clamPath = RoleEnvironment.GetLocalResource("clamav").RootPath;
    var args = string.Format("--datadir={0}", Path.Combine(clamPath, "db"));
    if (notify)
    {
        args += string.Format(" --daemon-notify={0}", Path.Combine(clamPath, "clamd.conf"));
    }
    var proc = Process.Start(Environment.ExpandEnvironmentVariables(@"%RoleRoot%\approot\clamav\freshclam.exe"), args);
    proc.WaitForExit();
    if (proc.ExitCode != 0)
    {
        throw new Exception("freshclam failed");
    }
}
From then on, the code enters a fairly standard loop, checking for new blobs to scan, scanning them, and then moving them to either the “public” or “quarantine” container depending on the results of the scan. Every fifteen minutes, FreshClam is executed again to pull down any new virus signatures:
while (true)
{
    if (DateTime.UtcNow - lastRefresh > TimeSpan.FromMinutes(15))
    {
        FreshClam(true);
        lastRefresh = DateTime.UtcNow;
    }
    var msg = q.GetMessage();
    if (msg != null)
    {
        var name = msg.AsString;
        var blob = incomingContainer.GetBlobReference(name);
        var result = new ClamClient("127.0.0.1", 3310).SendAndScanFile(blob.DownloadByteArray()).Result;
        if (result == ClamScanResults.Clean)
        {
            publicContainer.GetBlobReference(name).CopyFromBlob(blob);
            blob.Delete();
        }
        else
        {
            var publicBlob = publicContainer.GetBlobReference(name);
            publicBlob.Metadata["quarantined"] = "true";
            publicBlob.UploadText("File quarantined: possible infection.");
            quarantineContainer.GetBlobReference(name).CopyFromBlob(blob);
            blob.Delete();
        }
        q.DeleteMessage(msg);
    }
    else
    {
        Thread.Sleep(TimeSpan.FromSeconds(1));
    }
}
ClamClient comes from the nClam library, which, thanks to NuGet, is as easy to acquire as install-package nclam.

I’m only using a single instance of the worker role, because I don’t need high availability. This also lets me not worry about using a unique port number for each instance of “clamd” when running locally. If you try to run this locally with more worker role instances, you’ll probably want to declare an internal endpoint and use the port provided by the runtime instead of hardcoding port 3310.

Try it Out, and Download the Code

You can try the solution at http://antivirus.cloudapp.net.

You’ve seen most of the interesting code right here in this post, but if you want to see all the details, you can download the full Visual Studio solution here: http://cdn.blog.smarx.com/files/AntiVirus_source.zip

Note that I didn’t include the ClamAV or VS redistributable binaries, primarily to keep the zip file size down. If you want to actually run this application, you’ll need to get those binaries and put them in the “clamav” folder within the worker role.

The Windows Azure Team (@WindowsAzure) posted a Real World Windows Azure: Interview with Dom Alcocer, Marketing Manager at General Mills case study summary on 8/15/2011:

MSDN: Tell us about General Mills.

Alcocer: With more than 33,000 employees, General Mills is one of the world’s leading food companies. Our brands include Betty Crocker, Cheerios, Nature Valley and Pillsbury.

MSDN: What led the company to develop Gluten Freely?

Alcocer: Gluten, a protein naturally found in certain grains common in a modern diet, can cause health problems for a small but growing number of people who have sensitivities to it, including those with celiac disease. We realized that people with celiac disease, and others who have some kind of gluten sensitivity or who just want to live a gluten-free lifestyle, were having particular difficulty in simply locating all of the great foods that they need to make this diet come to life for them. They couldn’t find places to buy the products, and they had to spend a lot of time sifting and sorting through the Internet to find recipes and other information about gluten-free diets.

In order to assist these customers, we wanted to create a direct-to-consumer online channel that would allow them to buy gluten-free products directly from General Mills.

MSDN: What is Gluten Freely?

Alcocer: Gluten Freely is a cloud-based consumer business channel for gluten-free products and related information. The site gives consumers access to a broad range of resources—including recipes, blogs, community forums, and medical facts about gluten—along with coupons and discounts for our online store where they can choose from a selection of more than 400 gluten-free products, which are then shipped directly to the consumer’s door. We sell both our own products along with non–General Mills products in an effort to provide the largest possible selection to consumers.

MSDN: Why did you choose to build Gluten Freely on Windows Azure?

Alcocer: BrandJourney Venturing, the solution provider we worked with on this project, evaluated possible technology platforms and recommended we use a cloud-based solution for speed, flexibility, cost, and scalability reasons. In the end, we felt the best route to go was to run on Windows Azure.

MSDN: What were some of the advantages of building on Windows Azure?

Alcocer: Windows Azure allows us to add infrastructure based on consumer demand—essentially, one consumer at a time—instead of forcing us to buy and build servers and systems based on guesses of what we’ll need. The General Mills New Ventures group has been tasked with creating new ways of doing business. With Windows Azure, and by working with BrandJourney Venturing, we brought a new business idea to market about twice as fast and at about half the cost of using a more traditional IT development and brand agency model.

In addition, Windows Azure gave us right out of the gate a platform for creating a solution where compliance and data integrity were protected. We have great confidence in the Microsoft data centers that run Windows Azure. This is critical to the success of the project, because we’re working with a lot of transactional information, including credit card data and customers’ personal information. The Windows Azure platform provides a level of security and robustness that supports a rock-solid experience for consumers.

Dom Alcocer, Marketing Manager, New Ventures, General Mills Talks About Gluten Freely and Windows Azure

MSDN: What other Microsoft technologies are you leveraging in this solution?

Alcocer: Gluten Freely was designed using Visual Studio and the .NET Framework; product and customer data is stored in SQL Azure. The solution also supports federated identity with social networking activity, prompting users to create a link when users log on to sites like Facebook or Twitter. The company also uses Microsoft Dynamics CRM Online for our ecommerce analytics activities.

MSDN: How long did take to develop Gluten Freely?

Alcocer: Gluten Freely took about six months to develop and went live in March 2011.

MSDN: What has the customer response been to Gluten Freely?

Alcocer: We’ve gotten lots of great feedback from our fans on Facebook and followers on Twitter. In our first day, we had orders from more than 25 states; and of course, we now deliver to every state in the continental United States. With Windows Azure and the other Microsoft products, we were able to create and deploy a solution that met a growing market need. It gets great branding and advertising out there in a new way, and people love being able to go to one place to shop for gluten-free products and have them shipped right to their doors.

Click here to read the full case study.

Click here to read how others are using Windows Azure.

Lori MacVittie (@lmacvittie) asserted #v11 #HTML5 will certainly have an impact on web applications, but not nearly as much as hoped on the #mobile application market as an introduction to her HTML5 Going Like Gangbusters But Will Anyone Notice? post of 8/15/2011 to F5’s DevCenter blog:

There’s a war on the horizon. But despite appearances, it’s a war for interactive web application dominance, and not one that’s likely to impact very heavily the war between mobile and web applications.

First we have a report by ABI Research indicating a surge in the support of HTML5 on mobile devices indicating substantially impressive growth over the next five years.

More than 2.1 billion mobile devices will have HTML5 browsers by 2016, up from just 109 million in 2010, according to a new report by ABI Research. -- The HTML Boom is Coming. Fast. (June 22, 2011)

Impressive, no? But browser support does not mean use, and a report issued the day before by yet another analytics firm indicates that HTML5 usage on mobile applications is actually decreasing.

Mobile applications are commanding more attention on smartphones than the web, highlighting the need for strong app stores on handset platforms. For the first time since Flurry, a mobile analytics firm, has been reporting engagement time of apps and web on smartphones, software is used on average for 81 minutes per day vs 74 minutes of web use. -- Sorry HTML 5, mobile apps are used more than the web (June 21, 2011)

What folks seem to be missing – probably because they lack a background in development – is that the war is not really between HTML5 and mobile applications. The two models are very different – from the way in which they are developed and deployed to the way they are monetized. On the one hand you have HTML5 which, like its HTMLx predecessors, can easily be developed in just about any text editor and deployed on any web server known to man. On the other hand you have operating system and often device-specific development platforms that can only be written in certain languages and deployed on specific, targeted platforms.

There’s also a marked difference in the user interface paradigm, with mobile device development heavily leaning toward touch and gesture-based interfaces and all that entails. It might appear shallow on the surface, but from a design perspective there’s a different mindset in the interaction when relying on gestures as opposed to mouse clicks. Consider those gestures that require more than one finger – enlarging or shrinking an image, for example. That’s simply not possible with one mouse – and becomes difficult to replicate in a non gesture-based interface. Similarly there are often very few “key" commands on mobile device applications and games. Accessibility? Not right now, apparently.

That’s to say nothing of the differences in the development frameworks; the ones that require specific environments and languages.

The advantages of HTML5 is that it’s cross-platform, cross-environment, and highly portable. The disadvantage is that you have little or no access to and control over system-level, well, anything. If you want to write an SSL VPN client, for example, you’re going to have to muck around in the network stack. That’s possible in a mobile device development environment and understandably impossible in a web-only world. Applications that are impossible to realistically replicate in a web application world– think graphic-intense games and simulation systems – are possible in a mobile environment.

MOBILE BROADENING ITS USE

The one area in which HTML5 may finally gain some legs and make a race out of applications with mobile apps is in its ability to finally leverage offline storage. The assumption for web applications has been, in the past, always on. Mobile devices have connectivity issues, attenuation and loss of signal interrupts connection-oriented applications and games. And let’s not forget the increasing pressure of data transfer caps on wireless networks (T-Mobile data transfer cap angers smartphone users, Jan 2011; O2 signals the end of unlimited data tariffs for iPhone customers, June 2010) that are also hitting broadband customers, much to their chagrin. But that’s exactly where mobile applications have an advantage over HTML5 and web applications, and why HTML5 with its offline storage capabilities comes in handy.

But that would require rework on the part of application developers to adapt existing applications to fit the new model. Cookies and frequent database updates via AJAX/JSON is not a reliable solution on a sometimes-on device. And if you’re going to rework an application, why not target the platform specifically? Deployment and installation has reached the point of being as simple as opening a web page – maybe more so given the diversity of browsers and add-on security that can effectively prevent a web application requiring scripting or local storage access from executing at all. Better tracking of application reach is also possible with mobile platforms – including, as we’ve seen from the Flurry data, how much time is spent in the application itself.

If you were thinking that mobile is a small segment of the population, think again. Tablets – definitely falling into the mobile device category based on their development model and portability - may be the straw that breaks the laptop’s back.

Our exclusive first look at its latest report on how consumers buy and use tablets reveals an increasing acceptance--even reliance--on tablets for work purposes. Of the 1,000 tablet users surveyed, 57 percent said they are using tablets to replace laptop functions. Compared with a year ago, tablet owners are much less likely to buy a new laptop or Netbook, as well.

Tablets are also cutting into e-reader purchase plans to an ever greater degree.

What's more surprising, given the newness of the tablet market, is that 46 percent of consumers who already have a tablet are planning to buy another one.

-- Report: Multi-tablet households growing fast (June 2011)

This is an important statistic, as it may – combined with other statistics respecting the downloads of applications from various application stores and markets – indicate a growing irrelevance for web-based applications and, subsequently, HTML5. Mobile applications, not HTML5, are the new hotness. The losers to HTML5 will likely be Flash and video-based technologies, both of which can be replaced using HTML5 mechanisms that come without the headaches of plug-ins that may conflict, require upgrades and often are subject to targeted attacks by miscreants.

I argued earlier this year that the increasing focus on mobile platforms and coming-of-age of HTML5 would lead to a client-database model of application development. Recent studies and data indicate that’s likely exactly where we’re headed – toward a client-database model that leverages the same database-as-a-service via a RESTful API and merely mixes up the presentation and application logic tiers on the client – whether through mobile device development kits or HTML5.

As mobile devices – tablets, smartphones and whatever might come next – continue to take more and more mindshare from both the consumer and enterprise markets we’ll see more and more mobile-specific support for applications. You’ll note popular enterprise applications aren’t simply being updated to leverage HTML5 even though there is plenty of uptake in the market of the nascent specification. Users want native mobile platform applications – and they’re getting them.

That doesn’t mean HTML5 won’t be a game-changer for web-applications – it likely will - but it does likely mean it won’t be a game-changer for mobile applications.

No significant articles today.

Visual Studio LightSwitch and Entity Framework 4.1+

No significant articles today.

Return to section navigation list>

Windows Azure Infrastructure and DevOps

JP Morgenthal (@jpmorgenthal) posted a Defining DevOps essay on 8/15/2011:

Recently a member of the LinkedIn DevOps group started a discussion entitled, “Concise description of DevOps?” This member’s post focused on clarifying DevOps as a role mainly for the purposes of simplifying his recruiting efforts. The member pointed out that recruiters and vendors are starting to overload the term in an attempt to attract a wider pool of applicants even if the jobs they are recruiting for are primarily operations focused.

Overloading of technical terms in the information technology industry is quite the norm, but it does sometimes play an integral role in confusing technology consumers and delaying progress through failure or mismatches. The more popular a term, the greater it will be co-opted by various factions for their own purposes. DevOps is certainly not escaping this trend.

Another member in the forum responded to the discussion by stating their regret for introducing the concept of DevOps as a role since management immediately saw the role as a consolidation of multiple roles, thus leading to higher productivity for lower costs. After all, if we can get one person to do the work of three that would certainly be a great advantage. Unfortunately, if management really believes this they should be shot for not recognizing the old adage, “if it seems too good to be true, it probably is!” This member is now attempting to undo the damage by redefining DevOps in his organization as a movement.

I’ve provided my own beliefs on what DevOps is here. I would say based on this I am more closely aligned with concept of DevOps as a movement than a role, but then again I am only one opinion amidst thousands. I would say there’s broad agreement that DevOps incorporates collaboration across the various phases of software development lifecycle, which, oddly enough, is not a widely-followed practice in many organizations. The primary cause as I can discern for the lack of collaboration is a belief that the infrastructure should be organized to support the needs of the application. However, with the movement to cloud computing, the need to reduce disparate hardware stacks and lower overhead of managing the data center, all of a sudden, business is starting to see the impact of that decision on the bottom line and reversing this trend. This requires that the application developers and testers understand the limits of the environment in which they will be deploying.

A subset of DevOps professionals also believe that DevOps is the combination of development and operations in a single role. Here, however, the development effort is required for automation of the deployment and management of an operational environment. System administrators have been developing scripts for years to simplify the task of managing and operating an environment and now that effort is being recognized as an important skill. That is, those administrators that can also code scripts are more valuable than those that simply administer through human intervention alone.

In light of this skill, these developers are also now being grouped into the DevOps movement. The only issue with this is that there is no clear delineation between application developers and the operational developers. Perhaps differentiation is unnecessary, except in the case of recruiting since an agile Java developer does not want to waste time interviewing for a position that ends up being Python scripts for deploying machine instances.

One thing is for sure, the DevOps movement is very important to improving the quality and support of applications and infrastructure services. As I stated in my blog entry, “Thar Be Danger in That PaaS” “PaaS, is fraught with pitfalls and dangers that could cause your application to stop running at any point. Moreover, should this occur, the ability to identify and correct the problem may be so far out of your hands that only by spending an inordinate amount of time with your PaaS provider's support personnel could the problem be corrected.” PaaS is one area where support requires the collaborative efforts of a group that has both application and operational experience and can work with each other to uncover problems for their customers.

That said, should you be hiring DevOps? Or should you be seeking System administrators with ability to program automation scripts and software engineers and architects with an understanding of infrastructure architecture?

That’s why DevOps is a movement that’s important to the Windows Azure Platform’s success. For my take on the subject, see my March 2011 How DevOps brings order to a cloud-oriented world tip for the SearchCloudComputing.com blog.

Full disclosure: I’m a paid contributor to the SearchCloudComputing.com blog.

Eric Knorr asserted “So much has happened so fast in the crazy jumble known as cloud computing, it's time to sort out what's really going on” in a deck for his The (real) state of the cloud, 2011 article of 8/15/2011 for InfoWorld’s Cloud Computing blog:

You can tell when an industry trend starts hitting the wall: Salespeople stop talking about it. A few sources have told me that, these days, when customers get a sales pitch that leads with "the cloud," they reply with a look that says: "If I hear 'the cloud' out of your mouth one more time, I'm going to kill you," or something to that effect.

Ironically, just when the cloud "brand" is faltering, the real-world value of this fuzzy collection of technologies and techniques is clearer than ever.

Just have a look at four distinct segments of the cloud in order of hotness: The private cloud is on fire -- for very good reasons that will be explained shortly. SaaS (software as a service), almost entirely a public cloud phenomenon, comes next. After that are the public IaaS (infrastructure as a service) offerings such as those from Amazon.com and Rackspace -- and finally commercial PaaS (platform as a service) environments for application development. Let's start at the top.

The allure of the private cloud
You can argue that the whole public cloud phenomenon began years ago when CIOs looked at Google's infinitely scalable, resilient infrastructure and said, "Me want some of that." In response, data center managers slapped their foreheads and explained through clenched teeth that Google's infrastructure was purpose-built for a single application -- utterly the wrong fit for the average data center.

Meanwhile, the proliferation of server virtualization began to form the foundation for a different sort of private cloud. Virtualization's combination of vastly improved hardware utilization and quick provisioning has proven irresistible, to the point where roughly half of servers in midsize to large companies today act as physical hosts to VMs. That's a huge shift from dedicated resources to pooled resources. …

Eric continues on page 2 with an analysis of SaaS with Office 365 as an example. Page 3 analyzes IaaS and PaaS as consumers of IaaS. Eric concludes:

As best as I can determine, PaaS is the area of the cloud that appeals least to enterprises. No big surprise there: For what particular reason would you have your developers create intellectual property on someone else's platform? Instead, enterprises tend to have a once-removed relationship with PaaS -- and hire outside development firms to build and deploy public-facing Web apps on PaaS platforms.

There are exceptions to this, of course, an obvious example being the many customers that use the Force.com platform to add to Salesforce's functionality. But for the most part, PaaS is the province of independent developers creating custom or commercial Web and/or mobile applications. The main effect of PaaS on enterprises, then, is that it provides a platform for a new generation of consurmerized applications business users are flocking to -- and IT is desperately trying to manage.

So there's your handy state of the cloud report, summary edition. One final note: I find myself agreeing with InfoWorld's David Linthicum when he says "It's official: 'Cloud computing' is now meaningless." The cloud nomenclature has always been frustratingly vague -- and vendors' tendencies to grandfather everything on earth into "the cloud" has made matters much, much worse. No wonder people are sick of it.

Got ideas for some new terminology? I'd love to hear about it.

Eric’s conclusion that enterprises will primarily use independent developers to implement Web sites with PaaS, such as Windows Azure, bodes well for .NET developers’ future.

Rich Miller reported Microsoft Renews Big Lease, Yahoo Doesn’t in an 8/10/2011 post to DataCenterKnowledge.com (missed when published):

Analysts who track the wholesale data center industry have been assessing the potential impact of large customers who have historically leased space but are now building their own data centers. Will these cloud-builders shift all their servers into their new facilities, leaving landlords to fill empty space once their leases expire?

Not necessarily. A case in point: Microsoft will renew one of its largest wholesale data center leases. Industry sources indicate Microsoft will renew a lease for a 10 megawatt data center in northern Virginia that was originally scheduled to expire in increments between 2012 and 2017. The lease will be renewed for eight years.

Need for Capacity

Microsoft has been building its own data centers throughout North America. Last year Microsoft announced plans to invest up to $499 million in a major new data center project in southern Virginia. The company said that a 175-acre site near Boydton, Virginia would be the location of a state-of-the-art facility featuring IT-PACs, Microsoft’s air-cooled modular data centers. That announcement raised an obvious question: Would Microsoft still need a large chunk of leased space in the same state?

Microsoft wouldn’t comment on the lease renewal, but confirmed that it is moving ahead full speed with the new data center in southern Virginia, citing stronger than anticipated demand for its cloud computing services – which in turn requires additional data center capacity. [Emphasis added.] …

Read more: Rich continues with a “Yahoo Migrating to New Facilities” topic.

Hadn’t heard much about Microsoft’s Southern Virginia data center recently.

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

No significant articles today.

Cloud Security and Governance

Todd Hoff asked and answered Should any cloud be considered one availability zone? The Amazon experience says yes in an 8/15/2011 post to his High Scalability blog:

Amazon has a very will written account of their 8/8/2011 downtime: Summary of the Amazon EC2, Amazon EBS, and Amazon RDS Service Event in the EU West Region. Power failed, backup generators failed to kick in, there weren't enough resources for EBS volumes to recover, API servers where overwhelmed, a DNS failure caused failovers to alternate availability zones to fail, a double fault occurred as the power event interrupted the repair of a different bug. All kind of typical stuff that just seems to happen.

Considering the previous outage, the big question for programmers is: what does this mean? What does it mean for how systems should be structured? Have we learned something that can't be unlearned?

The Amazon post has lots of good insights into how EBS and RDS work, plus lessons learned. The short of the problem is large + complex = high probability of failure. The immediate fixes are adding more resources, more redundancy, more isolation between components, more automation, reduce recovery times, and build software that is more aware of large scale failure modes. All good, solid, professional responses. Which is why Amazon has earned a lot of trust.

We can predict, however, problems like this will continue to happen, not because of any incompetence by Amazon, but because: large + complex make cascading failure an inherent characteristic of the system. At some level of complexity any cloud/region/datacenter could be reasonably considered a single failure domain and should be treated accordingly, regardless of the heroic software infrastructure created to carve out availability zones.

Viewing a region as a single point of failure implies to be really safe you would need to be in multiple regions, which is to say multiple locations. Diversity as mother nature's means of robustness would indicate using different providers as a good strategy. Something a lot of people have been saying for a while, but with more evidence coming in, that conclusion is even stronger now. We can't have our cake and eat it too.

For most projects this conclusion doesn't really matter all that much. 100% uptime is extremely expensive and Amazon will usually keep your infrastructure up and working. Most of the time multiple Availability Zones are all you need. And you can always say hey, we're on Amazon, what I can I do? It's the IBM defense.

All this diversity of course is very expensive and and very complicated. Double the budget. Double the complexity. The problem of synchronizing data across datacenters. The problem of failing over and recovering properly. The problem of multiple APIs. And so on.

Another option is a retreat into radical simplicity. Complexity provides a lot of value, but it also creates fragility. Is there way to become radically simpler?

Related Articles

Amazon Discussion Forums

Apache Libcloud - a unified interface to the cloud.

From RightScale

Cloud Computing Events

Ben Kepes (@benkepes) will present a Creative Configurations Mixing and Matching Public, Private and Hybrid Clouds for Maximum Benefits Webinar in his Cloud U series for Rackspace on 8/25/2011 at 11:00 AM PDT:

Meeting Description:

Perhaps one of the most contentious debates in the Cloud Computing world is that around Private Clouds. Many commentators remain adamant that Private Cloud does not, in fact, constitute a legitimate example of the Cloud. Others are more pragmatic and see Private Cloud as well as Hybrid approaches as logical stepping stones towards the Cloud.

In this webinar we will define these three distinct delivery mechanisms; Public Cloud, Private Cloud and Hybrid Cloud and show how any of the three may be the best approach for customers depending on the particulars of the use case.

Have questions that you would like to ask the panel? Send questions to cloudU@rackspace.com or ask us on Twitter using the hashtag #cloudU.

Download the whitepaper: Read it now

See what else is coming from CloudU in 2011: Register for upcoming sessions or subscribe for updates when new whitepapers are published at the CloudU website.

Other Cloud Computing Platforms and Services

Lydia Leong (@CloudPundit) posted OpenStack, community, and commercialization to her CloudPundit blog on 8/15/2011:

I wrote, the other day, about Citrix buying Cloud.com, and I realized I forgot to make an important point about OpenStack versus the various commercial vendors vying for the cloud-building market; it’s worthy of a post on its own.

OpenStack is designed by the community, which is to say that it’s largely designed by committee, with some leadership that represents, at least in theory, the interests of the community and has some kind of coherent plan in mind. It is implemented by the community, which means that people who want to contribute simply do so. If you want something in OpenStack, you can write it and hope that your patches are included, but there’s no guarantee. If the community decides something should be included in OpenStack, they need some committers to agree to actually write it, and hope that they implement it well and do it in some kind of reasonable timeframe. [Link added.]

This is not the way that one normally deals with software vendors, of course. If you’re a potentially large customer and you’d like to use Product X but it doesn’t contain Feature Y that’s really important to you, you can normally say to the vendor, “I will buy X if you had Y within Z timeframe,” and you can even write that into your contract (usually witholding payment and/or preventing the vendor from recognizing the revenue until they do it).

But if you’re a potentially large customer that would happily adopt OpenStack if it just had Feature Y, you have miminal recourse. You probably don’t actually want to write Feature Y yourself, and even if you did, you would have no guarantee that you wouldn’t be maintaining a fork of the code; ditto if you paid some commercial entity (like one of the various ventures that do OpenStack consulting). You could try getting Feature Y through the community process, but that doesn’t really operate on the timeframe of business, nor have any guarantees that it’ll be successful, and also requires you to engage with the community in a way that you may have no interest in doing. And even if you do get it into the general design, you have no control over implementation timeframe. So that’s not really doable for a business that would like to work with a schedule.

There are a growing number of OpenStack startups that aim to offer commercial distributions with proprietary features on top of the community OpenStack core, including Nebula and Piston (by Chris Kemp and Joshua McKenty, respectively, and funded by Kleiner Perkins and Hummer Winblad, respectively, two VCs who usually don’t make dumb bets). Commercial entities, of course, can deal with this “I need to respond to customer needs more promptly than the open source community can manage” requirement.

There are many, many entitities, globally, telling us that they want to offer a commercial OpenStack distribution. Most of these are not significant forks per se (although some plan to fork entirely), but rather plans to pick a particular version of the open source codebase and work from there, in order to try to achieve code stability as well as add whatever proprietary features are their secret source. Over time, that can easily accrete into a fork, especially because the proprietary stuff can very easily clash with whatever becomes part of OpenStack’s own core, given how early OpenStack is in its evolution.

Importantly, OpenStack flavors are probably not going to be like Linux distributions. Linux distributions differ mostly in which package manager they use, what packages are installed by default, and the desktop environment config out of the box — almost cosmetic differences, although there can be non-cosmetic ones (such as when things like virtualization technologies were supported). Successful OpenStack commercial ventures need to provide significant value-add and complete solutions, which, especially in the near term when OpenStack is still a fledgling immature project, will result in a fragmentation of what features can be expected out of a cloud running OpenStack, and possibly significant differences in the implementation of critical underlying functionality.

I predict most service providers will pick commercial software, whether in the form of VMware, Cloud.com, or some commercial distribution of OpenStack. Ditto most businesses making use of cloud stack software to do something significant. But the commercial landscape of OpenStack may turn out to be confusing and crowded.

Steven Spector posted OpenStack Conference – Call for Speakers on 8/3/2011 (missed when published):

On behalf of the OpenStack Conference Program Committee, I am pleased to initiate the Call for Speakers for the Fall 2011 OpenStack Conference in Boston, MA from October 5-7. This gathering of OpenStack developers, users, eco-system partners, open source enthusiasts, and cloud computing technologists presents speakers with the opportunity to actively participate in the shaping of the future of the OpenStack project. I look forward to joining with the community in Boston for an amazing 3 days of OpenStack goodness.

PROCESS
The Program Committee has listed a series of suggested topics for the overall agenda of the conference. Please review these topics and submit your talk under the various topic(s). Of course, if you have an idea for a topic not listed, please send your information as the Program Committee is always open to new ideas.

To submit your request to be a speaker please send an email to stephen.spector@openstack.org with the Subject Line: OPENSTACK CONFERENCE SPEAKER SUBMISSION. In the email be sure to include your contact information as well as the topic you are interested in speaking on. Should you submit a new topic, please provide details on that topic.

TOPICS
The topics suggested by the Program Committee fit under two umbrella categories: technical or business. Each session is currently planned for 30 minutes.

Technical
Project Overview – NOVA
Project Overview - SWIFT
Project Overview – GLANCE
Community Developer Tools
OpenStack in the Data Center
OpenStack Deployments
DevOps
Project Introductions – Information on the various eco-system projects not yet core in OpenStack

Business
Economics of OpenStack
OpenStack Case Study
Project Overview – NOVA
Project Overview - SWIFT
Project Overview – GLANCE
Building an OpenStack Practice (Solution Providers)
OpenStack Case Study
Public Cloud Hosting Best Practices
Private Clouds Hosting Best Practices

The Program Committee anxiously awaits your speaker submission as we assemble the final OpenStack Conference Agenda. The deadline to submit your request is September 6, 2011 so that we can release the final agenda on September 14, 2011. All speakers selected will receive a complementary pass to the event via a special registration code. If you have any further questions, please contact me.

Karl Seguin (@karlseguin) explained How You Should Go About Learning NoSQL in an 8/15/2011 post to his OpenMyMind.net blog:

Yesterday I tweeted three simple rules to learning NoSQL. Today I'd like to expand on that. The rules are:

1: Use MongoDB. 2: Take 20 minute to learn Redis 3: Watch this video to understand Dynamo.

Before we get going though, I want to talk about two different concepts which'll help us when we talk about specific technologies.

Secondary Indexes and Joins

First let's talk about secondary indexes. In the relational world, you can normally have as many indexes on a table as you want. Although a primary index (or primary key) always has to be unique, there really isn't any appreciable difference between a primary and secondary index (which is any other index which isn't your primary key). We're talking about this because some (though certainly not all!) NoSQL solutions don't offer secondary indexes. Your very first thought might be that this is insanity, but let's just see where it takes us.

Let's imagine that our relational databases didn't have secondary indexes, what would we do? It turns out that managing our own indexes isn't that difficult. Say we have a Scores table which had an Id, LeaderboardId, UserId and Score column. Our primary index will go on the Id column. However, we also want to be get scores based on their LeaderboardId. The solution? We create a 2nd table which has two columns: LeaderboardId and ScoreIds. In this case, we index the LeaderboardIds so that we can get all of the ScoreIds that belong to a given leaderboard. Whenever we add a new score, we push it onto the ScoreIds column. With this list, we can then fetch all the scores by their Id. Getting the scores which belong to a leaderboard would be 2 queries (both using an index). First getting all the ScoreIds by a given LeaderboarId, then getting all the matchin Score by Id. Storing a score would also be two queries.

Obviously this doesn't work well with relational databases since we'd probably have to treat ScoreIds as a comma-delimited string. However, if the storage engine treated arrays as first class objects (so that we can push, remove and slice in constant time) it wouldn't be the most ridiculous approach in the world (although, there's no denying that a secondary index is better).

The other thing we need to talk about are joins. While some NoSQL solutions support secondary indexes and some don't, they almost all agree that joins suck. Why? Because joins and sharding don't really work together. Sharding is the way that most NoSQL solutions scale. Keeping things simple, if we were to shard our above Scores example, all the scores for leaderboard 1, 3, 5, 7 and 9 might be on server 1, while server 2 contained all the scores for leaderboard 2, 4, 6, 8 and 10. Once you start to split your data around like this, joining just doesn't make sense. How do we grab the UserName (joined on Scores.UserId to Users.Id) when users are shared across different leaderboards ?

So, how do we deal with a joinless world? First, NoSQL folk aren't afraid to denormalize. So, the simplest solution to our above problem is to simply stick the UserName within Scores. That won't always work though. The solution is to join within your application. First you grab all the scores, from these you extract the UserIds and then issue a 2nd query to get the UserNames. You are essentially adding complexity in your code so that you can scale horizontally (aka, on the cheap).

MongoDB

With the above out of the way, we can talk about MongoDB. This is easily the first NoSQL solution you should use for a couple of reasons. First, it's easy to get setup on any operating system. Goto this horrible download page (which could make Barry Schwartz write another book), download the right package, unzip, create c:/data/db (or /data/db), startup bin/mongod and you're done. You can connect by either running bin/mongo, or downloading a driver for your favorite programming language.

The other nice thing about MongoDB is that it fully supports secondary indexes and is, aside from the lack of joins, not that different in terms of data modeling. The whole thing is pretty effortless, from setup to maintenance, from modeling to querying. It's also one of the more popular NoSQL solutions, so it's a relatively safe bet. A lot of NoSQL solutions are about solving specific problems. MongoDB is a general solution which can (and probably should) be used in 90% of the cases that you currently use an RDBMS.

MongoDB isn't perfect though. First, the website and online documentation are brutal. Thankfully, the official Google Group is very active and I wrote a free little ebook to help you get started. Secondly, once your working set no longer fits in memory, MongoDB seems to perform worse than relational databases (otherwise it's much faster). As a mixed blessing, MongoDB relies on MapReduce for analytics. It's much more powerful than SQL aggregate capabilities, but it's currently single threaded and doesn't scale like most of us would expect a NoSQL solution to. My final complaint is that, compared to other NoSQL solutions, MongoDB has average availability. It's in the same ballpark as your typical RDBMS setup.

Finally, if you just want to try MongoDB quickly, you can always try out the online tutorial I wrote, you'll be connected to a real MongoDB instance!

Redis

Redis is the most misunderstood NoSQL solution out there. That's a real shame considering you can absolutely 100% master it in about 30 minutes. You can download, install and master Redis in the time it'll take to download SQL Server from MSDN. People (including Redis people) often call Redis a key=>value store. I think the right way to think about Redis is as an in-memory data structure engine. WTF does that mean? It means Redis has 5 built-in data structures which can do a variety of things. It just so happens that simplest data structure is a key value pair (but there are 4 others, and they are awesome).

Now, Redis requires a pretty fundamental shift in how you think of your data. Oftentimes you'll use it to supplement another storage engine (like MongoDB) because some of your data will have been born to sit inside one of Redis data structures while others will be like trying to force a square peg in a round hole. Like MongoDB, Redis is super easy to setup and play with. Windows users will want to use this port for testing. Unlike MongoDB, Redis doesn't support secondary indexes. However, one if its data structure, Lists, is perfectly suited for maintaining your own.

Let's look at an example. We keep track of the total number of users and the number of unique (per day) users that log into our system. Tracking the total numbers is easy. We'll use the simplest String data structure, which is the key value pair. Our key will be the date, say "2011-08-15" (Redis keys don't have to be strings, any byte data will do). If we visit the String documentation we see that they support an INCR command. So, for every hit, all we do is redis.incr(Time.now.utc.strftime('%Y-%m-%d')). If we want to get the numbers for the past week, we can do redis.mget *Array.new(7){|i| (Time.now.utc - (86400 * i)).strftime('%Y-%m-%d') }.

For our unique users, we'll use the a Set structure. On each hit we'll also call redis.sadd(Time.now.utc.strftime('%Y-%m-%d'), USER). We can get the count by using the scard command. We don't actually have to worry about duplicates, that's what the Redis' set takes care of for us. (At the end of the day we can turn our set into a simple string value to save space).

Redis isn't a perfect solution though. First, sometimes your data just won't be a good fit. Secondly, it requires that all your data fits into memory (the VM doesn't really work great). Also, until Redis Cluster comes out, you're stuck with replication and manual failover. However, it's fast, well documented and when the model works out (which is often a matter of changing how you look at it) you can achieve amazing things with a few lines of code.

I've blogged a bit about Redis Modeling (which I think is the biggest barrier to entry). The first blog post talks about using Redis with MongoDB. The second post looks at dealing with time-based sequences using Sorted Sets.

Dynamo/Cassandra

The last point I want to talk about is Cassandra and Dynamo. Dynamo is a set of patterns you can use to build a high-available storage engine. Cassandra is the most popular open source implementation. Now, I know the least about these, so I'm not going to go in any great detail. I will say that you really ought to watch this video which describes Dynamo. The video is given from Riak's point of view, which is another open source Dynamo implementation. But it's pretty generic.

Dynamo is very infrastructure-oriented and might not seem as interesting/relevant to day to day programming. However, the above video is so good (if not a little long), that I think it's well worth it if it makes you think about availability in a new light (as it did for me). Since watching the video I've been pestering the MongoDB folk to implement better availability, it just seems so right.

Where I think you should download and play with MongoDB and Redis today, I think you can take your time around Cassandra or Riak. First, they are both harder to setup. Secondly, they both require changes to how you model your data (in a way that I think is more pervasive than Redis in that you'll probably only use Redis in specific, well-fitting, situations). Finally, and this might just be ignorance on my part, but I found the Ruby cassandra driver to be an absolute nightmare. Java folk will probably have a better time (since Cassandra is written in Java).

In other words, I think Dynamo is worth familiarizing yourself with because I think availability is important, but if you need to use a dynamo-solution, you'll know it (and won't need me to tell you).

Conclusion

There isn't much more to say other than you should just go ahead and have fun. NoSQL is a big world, and solutions vary in complexity and differentness. That's why I think MongoDB, which isn't very different, and Redis, which is different but very simple, are a great place to start.

Check Karl’s About page to see a list of his MongoDB-related projects with brief descriptions: mogade, mongodb interactive tutorial, mongodb geospatial tutorial, and the little mongodb book.

Scalebase, Inc. (@SCLBase) claimed Scalebase 1.0 “Transforms Scalability Model for MySQL Databases Allowing for Enterprise-Level High Availability” in a deck for its ScaleBase Delivers Transparent Scaling to MySQL Databases press release of 8/15/2011:

Boston, Mass., August 15, 2011 – ScaleBase, Inc. today announced the general availability of ScaleBase 1.0 for unlimited scalability of MySQL databases. ScaleBase 1.0 delivers MySQL performance and high availability, without the need to change a single line of application code. Users of MySQL can download and easily deploy the software by visiting http://www.scalebase.com/solution/download/.

ScaleBase utilizes two techniques for scaling: read-write splitting and transparent sharding (a technique for massively scaling-out relational database). The software enables MySQL to scale transparently, without forcing developers to change a single line of code or perform a long data migration process. This new technology is ideally suited for any application in which scalability, performance and speed are critical, including: gaming, e-commerce, SaaS, machine generated data, web 2.0, and more.

“MySQL users are constantly frustrated with the inability to easily scale their databases to meet growing demands,” said Doron Levari, CEO of ScaleBase. “ScaleBase puts an end to these frustrations with unlimited, easy-to-use, transparent scaling. With the increasing deluge of data flooding businesses today, the ability to have flexible systems that can grow organically over time is critical for companies of all sizes.”

The ScaleBase Beta program began in January 2011, and the software was downloaded and tested by more than 500 users. ScaleBase has numerous Beta customer-use cases that illustrate the effectiveness of the technology and how it results in significantly faster application performance.

Customer and Industry Feedback

Matthew Aslett, Senior Analyst, Enterprise Software, The 451 Group (March 25, 2011 report)

“What differentiates ScaleBase is its ability to add scalability without the need to migrate to a new database architecture or make any changes to existing applications.”

SolarEdge (ScaleBase Customer)

“Our application has massive data requirements that we weren’t able to meet with a single MySQL database,” said Amir Fishelov, Chief Software Architect of SolarEdge. “We evaluated building an internal sharding solution, and even evaluated some other sharding tools, but at the end of the day we chose ScaleBase for its transparent sharding solution. We’ve seen significantly faster application performance, and couldn’t be more pleased with the results. Our database scalability issues are over.”

Paul Burns, President, Neovise

“Scaling up databases is inherently challenging. It gets expensive, performance and availability can suffer, and there still isn’t enough capacity for very large scale applications,” said Paul Burns, president of Neovise, an analyst firm focused on cloud computing. “ScaleBase addresses all these problems by transparently scaling out databases with its unique database load balancer. Customers can take advantage of low-cost cloud or commodity servers while at the same time increasing performance, scale and availability.”

BuildFax (ScaleBase Customer)

“We have looked at every option we could find for scaling our existing MySQL database, including switching to NoSQL or NewSQL platforms. We chose ScaleBase because we believe that it is—hands down—the best solution for us.” Joe Masters Emison, VP, Research and Development for BuildFax. “With ScaleBase, we know we can plan to scale for the long-term, without burdening IT staff and database users. ScaleBase technology puts our minds at ease in regards to a critical component of our infrastructure.”

Pricing & Availability

ScaleBase is available for download immediately. Evaluation for 30 days is free without any obligation, and pricing details are available at http://www.scalebase.com/resources/pricing/. Users can register and download the software here: http://www.scalebase.com/solution/download/

About ScaleBase

ScaleBase is a privately-held company, founded in 2009, that has developed an innovative database scaling technology. The ScaleBase product is ideally suited for any application in which scalability, performance and speed are critical, including: gaming, e-commerce, SaaS, machine generated data and more. ScaleBase is an Authorized Amazon.com and Oracle Partner. The company is headquartered near Boston, Mass. Follow ScaleBase on Twitter.