Thursday, February 02, 2012

Windows Azure and Cloud Computing Posts for 2/1/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI Access Control, Connect, SQL Azure Database, and other cloud-computing articles. image222


• Updated 2/2/2012 with new articles marked .

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

Azure Blob, Drive, Table, Queue and Hadoop Services

• Cory Fowler (@SyntaxC4) described getting the Windows Azure Storage Emulator Account Name and Key in a 2/1/2012 post:

imageIt’s almost funny to say, but I’m working on a legacy Windows Azure Application in which the need of the Windows Azure Storage Account Name and Account Key need to be given instead of a Storage Account Connection String.

What’s the Difference? You need the Name & Key for a Connection String

imageWhile this is true in some scenarios, there is one distinct difference especially when it comes down to running in the local Storage Emulator. The following are considered Connection Strings:

<Setting name="DataConnectionString" value="usedevelopmentstorage=true" />


     <add key="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=myuniqueaccountname;AccountKey=superlongsecurestoragekey" />

The second example of a connection string could be easily parsed to extract the AccountName and AccountKey, unlike the first example which is absolutely impossible to provide any valuable information. If you’re using the original Microsoft Sample code for Windows Azure Storage Session State Management which doesn’t use the convention of the connection string, you’d be sadly out of luck. This is what you’ll find in the Web.Config file:

     <add key="AccountName" value="myuniqueaccountname"/>
     <add key="AccountKey" value="superlongsecurestoragekey"/>
Oh No, We’re doomed right?

Not at all. Microsoft has provided the storage credentials for the local Storage Emulator on MSDN. Here’s a quick place to copy the default name and key:


Mariano Vazquez (@marianodvazquez) explained Browsing Blobs within a container using the Windows Azure Node SDK in a 1/30/2012 post to Southworks’ Node on Azure blog:

imageThe last couple of weeks we were working on a new, exciting project: a collaborative, real-time markdown editor that runs on a NodeJS server, hosted on Windows Azure (you'll hear more about this soon). One of the features that this app will have is the ability to store the .markdown files in either your local disk or Azure Blob Storage. To achieve this, we investigated the best way to list the containers and blobs of a specific account, and navigate them like we were dealing with an hierarchical structure. We found out that this can be done, but is not as easy and it sounds (you have to use a combination of the prefix and delimiter options of the listBlobs operation, along with inspecting the BlobPrefix element returned in the Blobs REST API response).

imageIn the next lines, we are going to explain how you can implement this functionality in your application. We start from showing the blobs in a simple, flat-list to demonstrate how to organize them in a more complex structure, like if you were navigating your local file system directories.

imageTo access blob storage, we used the Windows Azure SDK for Node.js (you can install this module in your node application by doing npm install azure). Also, don't forget to install the Windows Azure SDK for Node.js to emulate the Azure environment locally.

Basic scenario

This is all the code you need to list the containers in your storage account, and show the blobs inside. Because all blobs are stored inside a container, you need to perform two separated calls: one to retrieve all containers in the account and other to retrieve the blobs inside a particular container.

var azure = require('azure');


function Home () {
    this.blobService = azure.createBlobService();

Home.prototype = {
    showContainers: function (req, res) {
        var self = this;
        self.blobService.listContainers(function(err, result){
            // some code here to show the results
    showBlobs: function(req, res){
        var self = this;
        var containerName = req.query['containerName'];
        if (!containerName)
            self.showContainers(req, res);
            self.blobService.listBlobs(containerName, function(err, result){
                // some code here to show the results


Below is a sample result of what you may get. In this case, we listed only the blob names. Notice that the listBlobs operation is returning every blob within the container, using flat blob listing (more info about flat blob listing here).

There's nothing wrong with the code above, and it might be sufficient for you and your business needs (actually, it will work great if all your blobs are children of the container). But what happens if your containers have a considerable amount of blobs, organized in a logic way, and you want to retrieve them in a lazy manner? If that's the case you're facing, keep reading.

Using directories approach

You can filter the results of the listBlob operation by setting the prefix and delimiter parameters. The first one is used, as its name claims, to return only the blobs whose names begin with the value specified. The delimiter has two purposes: to skip from the result those blobs whose names contains the delimiter, and to include a BlobPrefix element in the REST API response body. This element will act as a placeholder for all blobs whose names begin with the same substring up to the appearance of the delimiter, and will be used to simulate a directory hierarchy (the folders will be listed there).

var azure  = require('azure');


function Home () {
    this.blobService = azure.createBlobService();

function getFiles(collection){
    var items = [];
    for(var key in collection){
        var item = collection[key];
        var itemName ='/')['/').length - 1];
        items.push({ 'text': itemName, 'classes': 'file'});
    return items;

function getFolders(containerName, collection){
    var items = [];
    //if BlobPrefix contains one folder is a simple JSON. Otherwise is an array of JSONs
    if (collection && !collection.length){
        temp = collection;
        collection = [];
    for(var key in collection){
        var item = collection[key];
        var itemName = item.Name.replace(/\/$/, '').split('/')[item.Name.replace(/\/$/, '').split('/').length - 1];
        items.push({ 'text': itemName, 'classes': 'folder' });
    return items;

Home.prototype = {


    listBlobs: function(containerName, prefix, delimiter, callback){
        var self = this;
        self.blobService.listBlobs(containerName,{ 'prefix': prefix, 'delimiter': delimiter}, function(err, result, resultCont, response){
                var files = getFiles(result);
                var folders = getFolders(containerName, response.body.Blobs.BlobPrefix);
                var childs = folders.concat(files);
                // return the childs


This is what we're doing in the lines above:

  • First, we parse the result to access the blobs returned (by calling getFiles).
  • To generate the folders, we parse the BlobPrefix element (located inside the Response Body).
  • Lastly, we join these two collections into one.

We created this sample to demonstrate how you can use all of this in a real-world scenario. It shows you how to correctly parse the JSON returned by the listBlobs operation, and how to use this information to populate a control in the View, like a jQuery Treeview Plugin.

Enjoy coding!

• Josh Wills (@josh_wills) described the “practice of data science” in a Seismic Data Science: Reflection Seismology and Hadoop post of 1/25/2012 to Cloudera’s Developer Center blog:

imageWhen most people first hear about data science, it’s usually in the context of how prominent web companies work with very large data sets in order to predict clickthrough rates, make personalized recommendations, or analyze UI experiments. The solutions to these problems require expertise with statistics and machine learning, and so there is a general perception that data science is intimately tied to these fields. However, in my conversations at academic conferences and with Cloudera customers, I have found that many kinds of scientists– such as astronomers, geneticists, and geophysicists– are working with very large data sets in order to build models that do not involve statistics or machine learning, and that these scientists encounter data challenges that would be familiar to data scientists at Facebook, Twitter, and LinkedIn.

The Practice of Data Science

imageThe term “data science” has been subject to criticism on the grounds that it doesn’t mean anything, e.g., “What science doesn’t involve data?” or “Isn’t data science a rebranding of statistics?” The source of this criticism could be that data science is not a solitary discipline, but rather a set of techniques used by many scientists to solve problems across a wide array of scientific fields. As DJ Patil wrote in his excellent overview of building data science teams, the key trait of all data scientists is the understanding “that the heavy lifting of [data] cleanup and preparation isn’t something that gets in the way of solving the problem: it is the problem.”

imageI have found a few more characteristics that apply to the work of data scientists, regardless of their field of research:

  1. Inverse problems. Not every data scientist is a statistician, but all data scientists are interested in extracting information about complex systems from observed data, and so we can say that data science is related to the study of inverse problems. Inverse problems arise in almost every branch of science, including medical imaging, remote sensing, and astronomy. We can also think of DNA sequencing as an inverse problem, in which the genome is the underlying model that we wish to reconstruct from a collection of observed DNA fragments. Real-world inverse problems are often ill-posed or ill-conditioned, which means that scientists need substantive expertise in the field in order to apply reasonable regularization conditions in order to solve the problem.
  2. Data sets that have a rich set of relationships between observations. We might think of this as a kind of Metcalfe’s Law for data sets, where the value of a data set increases nonlinearly with each additional observation. For example, a single web page doesn’t have very much value, but 128 billion web pages can be used to build a search engine. A DNA fragment in isolation isn’t very useful, but millions of them can be combined to sequence a genome. A single adverse drug event could have any number of explanations, but millions of them can be processed to detect suspicious drug interactions. In each of these examples, the individual records have rich relationships that enhance the value of the data set as a whole.
  3. Open-source software tools with an emphasis on data visualization. One indicator that a research area is full of data scientists is an active community of open source developers. The R Project is a widely known and used toolset that cuts across a variety of disciplines, and has even been used as a basis for specialized projects like Bioconductor. Astronomers have been using tools like AIPS for processing data from radio telescopes and IRAF for data from optical telescopes for more than 30 years. Bowtie is an open source project for performing very fast DNA sequence alignment, and the Crossbow Project combines Bowtie with Apache Hadoop for distributed sequence alignment processing.

We can use the term “data scientist” as a specialization of “scientist” in the same way that we use the term “theoretical physicist” as a specialization of “physicist.” Just as there are theoretical physicists that work within the various subdomains of physics, such as cosmology, optics, or particle physics, there are data scientists at work within every branch of science.

Data Scientists Who Find Oil: Reflection Seismology

Reflection seismology is a set of techniques for solving a classic inverse problem: given a collection of seismograms and associated metadata, generate an image of the subsurface of the Earth that generated those seismograms. These techniques are primarily used by exploration and production companies in order to locate oil and natural gas deposits, although they were also used to identify the location of the Chicxulub Crater that has been linked to the extinction of the dinosaurs.

Marine Seismic Survey

Seismic data is collected by surveying an area that is suspected to contain oil or gas deposits. Seismic waves are generated from a source, which is usually an air gun in marine surveys or a machine called a Vibroseis for land-based surveys. The seismic waves reflect back to the surface at the interfaces between rock layers, where an array of receivers record the amplitude and arrival times of the reflected waves as a time series, which is called a trace. The data that is generated from a single source is called a shot or shot record, and a modern seismic survey may consist of tens of thousands of shots and multiple terabytes of trace data.

Common Depth Point (CDP) Gather

In order to solve the inversion problem, we take advantage of the geometric relationships between traces that have different source and receiver locations but a common depth point (also known as a common midpoint). By comparing the time it took for the seismic waves to travel from the different source and receiver locations and experimenting with different velocity models for the waves moving through the rock, we can estimate the depth of the common subsurface point that the waves reflected off of. By aggregating a large number of these estimates, we can construct a complete image of the subsurface. As we increase the density and the number of traces, we can create higher quality images that improve our understanding of the subsurface geology.

A 3D seismic image of Japan's southeastern margin

Additionally, seismic data processing has a long history of using open-source software tools that were initially developed in academia and were then adopted and enhanced by private companies. Both the Seismic Unix project, from the Colorado School of Mines, and SEPlib, from Stanford University, have their roots in tools created by graduate students in the late 1970s and early 1980s. Even the most popular commercial toolkit for seismic data processing, SeisSpace, is built on top of an open source foundation, the JavaSeis project.

Hadoop and Seismic Data Processing

Geophysicists have been pushing the limits of high-performance computing for more than three decades; they were early adopters of the first Cray supercomputers as well as the massively parallel Connection Machine. Today, the most challenging seismic data processing tasks are performed on custom compute clusters that take advantage of multiple GPUs per node, high-performance networking and storage systems for fast data access.

The data volume of modern seismic surveys and the performance requirements of the compute clusters means that data from seismic surveys that are not undergoing active processing are often stored offsite on tape. If a geophysicist wants to re-examine an older survey, or study the effectiveness of a new processing technique, he must file a request to move the data into active storage and then consume precious cluster resources in order to process the data.

Fortunately, Apache Hadoop has emerged as a cheap and reliable online storage system for petabyte-scale data sets. Even better, we can export many of the most I/O intensive steps in the seismic data processing into the Hadoop cluster itself, thus freeing precious resources in the supercomputer cluster for the most difficult and urgent processing tasks.

Seismic Hadoop is a project that we developed at Cloudera to demonstrate how to store and process seismic data in a Hadoop cluster. It combines Seismic Unix with Crunch, the Java library we developed for creating MapReduce pipelines. Seismic Unix gets its name from the fact that it makes extensive use of Unix pipes in order to construct complex data processing tasks from a set of simple procedures. For example, we might build a pipeline in Seismic Unix that first applies a filter to the trace data, then edits some metadata associated with each trace, and finally sorts the traces by the metadata that we just edited:

sufilter f=10,20,30,40 | suchw key1=gx,cdp key2=offset,gx key3=sx,sx b=1,1 c=1,1 d=1,2 | susort cdp gx

Seismic Hadoop takes this same command and builds a Crunch pipeline that performs the same operations on a data set stored in a Hadoop cluster, replacing the local susort command with a distributed sort across the cluster using MapReduce. Crunch takes care of figuring out how many MapReduce jobs to run and which processing steps are assigned to the map phase and which are assigned to the reduce phase. Seismic Hadoop also takes advantage of Crunch’s support for streaming the output of a MapReduce pipeline back to the client in order to run the utilities for visualizing data that come with Seismic Unix.

Challenges to Solve Together

Talking to a geophysicist is a little bit like seeing into the future: the challenges they face today are the challenges that data scientists in other fields will be facing five years from now. There are two challenges in particular that I would like the broader community of data scientists and Hadoop developers to be thinking about:

  1. Reproducibility. Geophysicists have developed tools that make it easy to understand and reproduce the entire history of analyses performed on a particular data set. One of the most popular open source seismic processing toolkits, Madagascar, even chose as its home page. Reproducible research has enormous benefits in terms of data quality, transparency, and education, and all of the tools we develop should be built with reproducibility in mind.
  2. Dynamic and resource-aware scheduling of jobs on heterogeneous clusters. MR2 and YARN will unleash a Cambrian explosion of data-intensive processing jobs on Hadoop clusters. What was once only MapReduce jobs will now include MPI jobs, Spark queries, and BSP-style computations. Different jobs will have radically different resource requirements in terms of CPU, memory, disk, and network utilization, and we will need fine-grained resource controls, intelligent defaults, and robust mechanisms for recovering from task failures across all job types.

It is an unbelievably exciting time to be working on these big data problems. Join us and be part of the solution!

Josh is a data scientist for Cloudera and the creator of Crunch.

Andrew Brust (@andrewbrust) asserted “By responding to potential threats with thoughtfulness, and a zeal to add value, SQL and Big Data could be big business for Redmond” as a deck for his Big Data and SQL Server: Disruption or Harmony? article of 2/1/2012 for the Redmond Developer News site:

imageThe SQL Server relational engine matured a long time ago. There have been advances, of course, in performance, fault tolerance and high availability, not mention encryption, compression and file-system integration. And, yes, there's been support for XML, geospatial data and even a service broker. But these have been improvements on the margins. All these features are new flavors of icing; the cake has stayed the same.

And yet, over the 20-year lifetime of SQL Server, Microsoft has continued to add value to the product, even as the core features have gone into maintenance mode. The big story here has been business intelligence: BI capabilities started in SQL Server 7 with OLAP Services, and have been expanded in meaningful ways with every subsequent release. It's been a great strategy and it still is. But will it keep working?

Core capabilities can't stay static forever; eventually disruption comes along. Today that disruption is here, coming from Big Data, Hadoop and its MapReduce distributed-computing approach. As strong as SQL Server BI capabilities are (and they're getting even stronger with SQL Server 2012), if Microsoft can't embrace Big Data technology, SQL Server could find itself in a position of desperation, going from underdog to contender, to retiree. What's a poor enterprise software company to do?

Build Bridges, Don't Burn Them
image_thumb3_thumbMicrosoft could ignore Hadoop, but that would be foolish. It could try to build a competitor, which it almost did with the Microsoft Research Dryad project, but I fear not too many people would have come to that party. Microsoft could just adopt Hadoop, plain vanilla, but that would most likely be a race to the bottom, and it wouldn't even win. Really, Microsoft must mix Hadoop into its bag of tricks and do what it has always done best: take a raw technology and make it approachable to the enterprise. I can't be sure yet, but I think that's what Microsoft has done, and it has enhanced the value of Windows Azure in the process.

With code name "Project Isotope," Microsoft has taken the step of implementing Hadoop on Windows. It's a no-slouch effort, too: Microsoft's distribution of Hadoop is being developed in concert with Hortonworks, a startup company founded and staffed by many former Hadoop team members at Yahoo! (where the open source project began). But what Microsoft has also done is integrate Hadoop into its BI stack, and that may be one of the smartest moves it's made in quite some time.

Microsoft has created an Excel add-in for Hive, which provides a SQL-like abstraction over Hadoop and MapReduce. The add-in is based on an ODBC driver, which in turn is compatible with PowerPivot, so business users can do meaningful analysis on Big Data, on their own terms. And because the same engine that drives PowerPivot has been implemented inside Analysis Services in SQL Server 2012, that product has access to Hadoop now, too. With that, Microsoft has joined the Big Data and Enterprise BI worlds. It has also tied together SQL Server and Hadoop.

Simplify and Succeed
imageHadoop runs on Windows Server and on Windows Azure, with an installer that makes setup really easy. In addition to those configurations, Microsoft allows Windows Azure users to provision an entire Hadoop cluster from a Web portal, without any discrete installation steps at all. Once the cluster's up, customers can connect to its head node via Remote Desktop. The full Java-based command-line personality of Hadoop is there if you want it, but there are also Hive and JavaScript consoles in the browser. And then it only takes a minute or two to build out a connection from Microsoft BI tools or Excel, putting Hadoop to work for users fitting a number of profiles. It's classic Microsoft, mixed with equal parts open source and Java.

I've said before that Microsoft does some of its best work when it embraces standards from outside the company. That's what happened with jQuery and ASP.NET, and it may well be what happens with HTML5 and JavaScript in Windows 8. In the case of implementing Hadoop, Microsoft makes Windows Azure more valuable and more agnostic. It brings continued relevance to Windows Server and SQL Server. And it widens the reach and utility of Hadoop. By responding to a potential threat with thoughtfulness -- and a zeal to add value -- Big Data could be big business for Redmond.

Full disclosure: I’m a contributing editor for Visual Studio Magazine, which is published by 1105 Media who also own the Redmond Developer News site.

<Return to section navigation list>

SQL Azure Database, Federations and Reporting

• Herve Roggero (@hroggero) posted a Quick Review of Backup tools for SQL Azure on 2/1/2012:

imageThe landscape of SQL Azure backups is changing rapidly. A few tools are becoming available at no charge and Microsoft is adding capabilities over time. Here a quick update.

Microsoft Tools

imageMicrosoft offers two primary backup mechanisms so far:

  • Export/Import feature available on the Azure Management portal
  • The COPY operation as part of its T-SQL CREATE DATABASE statement

These mechanisms do not offer a scheduling component and do not work together. To obtain a transactionally consistent backup, you first need to perform the COPY operation manually, then run the Export function.

Free Tools

You have other tools on the market that are a bit more comprehensive at no charge. Here are a few:

  • Enzo Backup for SQL Azure (Standard Edition)
  • Red Gate's SQL Azure Backup (backup only; no restore; no cloud backup)

These are the only two third-party products available at no charge I am aware of right now. Note that Red Gate's product provides a simple copy operation to a local SQL Server database, not really a backup/restore solution. However it does the job well if you want to get a local copy of your data. Enzo Backup is more comprehensive and offers many more functions, such as a built-in scheduler, cloud backup devices (in Blobs) and a restore capability.

Paid-For Tools

  • Enzo Backup for SQL Azure (Advanced Edition)
  • SQLAzureBackup (basic command-line BCP wrapper, no cloud backup)

Regarding the tools you can purchase, SQLAzureBackup is a product you can purchase, although it seems somewhat limited in its capabilities. You can nonetheless use it to export your data locally and restore the database back into SQL Azure. Enzo Backup Advanced Edition is a more powerful flavor of the free version that leverages multithreading for faster operations.

Here is a link to an MSDN article that offers additional information on how to backup SQL Azure:

For the time being Enzo Backup appears to have significantly more capabilities than the other products, and it can be used at no charge by visiting Blue Syntax's website (

[Disclaimer: I am the author of Enzo Backup for SQL Azure; this blog intends to provide a quick overview of the current tools available on the market; please investigate the tools referenced here and visit the MSDN link provided to make an educated decision]

Benjamin Guinebertière explained how to Use SSIS to push data to SQL Azure | Utiliser SSIS pour pousser des données vers SQL Azure in a 1/31/2012 post. From the English version:

    imageThe goal of this blog post is to show that it is as easy with SQL Azure as with SQL Server to push data with SSIS.

    In this sample, let’s use SQL Server 2008 R2, which is the latest SQL Server version in production (1).

    imageThe scenario is to push some data from SQL Server 2008 R2 on premises to SQL Azure in the cloud. This is done with SQL Server 2008 R2 Integration Services (SSIS).

    If and when starting from an empty SQL Azure database, in order to create the schema, one would typically use SQL Azure Migration Wizard which has an option to generate schema only.

    Here are the steps to create a very simple SSIS package and execute it (in debug mode):



















    NB: the SSIS package needs an outbound access to port 1433 in order to connect to SQL Azure. The SQL Azure firewall must also have been opened for the public IP address the SSIS package will use to access the Internet.







    Here is a synopsis of what we just did. Data was copied from SQL Server 2008 R2 to SQL Azure thru SSIS exactly as it would have been from SQL Server to SQL Server.


    This sample SSIS package was very simple, because its goal was to show connectivity is seamless. Of course, in real life this package would contain much more shapes in order to copy several tables, transform data and so on.


    (1) SQL Server 2012 is in Release Candidate 0 for now so I expect companies to have SQL Server 2008 R2 deployed

    In my opinion, George Huey’s SQL Azure Migration Wizard is a simpler and more straightforward method of uploading data to SQL Azure. See my Using the SQL Azure Migration Wizard v3.3.3 with the AdventureWorksLT2008R2 Sample Database of 7/10/2010 for more details.

    <Return to section navigation list>

    MarketPlace DataMarket, Social Analytics and OData

    The Codename “Social Analytics” Team posted the promised Lab Bonus! Enhanced Sentiment Analysis for Twitter from Microsoft Research on 2/2/2012:

    What’s new?

    imageIn this release, we enhanced our sentiment software by upgrading to the latest version of code from the same Microsoft Research team we used in prior releases. The major improvement in this version of their code is a new classifier specifically trained on Tweets. The sentiment analysis code we used in prior releases from Microsoft Research was trained on short sentences and paragraphs. We predict that the accuracy of sentiment analysis will improve in Social Analytics by using the classifier trained specifically on tweets for Twitter content items. We will continue to use the sentence and paragraph classifiers on all other content.

    The tweet classifier was trained on nearly 4 million tweets from over a year’s worth of English Twitter data. It is based on a study of how people express their moods on Twitter with mood-indicating hashtags. We mapped over 150 different mood-bearing hashtags to positive and negative affect, and used the hashtags as a training signal to learn which words and word pairs in a tweet are highly correlated with positive or negative affect.

    How we use the sentiment software in Social Analytics

    We use the MSR sentiment software to assess the tone of all content items as part of our enrichment process. When the assessment is complete we store both that ranking and the reliability of that assessment the these 2 fields respectively; CalculatedToneID and ToneReliability. Our API and sample client Silverlight UI will expose content item as either positive or negative if the sentiment engine scores the item with a reliability percentage over a certain threshold we determine.

    Here is a simple explanation of the three fields related to sentiment in the ContentItem table:

    • Field
    • Description
    • CalculatedToneID

    The sentiment (or tone) of the content item as determined by the sentiment software:

    • 3 = Neutral

    • 5 = Positive

    • 6 = Negative


    The reliability of the tone calculation as determined by the sentiment software. The reliability thresholds are currently 80% for positive sentiment and 90% for negative sentiment. If we’re below the reliability threshold, the CalculatedToneId will be set to neutral.


    If a user sets the sentiment manually in our UI or thru an API call, the tone they set is stored in this field. If ToneID is set, we show ToneID rather than CalculatedToneID in the UI and return it in API calls.

    For more details on this Microsoft Research project, check out ! [See post below]

    See My Microsoft Codename “Social Analytics” Windows Form Client Detects Anomaly in VancouverWindows8 Dataset and Querying Microsoft’s Codename “Social Analytics” OData Feeds with LINQPad for more details on sentiment analysis of tweets.

    Scott Counts, Munmun de Choudhury and Michael Gamon of Microsoft Research authored Affect detection in tweets: The case for automatic affect detection:

    imageDetecting affect in free text has a wide range of possible applications:

    • What are the positive and negative talking points of your customers?

    • What opinions are out there on products and services (on Twitter, Facebook, in product reviews, etc.)?

    • How does mood and sentiment trend over time, geography and populations?

    Similarly, there are different techniques to automatically detect affect: some systems use hand-curated word lists of positive and negative opinion terms, others use statistical models that are trained on opinion-heavy text. The challenge is to come up with a system that works reasonably well across various domains and types of content. In other cases, though, it would be better to use a classifier specific to a particular task, in which case the challenge is in creating, or finding, enough annotated text to train a classifier.


    Scott CountsScott Counts

    Munmun De ChoudhuryMunmun De Choudhury

    Michael GamonMichael Gamon

    Recently we have conducted a study based on the psychological literature where we identified over 150 different mood hastags that people use on Twitter. We mapped these hashtags into positive and negative affect and used them as a training signal to identify affect from the tweet text. We collected nearly four million tweets from a span of one year and trained a text classifier on this data.

    How the classifier works:

    The classifier is trained on text with known affect (positive or negative). For each such text, words and word pairs are extracted and counted. At training time, the classification algorithm (maximum entropy classifier) assigns numerical weights to the words an word pairs depending on how strongly they correlate with positive or negative opinion. At runtime, a new text is passed in and words and word pairs are extracted from the new text. These are passed into the classifier, the weights for the words/pairs are looked up and combined in a classifier-specific mathematical formulation, and the output is a prediction (positive or negative) and a probability.

    Training time:


    ©2012 Microsoft Corporation. All rights reserved.

    See also my Twitter Sentiment Analysis: A Brief Bibliography of 11/26/2011 and New Features Added to My Microsoft Codename “Social Analytics” WinForms Client Sample App of 11/21/2011.

    The Codename “Data Explorer” Team (@DataExplorer) described Consuming “Data Explorer” published data feeds in Visual Studio primer in a 2/1/2012 post:

    Part 1 – Consuming a Basic Open Feed

    imageIn a previous post we discussed the Publishing capabilities of Microsoft Codename “Data Explorer”. After mashing up your data and generating interesting results, you can share your findings literally with the world at large. One of the formats available to publish data is as an OData feed, which can be used to allow programmatic access to the results.

    imageIn this post we cover a basic example of consuming a published feed in Visual Studio (VS) using a simple console application along with some pointers to set it up. We are actively working on improving the ease of access to published results in “Data Explorer” from VS and other tools, so bear with us as we iron out the wrinkles and please reach out to us if you run into any issues. Our goal is to reduce the need for function calls by adding extra user interface support, but in the meantime, for those of you who like to tinker, here are some roll-back-your-sleeves-and-get-your-hands-dirty instructions.

    1. Ensuring the published mashup is set up to be consumable

    Currently, in order to consume a published OData feed using VS, resources need to have primary keys and include only supported data types (more on this below). For the purposes of this example, the mashup also needs to be published as visible to everyone, including unauthenticated users (meaning, that no authentication mechanism is requested when trying to access the feed). We will cover the explanation for how to access feeds secured via a feed key in a subsequent blog post.

    1.a. Verifying the mashup resources can be externally consumed as a feed

    First and foremost, all published resources that are to be consumed by VS need to have a primary key. You can ensure that a given table has a primary key by using the Table.AddKey() function. We are working on an “add key” button to the user interface to facilitate this, but in the meantime you can manually add a primary key using the formula bar.

    For example, let’s start with a very simple table of IDs, First and Last names, created by directly inputting the text into “Data Explorer” and converting to a table:

    • Create a new mashup (call it, say, “Basic publish to VS”) and add data by inputting text.

    • Use “To Table” after pressing “Done” to convert the free-form text into a table.

    • Take the default options (using spaces as delimiters and the first row as headers) to create the table.

    With a simple table in place, we can now start the process to make it consumable.

    The first step is to add a primary key. To invoke the Table.AddKey() function using the formula bar, click on the fx icon next to the Edit button and select “New Task”:

    In the new task, call the function on the TableFromText reference by typing it in as seen in the figure. This function call makes the “ID” column the primary key (since primary keys must be unique, remember that multiple columns can be used to create a key, thus ensuring uniqueness). Note that “TableFromText” happens to be the name of the previous task.

    Once sure that the table has a primary key, there are a couple of caveats regarding the data that will be exposed in the feed:

    • Currently, the set of supported data types between VS and “Data Explorer” is different, so in order to ensure VS consumption, the data published in “Data Explorer” (which supports a larger set of data types) needs to be adjusted to be consumed in VS. Among the data types not yet recognized in VS are Time, Duration, DateTimeOffset, and Geospatial. Attempting to import a feed with these data types will cause the entire VS import operation to fail, so make sure that the feed you wish to import into VS does not have any of these data types exposed.
    • Similarly, an exposed service operation that returns a keyed-table type will import correctly into VS, but the service operations will be ignored.
    • Exposed service operations that return tables without keys will fail the VS import, though, as will service operations that have parameters.
    • Lastly, for now tables with navigation properties will also fail the VS import, but we are actively working on a fix that should address this in the near future.

    With the above in mind, and with a mashup that can be consumed by VS via feed, we are ready to publish and consume the data.

    1.b. Publishing the mashup

    When publishing the mashup, for this example we will make the published results visible to everyone, including unauthenticated users, to allow programmatic access (we will cover the more complex case of consuming authenticated feeds in a future post).

    Click on Publish and follow the link to access the publish page. From there, copy the full web address under “Data Feed” for use in programmatic access.

    This link to the feed is what will allow VS to access the data.

    2. Consuming a feed in Visual Studio

    Once the feed has been created in a consumable format, there are a few things to be aware of when programmatically accessing the feed via VS.

    2.a. Installing the Microsoft WCF Data Services October 2011 CTP

    In order to successfully add a service reference using the steps outlined below, be sure to install the community technical preview that included the updated version of the WCF Data Services library; click here for the Download Center page.

    2.b. Adding a service reference in Visual Studio

    In your VS project, right-click on References or on Service References to “Add Service Reference”.

    Enter the feed URL from the Data Explorer publish page into the Address box and click Go. Make sure the URL you enter matches the published feed URL exactly (for example, verify there are no trailing spaces). You should be able to see the published resources and click OK to add the service.

    2.c. Using the service reference in a console application

    Here is a sample console program to consume the above feed:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    namespace ConsumingDEpublishedDataInVSPart1
    class Program
    static void Main(string[] args)
    // Note: Make sure you replace the URI
    // with the feed URL used when adding a service reference.
    EntityCatalog catalog = new EntityCatalog(new
    foreach (EntityType0 element in catalog.Text1)
    { //Note: ‘Text1’ is the name of your mashup resource
    Console.WriteLine(element.ID + " " +
    + " "
    element.LastName + " ");
    Console.WriteLine("Press any key to continue...");

    As you can see in the figure below, the two “EntityCatalog” references are unresolved (and there are a couple of other interesting service-specific items as well). Note that the resource name in our mashup (“Text1”) is used when accessing the elements in the feed. Note also that the column names (“ID”, “FirstName”, and “LastName”) are also used when accessing each element.

    Before this code can compile, we need to resolve the “EntityCatalog” references via a “using” statement.

    This resolves both the “EntityCatalog” and “EntityType0” references because they are names generated by the “Data Explorer” feed.

    As a quick aside, those names and their relationship to the mashup resources can be seen in the $metadata description of the “Data Explorer” feed.

    The above code, when compiled and run, will output the data in the table from our simple mashup.

    This pattern can of course be extended to extract data from more meaningful or complex tables. The metadata for the feed (which is accessible by adding “/$metadata” to the feed URL and opening it up in a browser) can be used to tailor the queries over the exposed feed, accessing data from different resources.

    We are still working hard to improve the experience of consuming “Data Explorer” feeds using VS but hopefully the above example can unlock some interesting new ways to work with your data.

    What do you think? Let us know your thoughts and stay tuned as we further integrate and simplify the user interface to more easily publish and consume your data with Microsoft Codename “Data Explorer”.

    3. Some troubleshooting FAQs

    While support for programmatic consumption using VS is still a work in progress, following the above steps should allow you to create a service reference and pull data from your feed. Here are some common issues with setting the reference up; please have a look to see if they help, or ask a question in our forum with more specific issues you find.

    3.a. I encounter an Add Service Reference Error. What’s up with that?

    Depending on the specific details, this Add Service Reference Error may mean that an incorrect feed URL was used. For instance, one common mistake may be to use the link to the “Data Explorer” publish page rather than the full feed URL (which is the publish page link plus “/Feed”). Double check that the address you use is the full feed link that is presented in the publish page, and not the publish page’s URL.

    3.b. Why does the service require me to enter a user name and password?

    This is most likely due to having selected “Make visible to workspace consumers only” instead of “Make visible to everyone, including unauthenticated users” when publishing the mashup. For this example, try going back to “Data Explorer”, and republish by making visible to everyone. We will explain how to consume feeds protected by a feed key in a future blog post.

    3.c. Why do I get an “Unrecognized version 3.0” error?

    This error crops up if the WCF Data Services October 2011 CTP is not installed. Please see above and verify that this CTP is installed.

    4. Resources used in this post

    imageMy (@rogerjenn) Mashup Big Data with Microsoft Codename “Data Explorer” - An Illustrated Tutorial and More Features and the Download Link for My Codename “Social Analytics” WinForms Client Sample App posts of December 2011, and Microsoft cloud service lets citizen developers crunch big data article of 1/14/2012 for describe working in Visual Studio 2010 with Social Analytics data sources.

    You can open a live demo of my “Data Explorer” mashup by clicking here and read more about it in a Microsoft Pinpoint entry.

    The OData Team reported OData Service Validation Tool Update: 7 new rules on 1/30/2012:

    imageOData Service Validation Tool was updated with 4 more new rules and couple other changes:

    • 2 new common rules
    • 2 new entry rules
    • Added new rule targets for Link, Property and Value payload type (targets that we used to call Other)
    • Added Not Applicable test result category to all code rules

    This rule update brings the total number of rules in the validation tool to 140. You can see the list of the active rules here and the list of rules that are under development here.

    <Return to section navigation list>

    Windows Azure Access Control, Service Bus and Workflow

    Matias Woloski (@woloski) described Using Windows Azure Access Control Service (ACS) from a node express app in a 2/1/2012 post to his Node on Azure blog:

    imageIn this article we will explain how to use Windows Azure Access Control Service (ACS) from a node.js applicaiton. You might be wondering, what else Widows Azure ACS will provide that everyauth already does. Windows Azure ACS will provide single-sign-on to applications written in different languates/platforms and will also allow you to integrate with "enterprise Security Token Service" like Active Directory Federaetion Services (ADFS), SiteMinder or PingFederate, a common requirement for Software as a Service applications targeted to enterprise customers.

    imageTo do the integration we decided to extend everyauth which provided a good infrastructure to base our work on.

    imageIf you want to know how it works and how it was implemented read Implementing Windows Azure ACS with everyauth

    If you want to know how to use Windows Azure ACS in your node app keep reading.

    How to configure your node application with everyauth and Windows Azure ACS

    We are assuming that you already created the node.js Azure Service and added a node web role (New-AzureService and Add-AzureNodeWebRole) using the SDK or you simply have a regular node app.

    Follow these steps to add support for everyauth and configure the parameters required by Windows Azure Access Control Service.

    1. Add everyauth to your app package.json and execute npm install (we haven't send a pull request to everyauth yet, so for now you can point to darrenzully fork)

      "everyauth": ""
    2. Configure everyauth strategies. In this case we will configure only azureacs. You will need to create a Windows Azure ACS namespace. The only caveat when creating the namespace is setting the "Return URL". You will probably create one Relying Party for each environment (dev, qa, prod) and each of them will have a different "Return URL". For instance, dev will be http://localhost:port/auth/azureacs/callback and prod could be (notice the /auth/azureacs/callback, that's where the module will listen the POST with the token from ACS)

      var everyauth = require('everyauth');
      everyauth.debug = true; // true= if you want to see the output of the steps

      .homeRealm('') // if you want to use a default idp (like google/liveid)
      .tokenFormat('swt') // only swt supported for now
      .findOrCreateUser( function (session, acsUser) {
      // you could enrich the "user" entity by storing/fetching the user from a db
      return null;
    3. Add the middleware to express or connect:

      var app = express.createServer(
      , express.static(__dirname + "/public")
      , express.cookieParser()
      , express.session({ secret: 'azureacssample'})
      , express.bodyParser()
      , express.methodOverride()
      , everyauth.middleware()
    4. Add the everyauth view helpers to express:

    5. Render the login url, and user info/logout url on a view

      - if (!everyauth.loggedIn)
      h2 You are NOT Authenticated
      p To see how this example works, please log in using the following link:
      a(href='/auth/azureacs', style='border: 0px') Go to authentication server.
      - else
      h2 You are Authenticated
      h3 Azure ACS User Data
      p = JSON.stringify(everyauth.azureacs.user)
      a(href='/logout') Logout

    That's it, you can now run the app.

    Sample application

    If you want to test the experience you can try (we might take this down, so no warranties, if you want ro run it locally the code is @


    IMPORTANT: in production you should use HTTPS to avoid man in the middle and impersonation attacks

    Enjoy node and federated identity!

    <Return to section navigation list>

    Windows Azure VM Role, Virtual Network, Connect, RDP and CDN

    imageNo significant articles today.

    <Return to section navigation list>

    Live Windows Azure Apps, APIs, Tools and Test Harnesses

    • Yvonne Muench (@yvmuench) asked Need to Change Your Go-To-Market for SaaS? Here’s How One ISV Did It with Windows Azure in a 2/2/2012 post to the Windows Azure blog:

    imageIn a previous post, I wrote about traditional ISVs creating spinoffs to market and sell their Software-as-A-Service (SaaS) apps. Recently I was surprised to see Glassboard from Sepia Labs, a spinoff of NewsGator, come across my desk. I’m familiar with NewsGator. Based in Denver, CO, they have 8 years experience in SaaS, and multiple Windows Azure apps under their belt. Why would they go to the trouble of creating a spinoff for their latest Windows Azure app? Almost a third of their revenue is already subscription based, and they are by no means new to the rigors of SLAs, uptime and support – issues that challenge many SaaS newcomers. I called JB Holston, CEO of NewsGator and Walker Fenton, CEO of Sepia Labs to find out. Turns out, they have lot in common with many traditional ISVs using Windows Azure and SaaS for the first time – they are trying to reach a new customer segment. And which customers you’re going after can change everything. Here’s their story…

    imageAccording to Holston, Facebook is defining the way people interact online. Consumers like what Facebook offers and want more of it in the corporate space. Both NewsGator and Glassboard take their cues from that reality, operating in the social computing space. NewsGator’s marquee offering, Social Sites 2010, is an enterprise social computing product that’s attached to SharePoint 2010. By all accounts it’s doing pretty well… per a Jan 11, 2012 TechCrunch article NewsGator added one million new paid seats over the course of 2011, on top of its existing two million. Forrester’s Q3 2011 report rates them among the leaders in enterprise social platforms. Customers are global 2000 enterprises worldwide, reached through a direct sales force about 35 strong, augmented by 50+ channel partners. It’s largely installed on premises, and sold most often through a perpetual license pricing strategy. Most of that should sound pretty familiar to other traditional ISVs.

    imageThe newer Glassboard offering, which launched in Aug 2011, still operates in the social computing arena. Glassboard is about sharing privately with groups - essentially Facebook without the ads, with better privacy, and with very cool Bing maps integration for location.

    Watch this Channel 9 video for more on the technical reasons behind why they chose Windows Azure, including a demo of how different phone platforms process messages using notifications.

    However, beyond the common social computing focus of the two products, most aspects of the Glassboard opportunity are completely different from Social Sites 2010. Glassboard is completely cloud-based and integrates with SharePoint Online. But the differences go deeper than that:

    • WHO they target: Glassboard targets small businesses (SMBs) looking for private social collaboration they can easily use internally and with partners, suppliers and customers. Glassboard even extends into the consumer space, as the app is popular amongst teachers, families, and others who are uncomfortable with Facebook’s privacy policies. To encourage trial among these price sensitive customers, they use a Freemium pricing model i.e. the app is free until you pass a certain threshold of use.
    • WHAT they offer: SMBs and consumers need turnkey, low touch, intuitive apps. That dictates a different design and engineering ethic. Fenton called it a “mobile first interface” that works on iPhones, Androids, and WP7. In fact, Fenton’s goal for his design center was “an app my Mom can download from an app store and use without asking anyone.” In other words, entirely self-provisioning. That’s a growing trend in enterprise apps, but an imperative in lower-end small business and consumer apps.
    • HOW they make it happen: Glassboard reaches SMBs and consumers in four ways, all via marketing:

    Well when you put it that way, if the WHO, the WHAT and the HOW is different, a spinoff begins to make a ton of sense. Though NewsGator is successful with Social Sites 2010, they’d probably struggle with Sepia Lab’s Glassboard. Direct sales don’t work too well in reaching numerous and fragmented SMBs and consumers. Neither does a freemium pricing model get a commission-based sales force very excited. Glassboard called for a wholly new Go-To-Market strategy, namely a switch from sales-led to marketing-led demand generation. When Holston realized he had a team of folks within NewsGator that were passionate about this new category, he created a spinoff to release those passions. Walker Fenton had been responsible for the mobile group within NewsGator so Holston put him in charge of Sepia Labs. He gave him some personnel, some office space as well as HR and accounting support from the parent company. He also gave him NewsGator’s legacy online services. Fenton continues to manage those under Sepia Labs and uses the revenue streams to partially fund new product development, like Glassboard. He makes it look easy. To learn more, follow the official Glassboard blog.

    • Mick Badran (@mickba) described Azure: Current IP Range of Data Centers in a 2/1/2012 post:

    imageWith the ever changing Azure space, chances are you’ve had services working a treat and then one day just fail.

    “Can’t connect…" etc.

    imageThis has happened to me twice this week – with over 14 IP Address ranges defined in the client’s firewall rules.

    It appears that my service bus services were spun up or assigned another IP outside the ‘allowed range’.

    It gets frustrating at times as generally the process goes as follows:

    1. Fill out a form to request firewall changes. Include as much detail as possible.
    2. Hand to the client and they delegate to their security/ops team to implement.
    3. Confirmation comes back.
    4. Start up ServiceBus service
    5. Could work?? may fail – due to *another* IP address allocated in Windows Azure not on the ‘allowed list of ranges’.
    6. Fill out another form asking for another IP Address…

    By the 3rd iteration of this process it all is beginning to look very unprofessional. (in comparison, these guys are used to tasks such as ‘Access to SQL Server XXX – here’s the ports, there’s the machine and done’. Azure on the other hand – ‘What IP Addresses do you need? What ports?’… we need better information in this area)

    Anyway – here’s the most update to date list 10/02/2011:

    • Marco Chiappetta asserted “Startup accelerator TechStars and the Microsoft BizSpark Team have joined forces to offer startups up to $60,000 in Cloud Services on Windows Azure” in a deck for his Microsoft and TechStars Team Up; Giving Away Free Cloud Services to Fledgling Startups article of 1/31/2012 for NetworkWorld’s Microsoft Subnet blog:

    imageMicrosoft and TechStars, a prominent startup accelerator which operates programs in Boulder (Colorado), Boston, New York, Seattle and Texas, just announced a new program in which high-potential startups could earn $60,000 worth of compute and storage Cloud Services on Windows Azure over a two-year period. The goal of the alliance is to not only help fledgling startups fast-track their businesses, but to get them using Microsoft’s cloud services.

    image“Our passion is helping startups succeed around the world by providing funding and mentorship from the best and brightest Internet entrepreneurs and investors on the planet. The enhanced relationship with Microsoft will allow us to provide our founders with even more valuable support and services,” said David Cohen, founder and CEO of TechStars. “Access to technologies such as Windows Azure and other software and services from Microsoft through the BizSpark Plus program gives our companies a leg up in the all-encompassing race to scale and succeed.”

    Image Source: TechStars Website

    imageMany of the hottest startups today are building cloud applications or smart devices that heavily leverage the cloud. In doing so, some of the startups are able to more quickly and affordably drive user adoption and generate revenue than ever before, which in turn help scale the businesses. Microsoft postulates that Windows Azure offers an ideal platform for the creation of the Web applications and services that many of these startups are building and are attempting to get them to use Azure early and often.

    image“TechStars is a great partner with a proven track record of attracting world-class startups. We are excited at the opportunity to work together to help startups when they need it most, with the products, resources and connections they need the most,” said Dan’l Lewin, corporate vice president, Strategic and Emerging Business Development, Microsoft. “Working with entrepreneurs on the forefront of the cloud revolution is especially rewarding. Windows Azure is a powerful, integrated platform, making it easy for startups to get their services up and running quickly with minimal overhead. This is the first offer from our new BizSpark Plus program, and we look forward to supporting TechStars members across a number of markets and technology offerings.”

    Graduates and current enrollees in the TechStars program are eligible.

    BusinessWire (@BusinessWire) reported CopperEgg™ Expands Server Monitoring into Windows Azure Environments in a 2/1/2012 press release via the Azure Cloud on Ulitzer:

    imageCopperEgg, Corp., a real-time monitoring company, today announced the support of Windows Azure™ server monitoring with RevealCloud Pro™.

    RevealCloud Pro is a SaaS-based server monitoring service that provides a detailed, common view of all of your servers, running any OS, across public and private clouds, and virtual and physical environments. It provides real-time, in-depth insight into all the components participating in application and service delivery, which is critical for monitoring changing loads as well as identifying, isolating and addressing performance issues before service quality is impacted. RevealCloud Pro now supports server monitoring for Windows, Windows Azure, Hyper-V, Linux, Mac OSX and FreeBSD operating systems.

    imageToday users expect applications and services to perform seamlessly. If an application stops responding – even for a few seconds – customers get easily frustrated and often abandon their efforts, resulting in lost revenue and/or productivity. Server monitoring using RevealCloud Pro provides split-second monitoring of the operating and performance metrics for servers that support critical application services, no matter where they are located. This enables IT and development teams to accelerate time-to-market during the development phase, optimize deployments as they learn customer usage patterns, and identify early warning signs of service degradation.

    “Our customers upload large volumes of data in batches regularly and when that occurs, we need to scale up quickly so we can process the data efficiently. Using RevealCloud Pro we now have instant insight as to when our servers are reaching capacity,” said Carl Ryden, chief technology officer of PrecisionLender. “Then using their WebHooks combined with the Azure Management API, we’ve implemented automatic scaling of our servers so we can address load spikes without any intervention on our part. For 15 minutes worth of work and a couple dollars a month, we couldn’t have asked for a better server monitoring solution.”

    image“RevealCloud Pro is quickly gaining traction with our customers deploying and managing services in the cloud, virtual environments and data centers,” said Scott Johnson, CEO of CopperEgg. “The addition of server monitoring for Windows Azure addresses the needs of developers building services on the Azure platform as well as hybrid environments, who are looking for a robust server monitoring solution that is quickly deployed, easy to use and provides instant insight into server health.”

    CopperEgg’s Server Monitoring – Key Benefits

    • Monitor servers across environments: including clouds, virtual and physical servers.
    • Install in 10 seconds: package for automatic deployment on new instances, so you never lose an instance.
    • Get real-time updates: get instant visibility into the performance and health of your systems and detect trends as they develop.
    • Customize alerts: set thresholds for alerts and direct them to individuals or groups as applicable for instant insight into the health of a system.
    • Tailor views: see your whole infrastructure from a sortable top-level systems view; or tag and group systems as needed for optimal server monitoring.
    • Enable server monitoring from any browser: PC, Mac, iPad, iPhone, Android or other mobile device.

    Server monitoring with CopperEgg’s RevealCloud Pro is available here with a $15 instant credit on a ‘pay for consumption’ basis.

    About CopperEgg

    CopperEgg provides real-time insight into the speed and availability of applications and services deployed on cloud, virtual and physical servers. Our SaaS-based server monitoring tools are designed to install in seconds with no configuration and immediately deliver fine-grained visibility into critical performance metrics, insight into developing trends, and split-second decision support for organizations of all sizes. CopperEgg is backed by Silverton Partners and based in Austin, Texas. For more information, visit:

    Himanshu Singh posted IFS 360 Scheduling – Always Optimal on Windows Azure on 2/1/2012:

    imageIFS has announced the immediate availability of IFS 360 Scheduling on Windows Azure as part of its strategy to leverage cloud computing, citing Windows Azure as a cost-efficient platform for innovative business solutions.

    imageIFS 360 Scheduling empowers organizations to deliver on-site customer service excellence. Optimizing resource plans to minimize travel and maximize productive work time for hundreds, or even thousands, of technicians is a complex and compute-intensive task, and IFS 360 Scheduling’s ‘always optimizing’ approach means the schedule is constantly up-to-date no matter what is happening out in the field – for organizations with tight deadlines to meet, minutes matter.

    imageMeeting tough customer expectations and providing consistent service demands a reliable and robust computing infrastructure. The speed of deployment on Windows Azure and on-demand scaling of resources echo the needs of these dynamic environments, for which resource scheduling is mission critical.

    You can read more on the IFS announcement on their website, more about the work of IFS 360 Scheduling here, and a recent blog posting by IFS’ CTO Dan Matthews on cloud computing.

    Himanshu Singh (pictured below) posted Real-World Windows Azure: Interview with Michael Ross, VP of Delivery at Aidmatrix on 2/1/2012:

    imageAs part of the Real World Windows Azure interview series, I talked to Michael Ross, vice president for Delivery at Aidmatrix, about using Windows Azure to help power its humanitarian aid delivery solutions. Here’s what he had to say.

    Himanshu Kumar Singh: Tell me about the Aidmatrix Foundation.

    imageMichael Ross: Well, we create web-based supply-chain management (SCM) solutions that help our partners optimize distributing humanitarian relief. We started in 2001 and are headquartered in Dallas, with offices in Wisconsin, Washington D.C., Germany and India. More than 40,000 leading nonprofit, business, and government partners leverage our solutions to mobilize more than $1.5 billion in aid annually, worldwide. The donated goods, money and services impact the lives of more than 65 million people.

    HKS: How does Aidmatrix leverage technology to deliver humanitarian aid?

    MR: Increasingly, technology plays a vital role in enabling the rapid, targeted distribution of relief supplies to those in need. Our solutions help NGOs procure, manage and deliver humanitarian relief more efficiently by having real-time access to know what inventory is on hand and what unmet needs still exist. In this way, NGOs can save money in their purchasing, be more efficient in their distribution, and more responsive and transparent with their donors. .

    HKS: Are there also technology challenges you face?

    MR: To perform optimally both in daily humanitarian relief and unplanned disasters, we need to ensure that our applications can consistently deliver the highest levels of stability and throughput performance. And because we work with so many different kinds of partner organizations, our solutions need to synchronize with business systems that run on diverse operating system platforms. They also need to flexibly scale to handle massive bursts in demand. Following several recent natural disasters, our applications have experienced usage spikes on the order of 1,000 times the standard rate.

    HKS: How much demand is there for technology for humanitarian aid from Aidmatrix?

    MR: In 2010, we noted a 20 percent year-over-year increase in the number of people who used our applications. And as more NGOs adopt technology as a strategic part of their global operational success, the demand will continue to grow.

    HKS: How has this increase in demand impacted your ability to meet global needs?

    MR: This rapid growth has only compounded the challenges that we face in deploying, configuring, and scaling online relief management solutions. For example, for several years, we hosted our applications in data centers in the United States but in response to recent requests for services from humanitarian organizations in Europe, we’ve needed to locate additional server resources overseas. Setting up and maintaining data centers around the globe can be expensive and time-consuming. In a few cases, we’ve needed to pre-position assets and leave them turned off. Instead of paying for what we use, we’ve ended up investing up front for resources that we may or may not eventually need. But that’s all part of being ready for unplanned disasters.

    We also found that the time required to source and deploy new hardware impacted our agility. In mid-2010, we began a partnership with a large food bank network that has locations spread across the United Kingdom. After a few weeks, we noticed that the distance between our servers and end users caused the application to run slowly and even time out before people could complete their donations.

    HKS: What was the solution?

    MR: To empower more partners to help more people around the world, we needed the ability to scale applications at a moment’s notice while maintaining reliable performance. With an eye on keeping operating costs as low as possible, we began to investigate the advantages of moving to the cloud. In considering this change, we wanted to minimize the time needed to migrate existing applications and placed a priority on a cloud technology platform that would support the agile development of new functionality, along with the creation of new solutions to meet the unforeseen demands of future humanitarian crises.

    imageHKS: What led you to choose Windows Azure?

    MR: After evaluating several cloud services technologies, including Amazon Elastic Compute Cloud (EC2), we decided to adopt Windows Azure because, simply put, Windows Azure gives us all the tools we need to be more agile. It offers platform-as-a-service capabilities, so we don’t have to push out updates or worry about building out our own redundancy system. Plus, it incorporates a familiar development environment, so we can maximize productivity.

    HKS: Which of your solutions did you move to Windows Azure first?

    MR: The first solution we moved to Windows Azure was the Aidmatrix Program Metrics and Evaluation, which one of our U.S. partners uses to track the services delivered to clients at more than 1,000 locations. Moving the database component to Microsoft SQL Azure took five minutes, and we instantly gained high availability, along with reliable fault tolerance and security, at a fraction of what it would cost to build out those capabilities ourselves.

    In early 2011, working with Accenture and Avanade, we also moved our Online Warehouse solution, which provides end-to-end inventory management tools for tracking relief supplies, to Windows Azure. The conversion process for each of these applications took approximately six weeks to complete.

    We also recently used Windows Azure to configure and deploy a web portal application built by Microsoft to assist Second Harvest Japan, the Japanese food banking network. I deployed the application to the Microsoft data center in Hong Kong from home following the tsunami and I didn’t need to worry about how many instances to deploy because I could quickly scale out server resources if needed.

    HKS: Are you using any other Windows Azure technologies?

    MR: We take advantage of several Windows Azure technologies to ensure reliable, scalable performance. For example, we will use Windows Azure Connect to enable data sharing between our Online Warehouse solution and the on-premises operational systems used by our partners, including Oracle and SAP enterprise resource planning systems. And we rely on Windows Azure storage services to collect data on session state management so our staff can monitor application performance and troubleshoot issues in near real time.

    HKS: What are some of the benefits you’ve seen from your move to Windows Azure?

    MR: One of the key benefits has been noticeable cost savings. Because we no longer need to purchase, set up, and maintain database management and web servers, we expect to save 20 percent on data center costs, which could mean up to $100,000 in savings. We will be able to redirect the time and money we save toward creating and enhancing applications that help save lives. And instead of spending valuable resources on server upkeep, we can redirect time and cost savings to higher-value tasks, like developing a new module for one of our applications.

    Another benefit is the global availability of Microsoft data centers and the pay-as-you-go model, which enables us to ensure cost-efficient dynamic scalability for our solutions. Our applications need to handle usages spikes of 1,000 times the normal load. With Windows Azure, we can scale up or down in a very agile and efficient way, which is essential for the kind of work we do.

    Finally, the growing ecosystem of ISVs that have adopted Windows Azure allows us to accelerate our own development lifecycle. Taking advantage of functionality from other ISVs speeds our development and ultimately increases the value of our solutions. This is a major benefit of working with a cloud provider like Microsoft that has a large and growing network of partners.

    Read the full case study. Learn how others are using Windows Azure.

    Avkash Chauhan (@avkashchauhan) described Handling two known issues with Windows Azure node.js SDK 0.5.2 in a 1/31/2012 post:

    imageMy recent development work with Windows Azure Node.js SDK 0.5.2 helped me to found two issues. Until next SDK update is available please use the given workaround to solve your problem.

    Issues 1: Exception while testing your node.js application in Compute Emulator

    imageAfter you installed the latest available Node.js SDK 0.5.2 and launch your node application in Azure Emulator as below:

    image>> Start-AzureEmulator -Launch

    You might see an exception as below:

    >> Windows Azure Web Role Entry Point host has stopped working…

    Root Cause:

    The issue appears to be with server.js.logs, when the web role is started, IISConfigurator changes the ACL for the web role directory (and all contained directories). This process if server.js.logs is present and cause IISConfigurator process to crash which ultimately cause your role to show above error.


    Please delete the server.js.logs folder before launching the emulator and you will not see the problem. I have tested this workaround and it does work fine with SDK 0.5.2.

    Issues 2: Unable to publish MongoDB enabled Node.js app to Azure

    When you try to publish a MongoDB enabled node.js app using “Publish-AzureService” powershell command, you might get an error and publish process interrupted.

    PS>> Publish-AzureService

    Publish-AzureService : Object reference not set to an instance of an object.


    Root Cause:

    When you try to publish the application, the code look for certificate node in the CSCFG file and it this node is not present it generate an exception as seen by you. As certificate node ins CSCFG file is optional so the code must not dependent on it. PG could not repro this issue because they had Certificate node I their CSCFG and did not look for dependencies. We sure can overcome this problem by defining empty Certificates nodes in Role sections of cscfg.


    So to work around this issue please add the empty Certificate node in your CSCFG file in the appropriate section.


    Avkash Chauhan (@avkashchauhan) described NougakuDoCompanion: A “Ruby on Rails” companion for Windows Azure in a 1/31/2012 post:

    imageI was recently informed about a great “Ruby on Rails” companion package to run your Ruby on Rails application on Windows Azure.

    With this package you can:

    1. Run multiple instance of Ruby Application on Windows Azure
    2. Deploy your application virtually from any machine
    3. Connect to SQL Azure

    imageThe application architectures is as below:

    Companion Package information:


    Test results:

    • NougakuDo 1.0.5 tested with Rails 3.0.9
    • NougakuDo 1.1.7 tested with Rails 3.1.0
    • NougakuDo 1.1.9 tested with Rails 3.1.1

    <Return to section navigation list>

    Visual Studio LightSwitch and Entity Framework 4.1+

    • Beth Massi (@bethmassi) posted LightSwitch Community & Content Rollup–January 2012 on 2/2/2012:

    imageLast Fall I started posting a rollup of interesting community happenings, content, samples and extensions popping up around Visual Studio LightSwitch. If you missed those rollups you can check them out here:

    imageLooks like folks took a few well-deserved days to ramp back into the groove after the holidays (including myself). But there were still a lot of awesome things around LightSwitch this month, especially the number of submissions The Code Project had in the LightSwitch Star Contest. A lot of really interesting applications and some great case studies for LightSwitch as well as some for Azure. Check them out…

    “LightSwitch Star” Contest

    “LightSwitch Star” Contest on CodeProject

    In November The Code Project launched the “LightSwitch Star” contest. They’re looking for apps that show off the most productivity in a business as well as apps that use extensions in a unique, innovative way. Prizes are given away each month. Soon they will announce the January winners as well as the two grand prize winners of an ASUS U31SD-DH31 Laptop!

    Check out all the submission we had in January and make sure to log onto Code Project and vote for your favorites. Here’s a breakdown of the 13 apps that were submitted in January (see all 34 apps that have been submitted here). There are some very creative “business apps” here – like a campaign manager for Dungeons & Dragons!

    There are a lot of really interesting real-world LightSwitch production applications that were submitted. Some departmental apps, a few personal apps, some enterprise apps as well as a couple start-up companies. There are also some great case studies here for Azure, in particular:

    Developer Center Updates
    Learn Page Updates

    In December I kicked off a series aimed at beginner developers just getting started with LightSwitch and was featured in the MSDN Flash Newsletter. In January we updated the Learn page of the Developer Center to feature this series and it’s been getting some great traffic!


    I also released the completed source code for the sample we build in the series: Beginning LightSwitch - Address Book Sample

    How Do I Videos – Learning Made Easier

    We also updated all the How Do I video pages to show the sequential list of videos so that you can easily get to the previous and next videos in the series. This makes it a lot easier to navigate through the over 20 videos in the correct order. Just click into any video page like this one and you will see the video navigator at the bottom of the page.


    Notable Content this Month

    Here’s some more of the fun things the team and community released in January.

    Extensions released in January (see all 73 of them here!):

    The team also released a control extension sample that you can learn from. Check out the announcement on the team blog: Many-to-Many Control Released!

    Build your own extensions by visiting the LightSwitch Extensibility page on the LightSwitch Developer Center.

    Team Articles:
    Community Articles:
    Samples (see all of them here):
    LightSwitch Team Community Sites

    The Visual Studio LightSwitch Facebook Page has been increasing in activity thanks to you all. Become a fan! Have fun and interact with us on our wall. Check out the cool stories and resources.

    Also here are some other places you can find the LightSwitch team:

    LightSwitch MSDN Forums
    LightSwitch Developer Center
    LightSwitch Team Blog
    LightSwitch on Twitter (@VSLightSwitch, #VisualStudio #LightSwitch)

    Join Us!

    The community has been using the hash tag #LightSwitch on twitter when posting stuff so it’s easier for me to catch it (although this is a common English word so you may end up with a few weird ones ;-)). Join the conversation! And if I missed anything please add a comment to the bottom of this post and let us know!

    Reminder: Visual Studio LightSwitch items are included in these posts because LightSwitch is the quickest, easiest and best route for the new breed of “citizen programmers” to get apps running on Windows Azure.

    Jan Van der Haegen (@janvanderhaegen) described LightSwitch achievements: Chapter two (Events) – Classic events: “The observer” on 2/1/2012:

    imageIn this blog post series, we’re going to create a fun extension that will stimulate the users of our LightSwitch applications by rewarding them with points and achievements when they use the application…

    image_thumb1Also, in this particular post, I wrote my first VB.NET code ever, which seems to be the .NET language of choice of about 1/3rd of my readers… All comments on the VB.NET code will be considered positive feedback, no matter what you write or how you write it!

    Functional POV

    Conceptually, when talking about events in .NET, where are talking about a message that one object sends to zero-to-many other objects to let them know that something happened. There’s four important elements here:

    • the sender (aka the source): the object that sends the message
    • the message: what happened
    • the timing: when did it happen
    • the event handlers (aka the listeners): the objects that want to take an action for the event

    Most events will find their use in the presentation layer of our applications, where we want to execute code when the user clicks a button, hover the mouse cursor over an image, …

    Semantics POV

    It’s hard to talk code without a small example..

    using System;
    namespace ConsoleApplicationCSharp
        //1. Define the event "signature", usually (but NOT required) has two arguments: sender & event arguments.
        public delegate void ChangedEventHandler(object sender, EventArgs e);
        public class ClassWithEvent
            //2. Define the event.
            public event ChangedEventHandler Changed;
            public void InvokeTheEvent()
                Console.WriteLine("Firing the event");
                // 3. Invoking the event (can only be done from inside the class that defined the event)
                if (Changed != null)
                    Changed(this, EventArgs.Empty);
        class Program
            static void Main(string[] args)
                var classWithEvent = new ClassWithEvent();
                //4. Adding an event handler to the event
                classWithEvent.Changed += new ChangedEventHandler(classWithEvent_Changed);
            static void classWithEvent_Changed(object sender, EventArgs e)
                Console.WriteLine("The event has fired");

    And the same example in VB.NET (I think…)

    Imports System
    Namespace ConsoleApplicationVB
        '1. Define the event "signature", usually (but NOT required) has two arguments: sender & event arguments.
        Public Delegate Sub ChangedEventHandler(sender As Object, e As EventArgs)
        Public Class ClassWithEvent
            '2. Define the event.
            Public Event Changed As ChangedEventHandler
            Public Sub InvokeTheEvent()
                Console.WriteLine("Firing the event")
                ' 3. Invoking the event (can only be done from inside the class that defined the event)
                RaiseEvent Changed(Me, EventArgs.Empty)
            End Sub
        End Class
        Module Program
            Sub Main()
                Dim classWithEvent As New ClassWithEvent
                ' 4. Adding an event handler
                AddHandler classWithEvent.Changed, AddressOf classWithEvent_Changed
            End Sub
            Private Sub classWithEvent_Changed(sender As Object, e As EventArgs)
                Console.WriteLine("The event has fired")
            End Sub
        End Module
    End Namespace

    Just as the functional POV, the code samples reveal that when dealing with events, there are four important elements:

    1. What happened: the first step is to define the event “signature”, which happens in the form of a delegate. There’s a lot of “standard” event signatures provided by the .NET framework and I would advise you to use them over custom events, but as the samples show, it’s really easy to roll your own. Event signatures will often have two arguments: sender (a reference to the source, the object that raised the event), and (a subclass of) EventArgs (additional information about what happened).
    2. The sender: we add a property to the class that exposes the event, and add the .NET keyword “event”. Conceptually, it’s ok to think of this property as a “collection of method references” that all match the signature from step1. From a technical POV, what we actually do is add a property which is an empty delegate, aka method pointer that has no value yet. On a traditional property, the compiler will generate two “subroutines”: “get” and “set”. When we add the keyword “event” to the property declaration however, the compiler will add two “subroutines” called “add” and “remove”. In the generated “add” subroutine, the value (new delegate) will be combined with the existing value, as opposed to a normal property, where the “set” will replace the existing value, and ofcourse the opposite happens in the “remove” “subroutine”.
    3. When it happened: raising the event is quite simple: do a null-reference check then call the delegate (C#, more technical) or use the (VB.Net, more descriptive) keyword RaiseEvent. Conceptually, each method in our “collection of method references” from step 2 will be called. Raising the event can only be done from inside the class that defined it, although often you will see classes that expose this functionality to anyone ( depending on the access modifier: private/public/…).
    4. The event handlers: these are the classes that subscribed to the event, in other words: told the sender they want to be notified when the event occurs, in the form of a method being called. To do this, use the VB.Net keyword AddHandler, or the C# operator +=. Both ways are just semantics that when compiled, result in the “add” subroutine from step 2 being called.

    There’s a lot more technical aspects of events that I could discuss, the most important ones would be Garbage Collection (or more specifically: the prevention of GC on the event handlers) and thread safety, but let’s not forget that this is a LightSwitch blog, so let’s find out why this information is important for the Achievements extension that we are building in this blog post series…

    LightSwitch POV

    I wanted to take you in this shallow dive in events in .NET, because for our achievements extension, where most of the code resides in the model layer, not in the presentation layer, we’ll choose not to use this classic observer pattern, and use the event mediator pattern instead. To justify exactly why we’ll go with a different pattern, it’s important to understand the conceptual and technical differences between the two…

    However, the achievements extension will have a little bit of graphical artwork to it, and for this code (/XAML) there’s one event in particular that could use some special attention: INotifyPropertyChanged.

    What’s INotifyPropertyChanged?

    INotifyPropertyChanged is an interface that exposes just one property: the PropertyChanged event. This event is extremely important in the MVVM pattern, because the bindings that we define in XAML recognize this event and will subscribe to it. A Binding instance will update its value (“refresh the view”) if the source raises the event.

    The viewmodel that implements this interface can use the PropertyChangedEventArgs in two ways:

    • Fill in the PropertyName: by doing this, the sender signifies to the listeners that one particular property has changed, and that all bindings on this property should update.
    • Leave the PropertyName blank: by doing this, the sender signifies to the listeners that the object changed so radically, that all bindings to any property on this object should update. It’s basically requesting a general “refresh” of the view.

    INotifyPropertyChanged has a brother called INotifyCollectionChanged, also recognized by bindings, useful for “ItemsSource” properties, …

    An easy base class…

    If you are using EME, (if you’re not: what the hell?) then there’s an easy base class included in the framework for your LightSwitch development convenience…

    The ExtensionsMadeEasy.ClientAPI.Utilities.Base.NotifyPropertyChangedBase (source code here, it has been included for a while but never got around to blogging about it) is a base class (based on the beautiful work by Rob Eisenberg) for your (/my) ViewModels. It exposes three (undocumented, sorry) methods:

    namespace LightSwitchApplication
        public class SomeViewModel : ExtensionsMadeEasy.ClientAPI.Utilities.Base.NotifyPropertyChangedBase
            private int myProperty;
            public int MyPublicProperty
                get { return myProperty; }
                set {
                    myProperty = value;
                    base.Refresh(); //Refresh all bindings
                    base.OnPropertyChanged("MyPublicProperty"); //Classic approach
                    base.OnPropertyChanged(() => MyPublicProperty); //Approach with lambda expression

    Or, in VB equivalent (I think…)

    Public Class SomeViewModel
        Inherits ExtensionsMadeEasy.ClientAPI.Utilities.Base.NotifyPropertyChangedBase
        Private myProperty As Integer
        Public Property MyPublicProperty() As Integer
                Return MyProperty
            End Get
            Set(ByVal value As Integer)
                myProperty = value
                MyBase.Refresh() 'Refresh all bindings
                MyBase.OnPropertyChanged("MyPublicProperty") 'Classic approach
                MyBase.OnPropertyChanged(Function() MyPublicProperty) 'Approach with lambda expression
            End Set
        End Property
    End Class

    The first method (Refresh) raises the PropertyChanged event without a PropertyName on the EventArgs, all bindings on “SomeViewModel” will be refreshed.

    The second method raises the PropertyChanged event with a hardcoded PropertyName (“MyPublicPropertyName”) on the EventArgs, all bindings on this property will be refreshed. The string must match the actual property name exactly…

    The third method does exactly the same as the second, but uses a lambda expression. The major advantage is that the property is not hardcoded (ie: not a string), which gives you compile-time safety: you can not misspell the property name by accident or it won’t compile (happened to me many times before), but will also be updated automagically if you decide to refactor your class a bit, and rename the property.

    To each his own of course, but I definitely prefer the latter. Anyways, let’s see what this event mediator pattern is all about, and why it’s more suited for the model layer of our extension…

    Return to section navigation list>

    Windows Azure Infrastructure and DevOps

    • Mike West (@MikeWestCloud) posted an Azure: 10 Things You Need to Know teaser for a Saugatuck Technology research report to the Information Management site on 2/1/2012:

    imageIn 2010, Microsoft launched the Windows Azure platform. Since then, Saugatuck has been actively tracking Windows Azure through regular Microsoft briefings, interviews with ISVs and with SIs that have migrated their solutions or workloads to Windows Azure or that have built natively on Windows Azure.

    imageISVs migrating to the Cloud and enterprise developers should understand the evolution and maturity of the Windows Azure platform, and the solutions and workloads it was designed for. Not all work well on Windows Azure. In a recently published premium Saugatuck research deliverable we provide an update and overview of the platform, and its suitability for Cloud solutions. In this Lens360 blog post, we highlight some of the takeaways from this 6-page Strategic Perspective.

    imageOverall, significant progress has already been made, and it is important to note several important advances on the horizon for Windows Azure platform:

    SQL Azure

    • 2012 will see SQL Server 2012 and SQL Azure move closer to parity, as the once-significant functionality gap continues to close.
    • 2012 will see SQL Server 2012 and SQL Azure share development tools.
    • New federated partitioning for SQL Azure provides the means to handle issues related to managing customers separately.[*]

    Windows Azure

    • System Center 2012 makes management of Azure solutions part of the overall systems management capability.

    Regulatory Certifications

    • Microsoft is currently in the process of completing the SSAE 16 audit.
    • Work is also under way to deliver on HIPAA Business Associate Agreement (BAA).
    • PCI certification, not in place today, is next on the roadmap.

    At the same time, however, Windows Azure presents significant challenges to enterprise developers or to ISVs with enterprise customers migrating to the Cloud because of persistent design constraints. While we don’t have the room in this blog post to flesh these issues out in full, we discuss these in some detail in the Strategic Perspective, and what to do about it.

    One area we do want to highlight is that despite considerable recent attention to achieving certifications, Windows Azure still lacks FISMA, GLBI, HIPAA, PCI, and SSAE16 – a key constraint if an enterprise or an ISV needed to store medical information or consumer credit card data, for example. While we believe many of the key certifications are on the roadmap, it takes time to fully address all of the certification issues. Stay tuned.

    Complicating things further, Windows Azure is a rapidly moving target of functionality. The progress Microsoft has made in enhancing the platform continues at a fairly rapid pace. Therefore, it is important to understand the technical direction that the Windows Azure platform will take and continue to take as it evolves further. 2012 holds several very promising new additions to the Windows Azure platform, including development and testing capability.

    Fortunately for migrating ISVs and enterprise developers, leading SIs such as Accenture, Avanade, Cognizant and Wipro, for example, have partnered with Microsoft to enable Windows Azure ISV workloads and solutions, and in doing so have amassed deep knowledge of the platform, its architecture and its future directions. These SIs have also developed solutions that fill in some of the gaps in the platform today.

    Note: Ongoing Saugatuck subscription clients can access this premium research piece (1016MKT) by clicking here, and inputting your ID and password. Non-clients can purchase and download this premium research piece by clicking here.

    This blog originally appeared at Saugatuck Lens360.

    * SQL Azure Federations have little to do with “issues related to managing customers separately” (multi-tenancy.) SQL Azure Federations primarily provide the ability to horizontally scale SQL Azure data beyond the current 150-GB limit and/or increase compute and memory resources for a particular SQL Azure implementation.

    Andrew Kerry-Bedell (@AJKB) reported growth for jobs with Windows Azure skills is #1 in his Top Linked In Profile IT Skills post of 1/30/2012:

    imageHaving spent some recent days helping colleagues with Linked In lately I took the time to review some of the lesser used features in Linked In.

    Linked In Skills is one area that seems to have been overlooked by many people. Launched 9 months ago it is likely to come to the fore for those in the recruiting business seeking to filter out candidates early on with the right type of skills.

    imageWhilst skills like ‘consulting’ and ‘presentation skills’ are featured, they’re not much use and not very focused. What are useful are more specific IT and marketing related skills. In addition, Linked In helps by showing what the year on year growth is in the profiles featuring these skills, giving a good indication of the popularity of each individual skill.

    It’s interesting to see in decline … .

    There are no real surprises in Social Media, but in the Microsoft technology space it’s interesting that SharePoint is growing well, followed by their CRM and Dynamics products.

    It’s more interesting to me to see growth in jobs requiring Windows Azure skills ahead of that for AWS, although that might result from Windows Azure’s smaller base.

    Tiffany Trader reported Gartner Says Platform as a Service is on the Cusp of Several Years of Strategic Growth in a 2/1/2012 post to the HPC in the Cloud blog:

    imagePlatform as a service (PaaS) is a core layer of the cloud computing architecture, and its evolution will affect the future of most users and vendors in enterprise software markets, according to Gartner, Inc.

    image"With large and growing vendor investment in PaaS, the market is on the cusp of several years of strategic growth, leading to innovation and likely breakthroughs in technology and business use of all of cloud computing," said Yefim Natis, vice president and distinguished analyst at Gartner. "Users and vendors of enterprise IT software solutions that are not yet engaged with PaaS must begin building expertise in PaaS or face tough challenges from competitors in the coming years."

    PaaS is a common reference to the layer of cloud technology architecture that contains all application infrastructure services, which are also known as "middleware" in other contexts. PaaS is the middle layer of the end-to-end software stack in the cloud. It is the technology that intermediates between the underlying system infrastructure (operating systems, networks, virtualization, storage, etc.) and overlaying application software. The technology services that are part of a full-scope comprehensive PaaS include functionality of application containers (servers), application development tools, database management systems, integration middleware, portal products, business process management suites and others — all offered as a service.

    In the Gartner Special Report, "PaaS 2012: Tactical Risks and Strategic Rewards" (, Gartner analysts said 2011 was a pivotal year for the PaaS market. As Gartner predicted last year in the report "PaaS Road Map: A Continent Emerging" (, the broad vendor adoption in 2011 amounted to a sound industry endorsement of PaaS as an alternative to the traditional middleware deployment models.

    In 2012, the PaaS market is at its early stage of growth and does not yet have well-established leaders, best use or business practices or dedicated standards. The adoption of PaaS offerings is still associated with some degree of uncertainty and risk.

    "However, PaaS products are likely to evolve into a major component of the overall cloud computing market, just as the middleware products — including application servers, database management systems (DBMSs), integration middleware and portal platforms — are the core foundation of the traditional software industry," Mr. Natis said. "The tension between the short-term risk and the long-term strategic imperative of PaaS will define the key developments in the PaaS market during the next two to three years."

    Some of the newly announced PaaS offerings will reach general availability late in 2012, and by the end of 2013, all major software vendors will have competitive production offerings in the PaaS market. By 2016, competition among the PaaS vendors will produce new programming models, new standards and new software market leaders. However, until then, users will continue to experience architectural changes to technologies, business models and vendor alignments in the PaaS market.

    As vendors continue to invest in PaaS services, and the major software vendors look to deliver comprehensive PaaS service portfolios, activity in all segments of PaaS will accelerate and the fast pace of growth and change in the PaaS market will create confusion, making user adoption decisions more difficult.

    "While there are clear risks associated with the use of services in the new and largely immature PaaS market, the risk of avoiding the PaaS market is equally high," said Mr. Natis. "The right strategy for most mainstream IT organizations and software vendors is to begin building familiarity with the new cloud computing opportunities by adopting some PaaS services now, albeit with the understanding of their limitations and with the expectation of ongoing change in the market offerings and use patterns."

    Additional information is available in the Gartner Special Report "PaaS 2012 — Tactical Risks and Strategic Rewards." The special report includes video commentary from Mr. Natis, as well as links to more than 30 related reports about the PaaS market.

    Lori MacVittie (@lmacvittie) asserted “It’s about operational efficiency and consistency, emulated in the cloud by an API to create the appearance of a converged platform” in an introduction to her The Cloud API is Pseudo-Consolidation of Infrastructure post of 2/1/2011 to F5’s DevCentral blog:

    imageIn most cases, the use of the term “consolidation” implies the aggregation (and subsequently elimination) of like devices. Application delivery consolidation, for example, is used to describe a process of scaling up infrastructure that often occurs during upgrade cycles.

    consolidationMany little boxes are exchanged for a few larger ones as a means to simplify the architecture and reduce the overall costs (hard and soft) associated with delivering applications. Consolidation.

    But cloud has opened (or should have opened) our eyes to a type of consolidation in which like services are aggregated; a consolidation strategy in which we layer a thin veneer over a set of adjacent functionalities in order to provide a scalable and ultimately operationally consistent experience: an API. A cloud API consolidates infrastructure from an operational perspective. It is the bringing together of adjacent functionalities into a single “entity.” Through a single API, many infrastructure functions and services can be controlled – provisioning, monitoring, security, and load balancing (one part of application delivery) are all available through the same API. Certainly the organization of an API’s documentation segments services into similar containers of functionality, but if you’ve looked at a cloud API you’ll note that it’s all the same API; only the organization of the documentation makes it appear otherwise.

    This service-oriented approach allows for many of the same benefits as consolidation, without actually physically consolidating the infrastructure. Operational consistency is one of the biggest benefits.


    The ability to consistently manage and monitor infrastructure through the same interface – whether API or GUI or script – is an important factor in data center efficiency. One of the reasons enterprises demand overarching data center-level monitoring and management systems like HP OpenView and CA and IBM Tivoli is consistency and an aggregated view of the entire data center.

    It is no different in the consumer world, where the consistency of the same interface greatly enhances the ability of the consumer to take advantage of underlying services. Convenience, too, plays a role here, as a single device (or API) is ultimately more manageable than the requirement to use several devices to accomplish the same thing. Back in the day I carried a Blackberry, a mobile phone, and a PDA – each had a specific function and there was very little overlap between the two. Today, a single “smart”phone provides the functions of all three – and then some. The consistency of a single interface, a single foundation, is paramount to the success of such consumer devices. It is the platform, whether consumers realize it or not, that enables their highly integrated and operationally consistent experience.

    The same is true in the cloud, and ultimately in the data center. Cloud (pseudo) consolidates infrastructure the only way it can – through an API that ultimately becomes the platform analogous to an iPhone or Android-based device.

    Cloud does not eliminate infrastructure, it merely abstracts it into a consolidated API such that the costs to manage it are greatly reduced due to the multi-tenant nature of the platform. Infrastructure is still managed, it’s just managed through an API that simplifies and unifies the processes to provide a more consistent approach that is beneficial to the organization in terms of hard (hardware, software) and soft (time, administration) costs.

    The cloud and its requisite API provide the consolidation of infrastructure necessary to achieve greater cost savings and higher levels of consistency, both of which are necessary to scale operations in a way that makes IT able to meet the growing demand on its limited resources.

    Mary Jo Foley (@maryjofoley) “… speculates on the future of Microsoft's public cloud play” while asking Can Microsoft Save Windows Azure? in a 2/1/2012 article for

    imageMicrosoft is slowly but surely working to make its Windows Azure cloud platform more palatable to the masses -- though without the benefit of roadmap leaks, it would be hard for most customers to know this.

    When Microsoft began cobbling together its Windows Azure cloud plans back in 2007, there was a grand architectural plan. In a nutshell, Microsoft wanted to recreate Windows so that Redmond could run users' applications and store their data across multiple Windows Server machines located in Microsoft's (plus a few partners') own datacenters. In the last five years, Microsoft has honed that vision but has never really deviated too far from its original roadmap.

    imageFor Platform as a service (PaaS) purists -- and Microsoft-centric shops -- Windows Azure looked like a distributed-systems engineer's dream come true. For those unwilling or unable to rewrite existing apps or develop new ones that were locked into the Microsoft System Center- and .NET-centric worlds, it was far less appealing.

    imageHow many external, paying customers are on Windows Azure? Microsoft officials won't say -- and that's typically a sign that there aren't many. My contacts tell me that even some of the big Azure wins that Microsoft trumpeted ended up trying Windows Azure for one project and then quietly slinking away from the platform. However, Windows Azure is no Windows Vista. Nor is it about to go the way of the Kin. But without some pretty substantial changes, it's not on track to grow the way Microsoft needs it to.

    This fact hasn't been lost on the Microsoft management. Starting last year, Microsoft began making a few customer- and partner-requested tweaks to Windows Azure around pricing. Then the 'Softies started getting a bit more serious about providing support for non-Microsoft development tools and frameworks for Windows Azure. Developer champion and .NET Corporate Vice President Scott Guthrie traded his red shirt for an Azure-blue one (figuratively -- still not yet literally) and moved to work on the Windows Azure application platform.

    Starting around March this year, Microsoft is slated to make some very noticeable changes to Windows Azure. That's when the company will begin testing with customers its persistent virtual machine that will allow users to run Windows Server, Linux(!), SharePoint and SQL Server on Windows Azure -- functionality for which many customers have been clamoring. This means that Microsoft will be, effectively, following in rival Amazon's footsteps and adding more Infrastructure as a Service components to a platform that Microsoft has been touting as pure PaaS.

    The first quarterly update to Windows Azure this year -- if Microsoft doesn't deviate from its late 2011 roadmap -- will include a number of other goodies, as well, such as the realization of some of its private-public cloud migration and integration promises. If you liked Microsoft's increased support for PHP, Java, Eclipse, Node.js, MongoDB and Hadoop from last year, take heart that the Windows Azure team isn't done improving its support for non-Microsoft technologies. Also on the Q1 2012 deliverables list is support for more easily developing Windows Azure apps not just on Windows, but also on Macs and Linux systems.

    Microsoft's new focus with Windows Azure is to allow users to start where they are rather than making them start over. That may sound like rhetoric, but it's actually a huge change, both positioning- and support-wise for Microsoft's public cloud platform. Not everyone -- inside or outside the company -- agrees that this is a positive. Hosting existing apps in the cloud isn't the same as re-architecting them so they take advantage of the cloud. It will be interesting to see whether users who are tempted by the "new" Windows Azure are happy with the functionality for which they've been clamoring.

    <Return to section navigation list>

    Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

    The TechNet Wiki published Bill Loeffler’s Wiki Page: Private Cloud Principles, Patterns, and Concepts on 1/30/2012. From the introduction:

    imageA key goal is to enable IT organizations to leverage the principles and concepts described in Reference Architecture for Private Cloud content set to offer Infrastructure as a Service (IaaS ), allowing any workload hosted on this infrastructure to automatically inherit a set of Cloud-like attributes. Fundamentally, the consumer should have the perception of infinite capacity and continuous availability of the services they consume. They should also see a clear correlation between the amount of services they consume and the price they pay for these services.

    Achieving this requires virtualization of all elements of the infrastructure (compute [processing and memory], network, and storage) into a fabric that is presented to the container, or the virtual machine (VM). It also requires the IT organization to take a service provider’s approach to delivering infrastructure, necessitating a high degree of IT Service Management maturity. Moreover, most of the operational functions must be automated to minimize the variance as much as possible while creating a set of predictable models that simplify management.
    Finally, it is vital to ensure that the infrastructure is designed in a way that services, applications, and workloads can be delivered independently from where they are originally sourced or provided. Thus, one of the major goals is to enable portability between a customer’s private cloud and external public cloud platforms and providers.

    Therefore, this requires a strong service quality driven, consumer oriented approach as opposed to a “feature” or capability oriented approach. Although this approach is not orthogonal to other approaches, it may seem counterintuitive at first. This documentation defines the process elements for planning, building, and managing a private cloud environment with a common set of best practices.

    This document is part of a collection of documents that comprise the Reference Architecture for Private Clouddocument set. The Solution for Private Cloud is a community collaboration project. Please feel free to edit this document to improve its quality. If you would like to be recognized for your work on improving this document, please include your name and any contact information at the bottom of this page.

    Table of Contents
    • Principles
      • Achieve Business Value through Measured Continual Improvement
      • Perception of Infinite Capacity
      • Perception of Continuous Service Availability
      • Take a Service Provider’s Approach
      • Optimization of Resource Usage
      • Take a Holistic Approach to Availability Design
      • Minimize Human Involvement
      • Drive Predictability
      • Incentivize Desired Behavior
      • Create a Seamless User Experience
    • Concepts
      • Predictability
      • Favor Resiliency Over Redundancy
      • Homogenization of Physical Hardware
      • Pool Compute Resources
      • Virtualized Infrastructure
      • Fabric Management
      • Elastic Infrastructure
      • Partitioning of Shared Resources
      • Resource Decay
      • Service Classification
      • Cost Transparency
      • Consumption Based Pricing
      • Security and Identity
      • Multitenancy
    • Patterns
      • Resource Pooling
        • Service Management Partitions
        • Systems Management Partitions
        • Capacity Management Partitions
      • Physical Fault Domain
      • Upgrade Domain
      • Reserve Capacity
      • Scale Unit
      • Capacity Plan
      • Health Model
      • Service Class
      • Cost Model

    This Reference Architecture link opens a list of all wiki pages in the series to date.

    <Return to section navigation list>

    Cloud Security and Governance

    David Navetta (@DavidNavetta) asked Cyber Insurance: An Efficient Way to Manage Security and Privacy Risk in the Cloud? in a 2/1/2012 post to the InfoLaw Group blog:

    imageAs organizations of all stripes increasingly rely on cloud computing services to conduct their business, (with many organizations entering into cloud computing arrangements with multiple cloud providers), the need to balance the benefits and risks of cloud computing is more important than ever. This is especially true when it comes to data security and privacy risks. Cloud providers are sitting on reams of data from thousands of customers, including sensitive information such as personal information, trade secrets, and confidential and proprietary information. To criminals Cloud providers are prime targets.

    imageAt the same time, based in large part on the amount of risk aggregated by Cloud providers, most Cloud customers are unable to secure favorable contract terms when it comes to data security and privacy. While customers may enjoy some short term cost-benefits by going into the Cloud, they may be retaining more risk then they want (especially where Cloud providers refuse to accept that risk contractually). In short, the players in this industry are at an impasse. Cyber insurance may be a solution to help solve the problem.

    A Short History of Cyber Insurance Coverage*

    *This section ended up longer than I anticipated. If you already have a base knowledge of cyber coverage or don’t want to bother with some historical background, please skip ahead to this section: "Where Privacy and Security Risk Breaks Down in Cloud Computing Contracts"

    In the early 2000s, just around the “DotCom Bust”, some insurers began developing a product designed to address the financial loss that might arise out of a data breach. This was a time where most “brick and mortar” companies were just beginning to leverage the economic potential of the Internet. At that time insurers wanted to target the big “dotcom” companies like Amazon, Yahoo, eBay, Google, etc., and other companies pioneering e-commerce and online retailing. At some point, somebody dubbed this type of insurance “cyber insurance.”

    The early cyber policies included liability and property components. The liability coverages addressed claim expenses and liability arising out of a security breach of the insured’s computer systems (some early policies only covered “technical” security breaches, as opposed to policy violation-based security breaches). The property-related components covered business interruption and data asset loss/damage arising out of a data breach (during the holiday season many online retailers suddenly developed a tasted for business interruption coverage after realizing just how negatively their business would be impacted by a denial of service attack). Additional first party coverages included cyber-extortion coverage and crisis management/PR coverage.

    Unfortunately for the carriers, it was not easy to get people to understand the need for this coverage (and that is still a challenge today, but certainly a lesser challenge with all of the security and privacy news constantly streaming). Early on there were very few lawsuits and regulators were just beginning to consider enforcement of relatively new statutes like GLB and HIPAA.

    Two things changed that made cyber insurance much more relevant. One was a rather sudden event, and the other more gradual.

    First, in 2003, California passed SB1386, the world’s first breach notification law. The reality then (as now) is that companies suffer security breaches each and every day. Prior to SB1386, however, breaches of personal information simply went unreported. With SB1386 and the subsequent passage of breach notice laws in 45 other states (and now coming internationally), the risk profile changed for data breaches. Instead of burying the breaches, companies were required to incur significant direct expenses to investigate security breaches and comply with applicable breach notice laws, including the offering of credit monitoring to affected individuals (which is not legally required by existing breach notice laws, but is optionally provided by many companies or "suggested" by state regulators). As a result, the plaintiffs’ bar now had notice of security breaches and began filing class action lawsuits after big breaches (usually involving high-profile brand name organizations). As such, cyber insurance coverage went from coverage addressing a hypothetical risk of future lawsuits, to a coverage addressing real-life risk (and now we have lawsuits getting deeper into litigation and public settlements of these types of cases). Moreover, shortly after the passage of SB 1386 many cyber insurance policies began covering the direct costs associated with complying with breach notification laws, including attorney fees, forensic investigation expenses, printing and mailing costs, credit monitoring expenses and call center expenses. Breach notification costs are direct and almost unavoidable after a personal information breach. Regardless of lawsuit activity, a direct financial rationale for cyber insurance coverage now existed.

    The other change that occurred more gradually over time, but which has had a significant impact concerning the frequency and magnitude of data breaches was organized crime. In the early 2000s hacking was more of an exercise in annoyance or a used for bragging purposes. Hackers at that time wanted their exploits talked about and know. They wanted credit for hacking into or bringing down a sophisticated company (or better yet a division of the Federal Government or military). As such, when an attack happened it was discovered and remediated, and that would be the end of it.

    True criminals, of course, are less interested in such notoriety. In fact, when trying to steal thousands/millions of records to commit identity theft or credit card fraud it is much better to NOT be detected. Lingering on a company’s network taking information for months or years is a much more profitable endeavor. Recognizing that this type of crime is low risk (it can be performed from thousands of miles away in Eastern Europe with almost not chance of getting caught) and high reward, organized crime flooded into the space. And in this context the word “organized” is truly appropriate – these enterprises retain very smart IT-oriented people that use every tool possible to scale and automate their crimes. They leverage the communication tools on the Internet to fence their “goods” creating, for example, wholesale and retail markets for credit cards, or “eBay”-like auction sites to hawk their illicit wares (e.g. valuable information). The change in orientation described above has essentially resulted in a 24/7/365 relentless crime machine constantly attacking and looking for new ways to attack, and always seeming to be one step ahead of those seeking to stop them. That is why we read about security and privacy breaches practically every day in the newspaper.

    Fast-forward to present time. Cyber insurance is a much more established market with more carriers entering on a regular basis. There are primary and excess markets available for big risks, and companies of all sizes are looking at cyber more as a mandatory purchase rather than discretionary. As the world continues to change at seemingly light-speed and cyber risks increase (with the advent of hacktivism, social media and the consumerization of IT/BYOD ) the need for cyber is also growing. With competition pushing cyber insurance prices down, and significant security and privacy risk being retained by organizations, risk transfer is becoming very attractive (and from an overall big picture systemic point of view, spreading is risk is also attractive). Another area where cyber may help smooth out security and privacy risk is with cloud computing.

    Where Privacy and Security Risk Breaks Down in Cloud Computing Contracts

    As we have written extensively of in the past, Cloud computing raises significant privacy and security risks that are often difficult to hammer out in a Cloud computing negotiation (to the extent a Cloud customer gets a chance to negotiate at all). The net result of these contract negotiation difficulties and Cloud provider unwillingness in many cases to take on meaningful risk contractually, is that the risk is retained solely by the Cloud customer. The following examples outline the privacy and security-related Cloud issues that impact the Cloud customer's risk:

    • a Cloud provider failing to maintain reasonable security to prevent data breaches;
    • a Cloud provider failing to comply with privacy and security laws applicable to the Cloud customer;
    • a Cloud provider refusing to allow a Cloud customer to conduct its own independent forensic investigation of a data breach suffered by a Cloud provider;
    • potential conflict of interests with respect a Cloud provider’s handling a data breach that may have been the fault of the Cloud provider, including failing to cooperate with its Cloud customers if that cooperation could adversely impact the Cloud provider;
    • the Cloud customer’s potential obligation to comply with breach notice laws, including absorbing expenses for legal fees, forensic investigators, printing and mailing, credit monitoring and maintain a call center;
    • lawsuits and regulatory actions against the Cloud customer because of Cloud provider security and privacy breaches, and the legal fees, judgments, fines, penalties and settlement costs associated with them; and
    • Cloud providers seeking to leverage and data mine Cloud customer information being processed in the Cloud.

    The justification used by Cloud providers to avoid responsibilities for these risks and the costs associated with them is essentially risk aggregation. Cloud providers maintain that, because they serve hundreds or thousands of customers on shared computing resources, a single attack could expose Cloud providers to liability from all of those customers at the same time. In fact, we already have one example involving a business interruption of a Cloud provider that demonstrates how multiple customers can be affected by a security breach. They also claim that independent forensic investigations by customers in the wake of a data breach are not possible because they cannot accommodate multiple customers at one time, and even if they could a forensic assessment would essentially expose each Cloud customer’s data to every Cloud customer conducting such an investigation.

    Cyber Insurance: Addressing Retained Risk in the Cloud

    So how does cyber insurance fit into this picture? As it currently stands, cyber insurance can be a very valuable tool for Cloud customers who are not able to get their providers to contractually take financial responsibility for security and privacy risk. Most cyber insurance policies cover data security and privacy breaches of not only the computer networks directly under the control of the insured, but also those computer networks operated by third parties for or on behalf of the insured. What this means in the Cloud context is that most cyber insurance policies may cover data breaches of the Cloud provider’s systems where the Cloud customer's/insured's data is stored and processed on those systems. This coverage will typically include most of the expenses listed above, including those direct expenses to comply with breach notice laws and costs to defend lawsuits and regulatory actions arising out of Cloud provider data breaches. As such, in the event a Cloud customer cannot get reasonable contract terms, assuming it has purchased the correct cyber coverage, it will have a fallback risk transfer and will not be retaining that risk solely on its own.

    Is there a catch? Not really currently, except of course the premium that must be paid and the fact that most cyber insurance policies have a self-insured retention that must be satisfied by the insured before the carriers is required to pay. However, there may be longer term problems that arise for the carriers.

    At this point, whether they like it or not, carriers whose cyber insurance policies cover security and privacy breaches of third party service providers are already beginning to aggregate their risk when it comes to Cloud providers. Imagine a world with a relatively small number of Cloud providers serving a much larger customer base (to some degree we may already live in such a world considering the dominance of Google, Amazon, Rackspace and other big cloud players). Many insureds/Cloud customers are going to be dealing with this relatively small number of Cloud providers. For example, I am sure that for most cyber insurance companies, if they were to check their books, would find that many of their insureds already use the same Cloud providers and/or other third party service providers to store and process the insureds’ data. Further consolidation of Cloud provider, should that occur, will only increase the aggregation of risk.

    However, as long as cyber insurance is more widely adopted, the aggregation risk may be manageable. The entire purpose of insurance is to spread the risk across a wide community of insureds, and by doing so hopefully individual insureds that experience a breach are not catastrophically impacted. At the same time carriers can build reserves and achieve reasonable profits. The long term question is whether there are enough insureds purchasing cyber insurance to spread the risk and allow for the building of reserves to cover a breach of a major cloud provider that impacts a wide audience of insureds.

    We probably are not there yet, and unless demand increases, we may not get there. One thing that may happen, perhaps, is a push from the Cloud provider/customer community to somehow make cyber insurance more of a mandatory condition of doing business in the Cloud. Time will tell as to whether the cyber insurers view this aggregation issue as serious, and whether they will take steps to mitigate it (hopefully those steps will not involve narrowing the coverage). In the meantime, companies that are going deep into the Cloud should quantify the risk they are retaining and seriously consider Cyber insurance coverage. The price may be right, and the peace of mind priceless.

    <Return to section navigation list>

    Cloud Computing Events

    Robert Cathey (@robertcathey) posted An Announcement, a Party, and Six Talks at Cloud Connect to the Cloudscaling blog on 2/2/2012:

    imageThe week of February 13 is shaping up to be a milestone in the life of our company. There’s a lot going on, and we’d love for you to join us.

    Monday, February 13
    We’ve got a couple of big announcements on tap. The first is related to the next stage of our growth as the leading provider of open cloud solutions to support next-generation applications. The second is a large reference customer who is providing yet another proof point for the Cloudscaling approach to cloud building. Get all the details on Monday.

    imageAlso on Monday, Randy Bias and David Bernstein will be lending their expertise to the Carrier Cloud Forum at Cloud Connect in Santa Clara. First, Randy will co-present on the main stage with one of our reference clients about a new open cloud deployment. Later that day, Randy and David will participate on panels featuring current cloud use cases and choosing cloud models that fit the unique requirements of carriers and service providers.

    Tuesday, February 14
    We know, it’s Valentine’s Day. But, before you go out on your date, come hear Randy talk about the love of Cloudscaling’s life: Open and Scalable Clouds. He’ll give his talk on the main stage of David Linthicum’s Architecture track. Randy will highlight the factors driving adoption of open clouds. He’ll examine the role open clouds play in supporting next-generation applications that leverage the power of web technologies, mobile computing and big data processing to drive new business initiatives. And you’ll still have time to pick up some candy and flowers afterwards.

    Wednesday, February 15
    Over in Scott Bils’ Organizational Readiness track, Cloudscaler Francesco Paola will give a lightening talk along with Toby Ford of AT&T. They’ll offer practical examples of organizational readiness issues in large, open cloud deployments. Cesco will talk about KT, and Toby about AT&T. They’re sharing the stage with Simon Wardley, a Cloudscaling advisor and prolific writer on the topic of cloud disruption, both in terms of business models and organizational preparedness. Later in that same track, Cesco will join a panel discussing how to train for, hire, and incentivize IT professionals with the right cloud skills.

    On Wednesday evening, we’re having a cocktail party to celebrate the big week. Join us from 6:00 until 8:00 pm in the Magnolia room at the Hyatt Santa Clara (attached to the Santa Clara Convention Center where Cloud Connect is taking place). We’ll have food, drinks, music and conversation about open clouds and what we’re hearing at Cloud Connect. (Check out the invitation to get a sneak preview of our new branding.)

    We’d like to thank our partner, Cloud Technology Partners (cloudTP), for co-sponsoring the event with us. They’ve been an important team member on several projects with us, and we’re thrilled they’re joining us.

    One more thing… we’re hearing that there might be an OpenStack event later that evening. If it comes through, we’ll be sure and let you know about that as well.

    Registering for Cloud Connect
    If you’re coming to the Carrier Cloud Forum or Cloud Connect, register here and use the discount code CLOUDSALING to take 25% off the day-of admission price.

    We hope you can join us for some or all of what will be a milestone week in our company’s young history.

    • Eric Nelson (@ericnel) reported Slides, Notes and Recordings for Week 1 and 2 of #6weeksazure are all now available on 2/2/2012:

    Head over to and click on Week 1 and Week 2 to get details.


    imageIt is not too late to join us – there is after all still 4 more weeks to go and we have recordings of the first two weeks.

    Sign up today!

    Note: This is primarily targeting UK companies – but we do not turn anyone away.

    David Pallman described Upcoming February Sessions on Web and Cloud at Portland Cloud Intelligence and CloudFest Denver in a 2/1/2012 post:

    imageHere are some talks I and my Neudesic colleagues are giving on web, cloud, and programming at Portland Cloud Intelligence (Feb 2) and CloudFest Denver (Feb 9):

    Development in the Cloud
    Stuart Celarier
    Portland Cloud Intelligence Thu Feb 2 1:00p
    View Walk-through
    What’s it like to develop for the cloud? In this session, you’ll see what the development experience is like for Windows Azure. We’ll start with “Hello, Cloud” in Visual Studio, running it first in the local simulation environment and then deploying to a data center in the cloud. We’ll progressively add more features to the application to illustrate the use of different cloud services. Both the local simulation environment and deployment to a cloud data center will be shown. We’ll also discuss the software development lifecycle and share best practices for cloud development. You’ll leave with an understanding of the differences and similarities between cloud development and enterprise development.

    Keeping an Eye on the Cloud: Azure Diagnostics
    Mike Erickson (@mgerickson)
    CloudFest Denver Thu Feb 9 4:10p
    When an application is running in Windows Azure it can be difficult to know what is happening. In this session we will cover the techniques that you can implement to gather diagnostic information from the instances running in Azure. We will discuss how you can instrument your code to help you understand what is happening while it is running. We will also look at how you can connect to instances that are running in Azure.

    When Worlds Collide: HTML5 Meets the Cloud
    David Pallmann (@davidpallmann)
    Portland Cloud Intelligence Thu Feb 3:30p
    CloudFest Denver Thu Feb 9 1:20p
    View Presentation
    What does HTML5 mean for cloud computing, and vice-versa? When two revolutions are happening simultaneously you can expect interesting synergies and interactions. Even as cloud computing and social networks are profoundly transforming the back end, HTML5 and mobile devices are profoundly transforming the front end. When these worlds collide we can achieve something truly remarkable: rich, take-anywhere immersive experiences backed by on-demand, elastic services running at global scale. Both worlds are driving changes in the way we design software, and good solution architecture demands they be considered jointly. In this session you'll see how HTML5 and the cloud combine functionally and architecturally, illustrated by example.

    <Return to section navigation list>

    Other Cloud Computing Platforms and Services

    The HP Cloud Services Team announced on 2/2/2012 the Rollout of a New HP Cloud Identity Service on 2/2/7/2012:

    imageWe’re excited to inform you of some important changes we are making to HP Cloud Services. On February 7th, we plan to roll out our new HP Cloud Identity Service. As with all maintenance and status information, we will post the exact timeframe and impact at

    The new Identity Service delivers consistent identity management and authentication processes across all HP Cloud Services in addition to providing an interface that is compatible with the OpenStack™ Keystone APIs. The Identity Service is an important enhancement because it will serve as a foundational layer that we’ll use to build and grow capabilities for all HP Cloud Services offerings.
    The HP Cloud Identity Service will make accessing your HP Cloud Services more efficient. With the HP Cloud Identity Service, you’ll have the ability to:

    • Access HP Cloud Compute, HP Cloud Object Storage, and all future services with one set of credentials instead of unique credentials for each service
    • Use a single API endpoint for token management and user authentication instead of individual endpoints for each service
    • Combine various Compute and Object Storage resources into a tenant or group of resources, allowing for easier resource management
    • Access a catalog of the HP Cloud Services available to you at any time

    What You Will Need to Do

    • If you are interacting directly with the API for HP Cloud Compute and HP Cloud Object Storage, you must change the authentication endpoint anywhere you have explicitly used it, including configuration files.
    • If you are using a CLI or a language binding to interact with HP Cloud Services, you will need to upgrade to the latest versions. Updated tools will be available on when the Identity Service is released.
    • Any third-party tools (e.g. Cyberduck, Euca2ools) will need to be configured to use the new authentication process.

    Our intro guide and step-by-step instructions further detail the new Identity Service and authentication steps for both HP Cloud Compute and HP Cloud Object Storage. If you have questions, don’t hesitate to engage live customer support, available Monday-Friday, 7am-7pm CST, by online chat, submitting a request at, or email.

    We think you’ll find that the HP Cloud Identity Service provides a more seamless experience with HP Cloud Services, and we look forward to sharing news of future improvements with you.

    The HP Cloud Services Team

    I have an account for testing the HP Cloud Services beta version. For more information about HP Cloud Services, see my A First Look at HP Cloud Services post of 12/7/2012.

    • David Linthicum (@DavidLinthicum) asserted “'s cloud services are quickly becoming the only cloud platform that businesses consider” in a deck for his Amazon Web Services: The new Microsoft Windows article of 2/2/2012 for InfoWorld’s Cloud Computing blog:

    imageThe news hit this week: "As of the end of 2011, there are 762 billion objects in Amazon S3 (Simple Storage Service). We process over 500,000 requests per second for these objects at peak times," blogged This represents an annual growth rate of 192 percent.

    imagePretty good for a cloud company still growing a retail business. The growth can be attributed to Amazon Web Services' focus on building a successful cloud storage offering and the rapid adoption of AWS -- including the S3 cloud storage service -- as the cloud computing standard.

    imageMore enterprises are seeking storage in the cloud as they rapidly fill up their on-premise storage and as the cloud storage providers become more secure, reliable, and cost competitive. AWS is typically first on business buyers' lists. In fact, I'm finding that more and more ask only for AWS; others are not even considered.

    As cloud computing continues its rapid growth, we could face the reality that we have a single cloud computing provider that's far ahead of the competition. Indeed, AWS -- all its components, not just S3 -- is becoming to the enterprise what Windows was to the desktop in the 1990s. could have a virtual lock on the market, with better third-party support, and become the de facto standard. Maybe it already is.

    Of course the market has not played out. It's just beginning to grow to align with the hype over the last few years. I suspect that in the short term, AWS's success will benefit other cloud providers, as it validates the cloud as a viable option for business computing and as the overall cloud market continues to define itself.

    But in a few years, I do see a time when just a few providers will dominate the market. The investment in infrastructure it takes to operate a cloud computing service makes that inevitable. And there's no question that AWS will lead that pack, like Microsoft has in operating systems and office productivity tools and SAP has in ERP.

    Personally, I would rather see 100 cloud computing providers rather than just three or four -- that just seems healthier. But IT historically prefers a dominant provider for compati[b]ility, integration, and other standardization reasons. And it now seems to be making a similar choice in the cloud.

    David doesn’t limit the horizon of his judgment of Amazon’s triumph to IaaS, which I believe he should. The PaaS market leader is yet to be determined.

    Jeff Barr (@jeffbarr) posted New Elastic MapReduce Features: Metrics, Updates, VPC, and Cluster Compute Support (Guest Post) by Adam Gray on 1/31/2012:

    Today's guest blogger is Adam Gray. Adam is a Product Manager on the Elastic MapReduce Team.

    imageWe’re always excited when we can bring features to our customers that make it easier for them to derive value from their data—so it’s been a fun month for the EMR team. Here is a sampling of the things we’ve been working on.

    Free CloudWatch Metrics
    Starting today customers can view graphs of 23 job flow metrics within the EMR Console by selecting the Monitoring tab in the Job Flow Details page. These metrics are pushed CloudWatch every five minutes at no cost to you and include information on:

    • Job flow progress including metrics on the number of map and reduce tasks running and remaining in your job flow and the number of bytes read and written to S3 and HDFS.
    • Job flow contention including metrics on HDFS utilization, map and reduce slots open, jobs running, and the ratio between map tasks remaining and map slots.
    • Job flow health including metrics on whether your job flow is idle, if there are missing data blocks, and if there are any dead nodes.

    Please watch this video to see how to view CloudWatch graphs in the EMR Console:

    You can also learn more from the Viewing CloudWatch Metrics section of the EMR Developer Guide.

    You can view the new metrics in the AWS Management Console:

    Further, through the CloudWatch Console, API, or SDK you can set alarms to be notified via SNS if any of these metrics go outside of specified thresholds. For example, you can receive an email notification whenever a job flow is idle for more than 30 minutes, HDFS Utilization goes above 80%, or there are five times as many remaining map tasks as there are map slots, indicating that you may want to expand your cluster size.

    Please watch this video to see how to set EMR alarms through the CloudWatch Console:

    Hadoop 0.20.205, Pig 0.9.1, and AMI Versioning
    EMR now supports running your job flows using Hadoop 0.20.205 and Pig 0.9.1. To simplify the upgrade process, we have also introduced the concept of AMI versions. You can now provide a specific AMI version to use at job flow launch or specify that you would like to use our “latest” AMI, ensuring that you are always using our most up-to-date features. The following AMI versions are now available:

    • Version 2.0.x: Hadoop 0.20.205, Hive 0.7.1, Pig 0.9.1, Debian 6.0.2 (Squeeze)
    • Version 1.0.x: Hadoop 0.18.3 and 0.20.2, Hive 0.5 and 0.7.1, Pig 0.3 and 0.6, Debian 5.0 (Lenny)

    You can specify an AMI version when launching a job flow in the Ruby CLI using the --ami-version argument (note that you will have to download the latest version of the Ruby CLI):

    $ ./elastic-mapreduce --create --alive --name "Test AMI Versioning" --ami-version latest --num-instances 5 --instance-type m1.small

    Please visit the AMI Versioning section of the Elastic MapReduce Developer Guide for more information.

    S3DistCp for Efficient Copy between S3 and HDFS
    We have also made available S3DistCp, an extension of the open source Apache DistCp tool for distributed data copy, that has been optimized to work with Amazon S3. Using S3DistCp, you can efficiently copy large amounts of data between Amazon S3 and HDFS on your Amazon EMR job flow or copy files between Amazon S3 buckets. During data copy you can also optimize your files for Hadoop processing. This includes modifying compression schemes, concatenating small files, and creating partitions.
    For example, you can load Amazon CloudFront logs from S3 into HDFS for processing while simultaneously modifying the compression format from Gzip (the Amazon CloudFront default) to LZO and combining all the logs for a given hour into a single file. As Hadoop jobs are more efficient processing a few, large, LZO-compressed files than processing many, small, Gzip-compressed files, this can improve performance significantly.

    Please see Distributed Copy Using S3DistCp in the Amazon Elastic MapReduce documentation for more details and code examples.

    cc2.8xlarge Support
    Amazon Elastic MapReduce also now supports the new Amazon EC2 Cluster Compute instance, Cluster Compute Eight Extra Large (cc2.8xlarge). Like other Cluster Compute instances, cc2.8xlarge instances are optimized for high performance computing, giving customers very high CPU capabilities and the ability to launch instances within a high bandwidth, low latency, full bisection bandwidth network. cc2.8xlarge instances provide customers with more than 2.5 times the CPU performance of the first Cluster Compute instance (cc1.4xlarge) instance, more memory, and more local storage at a very compelling cost. Please visit the Instance Types section of the Amazon Elastic MapReduce detail page for more details.

    In addition, we are pleased to announce an 18% reduction in Amazon Elastic MapReduce pricing for cc1.4xlarge instances, dropping the total per hour cost to $1.57. Please visit the Amazon Elastic MapReduce Pricing Page for more details.

    VPC Support
    Finally, we are excited to announce support for running job flows in an Amazon Virtual Private Cloud (Amazon VPC), making it easier for customers to:

    • Process sensitive data - Launching a job flow on Amazon VPC is similar to launching the job flow on a private network and provides additional tools, such as routing tables and Network ACLs, for defining who has access to the network. If you are processing sensitive data in your job flow, you may find these additional access control tools useful.
    • Access resources on an internal network - If your data is located on a private network, it may be impractical or undesirable to regularly upload that data into AWS for import into Amazon Elastic MapReduce, either because of the volume of data or because of its sensitive nature. Now you can launch your job flow on an Amazon VPC and connect to your data center directly through a VPN connection.

    You can launch Amazon Elastic MapReduce job flows into your VPC through the Ruby CLI by using the --subnet argument and specifying the subnet address (note that you will have to download the latest version of the Ruby CLI):

    $ ./elastic-mapreduce --create --alive --subnet "subnet-identifier"

    Please visit the Running Job Flows on an Amazon VPC section in the Elastic MapReduce Developer Guide for more information.

    <Return to section navigation list>