Friday, February 10, 2012

Windows Azure and Cloud Computing Posts for 2/9/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI Access Control, Connect, SQL Azure Database, and other cloud-computing articles. image222


• Updated 2/12/2012 with new articles marked

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

Azure Blob, Drive, Table, Queue and Hadoop Services, Big Data

• Roope Astala posted “Cloud Numerics” Example: Analyzing Demographics Data from Windows Azure Marketplace to the Microsoft Codename “Cloud Numerics” blog on 2/7/2012 (missed when published):

imageImagine your sales business is ‘booming’ in cities X, Y, and Z, and you are looking to expand. Given that demographics provide regional indicators of sales potential, how do you find other cities with similar demographics? How do you sift though large sets of data to identify new cities and new expansion opportunities?

In this application example, we use “Cloud Numerics” to analyze demographics data such as average household size and median age between different postal (ZIP) codes in the United States.

This blog post demonstrates the following process for analyzing demographics data:

  1. We go through the steps of subscribing to the dataset in the Windows Azure Marketplace.
  2. We create a Cloud Numerics C# application for reading in data and computing correlation between different ZIP codes. We partition the application into two projects: one for reading in the data, and another one for performing the computation.
  3. Finally, we step through the writing of results to Windows Azure blob storage.

Before You Begin
  • Create a Windows Live ID for accessing Windows Azure Marketplace
  • Create a Windows Azure subscription for deploying and running the application (if do not have one already)
  • Install “Cloud Numerics” on your local computer where you develop your C# applications
  • Run the example application as detailed in the “Cloud Numerics” Lab Getting Started wiki page to validate your installation

Because of the memory usage, do not attempt to run this application on your local development machine. Additionally, you will need to run it on at least two compute nodes on your Windows Azure cluster.


You specify how many compute nodes are allocated when you use the Cloud Numerics Deployment Utility to configure your Windows Azure cluster. For details, see this section in the Getting Started guide. …

Roope continues with the following step-by-step sections:

  • Step 1: Subscribe to the Demographics Dataset at Windows Azure Marketplace
  • Step 2: Set Up the Cloud Numerics Project
  • Step 3: Add Service Reference to Dataset
  • Step 4: Create a Serial Reader for Windows Azure Marketplace Data
  • Step 5: Parallelize Reader Using IParallelReader Interface
  • Step 6: Add Method for Getting Mapping from Rows to Geographies
  • Step 7: Compute Correlation Matrix Using Distributed Data
  • Step 8: Select Interesting Subset of Cities and Write the Correlations into a BlobStep 9: Putting it All Together

The resulting worksheet appears as follows:


imageCompare the above with the data pattern from the worksheet created by the Latent Semantic Indexing sample in my Deploying “Cloud Numerics” Sample Applications to Windows Azure HPC Clusters post of 1/28/2012:


Roope also includes copyable source code for his Demographics Program.cs and DemographicsReader Class1.cs classes at the end of his post.

• Yanpei Chen, Archana Ganapathi, Rean Griffith, and Randy Katz recently posted The Case for Evaluating MapReduce Performance Using Workload Suites to GitHub. From the Abstract and Introduction:


imageMapReduce systems face enormous challenges due to increasing growth, diversity, and consolidation of the data and computation involved. Provisioning, configuring, and managing large-scale MapReduce clusters require realistic, workload-specific performance insights that existing MapReduce benchmarks are ill-equipped to supply.

In this paper, we build the case for going beyond benchmarks for MapReduce performance evaluations. We analyze and compare two production MapReduce traces to develop a vocabulary for describing MapReduce workloads. We show that existing benchmarks fail to capture rich workload characteristics observed in traces, and propose a framework to synthesize and execute representative workloads. We demonstrate that performance evaluations using realistic workloads gives cluster operator new ways to identify workload-specific resource bottlenecks, and workload-specific choice of MapReduce task schedulers.

We expect that once available, workload suites would allow cluster operators to accomplish previously challenging tasks beyond what we can now imagine, thus serving as a useful tool to help design and manage MapReduce systems.


imageMapReduce is a popular paradigm for performing parallel computations on large data. Initially developed by large Internet enterprises, MapReduce has been adopted by diverse organizations for business critical analysis, such as click
stream analysis, image processing, Monte-Carlo simulations, and others [1]. Open-source platforms such as Hadoop have accelerated MapReduce adoption.

While the computation paradigm is conceptually simple, the logistics of provisioning and managing a MapReduce cluster are complex. Overcoming the challenges involved requires understanding the intricacies of the anticipated workload. Better knowledge about the workload enables better cluster provisioning and management. For example, one must decide how many and what types of machines to provision for the cluster.

This decision is the most difficult for a new deployment that lacks any knowledge about workload-cluster interactions, but needs to be revisited periodically as production workloads continue to evolve. Second, MapReduce configuration
parameters must be fine-tuned to the specific deployment, and adjusted according to added or decommissioned resources from the cluster, as well as added or deprecated jobs in the workload. Third, one must implement an appropriate workload management mechanism, which includes but is not limited to
job scheduling, admission control, and load throttling.

Workload can be defined by a variety of characteristics, including computation semantics (e.g., source code), data characteristics (e.g., computation input/output), and the realtime job arrival patterns. Existing MapReduce benchmarks, such as Gridmix [2], [3], Pigmix [4], and Hive Benchmark [5],
test MapReduce clusters with a small set of “representative” computations, sized to stress the cluster with large datasets.

While we agree this is the correct initial strategy for evaluating MapReduce performance, we believe recent technology trends warrant an advance beyond benchmarks in our understanding of workloads. We observe three such trends:

  1. Job diversity: MapReduce clusters handle an increasingly diverse mix of computations and data types [1]. The optimal workload management policy for one kind of computation and data type may conflict with that for
    another. No single set of “representative” jobs is actually representative of the full range of MapReduce use cases.
  2. Cluster consolidation: The economies of scale in constructing large clusters makes it desirable to consolidate many MapReduce workloads onto a single cluster [6], [7]. Cluster provisioning and management mechanisms must account for the non-linear superposition of different workloads. The benchmark approach of high-intensity, short duration measurements can no longer capture the variations in workload superposition over time.
  3. Computation volume: The computations and data size handled by MapReduce clusters increases exponentially [8], [9] due to new use cases and the desire to perpetually archive all data. This means that small misunderstanding of workload characteristics can lead to large penalties.

Given these trends, it is no longer sufficient to use benchmarks for cluster provisioning and management decisions. In this paper, we build the case for doing MapReduce performance evaluations using a collection of workloads, i.e., workload
suites. To this effect, our contributions are as follows:

  • Compare two production MapReduce traces to both highlight the diversity of MapReduce use cases and develop a way to describe MapReduce workloads.
  • Examine several MapReduce benchmarks and identify their shortcomings in light of the observed trace behavior.
  • Describe a methodology to synthesize representative workloads by sampling MapReduce cluster traces, and then execute the synthetic workloads with low performance overhead using existing MapReduce infrastructure.
  • Demonstrate that using workload suites gives cluster operators new capabilities by executing a particular workload to identify workload-specific provisioning bottlenecks and inform the choice of MapReduce schedulers.

We believe MapReduce cluster operators can use the workload suites to accomplish a variety of previously challenging tasks, beyond just the two new capabilities demonstrated here. For example, operators can anticipate the workload growth in different data or computational dimensions, provision the
added resources just in time, instead of over-provisioning with wasteful extra capacity. Operators can also select highly specific configurations optimized for different kinds of jobs within a workload, instead of having uniform configurations optimized for a “common case” that may not exist. Operators can also anticipate the impact of consolidating different workloads onto the same cluster. Using the workload description vocabulary we introduce, operators can systematically quantify the superposition of different workloads across many workload
characteristics. In short, once workload suites become available, we expect cluster operators to use them to accomplish innovative tasks beyond what we can now imagine.

In the rest of the paper, we build the case for using workload suites by looking at production traces (Section II) and examining why benchmarks cannot reproduce the observed behavior (Section III). We detail our proposed workload synthesis and execution framework (Section IV), demonstrate that it executes representative workloads with low overhead, and gives cluster operators new capabilities (Section V). Lastly, we discuss opportunities and challenges for future work (Section VI). …

Project documentation and repository downloads are available here.

It appears to me that this technique would be useful for use with Apache Hadoop on Windows Azure clusters, as well as MapReduce operations in Microsoft Codename “Cloud Numerics” and Microsoft Research’s Excel Cloud Data Analytics and Project “Daytona.”

Three of the authors are from the University of California at Berkeley’s Electrical Engineering (my Alma Mater) and Computers Science department (fychen2, rean, randyg and one is from Splunk, Inc. (

Denny Lee (@dennylee) described Connecting Power View to Hadoop on Azure–An #awesomesauce way to view Big Data in the Cloud in a 2/10/2012 post:

imageThe post Connecting PowerPivot to Hadoop on Azure – Self Service BI to Big Data in the Cloud provided the step-by-step details on how to connect PowerPivot to your Hadoop on Azure cluster. And while this is really powerful, one of the great features as part of SQL Server 2012 is Power View (formerly known as Project Crescent). With Power ‘View, the SQL Server BI stack extends the concept of Self Service BI (PowerPivot) to Self service Reporting.


image_thumb3_thumbAbove is a screenshot of the Power View Mobile Hive Sample that is built on top of the PowerPivot workbook created in the Connecting PowerPivot to Hadoop on Azure blog post. But taking a different medium, the steps to create a Power View report with Hadoop on Azure source can be seen in the YouTube video below.

Power View Report to Hadoop on Azure

Avkash Chauhan (@avkashchauhan) asked How many copies of your blob is stored in Windows Azure Blob Storage? in a 2/9/2012 post:

imageI was recently asked about if someone store their content at Windows Azure Storage, how secure is it and how many copies are there, so I decided to write this article.

Technically there are 6 copies of your blob content, distributed in two separate data centers within same continent as below:

imageWhen you copy a blob in Azure Storage, the same blob is replicated into 2 more copies in the same data center. At any given time you just access the primary copy; however, other 2 copies are for recovery purposes. So you have 3 copies of your blob data in specific data center which you have selected during Azure Storage. And after Geo-Replication you have 3 more copies of the same blob in other Data Center within same continent. This way you have 6 copies of your data in Windows Azure which can make you believe that your data is safe. However if you delete a blob and ask Windows Azure Support team to get it back, which is not possible. The sync and replication is done so fast is that once a blob is removed from primary location it is instantly gone from other location. So it is your responsibility to play with your blob content carefully.

When you create, update, or delete data to your storage account, the transaction is fully replicated on three different storage nodes across three fault domains and upgrade domains inside the primary location, then success is returned back to the client. Then, in the background, the primary location asynchronously replicates the recently committed transaction to the secondary location. That transaction is then made durable by fully replicating it across three different storage nodes in different fault and upgrade domains at the secondary location. Because the updates are asynchronously geo-replicated, there is no change in existing performance for your storage account.

Learn more at:

Benjamin Guinebertière (@benjguin) described Analyzing 1 TB of IIS logs with Hadoop Map/Reduce on Azure with JavaScript | Analyse d’1 To de journaux IIS avec Hadoop Map/Reduce en JavaScript in a 2/9/2012 post. From the English version:

imageAs described in a previous post, Microsoft has ported Apache Hadoop to Windows Azure (this will also be available on Windows Server). This is available as a private community technology preview for now.

image_thumb3_thumbThis does not use Cygwin. One of the contributions Microsoft will propose in return to the open source community is the possibility to use JavaScript.

imageOne of the goals of Hadoop is to work on large amount of unstructured data. In this sample, we’ll use JavaScript code to parse IIS logs and get information from authenticated sessions.

The Internet Information Services (IIS) logs come from a Web Farm. It may be a web farm on premises or a Web Role on Windows Azure. The logs are copied and consolidated to Windows Azure blob storage. We get a little more than 1 TB of those. Here is how this looks from Windows Azure Storage Explorer:


and from the interactive JavaScript console:



1191124656300 Bytes = 1,083321564 TB

Here is how log files look like:

#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
#Date: 2012-01-06 09:09:05
#Fields: date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2012-01-06 09:09:05 W3SVC1273337584 RD00155D360166 GET /cuisine-francaise - 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) - 200 0 0 5734 321 3343
2012-01-06 09:09:12 W3SVC1273337584 RD00155D360166 GET /cuisine-francaise/huitres - 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) - 200 0 0 4922 346 890
2012-01-06 09:09:19 W3SVC1273337584 RD00155D360166 GET /cuisine-japonaise - 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= 200 0 0 3491 544 906
2012-01-06 09:09:22 W3SVC1273337584 RD00155D360166 GET /cuisine-japonaise/assortiment-de-makis - 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= 200 0 0 3198 557 671
2012-01-06 09:09:27 W3SVC1273337584 RD00155D360166 GET /blog - 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= 200 0 0 3972 544 2406
2012-01-06 09:09:30 W3SVC1273337584 RD00155D360166 GET /blog/marmiton - 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= 200 0 0 5214 519 718
2012-01-06 09:09:49 W3SVC1273337584 RD00155D360166 GET /ustensiles - 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= 200 0 0 6897 525 2859
2012-01-06 09:22:13 W3SVC1273337584 RD00155D360166 GET /Users/Account/LogOn ReturnUrl=%2Fustensiles 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= 200 0 0 3818 555 1203
2012-01-06 09:22:26 W3SVC1273337584 RD00155D360166 POST /Users/Account/LogOn ReturnUrl=%2Fustensiles 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= 302 0 0 729 961 703
2012-01-06 09:22:27 W3SVC1273337584 RD00155D360166 GET /ustensiles - 80 Test0001 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54 200 0 0 7136 849 1249
2012-01-06 09:22:30 W3SVC1273337584 RD00155D360166 GET / - 80 Test0001 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54 200 0 0 3926 788 1031
2012-01-06 09:22:57 W3SVC1273337584 RD00155D360166 GET /cuisine-francaise - 80 Test0001 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54 200 0 0 5973 795 1093
2012-01-06 09:23:00 W3SVC1273337584 RD00155D360166 GET /cuisine-francaise/gateau-au-chocolat-et-aux-framboises - 80 Test0001 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54 200 0 0 8869 849 749
2012-01-06 09:30:50 W3SVC1273337584 RD00155D360166 GET / - 80 - HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - - 200 0 0 3687 364 1281
2012-01-06 09:30:50 W3SVC1273337584 RD00155D360166 GET /Modules/Orchard.Localization/Styles/orchard-localization-base.css - 80 - HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - 200 0 0 1148 422 749
2012-01-06 09:30:50 W3SVC1273337584 RD00155D360166 GET /Themes/Classic/Styles/Site.css - 80 - HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - 200 0 0 15298 387 843
2012-01-06 09:30:51 W3SVC1273337584 RD00155D360166 GET /Themes/Classic/Styles/moduleOverrides.css - 80 - HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - 200 0 0 557 398 1468
2012-01-06 09:30:51 W3SVC1273337584 RD00155D360166 GET /Core/Shapes/scripts/html5.js - 80 - HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - 200 0 0 1804 370 1015
2012-01-06 09:30:53 W3SVC1273337584 RD00155D360166 GET /Themes/Classic/Content/current.png - 80 - HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - 200 0 0 387 376 656
2012-01-06 09:30:57 W3SVC1273337584 RD00155D360166 GET /modules/orchard.themes/Content/orchard.ico - 80 - HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - - 200 0 0 1399 346 468
2012-01-06 09:31:54 W3SVC1273337584 RD00155D360166 GET /Users/Account/LogOn ReturnUrl=%2F 80 - HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - 200 0 0 4018 435 718
2012-01-06 09:32:14 W3SVC1273337584 RD00155D360166 POST /Users/Account/LogOn ReturnUrl=%2F 80 - HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o= 302 0 0 709 1083 812
2012-01-06 09:32:14 W3SVC1273337584 RD00155D360166 GET / - 80 Test0001 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o=;+.ASPXAUTH=94C70A59F9DA0E7294DCAAAEF9A0C52FA585B56A7FC4E01AF24437C84327D3E862548C2C0A5B71DD073443F000CE5767AF9009FFDCDE5F3EE184C3D73CF4BA4C7B8650461A448467FBAB87E311209F4DFB83B19335C9002E5EC5423E145165F64F226AC7F47C19B6035025ABDEDB4A7CAB4FF63A8C22FEED3C6002E6A99920FA8249D3B9 200 0 0 3926 935 906
2012-01-06 09:33:22 W3SVC1273337584 RD00155D360166 GET / - 80 Test0001 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o=;+.ASPXAUTH=94C70A59F9DA0E7294DCAAAEF9A0C52FA585B56A7FC4E01AF24437C84327D3E862548C2C0A5B71DD073443F000CE5767AF9009FFDCDE5F3EE184C3D73CF4BA4C7B8650461A448467FBAB87E311209F4DFB83B19335C9002E5EC5423E145165F64F226AC7F47C19B6035025ABDEDB4A7CAB4FF63A8C22FEED3C6002E6A99920FA8249D3B9 200 0 0 3926 935 1156

By loading this sample in Excel, one can see that a session ID can be found from the .ASPXAUTH cookie, which is one of the cookies available as a IIS logs fields.


At the end of the processing, one tries to get the following result in 2 flat file structures.

Session headers give a summary of what happened in the session. Fields are a dummy row ID, session id, username, start date/time, end date/time, nb of visited URLs.

134211969 19251ab2b91cb3158e21c0c74f597a9872ed257d test2272g5x467 2012-01-28 20:06:08 2012-01-28 20:32:33 11
134213036 19251cd8a444c6642bbedc1ba5d848f26ad3c789 test1268gAx168 2012-02-02 20:01:47 2012-02-02 20:25:22 13
134213561 19252827f25750af10aaf89a9de3fc35ad15d97e test1987g4x214 2012-01-27 01:00:46 2012-01-27 01:06:26 5
134214566 19252bb73667cc04e5de2a6eebe5e8ba7cc77c4a test3333g4x681 2012-01-27 20:00:03 2012-01-27 20:03:23 12
134214866 19252bf03e7d962a41fde46127810339c587b0ae test1480hFx690 2012-01-27 18:18:51 2012-01-27 18:32:51 3
134215841 19253a4d1496dfea6e264ba7839d07ebd0a9662e test2467g6x109 2012-01-29 18:02:19 2012-01-29 18:13:10 11
134216451 19253b3c19f8a0f46fd44e6f979f3e8bedda7881 test3119hLx29 2012-02-02 18:04:17 2012-02-02 18:21:31 7
134216974 19253ff8924893dd72f6453568084e53985a8817 test2382g9x8 2012-02-01 01:07:55 2012-02-01 01:26:17 5
134217496 1925418002459ad897ed41b156f0e3eab78caa13 test3854g4x823 2012-01-27 02:06:38 2012-01-27 02:27:54 5

Session details give the list of URLs that were visited in a session. The fields are a dummy row ID, session id, hit time, url.

134216699 19253ff8924893dd72f6453568084e53985a8817 01:07:55 /Core/Shapes/scripts/html5.js
134216781 19253ff8924893dd72f6453568084e53985a8817 01:41:01 /Modules/Orchard.Localization/Styles/orchard-localization-base.css
134216900 19253ff8924893dd72f6453568084e53985a8817 01:25:02 /Users/Account/LogOff
134217072 1925418002459ad897ed41b156f0e3eab78caa13 02:08:01 /Modules/Orchard.Localization/Styles/orchard-localization-base.css
134217191 1925418002459ad897ed41b156f0e3eab78caa13 02:27:54 /Users/Account/LogOff
134217265 1925418002459ad897ed41b156f0e3eab78caa13 02:06:38 /
134217319 1925418002459ad897ed41b156f0e3eab78caa13 02:26:14 /Themes/Classic/Styles/moduleOverrides.css
134217414 1925418002459ad897ed41b156f0e3eab78caa13 02:17:08 /Core/Shapes/scripts/html5.js
134217596 1925420f22e51f948314b2a6fa0c53fe4d002455 19:11:29 /blog
134217654 1925420f22e51f948314b2a6fa0c53fe4d002455 19:00:21 /cuisine-francaise/barbecue

Note that the two structures could be joined thru the sessionid later on with HIVE for instance, but this is beyond the scope of this post. Also note that the sessionid is not the exact of value of the .ASPXAUTH cookie but a SHA1 hash of it so that it is shorter, in order to optimize netwrok traffic and have smaller result.
Here the code I used to do that. I may write another blog post later on to comment further that code.


IIS logs fields
0	date			2012-01-06
1	time 			09:09:05
2	s-sitename 		W3SVC1273337584
3	s-computername 	RD00155D360166
4	s-ip
5	cs-method 		GET
6	cs-uri-stem 	/cuisine-francaise
7	cs-uri-query 	-
8	s-port 			80
9	cs-username 	-
10	c-ip
11	cs-version		HTTP/1.1
12	cs(User-Agent)	Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0)
13	cs(Cookie)		- 
14	cs(Referer)
15	cs-host
16	sc-status		200
17	sc-substatus	0
18	sc-win32-status	0
19	sc-bytes		5734
20	cs-bytes		321
21	time-taken		3343

sample lines
2012-01-06 09:09:05 W3SVC1273337584 RD00155D360166 GET /cuisine-francaise - 80 - HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) - 200 0 0 5734 321 3343
2012-01-06 09:32:14 W3SVC1273337584 RD00155D360166 GET / - 80 Test0001 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o=;+.ASPXAUTH=94C70A59F9DA0E7294DCAAAEF9A0C52FA585B56A7FC4E01AF24437C84327D3E862548C2C0A5B71DD073443F000CE5767AF9009FFDCDE5F3EE184C3D73CF4BA4C7B8650461A448467FBAB87E311209F4DFB83B19335C9002E5EC5423E145165F64F226AC7F47C19B6035025ABDEDB4A7CAB4FF63A8C22FEED3C6002E6A99920FA8249D3B9 200 0 0 3926 935 906
2012-01-06 09:33:22 W3SVC1273337584 RD00155D360166 GET / - 80 Test0001 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o=;+.ASPXAUTH=94C70A59F9DA0E7294DCAAAEF9A0C52FA585B56A7FC4E01AF24437C84327D3E862548C2C0A5B71DD073443F000CE5767AF9009FFDCDE5F3EE184C3D73CF4BA4C7B8650461A448467FBAB87E311209F4DFB83B19335C9002E5EC5423E145165F64F226AC7F47C19B6035025ABDEDB4A7CAB4FF63A8C22FEED3C6002E6A99920FA8249D3B9 200 0 0 3926 935 1156

A cookie with authentication looks like this
The interesting part is 
the session ID is 


 /* the goal is to have this kind of file at the end:

fffffff0a929d9fbbbbb0b4ffa744842f9188e01	D 20:07:53 /blog
fffffff0a929d9fbbbbb0b4ffa744842f9188e01	H test2573g2x403 2012-01-25 20:07:53 2012-01-25 20:33:43 7
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:09:41 /Users/Account/LogO
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:26:12 /blog/marmiton
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:16:58 /cuisine-francaise/barbecue
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:10:00 /blog/marmiton
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:11:24 /
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:27:50 /cuisine-japonaise/assortiment-de-makis
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:29:31 /cuisine-francaise/fondue-au-fromage
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:05:19 /cuisine-japonaise
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:31:32 /cuisine-francaise/dinde
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:04:41 /cuisine-francaise/fondue-au-fromage
fffffff7e3dbde467fb4a004c31b41e5fdb49116	H test3698g4x509 2012-01-27 18:04:41 2012-01-27 18:31:32 10


var map = function (key, value, context) {
    var f; // fields
    var i;
    var s, sessionID, sessionData;

    if (value === null || value === "") {

    if (value.charAt(0) === "#") {

    f = value.split(" ");
    if (f[9] === null || f[9] === "" || f[9] === "-") {
        //username is anonymous, skip the log line

    s = extractSessionFromCookies(f[13]);
    if (!s) {

    sessionID = Sha1.hash(s); // hash will create a shorter key, here
    generated = "M " + f[9] + " " + f[0] + " " + f[1] + " " + f[6]
    context.write(sessionID, generated);

    function extractSessionFromCookies(cookies) {
        var i, j, sessionID;

        var cookieParts = cookies.split(";");
        for (i = 0; i < cookieParts.length; i++) {
            j = cookieParts[i].indexOf("ASPXAUTH=");
            if (j >= 0) {
                sessionID = cookieParts[i].substring(j + "ASPXAUTH=".length);
        return sessionID;

var reduce = function (key, values, context) {
    var generated;
    var minDate = null;
    var maxDate = null;
    var username = null;
    var currentDate, currentMinDate, currentMaxDate;
    var nbUrls = 0;
    var f;
    var currentValue;
    var firstChar;

    while (values.hasNext()) {
        currentValue =;
        firstChar = currentValue.substring(0,1);

        if (firstChar == "M") {
            f = currentValue.split(" ");

            if (username === null) {
                username = f[1];

            currentDate = f[2] + " " + f[3];

            if (minDate === null) {
                minDate = currentDate;
                maxDate = currentDate;
            else {
                if (currentDate < minDate) {
                    minDate = currentDate;
                else {
                    maxDate = currentDate;
            context.write(key, "D " + f[3] + " " + f[4]); // D stands for details
        else if (firstChar == "H") {
            f = currentValue.split(" ");

            if (username === null) {
                username = f[1];

            currentMinDate = f[2] + " " + f[3];
            currentMaxDate = f[4] + " " + f[5];

            if (minDate === null) {
                minDate = currentMinDate;
                maxDate = currentMaxDate;
            else {
                if (currentMinDate < minDate) {
                    minDate = currentMinDate;
                if (currentMaxDate > maxDate) {
                    maxDate = currentMaxDate;
            nbUrls += parseInt(f[6]);
        else if (firstChar == "D") {
            context.write(key, currentValue);
        else {
            context.write(key, "X" + firstChar + " " + currentValue);

    generated = "H " + username + " " + minDate + " " + maxDate + " " + nbUrls.toString(); // H stands for Header
    context.write(key, generated);

var main = function (factory) {
    var job = factory.createJob("iisLogAnalysis", "map", "reduce");


/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */
/*  SHA-1 implementation in JavaScript | (c) Chris Veness 2002-2010 |      */
/*   - see                             */
/*                                   */
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */

var Sha1 = {};  // Sha1 namespace

* Generates SHA-1 hash of string
* @param {String} msg                String to be hashed
* @param {Boolean} [utf8encode=true] Encode msg as UTF-8 before generating hash
* @returns {String}                  Hash of msg as hex character string
Sha1.hash = function (msg, utf8encode) {
    utf8encode = (typeof utf8encode == 'undefined') ? true : utf8encode;

    // convert string to UTF-8, as SHA only deals with byte-streams
    if (utf8encode) msg = Utf8.encode(msg);

    // constants [§4.2.1]
    var K = [0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6];


    msg += String.fromCharCode(0x80);  // add trailing '1' bit (+ 0's padding) to string [§5.1.1]

    // convert string msg into 512-bit/16-integer blocks arrays of ints [§5.2.1]
    var l = msg.length / 4 + 2;  // length (in 32-bit integers) of msg + ‘1’ + appended length
    var N = Math.ceil(l / 16);   // number of 16-integer-blocks required to hold 'l' ints
    var M = new Array(N);

    for (var i = 0; i < N; i++) {
        M[i] = new Array(16);
        for (var j = 0; j < 16; j++) {  // encode 4 chars per integer, big-endian encoding
            M[i][j] = (msg.charCodeAt(i * 64 + j * 4) << 24) | (msg.charCodeAt(i * 64 + j * 4 + 1) << 16) |
        (msg.charCodeAt(i * 64 + j * 4 + 2) << 8) | (msg.charCodeAt(i * 64 + j * 4 + 3));
        } // note running off the end of msg is ok 'cos bitwise ops on NaN return 0
    // add length (in bits) into final pair of 32-bit integers (big-endian) [§5.1.1]
    // note: most significant word would be (len-1)*8 >>> 32, but since JS converts
    // bitwise-op args to 32 bits, we need to simulate this by arithmetic operators
    M[N - 1][14] = ((msg.length - 1) * 8) / Math.pow(2, 32); M[N - 1][14] = Math.floor(M[N - 1][14])
    M[N - 1][15] = ((msg.length - 1) * 8) & 0xffffffff;

    // set initial hash value [§5.3.1]
    var H0 = 0x67452301;
    var H1 = 0xefcdab89;
    var H2 = 0x98badcfe;
    var H3 = 0x10325476;
    var H4 = 0xc3d2e1f0;

    // HASH COMPUTATION [§6.1.2]

    var W = new Array(80); var a, b, c, d, e;
    for (var i = 0; i < N; i++) {

        // 1 - prepare message schedule 'W'
        for (var t = 0; t < 16; t++) W[t] = M[i][t];
        for (var t = 16; t < 80; t++) W[t] = Sha1.ROTL(W[t - 3] ^ W[t - 8] ^ W[t - 14] ^ W[t - 16], 1);

        // 2 - initialise five working variables a, b, c, d, e with previous hash value
        a = H0; b = H1; c = H2; d = H3; e = H4;

        // 3 - main loop
        for (var t = 0; t < 80; t++) {
            var s = Math.floor(t / 20); // seq for blocks of 'f' functions and 'K' constants
            var T = (Sha1.ROTL(a, 5) + Sha1.f(s, b, c, d) + e + K[s] + W[t]) & 0xffffffff;
            e = d;
            d = c;
            c = Sha1.ROTL(b, 30);
            b = a;
            a = T;

        // 4 - compute the new intermediate hash value
        H0 = (H0 + a) & 0xffffffff;  // note 'addition modulo 2^32'
        H1 = (H1 + b) & 0xffffffff;
        H2 = (H2 + c) & 0xffffffff;
        H3 = (H3 + d) & 0xffffffff;
        H4 = (H4 + e) & 0xffffffff;

    return Sha1.toHexStr(H0) + Sha1.toHexStr(H1) +
    Sha1.toHexStr(H2) + Sha1.toHexStr(H3) + Sha1.toHexStr(H4);

// function 'f' [§4.1.1]
Sha1.f = function (s, x, y, z) {
    switch (s) {
        case 0: return (x & y) ^ (~x & z);           // Ch()
        case 1: return x ^ y ^ z;                    // Parity()
        case 2: return (x & y) ^ (x & z) ^ (y & z);  // Maj()
        case 3: return x ^ y ^ z;                    // Parity()

// rotate left (circular left shift) value x by n positions [§3.2.5]
Sha1.ROTL = function (x, n) {
    return (x << n) | (x >>> (32 - n));

// hexadecimal representation of a number 
//   (note toString(16) is implementation-dependant, and  
//   in IE returns signed numbers when used on full words)
Sha1.toHexStr = function (n) {
    var s = "", v;
    for (var i = 7; i >= 0; i--) { v = (n >>> (i * 4)) & 0xf; s += v.toString(16); }
    return s;

/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */
/*  Utf8 class: encode / decode between multi-byte Unicode characters and UTF-8 multiple          */
/*              single-byte character encoding (c) Chris Veness 2002-2010                         */
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */

var Utf8 = {};  // Utf8 namespace

* Encode multi-byte Unicode string into utf-8 multiple single-byte characters 
* (BMP / basic multilingual plane only)
* Chars in range U+0080 - U+07FF are encoded in 2 chars, U+0800 - U+FFFF in 3 chars
* @param {String} strUni Unicode string to be encoded as UTF-8
* @returns {String} encoded string
Utf8.encode = function (strUni) {
    // use regular expressions & String.replace callback function for better efficiency 
    // than procedural approaches
    var strUtf = strUni.replace(
      /[\u0080-\u07ff]/g,  // U+0080 - U+07FF => 2 bytes 110yyyyy, 10zzzzzz
      function (c) {
          var cc = c.charCodeAt(0);
          return String.fromCharCode(0xc0 | cc >> 6, 0x80 | cc & 0x3f);
    strUtf = strUtf.replace(
      /[\u0800-\uffff]/g,  // U+0800 - U+FFFF => 3 bytes 1110xxxx, 10yyyyyy, 10zzzzzz
      function (c) {
          var cc = c.charCodeAt(0);
          return String.fromCharCode(0xe0 | cc >> 12, 0x80 | cc >> 6 & 0x3F, 0x80 | cc & 0x3f);
    return strUtf;

* Decode utf-8 encoded string back into multi-byte Unicode characters
* @param {String} strUtf UTF-8 string to be decoded back to Unicode
* @returns {String} decoded string
Utf8.decode = function (strUtf) {
    // note: decode 3-byte chars first as decoded 2-byte strings could appear to be 3-byte char!
    var strUni = strUtf.replace(
      /[\u00e0-\u00ef][\u0080-\u00bf][\u0080-\u00bf]/g,  // 3-byte chars
      function (c) {  // (note parentheses for precence)
          var cc = ((c.charCodeAt(0) & 0x0f) << 12) | ((c.charCodeAt(1) & 0x3f) << 6) | (c.charCodeAt(2) & 0x3f);
          return String.fromCharCode(cc);
    strUni = strUni.replace(
      /[\u00c0-\u00df][\u0080-\u00bf]/g,                 // 2-byte chars
      function (c) {  // (note parentheses for precence)
          var cc = (c.charCodeAt(0) & 0x1f) << 6 | c.charCodeAt(1) & 0x3f;
          return String.fromCharCode(cc);
    return strUni;

/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */

This code will produce an intermediary flat file structure that looks like this (headers are after details):

00000e399c3e94f8f919314762998b784d178bd4        D 02:14:32 /Core/Shapes/scripts/html5.js
00000e399c3e94f8f919314762998b784d178bd4        D 02:00:54 /Users/Account/LogOff
00000e399c3e94f8f919314762998b784d178bd4        D 02:09:39 /Modules/Orchard.Localization/Styles/orchard-localization-base.css
00000e399c3e94f8f919314762998b784d178bd4        D 02:13:24 /Themes/Classic/Styles/moduleOverrides.css
00000e399c3e94f8f919314762998b784d178bd4        D 02:12:37 /
00000e399c3e94f8f919314762998b784d178bd4        H test3059g2x50 2012-01-25 02:00:54 2012-01-25 02:12:37 5
00000e7fd498e90cf3f10b5158e1ccf6ff3b8153        D 00:26:22 /Users/Account/LogOff
00000e7fd498e90cf3f10b5158e1ccf6ff3b8153        D 00:24:12 /
00000e7fd498e90cf3f10b5158e1ccf6ff3b8153        H test0118g5x29 2012-01-28 00:24:12 2012-01-28 00:26:22 2

then, 2 jobs will be able to get only headers, and details. Here they are.


// V120120a

var map = function (key, value, context) {
    var generated;
    var minDate;
    var maxDate;
    var username;
    var nbUrls;
    var l, f;
    var firstChar;
    var sessionID;

    if (!value) {

    l = value.split("\t");
    if (l.length < 2) {

    sessionID = l[0];

    firstChar = l[1].substring(1, 0);
    if (firstChar != "H") {

    f = l[1].split(" ");

    username = f[1];

    minDate = f[2] + " " + f[3];
    maxDate = f[4] + " " + f[5];

    nbUrls = f[6];

    generated = sessionID + "\t" + username + "\t" + minDate + "\t" + maxDate + "\t" + nbUrls;
    context.write(key, generated);

var main = function (factory) {
    var job = factory.createJob("iisLogAnalysisToH", "map", "");

and iisLogsAnalysisToD.js:
// V120120a var map = function (key, value, context) { var generated; var hitTime var Url var l, f; var firstChar; var sessionID; if (!value) { return; } l = value.split("\t"); if (l.length < 2) { return; } sessionID = l[0]; firstChar = l[1].substring(1, 0); if (firstChar != "D") { return; } f = l[1].split(" "); hitTime = f[1]; Url = f[2]; generated = sessionID + "\t" + hitTime + "\t" + Url; context.write(key, generated); }; var main = function (factory) { var job = factory.createJob("iisLogAnalysisToD", "map", ""); job.setNumReduceTasks(0); job.waitForCompletion(true); }; Before executing the code, one needs to provision a cluster in order to have processing power. With Windows Azure, here is how this can be done:





In order to copy the data from blob storage to Hadoop distributed file system (HDFS), one way is to connect thru Remote Desktop to the headnode and issue a distcp command. Before that one needs to configure Windows Azure Storage (ASV) in the console.






distcp automatically generates a map only job that copies data from one location to another in a distributed way. This job can be tracked from the standard Hadoop console:


JavaScript code must be uploaded to HDFS before being executed:



then javascript code can be executed:


This code runs within a few hours on a 1x8CPU+32x2CPU cluster.

Once it is finished, the two remaining scripts can be run in parallel (or not):



Then, one gets the result in HDFS folders that can be copied back to Windows Azure blobs thru distcp, or exposed as HIVE tables and retrieved thru SSIS in SQL Server or SQL Azure thanks to the ODBC driver for HIVE. This may be explained in a future blog post.

Here are just the HIVE commands to view the files as tables:

CREATE EXTERNAL TABLE iisLogsHeader (rowID STRING, sessionID STRING, username STRING, startDateTime STRING, endDateTime STRING, nbUrls INT)
LOCATION '/user/cornac/iislogsH'

Michelle Hart reported NEW! Wiki launched for Apache Hadoop on Windows Azure on 2/8/2012:

image_thumb3_thumbAlthough Apache Hadoop on Windows Azure is currently only available via CTP, you can get a jumpstart learning all about it by visiting the Apache Hadoop on Windows wiki. This wiki also covers Apache Hadoop on Windows Server.

imageThe wiki contains overview information about Apache Hadoop, as well as information about the Hadoop offerings on Windows and related Microsoft technologies, including Windows Azure. It also provides links to more detailed technical content from various sources and in various formats: How-to topics, Code Samples, Videos, and more.

Topics Content Types
Hadoop Overview How To
Hadoop on Windows Overview Code Examples
Apache Hadoop on Windows Server Videos
Apache Hadoop on Windows Azure  
Elastic Map Reduce on Windows Azure  
Learning Hadoop  
Hadoop on Windows  
Hadoop Best Practices  
Managing Hadoop  
Developing with Hadoop  
Using Hadoop with other BI Technologies  

Avkash Chauhan (@avkashchauhan) described the Internals of Hadoop Pig Operators as MapReduce Job in a 2/8/2012 post:

imageI was recently asked to show that Pig scripts are actually MapReduce jobs so to explain it in very simple way I have created the following example:

  1. Read a text file using Pig Script
  2. Dump the content of the file

As you can see below that when “dump” command was used a MapReduce job was initiated:

2012-02-09 05:19:12,777 [main] INFO org.apache.pig.Main - Logging error messages to: c:\apps\dist\pig_1328764752777.log
2012-02-09 05:19:13,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://
2012-02-09 05:19:13,652 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at:
grunt> raw = load 'avkashwordfile.txt';

grunt> dump raw;
2012-02-09 05:19:46,542 [main] INFO - Pig features used in the script: UNKNOWN
2012-02-09 05:19:46,542 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used.
2012-02-09 05:19:46,761 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: raw: Store(hdfs:// - scope-1 Operator Key: scope-1)
2012-02-09 05:19:46,776 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2012-02-09 05:19:46,823 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2012-02-09 05:19:46,823 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2012-02-09 05:19:46,995 [main] INFO - Pig script settings are added to the job
2012-02-09 05:19:47,026 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-02-09 05:19:48,308 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
2012-02-09 05:19:48,339 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting forsubmission.
2012-02-09 05:19:48,839 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2012-02-09 05:19:48,870 [Thread-6] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-02-09 05:19:48,870 [Thread-6] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2012-02-09 05:19:48,886 [Thread-6] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2012-02-09 05:19:51,183 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201202082253_0006
2012-02-09 05:19:51,183 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://10.1
2012-02-09 05:20:15,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-02-09 05:20:16,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-02-09 05:20:21,198 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2012-02-09 05:20:30,932 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2012-02-09 05:20:30,932 [main] INFO - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.8.1-SNAPSHOT avkash 2012-02-09 05:19:46 2012-02-09 05:20:30 UNKNOWN


Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime MinReduceTime AvgReduceTime Alias Feature Outputs
job_201202082253_0006 1 0 12 12 12 0 0 0 raw MAP_ONLY hdfs://

Successfully read 15 records (482 bytes) from: "hdfs://"

Successfully stored 15 records (183 bytes) in: "hdfs://"

Total records written : 15
Total bytes written : 183
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:

2012-02-09 05:20:30,948 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2012-02-09 05:20:30,979 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2012-02-09 05:20:30,979 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

<Return to section navigation list>

SQL Azure Database, Federations and Reporting

Sahil Malik (@sahilmalik) described Two Azure Success Stories in One Day in a 2/9/2012 post to his WinSmarts blog:

imageI’m doing a training on Azure, and right here in the class we had two success stories.

  1. There are a couple of people in the class who run a third party software for the banking industry. They are considering moving to Azure, and one of the biggest most important pieces to move is their database. In a matter of minutes, we were able to move their entire database, including data to a SQL Azure database. We did so by using
  2. We have another very interesting group of individuals in the class whose aim is to build a mobile application on the Android platform that is able to bring data from on-premise applications easily and securely. With very little effort we were able to author a service running in the cloud, and bring the data on an Android emulator. This took only about 20-30 minutes to do. Tomorrow, I will help them enhance this to use Azure ServiceBus so we can securely pull the data from an on-premise application and basically more or less have a fully functional application.

Very exciting to say the least!

Cihan Biyikoglu (@cihangirb) explained Connection Pool Fragmentation: Use Federations and you won’t need to learn about these nasty problems that come with sharding! in a 2/8/2012 post:

imageSharding has been around for a while and I have seen quite a few systems that utilize SQL Azure with 100s Databases and 100s of compute nodes and tweeted and written about them in the past. Like this case with Flavorus;

Ticketing Company Scales to Sell 150,000 Tickets in 10 Seconds by Moving to Cloud Computing Solution

imageJames has certainly been a great partner to work with and he is an amazingly talented guy who can do magic to pull some amazing results together in a few weeks. You need to read the case study for full details but basically he got Jetstream to scale to sell 150,00 tickets in 10 seconds roughly with 550 SQL Azure databases and 750 compute nodes. And he did that in about 3 weeks time testing included!

These types of systems are now common but the scale problems at these levels certainly get complex. one of those that hit many of the customers is the problem with connection multiplexing or simply connection pool fragmentation. Bear with me, this one takes a paragraph to explain:

Imagine a sharded system with M shards and N middle tier servers and with a max of C concurrency requests. The number of connections you need to establish is M*N*C. That is because every middle tier server has to establish possibly C connection to every shard given requests come in randomly distributed to the entire site. Now imagine the formula with some numbers; I’ll be conservative and say 50 shards, 75 middle tier servers and 10 concurrent connections. Here is what you end up with;

M * N * C = 50 * 75 * 10 = 37,500 connections


Over 37K… That is a lot of connections! Here are some other numbers;

  • Each middle tier server ends up with M connection pools with C connections in each at the worst case. That is 500 connection from each middle tier machine.
  • Also, every shard maintains 750 connections from the middle tier servers in the worst case. that is a lot of connections to maintain as well. A large number of connection is bad because that can cause you to become a victim of throttling … not a good thing …

Another fact of life is that life isn’t perfectly distributed so what happens most times is these 500 connections in the middle tier server don’t get used enough to stay alive. SQL Azure terminates them after a while of being idle. You end up with many of these idle connections dead in the pool. That means the next time you hit this dead connection to a shard from this app server, you have to reestablish a cold connection from the client all the way to the db with a full login and whole bunch of other handshakes that cost you orders of magnitude more in latency.

This issue can be referred to as connection pool fragmentation… Connection Pooling documentation also makes references to the issue: and look for fragmentation. Push M or N or C higher and things get much worse.

Federation cures this for apps. With federations the connection string points to the root database. Since we ‘fool’ the middle tier to connect to root in the connection string, M=1 and the total connection from all middle tier servers to SQL Azure is only N*C = 750 total connections. Compared to over 37K that is a huge improvement!

Not only that but each middle tier server has 1 connection pool to maintain with only C=10 connection… Much better than 500. Member connections can get more complicated because of pooling in the gateway tier but it is much much better than 750 connections per shard as well. So this makes the problem go away! This is the magic of USE FEDERATION statement. No more connection pool fragmentation.

Another great benefit of USE FEDERATION is that it give you the ability to maintain your connection object and keep rewiring your connection from gateway to the db nodes in SQL Azure without a disconnect. With sharding, since we don’t support USE statement yet, the only way to connect to another shard is to disconnect and reconnect. With USE FEDERATION you can simply keep doing more USE FEDERATIONs and never have to close your connection!

If you made is this far, thanks for reading all this… If you use federations, you can forget what you read. Just have to remember that you didn’t have to learn more about connection pool fragmentation or frequent disconnects & reconnects you have to face with sharding.

Herve Roggero (@hroggero) described Ways To Load Data In SQL Azure in a 2/7/2012 post:

imageThis blog provides links to a presentation and a sample application that shows how to load data in SQL Azure using different techniques and tools. The presentation compares the following techniques: INSERT, BCP, INSERT BULK, SQLBulkCopy and a few tools like SSIS and Enzo Data Copy.

imageThe presentation contains load tests performed from a local machine with 4 CPUs, using 8 threads (leveraging the Task Parallel Library), to a SQL Azure database using the code provided below. The test loads 40,000 records. Note that however the test was not conducted from a highly controlled environment, so results may vary. However enough differences were found to show a trend and demonstrate the speed of the SQLBulkCopy API versus other programmatic techniques.

You can download the entire presentation deck here: presentation (PDF)


Enzo Data Copy and SSIS

The presentation deck also shows that the Enzo Data Copy wizard loads data efficiently with large tables. However it performs slower for very small databases. The reason the Enzo Data Copy is fast with larger databases is due to its internal chunking algorithm and highly tuned parallel insert operations tailored for SQL Azure. In addition, Enzo Data Copy is designed to retry failed operations that could be the result of network connection issues or throttling; this resilience to connection issues ensures that large databases are more likely to be transferred successfully the first time. The Enzo Data Copy tool can be found here: Enzo Data Copy

In this test, with SSIS left with its default configuration, SSIS was 25% slower than the Enzo Data Copy Wizard with 2,000,000 records to transfer. The SSIS Package created was very basic; the UseBulkInsertWhenPossible property was set to true which controls the use of the INSERT BULK command. Note that a more advanced SSIS developer will probably achieve better results by tuning the SSIS load; the comparison is not meant to conclude that SSIS is slower than the Enzo Data Copy utility; rather it is meant to show that the utility compares with SSIS in load times with larger data sets. Also note that the utility is designed to be a SQL Server Migration tool; not a full ETL product.


Note about the source code

Note that the source code is provided as-is for learning purposes. In order to use the source code you will need to change the connection string to a SQL Azure database, create a SQL Azure database if you don’t already have one, and create the necessary database objects (the T-SQL commands to run can be found in the code).

The code is designed to give you control over how many threads you want to use and the technique to load the data. For example, this command loads 40,000 records (1000 x 40) using 8 threads in batches of 1000: ExecuteWithSqlBulkCopy(1000, 40, 8); While this command loads 40,000 records (500 x 80) using 4 threads in batches of 500: ExecuteWithPROC(500, 80, 4);

Here is the link to the source code: source code

Here is a sample output of the code:


imageOn the whole, George Huey’s SQL Azure Migration Wizard and SQL Azure Federations Data Migration Wizard are the best bet for uploading data with BCP from SQL Server to SQL Azure. See my Loading Big Data into Federated SQL Azure Tables with the SQL Azure Federation Data Migration Wizard v1.2 article of 1/17/2012 for more details.

Herve Roggero (@hroggero) explained How To Copy A Schema Container To Another SQL Azure Database in a 2/6/2012 post (missed when posted):

imageThis article is written to assist SQL Azure customers to copy a SCHEMA container from one SQL Azure database to another. Schema separation (or compress shard) is a technique used by applications that hold multiple customer “databases” inside the same physical database, but separated logically by a SCHEMA container. At times it may be necessary to copy a given SCHEMA container from one database to another. Copying a SCHEMA container from one database to another can be very difficult because you need to only extract and import the data contained in the tables found in that schema.

imageIf your SQL Azure database has multiple schema containers in it, and you would like to copy the objects under that schema to another database, you can do so by following the steps below. The article uses the Enzo Backup tool (, designed to help in achieving this complex task.

Example Overview

Let’s assume you have a database with multiple SCHEMA containers and you would like to copy one of the schema containers to another database. A SCHEMA container holds tables, foreign keys, default constraints, indexes and more. In the screenshot below the selected database has 5 schema contains: the DBO schema and 4 custom SCHEMA container. The tool allows you to copy the objects from any schema into another database.


Backing Up Your SCHEMA Container

The first step is to backup your schema. Enzo Backup gives you the options needed to backup a single SCHEMA container at a time.

  1. If you have registered your databases previously with the backup tool, select the database from the list of databases on the left pane, right-click on the SCHEMA container (i.e. your logical database) and select Backup to Blob (or file). Otherwise click on the Backup –> To Blob Device from the menu and enter the server credentials, and specify the name of the SCHEMA to backup from the Advanced screen.
  2. Type the name of your backup device
  3. Optionally select a cloud agent if you are saving to a Blob on the Advanced tab (note: a cloud agent must be deployed separately)
  4. Click Start


Restoring Your SCHEMA Container

Once the backup is complete, you can restore the backup device to the database server of your choice. Note that you could restore to a local SQL Server database or to another SQL Azure database server.

  1. Click on Backups on the left pane to view your backup devices and find your device
  2. Right-click on your backup device (file or blob)
  3. Enter the credentials of the server you are restoring to and the name of the database
  4. If the database does not exist, select the Create If… option
  5. Optionally, if you are using a Blob device and restoring to a SQL Azure database, check the Use Cloud Agent (note: a cloud agent must be deployed separately)
  6. Click Start


If for some reason the database you are restoring to is not empty, you may see a warning indicating that the database has existing objects. Click “Yes” to continue. When the operation is complete, you can inspect your database to verify the presence of your logical database.

  • You can restore additional SCHEMA containers on an existing database. You would simply need to repeat the above steps for each SCHEMA container.
  • If your intent was to “move” the SCHEMA container, you will need to clean up the original database. Once your SCHEMA container has been copied, and you have verified all you data is present in the destination database, you will need to manually drop all the objects in the source database before dropping the SCHEMA container.
  • This tool does not support backing up SQL Server database. However you can restore a backup device created with Enzo Backup on a local SQL Server database if desired.

<Return to section navigation list>

MarketPlace DataMarket, Social Analytics and OData

Doug Mahugh (@dmahugh) described Open Source OData Tools for MySQL and PHP Developers in a 2/9/2012 post to the Interoperability@Microsoft blog:

imageTo enable more interoperability scenarios, Microsoft has released today two open source tools that provide support for the Open Data Protocol (OData) for PHP and MySQL developers working on any platform.

The growing popularity of OData is creating new opportunities for developers working with a wide variety of platforms and languages. An ever increasing number of data sources are being exposed as OData producers, and a variety of OData consumers can be used to query these data sources via OData’s simple REST API.

imageIn this post, we’ll take a look at the latest releases of two open source tools that help PHP developers implement OData producer support quickly and easily on Windows and Linux platforms:

  • The OData Producer Library for PHP, an open source server library that helps PHP developers expose data sources for querying via OData. (This is essentially a PHP port of certain aspects of the OData functionality found in System.Data.Services.)
  • The OData Connector for MySQL, an open source command-line tool that generates an implementation of the OData Producer Library for PHP from a specified MySQL database.

These tools are written in platform-agnostic PHP, with no dependencies on .NET.

OData Producer Library for PHP


Last September, my colleague Claudio Caldato announced the first release of the Odata Producer Library for PHP, an open-source cross-platform PHP library available on Codeplex. This library has evolved in response to community feedback, and the latest build (Version 1.1) includes performance optimizations, finer-grained control of data query behavior, and comprehensive documentation.

OData can be used with any data source described by an Entity Data Model (EDM). The structure of relational databases, XML files, spreadsheets, and many other data sources can be mapped to an EDM, and that mapping takes the form of a set of metadata to describe the entities, associations and properties of the data source. The details of EDM are beyond the scope of this blog, but if you’re curious here’s a simple example of how EDM can be used to build a conceptual model of a data source.

The OData Producer Library for PHP is essentially an open source reference implementation of OData-relevant parts of the .NET framework’s System.Data.Services namespace, allowing developers on non-.NET platforms to more easily build OData providers. To use it, you define your data source through the IDataServiceMetadataProvider (IDSMP) interface, and then you can define an associated implementation of the IDataServiceQueryProvider (IDSQP) interface to retrieve data for OData queries. If your data source contains binary objects, you can also implement the optional IDataServiceStreamProvider interface to handle streaming of blobs such as media files.

Once you’ve deployed your implementation, the flow of processing an OData client request is as follows:

  1. The OData server receives the submitted request, which includes the URI to the target resource and may also include $filter, $orderby, $expand and $skiptoken clauses to be applied to the target resource.
  2. The OData server parses and validates the headers associated with the request.
  3. The OData server parses the URI to resource, parses the query options to check their syntax, and verifies that the current service configuration allows access to the specified resource.
  4. Once all of the above steps are completed, the OData Producer for PHP library code is ready to process the request via your custom IDataServiceQueryProvider and return the results to the client.

These processing steps are the same in .NET as they are in the OData Producer Library for PHP, but in the .NET implementation a LINQ query is generated from the parsed request. PHP doesn’t have support for LINQ, so the producer provides hooks which can be used to generate the PHP expression by default from the parsed expression tree. For example, in the case of a MySQL data source, a MySQL query expression would be generated.

The net result is that PHP developers can offer the same querying functionality on Linux and other platforms as a .NET developer can offer through System.Data.Services. Here are a few other details worth nothing:

  • In C#/.NET, the System.Linq.Expressions namespace contains classes for building expression trees, and the OData Producer Library for PHP has its own classes for this purpose.
  • The IDSQP interface in the OData Producer Library for PHP differs slightly from .NET’s IDSQP interface (due to the lack of support for LINQ in PHP).
  • System.Data.Services uses WCF to host the OData provider service, whereas the OData Producer Library for PHP uses a web server (IIS or Apache) and urlrewrite to host the service.
  • The design of Writer (to serialize the returned query results) is the same for both .NET and PHP, allowing serialization of either .NET objects or PHP objects as Atom/JSON.

For a deeper look at some of the technical details, check out Anu Chandy’s blog post on the OData Producer Library for PHP or see the OData Producer for PHP documentation available on Codeplex.

OData Connector for MySQL

The OData Producer for PHP can be used to expose any type of data source via OData, and one of the most popular data sources for PHP developers is MySQL. A new code generator tool, the open source OData Connector for MySQL, is now available to help PHP developers implement OData producer support for MySQL databases quickly and simply.

The OData Connector for MySQL generates code to implement the interfaces necessary to create an OData feed for a MySQL database. The syntax for using the connector is simple and straightforward:

php MySQLConnector.php /db=mysqldb_name /srvc=odata_service_name /u=db_user_name /pw=db_password /h=db_host_name

figure2The MySQLConnector generates an EDMX file containing metadata that describes the data source, and then prompts the user for whether to continue with code generation or stop to allow manual editing of the metadata before the code generation step.

EDMX is the Entity Data Model XML format, and an EDMX file contains a conceptual model, a storage model, and the mapping between those models. In order to generate an EDMX from a MySQL database, the OData Connector for MySQL needs to be able to do database schema introspection, and it does this through the Doctrine DBAL (Database Abstraction Layer). You don’t need to understand the details of EDMX in order to use the OData Connector for MySQL, but if you’re curious see the .edmx File Overview article on MSDN.

If you’re familiar with EDMX and wish to have very fine-grained control of the exposed OData feeds, you can edit the metadata as shown in the diagram, but this step is not necessary. You can also set access rights for specific entities in the DataService::InitializeService method after the code has been generated, as described below.

If you stopped the process to edit the EDMX, one additional command is needed to complete the generation of code for the interfaces used by the OData Producer Library for PHP:

php MySQLConnector.php /srvc=odata_service_name

Note that the generated code will expose all of the tables in the MySQL database as OData feeds. In a typical production scenario, however, you would probably want to fine-tune the interface code to remove entities that aren’t appropriate for OData feeds. The simplest way to do this is to use the DataServiceConfiguration object in the DataService::InitializeService method to set the access rights to NONE for any entities that should not be exposed. For example, you may be creating an OData provider for a CMS, and you don’t want to allow OData queries against the table of users, or tables that are only used for internal purposes within your CMS.

For more detailed information about working with the OData Connector for MySQL, refer to the user guide available on the project site on Codeplex.

These tools are open-source (BSD license), so you can download them and start using them immediately at no cost, on Linux, Windows, or any PHP platform. Our team will continue to work to enable more OData scenarios, and we’re always interested in your thoughts. What other tools would you like to see available for working with OData?

<Return to section navigation list>

Windows Azure Access Control, Service Bus and Workflow

• Edo van Asseldonk (@EdoVanAsseldonk) posted Azure webrole federated by ACS on 2/9/2012:

If you deploy a Web Role secured with ACS, and you have two or more instances of your Web Role, you probably have seen this error message:

Key not valid for use in specified state

imageThe problem is that the cookie that is created if you log in with ACS, is encrypted with the machine key by default. If the cookie is created by one of the instances, the cookie can not be read by the other instances because their machine key is different.

There is a solution to this: encrypt the cookie using a certificate that is known to all instances.

How to
I'll show you how a self-signed certificate can be used to encrypt the ACS cookie.

Create a certificate
First, on your dev machine open IIS and create a self-signed certificate. This certificate can be used on your Azure development environment as well as on the staging and production environment.

Give rights on certificate
To be able to use it on your development machine do the following:

  1. Go to MMC to view the certificates in the Personal store of Local Computer. Right click on the created certificate and choose [All Tasks] [Manage Private Keys...].
  2. Add the username 'Network Service' with read rights. Network Service is the account under which the Azure simulation environment is executing. It now has rights to read the self-signed certificate.

Add certificate to config
Next we have to use this certificate in our Web Role. Open the ServiceDefinition.csdef file. Add the following code under the WebRole node:

<Certificate name="CookieEncryptionCertificate"
storeName="My" />

Now your definition file should look something like this:

<?xml version="1.0" encoding="utf-8"?>
<ServiceDefinition name="AutofacTestAzure"
<WebRole name="[name]" vmsize="Small">
<Certificate name="CookieEncryptionCertificate"
storeName="My" />
</ WebRole >
< ServiceDefinition >

Add the following text to all cscfg-files where you want to use the certificate:

<Certificate name="Cookie" thumbprint="[thumbprint]" thumbprintAlgorith="sha1" />
</Certificates >

You can find the [thumbprint] value if you view the Details of the certificate in IIS. Make sure you remove the invisble space at the start of the thumbprint value, after you paste it into the xml.

Example thumbprint value "5a 0a e8 80 f4 ..."

At this position "(here)5a 0a e8 80 f4 ..." you'll notice an invisible space. Make sure you remove that.

Use certificate in code

We can now make some changes in code, so the certificate will be used to encrypt the cookie.

Add this code to the Application_Start event in the global.asax:

FederatedAuthentication.ServiceConfigurationCreated += FedAuthServiceConfigCreated;

So it will look like this:

protected void Application_Start()
FederatedAuthentication.ServiceConfigurationCreated +=

Add this method to the global.asax.cs:

private void FedAuthServiceConfigCreated(
object sender, ServiceConfigurationCreatedEventArgs e)
// Use the <serviceCertificate> to protect the cookies
// that are sent to the client.
var readonlyCookietransformList = new CookieTransform[]
new DeflateCookieTransform(),
new RsaEncryptionCookieTransform(e.ServiceConfiguration.ServiceCertificate)
var sessionSecurityTokenHandler =
new SessionSecurityTokenHandler(readonlyCookietransformList);

If you press F5 now, you'll see everything works locally.

Time to get things working in the cloud

There's only one thing we have to do to get this working in the cloud, and that's to upload the certificate to our Hosted Service.

  1. First, go to the created certificate in IIS and export it to a .pfx file.
  2. Next go to the management portal and import the certificate to the Certificates of the Hosted Service.

That's all. You can upload your Web Role to Azure and everything works.

Alik Levin (@alikl) described ASP.NET: Authentication With SWT Token Using Windows Azure ACS and WIF Custom Token Handler in a 2/7/2012 post:

imageThis is first part of the overall scenario that should answer the following question:

How I can flow security context of end user through tiers between ASP.NET web app and the downstream REST WCF service?


imageUploaded sample code to MSDN Code Gallery that shows how to use SWT token issued by Windows Azure Access Control Service (ACS). The bits are here:

The plan is next to add another Visual Studio Project to the solution based on the following walkthrough:

The idea is simple – create one relying party in Windows Azure ACS and share the issued SWT token between the ASP.NET app and REST service. The challenge here is that WIF does not come with ready to use SWT token implementation and SWT token handler. To solve this scenario the code sample implements custom token handler, cannibalized from the following sample:


<Return to section navigation list>

Windows Azure VM Role, Virtual Network, Traffic Manager, Connect, RDP and CDN

Mike Washam (@MWashamMS) published Windows Azure PowerShell Cmdlets 2.2.1 to CodePlex on 2/8/2012:

imagedownload file icon Recommended Download

Application Windows Azure PowerShell Cmdlets 2.2.1 (Binary Install)

application, 2927K, uploaded Wed - 45 downloads

download file icon Other Available Downloads

imageSource Code Windows Azure PowerShell Cmdlets 2.2.1 (Source)

source code, 3045K, uploaded Wed - 13 downloads

Release Notes

Fixed a Bug in Update-Deployment
Updated Documentation and Branding
Added Support for Windows Azure Traffic Manager [Emphasis added]
Improvements in Diagnostics

<Return to section navigation list>

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Wade Wegner (@WadeWegner) and Steve Marx (@smarx) posted Episode 70 - Windows Azure Demos with Steve Marx on 2/10/2012:

Join Wade and Steve each week as they cover the Windows Azure platform. You can follow and interact with the show at @CloudCoverShow.

imageIn this episode, we are very sad to bid Steve Marx farewell as he looks for new challenges outside of Microsoft. Fortunately, we're able to share some of his best moments on the Cloud Cover show and review some of the best demos he's built over the years.

In the news:

You can stay in touch with Steve Marx through his blog at

David Makogon (@dmakogon) posted Windows Azure ISV Blog Series: sociobridge, by ReedRex (written by Naoki Sato, @satonaoki) on 2/10/2012:

imageThe purpose of the Windows Azure ISV blog series is to highlight some of the accomplishments from the ISVs we’ve worked with during their Windows Azure application development and deployment. Today’s post, written by Naoki Sato, Windows Azure developer evangelist at Microsoft Japan, is about Windows Azure ISV ReedRex Co. LTD and how they’re using Windows Azure to deliver their CMS service for Facebook pages.

imageReedRex recently launched a new SaaS offering called sociobridge, a Content Management System (CMS) service specifically for Facebook pages. Distributed by top Japanese Ad Company Dentsu Razorfish, sociobridge was jointly established by Dentsu and Razorfish.

Company and ad agency staff can use sociobridge to do the following:

  • Create and maintain Facebook pages
  • Schedule posts
  • Monitor walls, comments, and likes

sociobridge consists of two multi-tenant web applications written in ASP.NET MVC with Facebook C# SDK . One Web Role runs a subscriber portal, which is a content management application for subscribers (customer companies and ad agencies). The other runs an application runtime, which hosts public Facebook pages for end users.

There are also Worker Role instances, which are responsible for Facebook monitoring and other background processing.

sociobridge uses Windows Azure Storage (Table and Blob) as a persistent data store. This is because Windows Azure Storage provides higher scalability with much lower cost.

Information about subscriptions (tenants), user accounts, scheduled posts, poll results, statistics and others are stored in Table Storage. There are only two tables in Windows Azure Table Storage:

  • Subscription metadata
  • Subscriber data

In Table Storage, tables are partitioned to support load balancing across storage nodes. A table’s entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value.

Relational database systems, such as SQL Azure, provide transactional support, with properties often referred to as ACID:

  • Atomic: Everything in a transaction succeeds, or is rolled back.
  • Consistent. A transaction cannot leave the database in an inconsistent state.
  • Isolated. Transactions cannot interfere with each other.
  • Durable. The results of a transaction are stored permanently.

Unlike relational database system, Windows Azure Table Storage does not support multiple operations (such as Insert, Update and Delete Entity) within a transaction in general. But if all entities subject to operations as part of the transaction have the same partition key value (and belong to a same partition), you can perform these operations in a batch update known as an “entity group transaction” or EGT. More details about EGT’s may be found here.

To leverage the schema-less data model and EGT of Table Storage, different type of data are stored in a single data table. Subscription ID is used as a partition key, so each subscription’s data are stored in different partitions, and table access is spread over many storage servers for scalability. The row key is a combination of data type and some kind of unique row ID.

Every table query from Subscriber Portal is carefully designed to specify the partition key (subscription ID), so that they query is limited to a single partition and a full table scan over many partitions never happens.

There are transactions of two or more types of data in a single subscriptions’ data, but there are no transactions of two or more subscriptions’ data. So, this table design enables atomic transactions using EGT (entity group transaction) because data in a single transaction reside in a single partition in a single table.

Currently, Windows Azure Storage Analytics provides metrics data per storage account. With this table design, it is not possible to estimate cost of table storage per subscriber. This table design was chosen because of benefit described above and quota of number of storage accounts.

Staff in a customer company and/or an ad agency create and maintain Facebook pages in the subscriber portal. When definitions and metadata of Facebook pages are created or modified, actual web content of Facebook pages are generated in advance, and stored in Blob Storage.

A Subscriber may create multiple custom tabs in a single Facebook page. Each custom tab has its own Blob container, and web content of each tab is stored there. Permissions of Blob containers are set to public, because Facebook pages are inherently public.

At runtime:

  1. An end user browses a Facebook page (included in Facebook using an HTML inline
  2. frame).
  3. The web browser sends a HTTP POST request to the application runtime on Web Role.
  4. The application runtime accesses Table Storage to query a Blob Storage URL.
  5. The application runtime returns the Blob Storage URL (using HTTP 302 Found).
  6. The web browser is redirected to the web content in Blob Storage.

The purpose of this architecture is to reduce the workload of Web Role by avoiding dynamic web content generation, and to achieve much higher scalability to process huge requests from many Facebook users.

Facebook pages are required to handle HTTP POST requests. But the “Get Blob” operation in Blob Service REST API needs to be a HTTP GET request as its name suggests. Because of this, direct access to Blob Storage does not work, and the application runtime is necessary to handle HTTP POST.

Partition Key for Blob Storage is a combination of container Name and Blob name, so access to different Facebook pages (Blobs) are spread over different partitions.

According to “Windows Azure Storage Abstractions and their Scalability Targets”, the target throughput of a single Blob is up to 60 MB/sec. Performance tests revealed that the throughput of the sociobridge application runtime (on one Small instance) is about 200 requests/sec.

At a rough estimate, if only one Facebook page is hosted and the size of Facebook page is less than 300KB, the application runtime becomes a bottleneck. If it is greater than 300KB, Blob Storage becomes a bottleneck. It is possible to remove these bottlenecks by adding more Web Role instances and leveraging Windows Azure CDN respectively. Note that usually multiple Facebook pages (Blobs) are hosted, and Blob access will scale out.

sociobridge was designed to utilize the power of Windows Azure, especially Windows Azure Storage. The sophisticated table design (with the tenant ID as a partition key) and the usage of EGT (entity group transaction) will be helpful in designing your multi-tenant application!

For additional information about scalability of Windows Azure Storage, refer to these blog posts:

Special thanks to Takekazu Omi, Lead Architect of sociobridge at ReedRex, for help on this post.

Liam Cavanagh (@liamca) continued his series with What I Learned Building a Startup on Microsoft Cloud Services: Part 3 – Choosing a Hybrid of Services on 2/9/2012:

imageI am the founder of a startup called Cotega and also a Microsoft employee within the SQL Azure group where I work as a Program Manager. This is a series of posts where I talk about my experience building a startup outside of Microsoft. I do my best to take my Microsoft hat off and tell both the good parts and the bad parts I experienced using Azure.

imageIn Part 2, I talked about keeping startup costs low and how I decided to use the Bizspark program that provided me two free Windows Azure VM’s. Unfortunately, I knew that I would need at least 3 VM’s (two to ensure constant availability of the monitoring service, plus an additional one for the administrative web site). This would be an extra $80 / month for the extra CPU that I would prefer not to pay if I did not have to.

One solution that I came up with was to combine the web site with the monitoring service into a single Windows Azure machine. What I mean by this is to create a single Windows Azure project and deploy both the worker role (which is the Cotega monitoring service) with the web role (which is the project that contains the MVC based web UI) and deploy both to the same VM. I figured the UI would not be that busy most of the time, so it should not affect the resources required by the worker role to monitor user’s databases. If you are interested in how I did this, I wrote a blog post on how to combine a Worker Role and an MVC4 Web Role into a single Azure instance.

This solution seemed really promising because not only would the Cotega service be highly available (because of the two VM’s) but the UI would also be highly available.

In the end, I decided that this was not a great solution for my needs. I found that I rarely deployed updates to the monitoring service, but I was frequently deploying updates to the UI. Since it could take 5-20 minutes to deploy an update to Windows Azure, it was frustrating to get updates in the service so that I could test them. Also, I knew that when users started using the system I would not want to keep updating the Cotega monitoring service every time the UI was updated because that would require additional staging machines (meaning doubling the costs).

So I started looking around for a hosting company to host my web site. I ended up working with (who by the way has been amazing to work with and has had great support). Using I was easily able to deploy my MVC application through FTP which would allow me to see changes deployed within seconds. It was a trade-off to have a potentially reduced availability of the web UI for a lower cost, but at this point in my startup development, it was something I was willing to accept.

This is what the architecture of the service looked like at this point:

Cotega Architecture

I should also note that if you are willing to host your Web UI on a separate machine, there is now multiple ways to quickly deploy updates to a Windows Azure Web Role. A lot of people also use techniques of using remote desktop to connect to the Windows Azure machine and simply replacing the dll’s, although this is only useful for testing since a reboot of a Windows Azure machine will reset the state of the dll’s to the originally deployed state.

Mary Jo Foley (@maryjofoley) asserted “Microsoft is moving ahead with plans to host its ERP products on Azure, but isn’t rushing users to make the jump” in a deck for her Microsoft's plan to bring its ERP users slowly but surely to the cloud article of 2/9/2012 for ZDNet’s All About Microsoft blog:

imageLast year, Microsoft officials said the company had decided to make all four of its Dynamics ERP products available on its Windows Azure cloud, starting with Dynamics NAV.

As the annual Microsoft Convergence conference approaches, the Softies are readying a few updates about the coming ERP-cloud transition.

First, Dynamics NAV 7 (the codename of the next version of the Dynamics NAV product) will be available hosted on Windows Azure toward the end of calendar 2012. The next Microsoft ERP product to get the Azure treatment will be Dynamics GP. (Microsoft execs aren’t providing a target date at this time for the Azure-hosted GP version, though I’d think 2013 would make sense, given the cadence of the Dynamics ERP products).

imageMicrosoft’s strategy around moving its ERP products to the cloud isn’t to force them down customers’ throats, emphasized Mike Ehrenberg, a Microsoft technical fellow and chief technology officer for Microsoft Business Solutions. (Some of Microsoft’s ERP partners beg to differ, but that’s what the Softies are saying.) …

Read more asserted “Partnership could save parents 25%” in a deck for its Microsoft, CJ Fallon introduce cloud service for textbooks article of 2/9/2011:

imageMicrosoft Ireland has partnered with textbook publisher CJ Fallon to deliver school books via the Windows Azure Programme. By the beginning of the 2012 school year parents will be able to save up to 25% on the cost of schoolbooks through CJ Fallon's eReader platform.

imageOrla Sheridan, consumer channels group director, Microsoft Ireland, said: "The partnership between Microsoft and CJ Fallon is a great example of how Cloud applications like Windows Azure can enable the ability of all sectors to transition to the cloud. Through this service, CJ Fallon are cuttings costs for parents, making lesson planning for teachers easier and supporting the move of the Irish education system to an online forum. This is a great initiative and a great use of Microsoft Windows Azure Programme."

The eReader is available free of charge on the CJ Fallon website and can be used to ‘unlock' specific titles for usage based on a flexible licensing model. Schools are given direct access to a licence manager to enable them to target specific resources at individual users or class groups.

Brian Gilsenan, CEO of CJ Fallon Publishers, said: "We've many of our books already available through the cloud and are set to bring all of them online by September 2012. As well as the distribution of school books to students through the cloud, we will be launching a new personalised MyCJFallon service in coming months. Through this teachers will be able to access and save all of their favourite CJ Fallon resources which are provided to support all of our major titles, from eBooks, animations, audio, video and interactives, to their own profile via the CJ Fallon website. We hope this platform will be beneficial for teachers, students and parents alike."

For more on CJ Fallon's eReader tool visit

<Return to section navigation list>

Visual Studio LightSwitch and Entity Framework 4.1+

Beth Massi (@bethmassi) posted LightSwitch Screen Tips: Scrollbars and Splitters on 2/9/2012:

imageOften times in business applications we have a lot of information to present on a screen, and depending on the user’s screen resolution that information may scroll off the page. Visual Studio LightSwitch’s screen templates are set to automatically apply a vertical scrollbar to screens when this happens, however, sometimes we rather allow the user to resize sections of the screen how they see fit. It’s common in screen design to use splitter controls to do this. Splitters are a useful UI feature where the width or height of a control on the screen can be modified to show more or less information. In this post I’ll show you how you can enable splitter controls to appear in LightSwitch as well as control how scrollbars behave.

Create Your Screen

image_thumb1First create a screen with all the information you want to display. For this example I will use the ContactDetails screens we created in the Address Book Sample. In this example, Contacts have many Addresses, PhoneNumbers and EmailAddresses. By default, LightSwitch will place the Contact fields at the top of the screen with a Tab Control of three DataGrids representing the children.

Here is what the screen looks like:


And here is the content tree for the screen:


Scrollbars in LightSwitch

Imagine that over time our contacts get more and more email addresses, phone numbers, and/or addresses. If the user’s screen resolution is low (or the size of the application shell is not maximized) then LightSwitch will place a vertical scrollbar on the screen by default (I added the green box for clarity ;-))


This is controlled by selecting the topmost Rows Layout and under the Appearance properties you will see “Vertical Scroll Enabled” checked. Notice there is also a “Horizontal Scroll Enabled” property you can use to enable horizontal scrollbars when needed. All group controls in LightSwitch have these properties (i.e. Rows Layout, Columns Layout, Table Layout)


Leaving this checked however, means that the user cannot see all the email addresses and the First & Last name fields at the same time. There are a couple things we can do here. One is we can disable vertical scroll of the screen. Once we do that LightSwitch will automatically place the scrollbar into the grid itself instead.


But what if we aren’t using a DataGrid (or list control) below the list of fields? Or what if we want the user to choose how many rows they need to view at a time? In those cases, we can allow the user to resize the panels of information using a splitter.

Adding a Horizontal Splitter

In order to provide this functionality, we can place a splitter between the list of fields at the top and the tab control below. While the application is running in debug mode, click the “Design Screen” at the top-right of the shell to open Customization Mode. Select the nested Rows Layout control for Contact and under the sizing properties you will see a check box “Is Row Resizable”.


Check that and then click Save to save your changes. You will now see a splitter control that you can use to resize the top and bottom panels.


In this case you probably also want to set a minimum and maximum height of this Rows Layout panel so that the user doesn’t use the splitter to wipe the grid completely off the screen. Right now if you take and drag the splitter off the screen, the grid will completely disappear. In order to stop this, you can set the MinHeight and MaxHeight properties. You can enable a scroll bar to appear in the top panel when needed by checking “Vertical Scroll Enabled” as well.


Adding Vertical Splitters

You can also add vertical splitters in a similar way. Instead of displaying DataGrids in separate tabs, say we want to display them side-by-side. Open customization mode again and change the Tabs Layout to a Columns Layout.


Then select each DataGrid and in the Sizing properties, check “Is Column Resizable”.


Now you can resize all of the columns containing the DataGrids. The DataGrids will automatically put a horizontal scrollbar at the bottom so users can see all the fields as necessary.


Another Example

Splitters and scrollbars really start helping you out when you have sections of information and/or commands that you want to allow the user to resize. Here I modified the Contoso Construction example to allow resizing of the tab control of commands on the left and the information on the right.


Wrap Up

There are a lot of things you can do with the LightSwitch Screen designer in order to provide completely customized layouts. Controlling scrollbars and adding splitters is just another way you can achieve what you want. For more tips and tricks on customizing screens see:

The ADO.NET Team reported EF 4.3 Released on 2/9/2012:

Over the last six months we’ve released a series of previews of our Code First Migrations work. Today we are making the first fully supported go-live release of our migrations work available as part of EF 4.3.

What Changed Between EF 4.2 and EF 4.3

The notable changes between EF 4.2 and EF 4.3 include:

  • New Code First Migrations Feature. This is the primary new feature in EF 4.3 and allows a database created by Code First to be incrementally changed as your Code First model evolves.
  • Removal of EdmMetadata table. If you allow Code First to create a database by simply running your application (i.e. without explicitly enabling Migrations) the creation will now take advantage of improvements to database schema generation we have implemented as part of Migrations.
  • Bug fix for GetDatabaseValues. In earlier releases this method would fail if your entity classes and context were in different namespaces. This issue is now fixed and the classes don’t need to be in the same namespace to use GetDatabaseValues.
  • Bug fix to support Unicode DbSet names. In earlier releases you would get an exception when running a query against a DbSet that contained some Unicode characters. This issue is now fixed.
  • Data Annotations on non-public properties. Code First will not include private, protected, or internal properties by default. Even if you manually included these members in your model, using the Fluent API in previous versions of Code First would ignore any Data Annotations on these members. This is now fixed and Code First will process the Data Annotations once the private, protected, or internal properties are manually included in the model.
  • More configuration file settings. We’ve enabled more Code First related settings to be specified in the App/Web.config file. This gives you the ability to set the default connection factory and database initializers from the config file. You can also specify constructor arguments to be used when constructing these objects. More details are available in the EF 4.3 Configuration File Settings blog post.
What Changed from EF 4.3 Beta 1

Aside from some minor bug fixes the changes to Code First Migrations since EF 4.3 Beta 1 include:

  • Enable-Migrations will now scaffold a code-based migration if the database has already been created. If you use Code First to create a database and then later enable migrations, a code-based migration will be scaffolded that represents the objects that have already been created in the database. (See the Enabling Migrations section of the Code-Based Migrations Walkthrough for more details).
    You can use the –EnableAutomaticMigrations flag to avoid a code-based migration from being scaffolded, and have the creation of these objects be treated as an automatic migration.
  • Added $InitialDatabase constant. This can be used in place of “0” when specifying a source or target migration. For example, migrating down to an empty database can be performed with the Update-Database –TargetMigration:$InitialDatabase command.
  • Renamed Update-Database.exe to migrate.exe. The command line tool for applying migrations was originally named to be consistent with the Power Shell command. We’ve now changed it to be more consistent with other command line names.
  • Updated migrate.exe so it can be invoked from the NuGet tools folder. In Beta 1 you needed to copy migrate.exe to the same directory as the assembly that contained migrations, you can now invoke it directly from the tools folder. You will need to specify the directory containing your assembly in the /StartUpDirectory parameter. For example:
    C:\MyProject\packages\EntityFramework.4.3.0\tools>migrate.exe MyAssembly /StartUpDirectory:C:\MyProject\MyProject\bin\debug
  • Fixed errors when running migrations commands from Package Manager Console. A number of folks reported errors when using the migrations commands in certain scenarios. We’ve fixed the underlying bugs causing these issues.
  • Fixed –Script issues. There were a number of bugs in the scripting functionality of EF 4.3 Beta 1 that prevented you generating a script starting from a migration other than an empty database. These bugs are now fixed.
Getting Started

You can get EF 4.3 by installing the latest version of the EntityFramework NuGet package.


These existing walkthroughs provide a good introduction to using the Code First, Model First & Database First workflows available in Entity Framework:

There are two new walkthroughs that cover the new Code First Migrations feature. One focuses on the no-magic workflow that uses a code-based migration for every change. The other looks at using automatic migrations to avoid having lots of code in your project for simple changes.

Upgrading from ‘EF 4.3 Beta 1’

If you have EF 4.3 Beta 1 installed you can upgrade to the new package by running Update-Package EntityFramework in Package Manager Console.

You may need to close and re-open Visual Studio after upgrading new package, this is required to unload the old migrations commands.

EF 5.0 (Enum support is coming… finally!)

We’ve been working on a number of features that required updates to some assemblies that are still part of the .NET Framework. These features include enums, spatial data types and some significant performance improvements.

As soon as the next preview of the .NET Framework 4.5 is available we will be shipping EF 5.0 Beta 1, which will include all these new features.


This release can be used in a live operating environment subject to the terms in the license terms. The ADO.NET Entity Framework Forum can be used for questions relating to this release.

Return to section navigation list>

Windows Azure Infrastructure and DevOps

Wely Lau (@wely_live) continued his series with An Introduction to Windows Azure (Part 2) in a 2/9/2012 post to Red Gate Software’s ACloudyPlace blog:

imageThis is the second article of a two-part introduction to Windows Azure. In Part 1, I discussed the Windows Azure data centers and examined the core services that Windows Azure offers. In this article, I will explore additional services available as part of Windows Azure which enable customers to build richer, more powerful applications.

Additional Services
1. Building Block Services

image‘Building block services’ were previously branded ‘Windows Azure AppFabric’. The main objective of building block services is to enable developers to build connected applications. The three services under this category are:

(i) Caching Service

Generally, accessing RAM is much faster than accessing disk, including storage and databases. For that reason, Microsoft have developed an in-memory and distributed caching service to deliver low latency, high-performance access, namely Windows Server AppFabric Caching. However, there are some activities, such as installing and managing, and some hardware requirements like investing in clustered servers, which have to be handled by the end-user.

Windows Azure Caching Service is a self-managed, yet distributed, in-memory caching service built on top of the Windows Server AppFabric Caching Service. Developers will no longer have to install and manage the Caching Service / Clusters. All they need to do is to create a namespace, specify the region, and define the Cache Size. Everything will get provisioned automatically in just a few minutes.

Creating new Windows Azure Caching Service

Additionally, Azure Caching Service comes along with a .NET client library and session providers for ASP.NET, which allow the developer to quickly use them in the application.

(ii) Access Control Service

Third Party Authentication

With the trend for federated identity / authentication becoming increasingly popular, many applications have relied on authentication from third party identity providers (IdPs) such as Live ID, Yahoo ID, Google ID, and Facebook.

One of the challenges developers face when dealing with different IdPs is that they use different standard protocols (OAuth, WS-Trust, WS-Federation) and web tokens (SAML 1.1, SAML 2.0, SWT).

Multiple ID Authentication

Access Control Service (ACS) allows application users to authenticate using multiple IdPs. Instead of dealing with different IdPs individually, developers just need to deal with ACS and let it take care of the rest.

AppFabric Access Control Services …

Wely continues with details of the Service Bus, Data Services, Networking, and the Windows Azure Marketplace.

Full disclosure: I’m a paid contributor to

<Return to section navigation list>

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

Kristian Nese (@KristianNese) posted a link to his Getting started with Services in VMM 2012 document on 2/9/2012:

imageDear community,

If you`re not familiar with Services in VMM 2012, I have created a tiny document where I`m trying to explain it a bit with my own words and by using a visual guidance.

imageHopefully some of you will find it useful.

Download / view it from here: Getting started with Services in VMM 2012

It’s by no means a tiny document.

Kristian Nese (@KristianNese) posted a link to a System Center 2012 RC Integration Packs document on 2/9/2012:

imageFinally available.

Now it`s time to explore the glue in the System Center 2012 family. System Center Orchestrator will force the technologist to perform a mind shift. Start focusing on best practice and create automated processes so that you finally can watch all the youtube videos you want, read newspaper and drink all the coffee you can get your hands on! :-)

imageRead more about the release here:

Nancy Gohring (@idgnancy) asserted “Microsoft appeared interested in maintaining support for Hyper-V in OpenStack, but developers decided to remove the buggy code” in a deck for her OpenStack removes Hyper-V support in next release article of 2/1/2012 for ComputerWorld (missed when published):

imageDespite Microsoft's stated commitment to Hyper-V in OpenStack, buggy code designed to support the hypervisor will be removed from the next version of the stack, developers decided on Wednesday.

An OpenStack developer wrote a patch that removes the Hyper-V support code, and two members of the core OpenStack team have approved the patch. That means the code will be removed when the next version of OpenStack, called Essex, is released in the second quarter. The code would have allowed a service provider to build an OpenStack cloud using Hyper-V.

imageMicrosoft announced in late 2010 that it had contracted with a company to build support for Hyper-V in OpenStack. "But they never really finished it and the company hasn't supported it since then," said Joshua McKenty, CEO of Piston Cloud Computing, in an interview earlier this week. McKenty was the technical architect of NASA's Nebula cloud platform, which spun off into OpenStack, and is involved in the OpenStack community.

imageDevelopers working on Essex suggested late last week dropping the Hyper-V support code. The code is "broken and unmaintained," Theirry Carrez, a developer handling release management for OpenStack, wrote in a news group when suggesting that it be dropped.

After reports surfaced that the code might be removed, Microsoft sounded interested in figuring out a way to retain it. "Microsoft is committed to working with the community to resolve the current issues with Hyper-V and OpenStack," Microsoft said in a statement on Tuesday. The company did not reply to a request for comment about Wednesday's decision to remove the code.

The move impacts very few people--McKenty doesn't know of any OpenStack clouds being built on Hyper-V. But it indicates that few cloud providers are using Windows Server in their OpenStack deployments, which could be a concern for Microsoft, noted James Staten, a Forrester Research analyst, earlier this week.

<Return to section navigation list>

Cloud Security and Governance

Ed Moyle asked Are cloud providers HIPAA business associates? in a 2/10/2012 post to TechTarget’s SearchCloudSecurity blog:

imageAs the use of cloud computing becomes more prevalent in health care, organizations that must comply with HIPAA face a number of compliance challenges, including the question of whether they should consider cloud service providers as HIPAA business associates. It matters because business associates have certain privacy and security requirements under the law that other third parties don’t -- and in turn, covered entities have specific requirements when it comes to business associates. Since guidance is tough to come by and consensus isn’t yet established, the decision can be a complex.

imageThe HIPAA privacy rule defines a business associate as “a person or entity that performs certain functions or activities that involve the use or disclosure of protected health information on behalf of, or provides services to, a covered entity.” That seems clear (i.e., disclosing PHI to a vendor means they’re a business associate) until you examine the specifics of “disclosure.” For example, some cloud service models only require storage of PHI; does mere storage constitute“disclosure” in the manner intended? Other vendors might backup the date automatically; is that“disclosure”? How about debugging, troubleshooting or monitoring? The list of ambiguous scenarios is a mile long. …

Ed continues with “Making the HIPAA business associate call” and “Strategizing around HIPAA business associate status” sections.

Full disclosure: I’m a paid contributor to TechTarget’s blog.

<Return to section navigation list>

Cloud Computing Events

Richard Conway (@azurecoder) reported on 2/9/2012 about Windows Azure and High Performance Compute presentation at a New York City user group:

Thanks to all those that came to the third meeting of the user group. We had a fantastic talk by Vas Kapsalis with demoing by Antonio Zurlo in New York. Great evening. Room was packed with about 80 people or so and a good time was had by all, especially me as I got to the beer keg first!

imageWe’re fairly expert in HPC and Azure these days and would love to hear from anyone who attended the user group with an interest. We’re currently testing a managed grid in Azure to allow adoption of HPC by Joe Schmo (anyone know him?) in the street rather than large organisations with even larger IT budgets! We’re all about elasticity for the masses. We’re not called Elastacloud for nothing!

Anyone if you missed the meeting here are Vas’s slides. For those that are interested in the Tip of the Month I did this on how to use WIF and certificates to encrypt cookies on more than one role instance. Down with DPAPI! You can find the slides here if you’re interested. Happy trails and all that!

Jim O’Neil (@jimoneil) will present a How Can My Business Benefit from Windows Azure [Mar 8] seminar on 3/8/2012 from 8:30 AM to 12:00 PM at the Microsoft Waltham, MA office:

Rimageegular readers of this blog know I often dive pretty deeply into the bits and bytes of developing cloud applications – roles, queues, SQL Azure, REST, ACS, Service Bus – stuff that makes your head explode. If you’re into all that, awesome, but this event isn’t for you! It may be for your boss though or anyone else that has a reaction similar to the woman on the right whenever “cloud computing” enters the conversation.

imageWill it save me money? Can it make my business more nimble? and more scalable? How do I leverage my existing investment in on-premises infrastructure? Why does Windows Azure’s Platform-as-a-Service approach make better business sense than the ubiquitous Infrastructure-as-a-Service options?

Find your direction into the cloudIf you’re a business or technical decision maker and these types of questions have been weighing heavy on your mind as you drive your own company’s cloud strategy, then set aside Thursday, March 8th, and register now for this 1/2 day free seminar, and you’ll leave on, ahem, Cloud Nine!

[Register here:] Windows Azure: Focus on the application. Not your infrastructure

Thursday, March 8, 2012
8:30 a.m. – 12 noon

Microsoft, 201 Jones Road, Waltham MA 02451

Richard Conway (@azurecoder) posted ADFS and ACS: Slides from Jan 10th Jan UKWAUG meeting on 2/5/2012 (missed when published:

These are very belated slides from the January meeting of the UK Windows Azure Users Group by Microsoft Evangelist Steve Plank. Planky gave a great talk on integrating ACS with ADFS in the cloud illustrating the messages and workflow that occurs. With ACS these integration points with typical web scenarios become tremendously easier. You can get the slides here.

imagePlanky’s after the break talk was on Azure Connect and common usage scenarios. You can get the slides here. Once again thanks to all those that turned up in support of the group and looking forward to seeing new and old faces on Tuesday night.

<Return to section navigation list>

Other Cloud Computing Platforms and Services

No significant articles today.

<Return to section navigation list>