Thursday, November 01, 2012

Recent Articles about SQL Azure Labs and Other Added-Value Windows Azure SaaS Previews: A Bibliography

imageI’ve been concentrating my original articles for the past 12 six months or so on SQL Azure Labs, HDInsight Apache Hadoop on Windows Azure and Windows Azure SQL Azure Database Federations previews, which I call added-value offerings. I use the term added-value because Microsoft doesn’t charge for their use, other than Windows Azure compute, storage and bandwidth costs or Windows Azure SQL Database SQL Azure monthly charges and bandwidth costs for some of the applications, such as Codename “Cloud Numerics” and SQL Azure Federations.

Updated 11/1/2012 with updates and change of name from Apache Hadoop on Windows Azure to HDInsight, new Codename “Clound Numerics” features, new Windows Azure Mobile Services tutorials, pending Project “Austin” (StreamInsight for Windows Azure) articles and brief descriptions of the added-value offerings.

Updated 6/30/2012 with my Analyzing 'big data' with Microsoft [Codename] Cloud Numerics article for and revised status of Codenames “Data Hub,” Data Transfer” and “Social Analytics” to discontinued.

‡ Updated 6/1/2012 with Table of Contents and addition of Amazon Elastic MapReduce content.


  1. Windows Azure Marketplace DataMarket plus Codenames “Data Hub” and “Data Transfer” from SQL Azure Labs
  2. HDInsight Apache Hadoop on Windows Azure from the SQL Server Team
  3. StreamInsight Project Codename “Austin” from the SQL Server Team
  4. Codename “Cloud Numerics” from SQL Azure Labs
  5. Codename “Social Analytics from SQL Azure Labs
  6. Codename “Data Explorer” from SQL Azure Labs
  7. Mobile Services and SQL Azure Federations from the Windows Azure SQL Database SQL Azure Team

The following tables list my articles in reverse chronological order of their publication date on the OakLeaf, or (marked ) and Red Gate Software’s (marked ••) blogs. Dates usually are the date of their last update, if updated; otherwise, the publication date. Blank dates are for articles submitted but not yet published (titles might change).

I’ll update this post as I write other articles in the same genre. Dates for updated items will be bold.

Windows Azure Marketplace DataMarket plus Codenames “Data Hub” and “Data Transfer” from SQL Azure Labs

The SQL Azure Labs Team describes Microsoft Codename "Data Hub" as follows:

Data drives your business. Having the right data at the right time gives you and your business a competitive advantage. Often the data you need already exists within your enterprise. You just need to find it. "Data Hub" enables your enterprise to curate and publish its data on a private data marketplace, making it easy to discover and leverage.

‡ Update 6/30/2012: Shoshanna Budzianowski (@shoshe) of the Codename “Data Transfer” team reported the shutdown of this SQL Labs project in a Thanks for using the Data Transfer Lab post on 6/2/2012

Date Link
5/21/2012 Using the Windows Azure Marketplace DataMarket (and Codename “Data Hub”) Add-In for Excel (CTP3)
5/19/2012 Accessing the US Air Carrier Flight Delay DataSet on Windows Azure Marketplace DataMarket and “DataHub”
5/15/2012 Free Private Data from Silos for Internal Use with Microsoft CodeName “Data Hub”
5/11/2012 Creating An Incremental SQL Azure Data Source for OakLeaf’s U.S. Air Carrier Flight Delays Dataset
5/15/2012 Microsoft Codename “Data Transfer” and “Data Hub” Previews Don’t Appear Ready for BigData
5/12/2012 Five Months of U.S. Air Carrier Flight Delay Data Available on the Windows Azure Marketplace DataMarket
5/11/2012 Creating a Private Data Marketplace with Microsoft Codename “Data Hub”
11/30/2011 Test-Drive SQL Azure Labs’ New Codename “Data Transfer” Web UI for Copying *.csv Files to SQL Azure Tables or Azure Blobs

HDInsight Apache Hadoop on Windows Azure from the SQL Server Team

‡‡‡ Microsoft’s Matt Winkler described the Windows Azure HDInsight Service as follows on 10/24/2012:

This morning we made some big announcements about delivering Hadoop for Windows Azure users. Windows Azure HDInsight Service is the easiest way to deploy, manage and scale Hadoop based solutions. This release includes:

  • Hadoop updates that ensure the latest stable versions of:
    • HDFS and Map/Reduce
    • Pig
    • Hive
    • Sqoop
  • Increased availability of the preview service
  • A local, developer installation of Microsoft HDInsight Server
  • An SDK for writing Hadoop jobs using .NET and Visual Studio
Community Contributions

As part of our ongoing commitment to Apache™ Hadoop®, the team has been actively working to submit our changes to Apache™. You can follow the progress of this work by following branch-1-win for check-ins related to HDFS and Map/Reduce. We’re also contributing patches to other projects, including Hive, Pig and HBase. This set of components is just the beginning, with monthly refreshes ahead we’ll be adding additional projects, such as HCatalog.

HDInsight is available for on-premise deployment as Microsoft HDInsight Server.

Date Link
6/1/2012 Creating an Interactive Hadoop on Azure Hive Table from an Amazon Elastic MapReduce Hive Output File (Updated 6/2/2012)
5/31/2012 Executing an Elastic MapReduce Hive Workflow from the AWS Management Console (Updated 6/4/2012)
5/1/2912 Big data buzz gets louder with Apache Hadoop and Hive
4/14/2012 Using Excel 2010 and the Hive ODBC Driver to Visualize Hive Data Sources in Apache Hadoop on Windows Azure
4/17/2012 Using Data from Windows Azure Blobs with Apache Hadoop on Windows Azure CTP
4/3/2012 Importing Windows Azure Marketplace DataMarket DataSets to Apache Hadoop on Windows Azure’s Hive Databases
4/2/2012 Introducing Apache Hadoop Services for Windows Azure
3/24/2012 Examining the state of PaaS in the year of ‘big data’
2/6/2012 Introducing Microsoft Research’s Excel Cloud Data Analytics
10/16/2011 Ted Kummert at PASS Summit: Hadoop-based Services for Windows Azure CTP to Release by End of 2011

StreamInsight Project Codename “Austin” from the SQL Server Team

‡‡‡ The SQL Server StreamInsight Team described StreamInsight Project Codename “Austin” as follows on 5/24/2011:

Project Codename “Austin” will make Microsoft StreamInsight’s complex event processing capabilities available as a service on the Windows Azure Platform. This allows Microsoft’s customers and partners to build event-driven applications where the analysis of the events is performed in the Cloud. Such a deployment becomes relevant in scenarios where

  • event data needs to be collected from globally distributed assets or equipment such as connected cars or oil platforms
  • event data is already “born” in the cloud, like clickstream data
  • event-processing results need to be consolidated and made globally available.

Instead of pulling data into an on-premise analytics environment and then possibly distributing it again, it can be processed in the Cloud using StreamInsight’s event-driven computation framework, providing cloud computing benefits for many application scenarios in verticals such as manufacturing, oil & gas, utilities, health care and web analytics.

Project Codename “Austin” offers the same capabilities for declarative event processing to derive insight from real-time and historical event data as Microsoft StreamInsight does on premises. To facilitate migrations from on premise applications to the Cloud, Project Codename “Austin” adopts the existing .NET and LINQ-based development experience that Microsoft StreamInsight provides for on premise solutions. A StreamInsight instance in the Cloud should appear just like an on-premise instance . In addition, Project Codename “Austin” will adopt a cloud-based deployment and servicing experience.

The latest update to Codename “Austin” is the third CTP dated August 2012.

Date Link
•• Pending Move Complex Event Processing to the Cloud with the StreamInsight Service for Windows Azure CTP, Part 2
•• Pending Move Complex Event Processing to the Cloud with the StreamInsight Service for Windows Azure CTP, Part 1

Codename “Cloud Numerics” from SQL Azure Labs

The SQL Azure Labs team describes Microsoft Codename "Cloud Numerics" as follows:

The Microsoft Codename "Cloud Numerics" lab is a numerical and data analytics library for data scientists, quantitative analysts, and others who write C# applications in Visual Studio. It enables these applications to be scaled out, deployed, and run on Windows Azure.

Ronnie Hoogerwerf (@rhoogerw) announced Microsoft Codename “Cloud Numerics” Lab Refresh on 10/18/2012. This post is a repeat of an 8/2/2012 post about v0.2 August 2012 update, reported here, with minor edits which caused it to reappear with a new publish date:

imageWe are announcing a refresh of the Microsoft Codename "Cloud Numerics" Lab. We want to thank everyone who participated in the initial lab, we amassed and used your feedback to make improvements and add exciting features. Your participation is what makes this lab a success. Thank you.

image222Here’s what is new in the refresh:

Improved user experience: through more actionable exception messages, a refactoring of the probability distribution function APIs, and better and more actionable feedback in the deployment utility. In addition, the deployment process time has decreased and the installer supports installation on a on-premises Windows HPC Cluster. All up, this refresh provides for a more efficient way of writing and deploying “Cloud Numerics” applications to Windows Azure. [Emphasis added.]

More scale-out enabled functions: more algorithms are enabled to work on distributed arrays. This significantly increases the breadth and depth of big data algorithms that can be developed using “Cloud Numerics” Lab. Scale-out functionality was added in the following areas: Fourier transforms, linear algebra, descriptive statistics, pattern recognition, random sampling, similarity measures, set operations, and matrix math.

Array indexing and manipulation: a large part of any data analytics application concerns handling and preparing data to be in the right shape and have the right content. With this refresh “Cloud Numerics” adds advanced array indexing enabling users to easily and efficiently set and extract subsets of arrays and to apply Boolean filters.

Sparse data structures and algorithms: much of the real-world big data sets are sparse, i.e., not every field in a table has a value. With this refresh of the lab we introduce a distributed sparse matrix structure to hold these datasets and introduce core sparse linear algebra functions enabling scenarios such as document classification, collaborative filtering, etc.

Apply/Sweep framework: in addition to the built-in parallelism the “Cloud Numerics” Lab, this refresh now exposes a set of APIs to enable embarrassingly parallel patterns. The Apply framework enables applying arbitrary serializable .NET code to each element of an array or to each row or column of an array. The framework also provides a set of expert level interfaces to define arbitrary array splits. The Sweep framework performs as its name implies —this framework enables distributed parameter sweeps across a set of nodes allowing for better execution times.

Improved IO functionality: we added more parallel readers to enable out of the box data ingress from Windows Azure storage and introduced parallel writers. [Emphasis added.]

Documentation: we introduced detailed mathematical descriptions of more than half of the algorithms using print-quality formulae and best-of-web equation rendering that help clarify algorithm mathematical definition and method behavior. In addition, we updated the “Getting Started” wiki, and we added conceptual documentation for the “Cloud Numerics” help that includes the programming model, the new Apply framework, IO, and so on.

Stay tuned for upcoming blog posts:

  • F#: We’ll be distributing a F# add-in for “Cloud Numerics” soon. The add-in exposes the “Cloud Numerics” APIs in a more functional manner, introduces operators, such as matrix multiply, and F# style constructors for and indexing on “Cloud Numerics” arrays.
  • Text analytics using sparse data structures

Do you want to learn more about Microsoft Codename “Cloud Numerics” Lab? Please visit us on our SQL Azure Labs home page, take a deeper look at the Getting Started material and Sign Up to get access to the installer. Let us know what you think by sending us email at

The “Cloud Numerics” refresh depends on the newly released Azure SDK 1.7 and Microsoft HPC Server R2 SP4. It does not provide support for the Visual Studio 2012 RC. [Emphasis added.]

Date Link
6/25/2012• Analyzing 'big data' with Microsoft [Codename] Cloud Numerics
4/3/2012 Analyze Years of Air Carrier Flight Arrival Delays in Minutes with the Windows Azure HPC Scheduler
3/26/2012 Analyzing Air Carrier Arrival Delays with Microsoft Codename “Cloud Numerics”
1/30/2012 Deploying “Cloud Numerics” Sample Applications to Windows Azure HPC Clusters
3/17/2012 Introducing Microsoft Codename “Cloud Numerics” from SQL Azure Labs

Codename “Social Analytics from SQL Azure Labs

The SQL Azure Labs team reported the availability of Microsoft Codename "Social Analytics" on 10/25/2011:

As the popularity of the social web continues to grow it has become increasingly important for businesses to keep their finger on the pulse of the social web. Social information provides businesses with new insights, and the social web provides a means to connect with customers and respond quickly to customer concerns or comments. Microsoft Codename "Social Analytics" allows you to easily integrate social information into your business applications.

Aggregate social media content from many sources including Twitter, Facebook, blogs and forums.

raw social data by assessing the sentiment and by tying conversations together.

Include rich social media content in your web applications through straightforward APIs.

‡ Update 6/30/2012: The Codename “Social Analytics” Team reported in a Microsoft Codename "Social Analytics" - Lab Phase is Complete blog post of 6/21/2012 that the project was discontinued. The OData data source was no longer available from the Windows Azure Marketplace Data Market as of 6/30/2012. I am in the process of modifying my Microsoft Social Analytics Windows Form Client sample project to used data saved before the data source shutdown. See my My “Big Data in the Cloud” Cover Article for Visual Studio Magazine’s July Issue post for more details.

Date Link
2/22/2012•• Track Consumer Engagement and Sentiment with Microsoft Codename “Social Analytics”
2/17/2012 Twitter Sentiment Analysis: A Brief Bibliography
12/26/2011 More Features and the Download Link for My Codename “Social Analytics” WinForms Client Sample App
12/16/2011 Use OData to Execute RESTful CRUD Operations on Big Data in the Cloud
12/1/2011 Microsoft tests Social Analytics experimental cloud
11/23/2011 New Features Added to My Microsoft Codename “Social Analytics” WinForms Client Sample App
11/19/2011 My Microsoft Codename “Social Analytics” Windows Form Client Detects Anomaly in VancouverWindows8 Dataset
11/15/2011 Microsoft Codename “Social Analytics” ContentItems Missing CalculatedToneId and ToneReliability Values
11/4/2011 Problems Browsing Codename “Social Analytics” Collections with Popular OData Browsers
11/5/2011 Using the Microsoft Codename “Social Analytics” API with Excel PowerPivot and Visual Studio 2010
11/1/2011 SQL Azure Labs Unveils Codename “Social Analytics” Skunkworks Project

Codename “Data Explorer” from SQL Azure Labs

‡‡‡ The SQL Azure Labs team describes Microsoft Codename "Data Explorer" as follows:

Gain new insights from your data

Have you ever had trouble finding data you needed? Or combining data from different, incompatible sources? How about sharing the results with others in a web-friendly way? If so, we want you to try Microsoft Codename “Data Explorer” Cloud service.

With "Data Explorer" you can:

  • Identify the data you care about from the sources you work with (e.g. Excel spreadsheets, files, SQL Server databases).

  • Discover relevant data and services via automatic recommendations from the Windows Azure Marketplace.

  • Enrich your data by combining it and visualizing the results.

  • Collaborate with your colleagues to refine the data.

  • Publish the results to share them with others or power solutions.

In short, we help you harness the richness of data on the Web to generate new insights.

Date Link
1/24/2012• Microsoft cloud service lets citizen developers crunch big data
12/30/2011 Problems with Microsoft Codename “Data Explorer” - Aggregate Values and Merging Tables - Solved
12/27/2011 Microsoft Codename “Data Explorer” Cloud Version Fails to Save Snapshots of Codename “Social Analytics” Data
12/27/2011 Mashup Big Data with Microsoft Codename “Data Explorer” - An Illustrated Tutorial
10/12/2011 Ted Kummert at PASS Summit: “Data Explorer” Creates Mashups from Big Data, DataMarket and Excel Sources

Mobile Services and Federations from the Windows Azure SQL Database SQL Azure Team

‡‡‡ The Windows Azure Team describes Windows Azure Mobile Services as follows:

Windows Azure Mobile Services is a Windows Azure service offering designed to make it easy to create highly-functional mobile apps using Windows Azure. Mobile Services brings together a set of Windows Azure services that enable backend capabilities for your apps. Mobile Services provides the following backend capabilities in Windows Azure to support your apps:

  • Simple provisioning and management of tables for storing app data.
  • Integration with notification services to deliver push notifications to your app.
  • Integration with well-known identity providers for authentication.
  • Granular control for authorizing access to tables.
  • Supports scripts to inject business logic into data access operations.
  • Integration with other cloud services.
  • Supports the ability to scale a mobile service instance.
  • Service monitoring and logging.

and Windows Azure SQL Database Federations thusly:

Federations in SQL Database are a way to achieve greater scalability and performance from the database tier of your application through horizontal partitioning. One or more tables within a database are split by row and portioned across multiple databases (Federation members). This type of horizontal partitioning is often referred to as ‘sharding’. The primary scenarios in which this is useful are where you need to achieve scale, performance, or to manage capacity.

SQL Database can deliver scale, performance, and additional capacity through federation, and can do so dynamically with no downtime; client applications can continue accessing data during repartitioning operations with no interruption in service.

Date Link
10/31/2012• Windows Azure Mobile Services creates backends for Windows 8, iPhone
10/17/2012 Windows Azure Mobile Services Preview Walkthrough–Part 6: Authentication with Third-Party Identity Providers (under construction)
9/22/2012 Windows Azure Mobile Services Preview Walkthrough–Part 5: Distributing Your App From the Windows Store
9/15/2012 Windows Azure Mobile Services Preview Walkthrough–Part 4: Customizing the Windows Store App’s UI
9/22/2012 Windows Azure Mobile Services Preview Walkthrough–Part 3: Pushing Notifications to Windows 8 Users (C#)
9/9/2012 Windows Azure Mobile Services Preview Walkthrough–Part 2: Authenticating Windows 8 App Users (C#)
9/8/2012 Windows Azure Mobile Services Preview Walkthrough–Part 1: Windows 8 ToDo Demo Application (C#)
4/5/2012 Split root table with T-SQL in SQL Azure Federations
4/5/2012 Manage, query SQL Azure Federations using T-SQL
3/28/2012 Tips for deploying SQL Azure Federations
1/18/2012 Upload Big Data to SQL Azure Federated Databases with BCP Automatically
1/17/2012 Loading Big Data into Federated SQL Azure Tables with the SQL Azure Federation Data Migration Wizard v1.2
1/11/2012 Creating a SQL Azure Federation in the Windows Azure Platform Portal
1/8/2012 Generating Big Data for Use with SQL Azure Federations and Apache Hadoop on Windows Azure Clusters
7/1/2011 Sharding relational databases in the cloud