OakLeaf Systems: Windows Azure and Cloud Computing Posts for 1/4/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI Access Control, Connect, SQL Azure Database, and other cloud-computing articles.

•• Updated 1/7/2012 1:30 PM PST with new articles marked •• by the SQL Server Team

• Updated 1/6/2012 11:30 AM PST with new articles marked • by Jim O’Neil, Avkash Chauhan, Ilan Rabinovitch, Ashlee Vance, Beth Massi, Mary Jo Foley, Maarten Balliauw, Ralph Squillace, SearchCloudComputing, Himanshu Singh, Larry Franks, Cihan Biyikoglu, Andrew Brust and Me.

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

Windows Azure Blob, Drive, Table, Queue and Hadoop Services
SQL Azure Database and Reporting
Marketplace DataMarket, Social Analytics and OData
Windows Azure Access Control, Service Bus, EAI Bridge and Cache
Windows Azure VM Role, Virtual Network, Connect, RDP and CDN
Live Windows Azure Apps, APIs, Tools and Test Harnesses
Visual Studio LightSwitch and Entity Framework v4+
Windows Azure Infrastructure and DevOps
Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds
Cloud Security and Governance
Cloud Computing Events
Other Cloud Computing Platforms and Services

Azure Blob, Drive, Table, Queue and Hadoop Services

• Andrew Brust (@andrewbrust) offered Hadoop on Azure: First Impressions in his Redmond Roundup post of 1/6/2012:

In the last Roundup, I mentioned that I was admitted to the early preview CTP of Microsoft's Project "Isotope" Hadoop on Azure initiative. At the time, I had merely provisioned a Hadoop cluster. At this point, I've run queries from the browser, logged in over remote desktop to the cluster's head node, tooled around and connected to Hadoop data from four different Microsoft BI tools. So far, I'm impressed.

I've been saying for a while that Hadoop and Big Data aren't so far off in mission from enterprise BI tools. The problem is that the two technology areas have thus far existed in rather mutually-isolated states, when they should be getting mashed up. While a number of BI vendors have been pursuing that goal, it actually seems to be the entire underpinning of Microsoft's Hadoop initiative. That does lead to some anomalies. For instance, Microsoft's Hadoop distribution seems to exclude the Hbase NoSQL database, which is rather ubiquitous in the rest of the Hadoop world. But in general, the core Hadoop environment seems faithful and robust, provisioning a cluster is really simple and connectivity from Excel, PowerPivot, Analysis Services tabular mode and Reporting Services seems solid and smooth. I like what I'm seeing and am eager to see more. …

I’m a bit mystified by the missing Hbase, also. Andrew continues with observations about Nokia/Windows Phone 7 and Ultrabooks running Windows 7.

• Avkash Chauhan (@avkashchauhan) described Using Windows Azure Blob Storage (asv://) for input data and storing results in Hadoop Map/Reduce Job on Windows Azure in a 1/5/2012 post:

Microsoft[’s] distribution [of] Apache Hadoop [has] direct connectivity to cloud storage, i.e., Windows Azure Blob storage or Amazon S3. Here we will learn how to connect your Windows Azure Storage directly from your Hadoop Cluster.

To learn how to connect you[r] Hadoop Cluster to Windows Azure Storage, please read the following blog first:

Apache Hadoop on Windows Azure: Connecting to Windows Azure Storage your Hadoop Cluster

After reading the above blog, please setup your Hadoop configuration to connect with Azure Storage and verify that connection to Azure Storage is working. Now, before running [a] Hadoop Job please be sure to understand the correct format to use asv:// as below:

When using input or output string using Azure storage you must use the following format:

Input

asv://<container_name>/<symbolic_folder_name>

Example: asv://hadoop/input

Output

asv://<container_name>/<symbolic_folder_name>

Example:asv://hadoop/output

Note If you will use asv://<only_container_name> then job will return error.

Let’s verify at Azure Storage that we do have some data in proper location

The contents of the file helloworldblo[b].txt are as below:
This is Hello World
I like Hello World
Hello Country
Hello World
Love World
World is Love
Hello World
Now let’s run a simple WordCount Map/Reduce Job and use HelloWorldBlob.txt as input file and store results also in Azure Storage.

Job Command:

call hadoop.cmd jar hadoop-examples-0.20.203.1-SNAPSHOT.jar wordcount asv://hadoop/input asv://hadoop/output

Once the Job Completes the following screenshot shows the results output:

Opening part-r-00000 shows the results as below:

Country 1

Hello 5

I 1

Love 2

This 1

World 6

is 2

like 1

Finally the Azure HeadNode WebApp shows the following final output about the Hadoop Job: …

See the original post for the detailed output.

• Avkash Chauhan (@avkashchauhan) added yet another member with Apache Hadoop on Windows Azure: Connecting to Windows Azure Storage from Hadoop Cluster on 1/5/2012:

Microsoft[’s] distribution [of] Apache Hadoop [has] direct connectivity to cloud storage, i.e., Windows Azure Blob storage or Amazon S3. Here we will learn how to connect your Windows Azure Storage directly from your Hadoop Cluster.

As you know Windows Azure Storage access need[s the] following two things:

Azure Storage Name

Azure Storage Access Key

Using [the] above information we create the following Storage Connection Strings:

DefaultEndpointsProtocol=https;

AccountName=<Your_Azure_Blob_Storage_Name>;

AccountKey=<Azure_Storage_Key>

Now we just need to setup the above information inside the Hadoop cluster configuration. To do that, please open C:\Apps\Dist\conf\core-site.xml and include the following to parameters related with Azure Blob Storage access from Hadoop Cluster:
<property>
<name>fs.azure.buffer.dir</name>
<value>/tmp</value>
</property>
<property>
<name>fs.azure.storageConnectionString</name>
<value>DefaultEndpointsProtocol=https;AccountName=<YourAzureBlobStoreName>;AccountKey=<YourAzurePrimaryKey></value>
</property>
The above configuration setup Azure Blob Storage within the Hadoop setup.

ASV:// => https://<Azure_Blob_Storage_name>.blob.core.windows.net

Now let’s try to list the blogs in your specific container:

c:\apps\dist>hadoop fs -lsr asv://hadoop/input

-rwxrwxrwx 1 107 2012-01-05 05:52 /input/helloworldblob.txt

Let’s verify at Azure Storage that the results we received above are correct as below:

So for example if you would want to copy a file from Hadoop cluster to Azure Storage you will use the following command:

Hadoop fs –copyFromLocal <Filename> asv://<Target_Container_Name>/<Blob_Name_or_samefilename>

Example:

c:\Apps>hadoop.cmd fs -copyFromLocal helloworld.txt asv://filefromhadoop/helloworldblob.txt

This will upload helloworld.txt file to container name “filefromhadoop” as blob name “helloworldblob.txt”.

c:\Apps>hadoop.cmd fs -copyToLocal asv://hadoop/input/helloworldblob.txt helloworldblob.txt

This command will download helloworldblob.txt blob from Azure storage and made available to local Hadoop cluster

Please see [the original article] to learn more about [the] “Hadoop fs” command:

• Ashlee Vance (@valleyhack) asserted “The startup's competitions lure PhDs and whiz kids to solve companies' data problems” in a deck for her Kaggle's Contests: Crunching Numbers for Fame and Glory article of 1/4/2012 for Bloomberg Business Week:

A couple years ago, Netflix held a contest to improve its algorithm for recommending movies. It posted a bunch of anonymized information about how people rate films, then challenged the public to best its own Cinematch algorithm by 10 percent. About 51,000 people in 186 countries took a crack at it. (The winner was a seven-person team that included scientists from AT&T Labs.) The $1 million prize was no doubt responsible for much of the interest. But the fervor pointed to something else as well: The world is full of data junkies looking for their next fix.

In April 2010, Anthony Goldbloom, an Australian economist, decided to capitalize on that urge. He founded a company called Kaggle to help businesses of any size run Netflix-style competitions. The customer supplies a data set, tells Kaggle the question it wants answered, and decides how much prize money it’s willing to put up. Kaggle shapes these inputs into a contest for the data-crunching hordes. To date, about 25,000 people—including thousands of PhDs—have flocked to Kaggle to compete in dozens of contests backed by Ford, Deloitte, Microsoft, and other companies. The interest convinced investors, including PayPal co-founder Max Levchin, Google Chief Economist Hal Varian, and Web 2.0 kingpin Yuri Milner, to put $11 million into the company in November.

The startup’s growth corresponds to a surge in Silicon Valley’s demand for so-called data scientists, who are able to pull business and technical insights out of mounds of information. Big Web shops like Facebook and Google use these scientists to refine advertising algorithms. Elsewhere, they’re revamping how retailers promote goods and helping banks detect fraud.

Big companies have sucked up the majority of the information all-stars, leaving smaller outfits scrambling. But Goldbloom, who previously worked at the Reserve Bank of Australia and the Australian Treasury, contends there are plenty of bright data geeks willing to work on tough problems. “There is not a lack of talent,” he says. “It’s just that the people who tend to excel at this type of work aren’t always that good at communicating their talents.”

One way to find them, Goldbloom believes, is to make Kaggle into the geek equivalent of the Ultimate Fighting Championship. Every contest has a scoreboard. Math and computer science whizzes from places like IBM and the Massachusetts Institute of Technology tend to do well, but there are some atypical participants, including glaciologists, archeologists, and curious undergrads. Momchil Georgiev, for instance, is a senior software engineer at the National Oceanic and Atmospheric Administration. By day he verifies weather forecast data. At night he turns into “SirGuessalot” and goes up against more than 500 people trying predict what day of the week people will visit a supermarket and how much they’ll spend. (The sponsor is dunnhumby, an adviser to grocery chains like Tesco.) “To be honest, it’s gotten a little bit addictive,” says Georgiev.

Eric Huls, a vice-president at Allstate, says many of his company’s math whizzes have been drawn to Kaggle. “The competition format makes Kaggle unique compared to working within the context of a traditional company,” says Huls. “There is a good deal of pride and prestige that comes with objectively having bested hundreds of other people that you just can’t find in the workplace.”

Allstate decided to piggyback on Kaggle’s appeal and last July offered a $10,000 prize to see if it could improve the way it prices automobile insurance policies. In particular, the company wanted to examine if certain characteristics of a car made it more likely to be involved in an accident that resulted in a bodily injury claim. Allstate turned over two years’ worth of data that included variables like a car’s horsepower, size, and number of cylinders, and anonymized accident histories. “This is not a new problem, but we were interested to see if the contestants would approach it differently than we have traditionally,” Huls says. “We found the best models in the competition did improve upon the models we built internally.” …

Read more: 2, Next Page

• Avkash Chauhan (@avkashchauhan) continued his series with Apache Hadoop on Windows Azure: How Hadoop cluster was setup on Windows Azure on 1/4/2012:

Once your provide following information to setup your Hadoop cluster in Azure:

Cluster DNS Name

Type of Cluster

Small – 4 Nodes – 2TB diskspace

Medium – 8 Nodes – 4 TB diskspace

Large – 16 nodes – 8 TB diskspace

Extra Large – 32 Nodes – 16 TB diskspace

Cluster login name and Password

The cluster setup process configure[s] your cluster depend[ing] on your settings, and finally you get your cluster ready to accept Hadoop Map/Reduce Jobs.

If you want to understand how the head node and worker nodes were setup internally, here is some information to you:

[The] head node is actually a Windows Azure web role running. You will find Head Node Details … below:

Isotope HeadNode JobTracker 9010

Isotope HeadNode JobTrackerWebApp 50030 ß Hadoop Map/Reduce Job Tracker

Isotope HeadNode NameNode 9000

Isotope HeadNode NameNodeWebApp 50070 ç Namenode Management

ODBC/HiveServer running on Port 10000

FTP Server running on Port 2226

IsotopeJS is also running at 8443 as Interactive JavaScript Console.

About [the] Worker Node, which is actually a worker role having [an] endpoint directly communicating with HeadNode WebRole, here are some details important to you:

Isotope WorkerNode – Create X instances depend[ing] on your cluster setup

For example, a Small cluster use 4 nodes in that case the worker node will look like:

IsotopeWorkerNode_In_0

IsotopeWorkerNode_In_1

IsotopeWorkerNode_In_2

IsotopeWorkerNode_In_3

Each WorkerNode gets its own IP Address and Port and following two ports are used for individual job tracker on each node and HDFS management:

http://<WorkerNodeIPAddress_X>:50060/tasktracker.jsp - Job Tracker

http://<WorkerNodeIPAddress_X>:50075/ - HDFS

If you remote[ly] login to your clus[t]er and check the name node summary using http://localhost:50070/dfshealth.jsp you will see the exact same worker node IP Address as described here:

http://localhost:50070/dfshealth.jsp

If you look your C:\Resources\<GUID.IsotopeHeadNode_IN_0.xml you will learn more about these details. This XML file is the same which you finds on any Web or Worker Role and the configuration in XML will help you a lot on this regard.

CloudTweaks asserted “Open source “Big Data” cloud computing platform powers millions of compute-hours to process exabytes of data for Amazon.com, AOL, Apple, eBay, Facebook, foursquare, HP, IBM, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Twitter, Yahoo!, and more” in an introduction to an Apache Hadoop v1.0: Open Source “Big Data” Cloud Computing Platform Powers Millions… news release on 1/4/2012:

The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced Apache™ Hadoop™ v1.0, the Open Source software framework for reliable, scalable, distributed computing. The project’s latest release marks a major milestone six years in the making, and has achieved the level of stability and enterprise-readiness to earn the 1.0 designation.

A foundation of Cloud computing and at the epicenter of “big data” solutions, Apache Hadoop enables data-intensive distributed applications to work with thousands of nodes and exabytes of data. Hadoop enables organizations to more efficiently and cost-effectively store, process, manage and analyze the growing volumes of data being created and collected every day. Apache Hadoop connects thousands of servers to process and analyze data at supercomputing speed.

“This release is the culmination of a lot of hard work and cooperation from a vibrant Apache community group of dedicated software developers and committers that has brought new levels of stability and production expertise to the Hadoop project,” said Arun C. Murthy, Vice President of Apache Hadoop. “Hadoop is becoming the de facto data platform that enables organizations to store, process and query vast torrents of data, and the new release represents an important step forward in performance, stability and security.

“Originating with technologies developed by Yahoo, Google, and other Web 2.0 pioneers in the mid-2000s, Hadoop is now central to the big data strategies of enterprises, service providers, and other organizations,” wrote James Kobielus in the independent Forrester Research, Inc. report, “Enterprise Hadoop: The Emerging Core Of Big Data” (October 2011).

Dubbed a “Swiss army knife of the 21st century” and named “Innovation of the Year” by the 2011 Media Guardian Innovation Awards, Apache Hadoop is widely deployed at organizations around the globe, including industry leaders from across the Internet and social networking landscape such as Amazon Web Services, AOL, Apple, eBay, Facebook, Foursquare, HP, LinkedIn, Netflix, The New York Times, Rackspace, Twitter, and Yahoo!. Other technology leaders such as Microsoft and IBM have integrated Apache Hadoop into their offerings. Yahoo!, an early pioneer, hosts the world’s largest known Hadoop production environment to date, spanning more than 42,000 nodes. [Microsoft emphasis added.]

“Achieving the 1.0 release status is a momentous achievement from the Apache Hadoop community and the result of hard development work and shared learnings over the years,” said Jay Rossiter, senior vice president, Cloud Platform Group at Yahoo!. “Apache Hadoop will continue to be an important area of investment for Yahoo!. Today Hadoop powers every click at Yahoo!, helping to deliver personalized content and experiences to more than 700 million consumers worldwide.”

“Apache Hadoop is in use worldwide in many of the biggest and most innovative data applications,” said Eric Baldeschwieler, CEO of Hortonworks. “The v1.0 release combines proven scalability and reliability with security and other features that make Apache Hadoop truly enterprise-ready.”

“Gartner is seeing a steady increase in interest in Apache Hadoop and related “big data” technologies, as measured by substantial growth in client inquiries, dramatic rises in attendance at industry events, increasing financial investments and the introduction of products from leading data management and data integration software vendors,” said Merv Adrian, Research Vice President at Gartner, Inc. “The 1.0 release of Apache Hadoop marks a major milestone for this open source offering as enterprises across multiple industries begin to integrate it into their technology architecture plans.”

Apache Hadoop v1.0 reflects six years of development, production experience, extensive testing, and feedback from hundreds of knowledgeable users, data scientists, systems engineers, bringing a highly stable, enterprise-ready release of the fastest-growing big data platform.

It includes support for:

HBase (sync and flush support for transaction logging)

Security (strong authentication via Kerberos)

Webhdfs (RESTful API to HDFS)

Performance enhanced access to local files for HBase

Other performance enhancements, bug fixes, and features

All version 0.20.205 and prior 0.20.2xx features

“We are excited to celebrate Hadoop’s milestone achievement,” said William Lazzaro, Director of Engineering at Concurrent Computer Corporation. “Implementing Hadoop at Concurrent has enabled us to transform massive amounts of real-time data into actionable business insights, and we continue to look forward to the ever-improving iterations of Hadoop.”

“Hadoop, the first ubiquitous platform to emerge from the ongoing proliferation of Big Data and noSQL technologies, is set to make the transition from Web to Enterprise technology in 2012,” said James Governor, co-founder of RedMonk, “driven by adoption and integration by every major vendor in the commercial data analytics market. The Apache Software Foundation plays a crucial role in supporting the platform and its ecosystem.”

Availability and Oversight

As with all Apache products, Apache Hadoop software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. Apache Hadoop release notes, source code, documentation, and related resources are available at http://hadoop.apache.org/.

About The Apache Software Foundation (ASF)

Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world’s most popular Web server software. Through the ASF’s meritocratic process known as “The Apache Way,” more than 350 individual Members and 3,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation’s official user conference, trainings, and expo.

For more information, visit http://www.apache.org/.

Source: Apache

SQL Azure Database and Reporting

•• Christina Storm of the SQL Azure Team (@chrissto) recommended that you Get your SQL Server database ready for SQL Azure! in a 1/6/2011 post to the Data Platform Insider blog:

One of our lab project teams was pretty busy while the rest of us were taking a break between Christmas and New Year’s here in Redmond. On January 3^rd, their new lab went live: Microsoft Codename "SQL Azure Compatibility Assessment". This lab is an experimental cloud service targeted at database developers and admins who are considering migrating existing SQL Server databases into SQL Azure databases and want to know how easy or hard this process is going to be. SQL Azure, as you may already know, is a highly available and scalable cloud database service delivered from Microsoft’s datacenters. This lab helps in getting your SQL Server database cloud-ready by pointing out schema objects which are not supported in SQL Azure and need to be changed prior to the migration process. So if you are thinking about the cloud again coming out of a strong holiday season where some of your on-premises databases were getting tough to manage due to increased load, this lab may be worth checking out.

There are two steps involved in this lab:

You first need to generate a .dacpac file from the database you’d like to check on with SQL Server Data Tools (SSDT) CTP4. SQL Server 2005, 2008, 2008 R2, 2012 (CTP or RC0) are supported.

Next, you upload your .dacpac to the lab cloud service, which returns an assessment report, listing the schema objects that need to change before you can move that database to SQL Azure.

You find more information on the lab page for this project and in the online documentation. A step-by-step video tutorial will walk you through the process. Of course, we would love to hear feedback from you!

And we’re always interested in suggestions or ideas for other things you’d like to see on http://wwwsqlazurelabs.com. You can use the feedback buttons on that page and send us a note, or visit http://www.mygreatwindowsazureidea.com/ and enter a new idea into any of the specific voting forums that are available there. There’s a dedicated voting forum for SQL Azure and many of its subareas. You can also place a vote for ideas that were posted if you spot something that matters to you. We appreciate your input! Follow @SQLAzureLabs on Twitter for news about new labs.

• Cihan Biyikoglu updated his Introduction to Fan-out Queries: Querying Multiple Federation Members with Federations in SQL Azure post of 12/29/2011 with code fixes on 1/4/2012:

Happy 2012 to all of you! 2011 has been a great year. We now have federations live in production with SQL Azure. So lets chat about fanout querying.

Many applications will need querying all data across federation members. With version 1 of federations, SQL Azure does not yet provide much help on the server side for this scenario. What I’d like to do in this post is show you how you can implement this and what tricks you may need to engage when post processing the results from your fan-out queries. Lets drill in.

Fanning-out Queries

Fan-out simply means taking the query to execute and executing that on all members. Fan-out queries can handle cases where you’d like to union the results from members or when you want to run additive aggregations across your members such as MAX or COUNT.

Mechanics of how to execute these queries over multiple members is fairly simple. Federations-Utility sample application demonstrated how simple fan-out can be done under the Fan-out Query Utility Page. The application code is available for viewing here. The app fans-out a given query to all federation members of a federation. Here is the deployed version;

http://federationsutility-scus.cloudapp.net/FanoutQueryUtility.aspx

There is help available for how to use it on the page. The action starts with the button click event for the “Submit Fanout Query” button. Lets quickly walk through the code first;

First the connection is opened and three federation properties are initialized for constructing the USE FEDERATION statement: federation name, federation key name and the minimum value to take us to the first member.

49: // open connection
50: cn_sqlazure.Open();
51:
52: //get federation properties
53: str_federation_name = txt_federation_name.Text.ToString();
54:
55: //get distribution name
56: cm_federation_key_name.Parameters["@federationname"].Value = str_federation_name;
57: str_federation_key_name = cm_federation_key_name.ExecuteScalar().ToString();
58:
59: cm_first_federation_member_key_value.Parameters["@federationname"].Value = str_federation_name;
60: cm_next_federation_member_key_value.Parameters["@federationname"].Value = str_federation_name;
61:
62: //start from the first member with the absolute minimum value
63: str_next_federation_member_key_value = cm_first_federation_member_key_value.ExecuteScalar().ToString();

In the loop, the app constructs and executes the USE FEDERATION routing statement using the above three properties and through each iteration connect to the next member in line.

67: //construct command to route to next member
68: cm_routing.CommandText = string.Format("USE FEDERATION {0}({1}={2}) WITH RESET, FILTERING=OFF", str_federation_name , str_federation_key_name , str_next_federation_member_key_value);
69:
70: //route to the next member
71: cm_routing.ExecuteNonQuery();

Once the connection to the member is established to the member, query is executed through the DataAdapter.Fill method. The great thing about the DataAdapter.Fill method is that it automatically appends the rows or merges the rows into the DataSet.DataTables so as we iterate over the members, there is no additional work to do in the DataSet.

76: //get results into dataset
77: da_adhocsql.Fill(ds_adhocsql);

Once the execution of the query in the current member is complete, app grabs the range_high of the current federation member. Using this value, in the next iteration the app navigates to the next federation member. The value is discovered through “select cast(range_high as nvarchar) from sys.federation_member_distributions”

86: //get the value to navigate to the next member
87: str_next_federation_member_key_value = cm_next_federation_member_key_value.ExecuteScalar().ToString();

The condition for the loop is defined at line#71. Simply expresses looping until the range_high value returns NULL.

89: while (str_next_federation_member_key_value != String.Empty);

Fairly simple!

What can you do with this tool? Now use the tool to do schema deployments to federations; To deploy schema to all my federation members, simply put a DDL statement in the query window… And it will run it in all federation members;

DROP TABLE language_code_tbl;
CREATE TABLE language_code_tbl(
id bigint primary key,
name nchar(256) not null,
code nchar(256) not null);

I can also maintain reference data in federation members with this tool; Simply do the necessary CRUD to get the data into a new shape or simply delete and reinsert language_codes;

TRUNCATE TABLE language_code_tbl
INSERT INTO language_code_tbl VALUES(1,'US English','EN-US')
INSERT INTO language_code_tbl VALUES(2,'UK English','EN-UK')
INSERT INTO language_code_tbl VALUES(3,'CA English','EN-CA')
INSERT INTO language_code_tbl VALUES(4,'SA English','EN-SA')

Here is how to get the database names and database ids for all my federation members;

SELECT db_name(), db_id()

Here is what the output looks like from the tool;

Here are more ways to gather information from all federation members: This will capture information on connections to all my federation members;

SELECT b.member_id, a.*
FROM sys.dm_exec_sessions a CROSS APPLY sys.federation__member_distributions b

…Or I can do maintenance with stored procedures kicked off in all federation members. For example, here is how to update statistics in all my federation members;

EXEC sp_updatestats

For querying user data: Well, I can do queries that ‘Union All’ the results for me. Something like the following query where I get blog_ids and blog_titles for everyone who blogged about ‘Azure’ from my blogs_federation. By the way, you can find the full schema at the bottom of the post under the title ‘Sample Schema”.

SELECT b.blog_id, b.blog_title
FROM blogs_tbl b JOIN blog_entries_tbl be
ON b.blog_id=be.blog_id
WHERE blog_entry_title LIKE '%Azure%'

I can also do aggregations as long as grouping involves the federation key. That way I know all data that belongs to each group is only in one member. For the following query, my federation key is blog_id and I am looking for the count of blog entries about ‘Azure’ per blog.

SELECT b.blog_id, COUNT(be.blog_entry_title), MAX(be.created_date)
FROM blogs_tbl b JOIN blog_entries_tbl be
ON b.blog_id=be.blog_id
WHERE be.blog_entry_title LIKE '%Azure%'
GROUP BY b.blog_id

However if I would like to get a grouping that does not align with the federation key, there is work to do. Here is an example: I’d like to get the count of blog entries about ‘Azure’ that are created between Aug (8th month) and Dec (12th month) of the years.

SELECT DATEPART(mm, be.created_date), COUNT(be.blog_entry_title)
FROM blogs_tbl b JOIN blog_entries_tbl be
ON b.blog_id=be.blog_id
WHERE DATEPART(mm, be.created_date) between 8 and 12
AND be.blog_entry_title LIKE '%Azure%'
GROUP BY DATEPART(mm, be.created_date)

the output contains a whole bunch of counts for Aug to Oct (8 to 10) form different members. Here is how it looks like;

How about something like the DISTINCT count of languages used for blog comments across our entire dataset? Here is the query;

SELECT DISTINCT bec.language_id
FROM blog_entry_comments_tbl bec JOIN language_code_tbl lc ON bec.language_id=lc.language_id

‘Order By’ and ‘Top’ clauses have similar issues. Here are a few examples. With ‘Order by’ I don’t get a fully sorted output. Ordering issue is very common given most app do fan-out queries in parallel across all members. Here is TOP 5 query returning many more than 5 rows;

SELECT TOP 5 COUNT(*)
FROM blog_entry_comments_tbl
GROUP BY blog_id
ORDER BY 1

I added the CROSS APPLY to sys.federation_member_distributions to show you the results are coming from various members. So the second column in the output below is the RANGE_LOW of the member the result is coming from.

The bad news is there is post processing to do on this result before I can use the results. The good news is, there are many options that ease this type of processing. I’ll list a few and I will explore those further in detail in future;

- LINQ to DataSet offers a great option for querying datasets. Some examples here.

- ADO.Net DataSets and DataTables offers a number of options for processing groupbys and aggregates. For example, DataColumn.Expressions allow you to add aggregate expressions to your Dataset for this type of processing. Or you can use DataTable.Compute for processing a rollup value.

There are others, some went as far as sending the resulting DataSet back to SQL Azure for post processing using a second T-SQL statement. Certainly costly but they argue TSQL is more powerful.

So far we saw that it is fairly easy to iterate through the members and repeat the same query: this is fan-out querying. We saw a few sample queries with ADO.Net that doesn’t need any additional work to process. Once grouping and aggregates are involved, there is some processing to do.

The post won’t be complete without mentioning a class of aggregations that will be harder to calculate: there are the none-additive aggregates. Here is an example; This query calculates a distinct count of languages in blog entry comments per month.

SELECT DATEPART(mm,created_date), COUNT(DISTINCT bec.language_id)
FROM blog_entry_comments_tbl bec JOIN language_code_tbl lc
ON bec.language_id=lc.language_id
GROUP BY DATEPART(mm,created_date)

The processing of distinct is the issue here. Given this query, I cannot do any post processing to calculate the correct distinct count across all members. Style of these queries lend themselves to centralized processing so all comments language ids across days first need to be grouped across all members and then distinct can be calculated on that resultset.

You would rewrite the TSQL portion of the query like this;

SELECT DISTINCT DATEPART(mm,created_date), bec.language_id
FROM blog_entry_comments_tbl bec JOIN language_code_tbl lc
ON bec.language_id=lc.language_id

The output looks like this;

With the results of this query, I can do a DISTINCT COUNT on the grouping of months. If we take the DataTable from the above query, in TSQL terms the post processing query would be like;

SELECT month, COUNT(DISTINCT language_id)
FROM DataTable
GROUP BY month

So to wrap up, fan-out queries is the way to execute queries across your federation members. With fanout queries, there are 2 parts to consider; the part you execute in each federation member and the final phase where you collapse the results from all members to a single result-set. Does this sound familiar; If you were thinking map/reduce you got it right! The great thing about fan-out querying is that it can be done completely in parallel. All the member queries are executed executed by separate databases in SQL Azure. The down side is in certain cases, you need to now consider writing 2 queries instead of 1. So there is some added complexity for some queries. In future I hope we can reduce that complexity. Pls let me know if you have feedback about that and everything else on this post; just comment on this blog post or contact me through the link that says ‘contact the author’.

Thanks and happy 2012!

*Sample Schema
Here is the schema I used for the sample queries above.

-- Connect to BlogsRUs_DB
CREATE FEDERATION Blogs_Federation(id bigint RANGE)
GO
USE FEDERATION blogs_federation (id=-1) WITH RESET, FILTERING=OFF
GO
CREATE TABLE blogs_tbl(
blog_id bigint not null,
user_id bigint not null,
blog_title nchar(256) not null,
created_date datetimeoffset not null DEFAULT getdate(),
updated_date datetimeoffset not null DEFAULT getdate(),
language_id bigint not null default 1,
primary key (blog_id)
)
FEDERATED ON (id=blog_id)
GO
CREATE TABLE blog_entries_tbl(
blog_id bigint not null,
blog_entry_id bigint not null,
blog_entry_title nchar(256) not null,
blog_entry_text nchar(2000) not null,
created_date datetimeoffset not null DEFAULT getdate(),
updated_date datetimeoffset not null DEFAULT getdate(),
language_id bigint not null default 1,
blog_style bigint null,
primary key (blog_entry_id,blog_id)
)
FEDERATED ON (id=blog_id)
GO
CREATE TABLE blog_entry_comments_tbl(
blog_id bigint not null,
blog_entry_id bigint not null,
blog_comment_id bigint not null,
blog_comment_title nchar(256) not null,
blog_comment_text nchar(2000) not null,
user_id bigint not null,
created_date datetimeoffset not null DEFAULT getdate(),
updated_date datetimeoffset not null DEFAULT getdate(),
language_id bigint not null default 1
primary key (blog_comment_id,blog_entry_id,blog_id)
)
FEDERATED ON (id=blog_id)
GO
CREATE TABLE language_code_tbl(
language_id bigint primary key,
name nchar(256) not null,
code nchar(256) not null
)
GO

Avkash Chauhan (@avkashchauhan) recommended that you Assess your SQL Server to SQL Azure migration using SQL Azure Compatibility Assessment Tool by SQL Azure Labs in a 1/4/2012 post:

SQL Azure team announced today about the release of a new experimental cloud service, "SQL Azure Compatibility Assessment”. This tool is a created by SQL Azure Labs and if you are considering moving your SQL Server databases to SQL Azure, you can use this assessment service to check if your database schema is compatible with SQL Azure grammar. This service is very easy to use and does not require Azure account.

"SQL Azure Compatibility Assessment" is an experimental cloud service. It’s aimed at database administrators who are considering migrating their existing SQL Server databases to SQL Azure. This service is super easy to use with just a Windows Live ID. Here are the steps:

Generate a .dacpac file from your database using SQL Server Data Tools (SSDT) CTP4. You can either run SqlPackage.exe or import the database into an SSDT project and then build it to generate a .dacpac. SQL Server 2005, 2008, 2008 R2, 2012 (CTP or RC0) are all supported.

Upload your .dacpac to the "SQL Azure Compatibility Assessment" cloud service and receive an assessment report, which lists the schema objects that are not supported in SQL Azure and that need to be fixed prior to migration.

Learn more about Microsoft Codename "SQL Azure Compatibility Assessment" in this great video:

http://www.microsoft.com/en-us/sqlazurelabs/labs/sqlassessment.aspx

Learn Technical details about "SQL Azure Compatibility Assessment" tool:

http://msdn.microsoft.com/en-us/library/hh550080(v=VS.103).aspx

SqlPackage.exe is a command line utility that automates the following database development tasks:

Extraction: creating a database snapshot (.dacpac) file from a live SQL Azure database

Publishing: incrementally updates a database schema to match the schema of a source database

Report: creates reports of the changes that result from publishing a database schema

Script: creates a Transact-SQL script that publishes a database schema

I intend to use this new tool after I bulk upload my 10+ GB tab-separated text file to SQL Server 2008 R2 in preparation for a move to SQL Azure federations.

MarketPlace DataMarket, Social Analytics and OData

• Robin Van Steenburgh posted Decoding image blobs obtained from OData query services by Arthur Greef on 1/5/2012:

This post on the OData Query Service is by Principal Software Architect Arthur Greef.

Images retrieved using OData query services are returned as base 64 encoded containers. The following code will decode the image container into a byte array that contains just the image.
string base64EncodedString = "<place your base64 encoded blob here>";
byte[] serializedContainer = Convert.FromBase64String(base64EncodedString);
byte[] modifiedSerializedContainer = null;
if (serializedContainer.Length >= 7 &&
serializedContainer.Take(3).SequenceEqual(new byte[] { 0x07, 0xFD, 0x30 }))
{
    modifiedSerializedContainer = serializedContainer.Skip(7).ToArray();
}
File.WriteAllBytes("<image file name here>", modifiedSerializedContainer);

Joe McKendrick asserted “Growing interest in the cloud, and a desire for more vendor independence, may drive REST deeper into the enterprise BPM space” as a deck for his REST will free business process management from its shackles: prediction article of 1/3/2012 for ZDNet’s ServiceOriented blog:

The REST protocol has been known for its ability to link Web processes in its lightweight way, but at least one analyst sees a role far deeper in the enterprise. This year, we’ll see REST increasingly deployed to pull together business processes.

ZapThink’s Jason Bloomberg predicts the rise of RESTful business process management, which will enable putting together workflows on the fly, without the intervention of heavyweight BPM engines.

People haven’t really been employing REST to its full potential, Jason says. REST is more than APIs, it’s an “architectural style for building hypermedia applications,” which are more than glorified Web sites. REST is a runtime workflow, he explains. It changes the way we might look at BPM tools, which typically have been “heavyweight, integration-centric tools that typically rely upon layers of infrastructure.”

“With REST, however, hypermedia become the engine of application state, freeing us from relying upon centralized state engines, instead allowing us to take full advantage of cloud elasticity, as well as the inherent simplicity of REST.”

The shift will take time, beyond 2012. But with growing interest in the cloud, and a desire for more vendor independence, means its time for RESTful BPM.

As goes REST, so goes OData.

Windows Azure Access Control, Service Bus, EAI Bridge, and Cache

• Ralph Squillace complained about Understanding Azure Namespaces, a bleg on 1/6/2012:

I've been busy recently writing sample applications that use Windows Azure Service Bus and Windows Azure web roles and Windows Azure (damn I'm tired of typing brand names now) Access Control Service namespaces. And each time I do, I create a new namespace and often I reuse namespaces for general services in what used to be called AppFabric -- the Service Bus, Access Control, and Caching.

If I'm creating a set of applications, and all I need is access control to that application, I'll need to use an ACS namespace to configure that application as a relying party (which I would call a relying or dependent application, but I don't write the names). But do I use an ACS namespace that is already in use? Do I create a new ACS namespace just for that application? Or are there categories of people or clients that I can group into an ACS namespace, trusted by one or more applications LIKE those used by those people or clients?

I'm curious what you do, but I'm doing more work figuring out the best way to go here from the perspective of the real world. More soon.

• Richard Seroter (@rseroter) wrote Microsoft Previews Windows Azure Application Integration Services and InfoQ published it, together with an interview with Microsoft’s Itai Raz on 1/6/2012:

In late December 2011, Microsoft announced the pre-release of a set of services labeled Windows Azure Service Bus EAI Labs. These enhancements to the existing Windows Azure Service Bus make it easier to connect (cloud) applications through the use of message routing rules, protocol bridging, message transformation services and connectivity to on-premises line of business systems.

Microsoft has the three major components of the Windows Azure Service Bus EAI software. The first is referred to as EAI Bridges. Bridges form a messaging layer between multiple applications and support content-based routing rules for choosing a message destination. While a Bridge hosted in Windows Azure can only receive XML messages via HTTP, it can send its XML output to HTTP endpoints, Service Bus Topics, Service Bus Queues or to other Bridges. Developers can use the multiple stages of an XML Bridge to validate messages against a schema, enrich it with reference data, or transform it from one structure to another.

Transforms are the second component of the Service Bus EAI Labs release and are targeted at developers who need to change the XML structure of data as it moves between applications. To create these transformations that run in Windows Azure, Microsoft is providing a visual XSLT mapping tool that is reminiscent of a similar tool that ships with Microsoft’s BizTalk Server integration product. However, this new XSLT mapper has a richer set of canned operations that developers can use to manipulate the XML data. Besides basic string manipulation and arithmetic operators, this mapping tool also provides more advanced capabilities like storing state in user-defined lists, and performing If-Then-Else assessments when deciding how to populate the destination message. There is not yet word from Microsoft on whether they will support the creation of custom functions for these Transforms.

The final major component of this Labs release is called Service Bus Connect. This appears to build on two existing Microsoft products: the Windows Azure Service Bus Relay Service and the BizTalk Adapter Pack. Service Bus Connect is advertised as a way for cloud applications to securely communicate with onsite line of business systems like SAP, Siebel, and Oracle E-Business Suite as well as SQL Server and Oracle data repositories. Developers create what are called Line of Business Relays that make internal business data or functions readily accessible via secure Azure Service Bus endpoint.

Microsoft has released a set of tools and templates for Microsoft Visual Studio that facilitate creation of Service Bus EAI solutions. A number of Microsoft MVPs authored blog posts that showed how to build projects based on each major Service Bus EAI component. Mikael Hakansson explained how to configure a Bridge that leveraged content-based routing, Kent Weare demonstrated the new XSLT mapping tool, and Steef-Jan Wiggers showed how to use Service Bus Connect to publicly expose an Oracle database.

…

The InfoQ article concludes with the Itai Raz interview. Thanks to Rick G. Garibay for the heads-up.

• Maarten Balliauw (@maartenballiauw) asked How do you synchronize a million to-do lists? in a 1/5/2012 post:

Not this question, but a similar one, has been asked by one of our customers. An interesting question, isn’t it? Wait. It gets more interesting. I’ll sketch a fake scenario that’s similar to our customer’s question. Imagine you are building mobile applications to manage a simple to-do list. This software is available on Android, iPhone, iPad, Windows Phone 7 and via a web browser. One day, the decision to share to-do lists has been made. Me and my wife should be able to share one to-do list between us, having an up-to-date version of the list on every device we grant access to this to-do list. Now imagine there are a million of those groups, where every partner in the sync relationship has the latest version of the list on his device. In often a disconnected world.

How would you solve this?

My take: Windows Azure Service Bus Topics & Subscriptions

According to the Windows Azure Service Bus product description, it “implements a publish/subscribe pattern that delivers a highly scalable, flexible, and cost-effective way to publish messages from an application and deliver them to multiple subscribers.“ Interesting. I’m not going into the specifics of it (maybe in a next post), but the Windows Azure Service Bus gave me an idea: why not put all actions (add an item, complete a to-do) on a queue, tagged with the appropriate “group” metadata? Here’s the producer side:

On the consumer side, our devices are listening as well. Every device creates its subscription on the service bus topic. These subscriptions are named per device and filtered on the SyncGroup metadata. The Windows Azure Service Bus will take care of duplicating messages to every subscription as well as keeping track of messages that have not been processed: if I’m offline, messages are queued. If I’m online, I receive messages targeted at my device:

The only limitation to this is keeping the number of topics & subscriptions below the limits of Windows Azure Service Bus. But even then: if I just make sure every sync group is on the same bus, I can scale out over multiple service buses.

How would you solve the problem sketched? Comments are very welcomed!

• Jim O’Neil continued his series with Photo Mosaics Part 9: Caching Analysis on 1/5/2012:

When originally drafting the previous post in this series, I’d intended to include a short write-up comparing the performance and time-to-completion of a photo mosaic when using the default in-memory method versus Windows Azure Caching. As I got further and further into it though, I realized there’s somewhat of a spectrum of options here, and that cost, in addition to performance, is something to consider. So, at the last minute, I decided to focus a blog post on some of the tradeoffs and observations I made while trying out various options in my Azure Photo Mosaics application.

Methodology

While I’d stop short of calling my testing ‘scientific,’ I did make every attempt to carry out the tests in a consistent fashion and eliminate as many variables in the executions as possible. To abstract the type of caching (ranging from none to in-memory), I added a configuration parameter, named CachingMechanism, to the ImageProcessor Worker Role. The parameter takes one of four values, defined by the following enumeration in Tile.vb
Public Enum CachingMechanism
    None = 0
    BlobStorage = 1
    AppFabric = 2
    InRole = 3
End Enum
The values are interpreted as follows:

None: no caching used, every request for a tile image requires pulling the original image from blob storage and re-creating the tile in the requested size.

BlobStorage: a ‘temporary’ blob container is created to store all tile images in the requested tile size. Once a tile has been generated, subsequent requests for that tile are drawn from the secondary container. The tile thumbnail is generated only once; however, each request for the thumbnail does still require a transaction to blob storage.

AppFabric: Windows Azure Caching is used to store the tile thumbnails. Whether the local cache is used depends on the cache configuration within the app.config file.

InRole: tile thumbnail images are stored in memory within each instance of the ImageProcessor Worker Role. This was the original implementation and is bounded by the amount of memory within the VM hosting the role.

To simplify the analysis, the ColorValue of each tile (a 32-bit integer representing the average color of that tile) is calculated just once and then always cached “in-role” within the collection of Tile objects owned by the ImageLibrary.

For each request, the same inputs were provided from the client application and are shown in the annotated screen shot to the right:

Original image: Lighthouse.jpg from the Windows 7 sample images (size: 1024 x 768)

Image library: 800 images pulled from Flickr’s ‘interestingness’ feed. Images were requested as square thumbnails (default 75 pixels square)

Tile size: 16 x 16 (yielding output image of size 16384 x 12288)

Original image divided into six (6) slices to be apportioned among the three ImageProcessor roles (small VMs); each slice is the same size, 1024 x 128.

These inputs were submitted four times each for the various CachingMechanism values, including twice for the AppFabric value (Windows Azure Caching), once leveraging local (VM) cache with a time-to-live value of six hours and once without using the local cache.

All image generations were completed sequentially, and each ImageProcessor Worker Role was rebooted prior to running the test series for a new CachingMechanism configuration. This approach was used to yield values for a cold-start (first execution) versus a steady-state configuration (fourth execution).

Performance Results

The existing code already records status (in Windows Azure tables) as each slice is completed, so I created two charts displaying that information. The first chart shows the time-to-completion of each slice for the “cold start” execution for each CachingMechanism configuration, and the second chart shows the “steady state” execution.

In each series, the same role VM is shown in the same relative position. For example, the first bar in each series represents the original image slice processed by the role identified by AzureImageProcessor_IN_0, the second by AzureImageProcessor_IN_1, and the third by AzureImageProcessor_IN_2. Since there were six slice requests, each role instance processes two slices of the image, so the fourth bar represents the second slice processed by AzureImageProcessor_IN_0, the fifth bar corresponds to the second slice processed by AzureImageProcessor_IN_1, and the sixth bar represents the second slice processed by AzureImageProcessor_IN_2.

There is, of course, additional processing in the application that is not accounted for by the ImageProcessor Worker Role. The splitting of the original image into slices, the dispatch of tasks to the various queues, and the stitching of the final photo mosaic are the most notable. Those activities are carried out in an identical fashion regardless of the CachingMechanism used, so to simplify the analysis their associated costs (both economic and performance) are largely factored out of the following discussion.

Costs

System at Rest

There are a number of costs for storage and compute that are consistent across the five different scenarios. The costs in the table below occur when the application is completely inactive.

Note, there are some optimizations that could be made:

Three instances of ImageProcessor are not needed if the system is idle, so using some dynamic scaling techniques could lower the cost. That said, there is only one instance each of ClientInterface and JobController, and for this system to qualify for the Azure Compute SLA there would need to be a minimum of six instances, two of each type, bringing the compute cost up about another $87 per month.

Although the costs to poll the queues are rather insignificant, one optimization technique is to use an exponential back-off algorithm versus polling every 10 seconds.

Client Request Costs

Each request to create a photo mosaic brings in additional costs for storage and transactions. Some of the storage needs are very short term (queue messages) while others are more persistent, so the cost for one image conversion is a bit difficult to ascertain. To arrive at an average cost per request, I’d want to estimate usage on the system over a period of time, say a week or month, to determine the number of requests made, the average original image size, and the time for processing (and therefore storage retention). The chart below does list the more significant factors I would need to consider given an average image size and Windows Azure service configuration.

Note there are no bandwidth costs, since the compute and storage resources are collocated in the same Windows Azure data center (and there is never a charge for data ingress, for instance, submitting the original image for processing). The only bandwidth costs would be attributable to downloading the finished product.

I have also enabled storage analytics, so that’s an additional cost not included above, but since it’s tangential to the topic at hand, I factored that out of the discussion as well.

Image Library (Tile) Access Costs

Now let’s take a deeper look at the costs for the various caching scenarios, which draws our focus to the storage of tile images and the associated transaction costs. The table below summarizes the requirements for our candidate Lighthouse image, where:

p is the number of pixels in the original image (786,432),

t is the number of image tiles in the library (800), and

s is the number of slices into which the original images was partitioned (6).

¹ Caching transaction costs cover the initial loading of the tile images from blob storage into the cache, which technically needs to occur only once regardless of the number of source images processed. Keep in mind though that cache expiration and eviction policies also apply, so images may need to be periodically reloaded into cache, but at nowhere near the rate when directly accessing blob storage.

² Assumes 128MB cache, plus the miniscule amount of persistent blob container storage. While 128MB fulfills space needs, other resource constraints may necessitate a larger cache size (as discussed below).

Observations

1: Throttling is real

This storage requirements for the 800 Flickr images, each resized to 16 x 16, amount to 560KB. Even with the serialization overhead in Windows Azure Caching (empirically a factor of three in this case) there is plenty of space to store all of the sized tiles in a 128MB cache.

Consider that each original image slice (1024 x 128) yields 131072 pixels, each of which will be replaced in the generated image with one of the 800 tiles in the cache. That’s 131072 transactions to the cache for each slice, and a total of 768K transactions to the cache to generate just one photo mosaic.

Now take a look at the hourly limits for each cache size, and you can probably imagine what happened when I initially tested with a 128MB cache without local cache enabled.

The first three slices generated fine, and on the other three slices (which were running concurrently on different ImageProcessor instances) I received the following exception:

ErrorCode<ERRCA0017>:SubStatus<ES0009>:There is a temporary failure. Please retry later. (The request failed, because you exceeded quota limits for this hour. If you experience this often, upgrade your subscription to a higher one). Additional Information : Throttling due to resource : Transactions.

In my last post, I mentioned this possibility and the need to code defensively, but of course I hadn’t taken my own advice! At this point, I had three options:

reissue the request directly to the original storage location whenever the exception is caught (recall a throttling exception can be determined by checking the DataCacheException SubStatus for the value DataCacheErrorSubStatus.QuotaExceeded [9]),

wait until the transaction ‘odometer’ is reset the next hour (essentially bubble the throttling up to the application level),

provision a bigger cache.

Well, given the goal of the experiment, the third option was what I went with – a 512MB cache, not for the space but for the concomitant transactions-per-hour allocation. That’s not free “in the real world” of course, and would result in a 66% monthly cost increase (from $45 to $75), and that’s just to process a single image! That observation alone should have you wondering whether using Windows Azure Caching in this way for this application is really viable.

Or should it??… The throttling limitation comes into play when the distributed cache itself is accessed, not if the local cache can fulfill the request. With the local cache option enabled and the current data set of 800 images of about 1.5MB total in size, each ImageProcessor role can service the requests intra-VM, from the local cache, with the result that no exceptions are logged even when using a 128MB cache. Each role instance does have to hit the distributed cache at least once in order to populate its local cache, but that’s only 800 transactions per instance, far below the throttling thresholds.

2: Local caching is pretty efficient

I was surprised at how local caching compares to the “in-role” mechanism in terms of performance; the raw values when utilizing local caching are almost always better (although perhaps not to a statistically significant level). While both are essentially in-memory caches within the same memory space, I would have expected a little overhead for the local cache, for deserialization if nothing else, when compared to direct access from within the application code.

The other bonus here of course is that hits against the local cache do not count toward the hourly transaction limit, so if you can get away with the additional staleness inherent in the local cache, it may enable you to leverage a lower tier of cache size and, therefore, save some bucks!

How much can I store in the local cache? There are no hard limits on the local cache size (other than the available memory on the VM). The size of the local cache is controlled by an objectCount property configurable in the app.config or web.config of the associated Windows Azure role. That property has a default of 10,000 objects, so you will need to do some math to determine how many objects you can cache locally within the memory available for the selected role size.
In my current application configuration, each ImageProcessor instance is a Small VM with 1.75GB of memory, and the space requirement for the serialized cache data is about 1.5MB, so there’s plenty of room to store the 800 tile images (and then some) without worrying about evictions due to space constraints.

3: Be cognizant of cache timeouts

The default time-to-live on a local cache is 300 seconds (five minutes). In my first attempts to analyze behavior, I was a bit puzzled by what seemed to be inconsistent performance. In running the series of tests that exercise the local cache feature, I had not been circumspect in the amount of time between executions, so in some cases the local cache had expired (resulting in a request back to the distributed cache) and in some cases the image tile was served directly from the VM’s memory. This all occurs transparently, of course, but you may want to experiment with the TTL property of your caches depending on the access patterns demonstrated by your applications.

Below is what I ultimately set it to for my testing: 21600 minutes = 6 hours.
<localCache isEnabled="true" ttlValue="21600" objectCount="10000"/>
Timeout policy indirectly affects throttling. An aggressive ttlValue value will tend to increase the number of transactions to the distributed cache (which counts against your hourly transaction limit), so you will want to balance the need for local cache currency with the economical and programmatic complexities introduced by throttling limits.

4: The second use of a role instance performs better

Recall that the role with ID 0 is responsible for both Slice 0 and Slice 3, ID 1 for Slices 1 and 4, and ID 2 for Slices 2 and 5. Looking at the graphs (see callout to the right), note that the second use of each role instance within a specific single execution almost always results in a shorter execution time. This is the case for the first run, where you might attribute the difference to a warm-up cost for each role instance (seeding the cache, for example), but it’s also found in the steady state run and also on scenarios where caching was not used and where the role processing should have been identical.

Unfortunately, I didn’t take the time to do any profiling of my Windows Azure Application while running these tests, but it may be something I revisit to solve the mystery.

5: Caching pits economics versus performance

Let’s face it, caching looks expensive; the entry level of 128MB cache is $45 per month. Furthermore, each cache level has a hard limit on number of transactions per hour, so you may find (as I did) that you need to upgrade cache sizes not because of storage needs, but to gain more transactional capacity (or bandwidth or connections, each of which may also be throttled). The graph below highlights the stair-step nature of the transaction allotments per cache size in contrast with the linear charges for blob transactions.

With blob storage, the rough throughput is about 60MB per second, and given even the largest tile blob in my example, one should conservatively get 1000 transactions per second before seeing any performance degradation (i.e., throttling). In the span of an hour, that’s about 3.6 million blob transactions – over four times as many as needed to generate a single mosaic of the Lighthouse image. While that’s more than any of the cache options would seem to support, four images an hour doesn’t provide any scale to speak of!

BUT, that’s only part of the story: local cache utilization can definitely turn the tables! For sake of discussion, assume that I've configured a local cache ttlValue of at least an hour. With the same parameters as I’ve used throughout this article, each ImageProcessor role instance will need to make only 800 requests per hour to the distributed cache. Those requests are to refresh the local cache from which all of the other 786K+ tile requests are served.

Without local caching, we couldn’t support the generation of even one mosaic image with a 128MB cache. With local caching and the same 128MB cache, we can sustain 400,000 / 800 = 500 ImageProcessor role ‘refreshes’ per hour. That essentially means (barring restarts, etc.) there can be 500 ImageProcessor instances in play without exceeding the cache transaction constraints. And by simply increasing the ttlValue and staggering the instance start times, you can eke out even more scalability (e.g., 1000 roles with a ttlValue of two hours, where 500 roles refresh their cache in the first hour, and the other 500 roles refresh in the second hour).

So, let’s see how that changes the scalability and economics. Below I’ve put together a chart that shows how much you might expect to spend monthly to provide a given throughput of images per hour, from 2 to 100. This assumes an average image size of 1024 x 768 and processing times roughly corresponding to the experiments run earlier: a per-image duration of 200 minutes when CachingMechanism = {0|1}, 75 minutes when CachingMechanism = 2 with no local cache configured, and 25 minutes when CachingMechanism = {2|3} with local cache configured.

The costs assume the minimum number of ImageProcessor Role instances required to meet the per-hour throughput and do not include the costs common across the scenarios (such as blob storage for the images, the cost of the JobController and ClientInterface roles, etc.). The costs do include blob storage transaction costs as well as cache costs.

For the “Caching (Local Disabled)” scenario, the 2 images/hour goal can be met with a 512MB cache ($75), and the 10 images/hour can be met with a 4GB cache. Above 10 images/hour there is no viable option (because of the transaction limitations), other than provisioning additional caches and load balancing among them. For the “Caching (Local Enabled)” scenario with a ttlValue of one hour a 128MB ($45) cache is more than enough to handle both the memory and transaction requirements for even more than 100 images/hour.

Clearly the two viable options here are Windows Azure Caching with local cache enabled or using the original in-role mechanism. Performance-wise that makes sense since they are both in-memory implementations, although Windows Azure Caching has a few more bells-and-whistles. The primary benefit of Windows Azure Caching is that the local cache can be refreshed (from the distributed cache) without incurring additional blob storage transactions and latency; the in-role mechanism requires re-initializing the “local cache” from the blob storage directly each time an instance of the ImageProcessor handles a new image slice. The number of those additional blob transactions (800) is so low though that it’s overshadowed by the compute costs. If there were a significantly larger number of images in the tile library, that could make a difference, but then we might end up having to bump up the cache or VM size, which would again bring the costs back up.

In the end though, I’d probably give the edge to Windows Azure Caching for this application, since its in-memory implementation is more robust and likely more efficient than my home-grown version. It’s also backed up by the distributed cache, so I’d be able to grow into a cache-expiration plan and additional scalability pretty much automatically. The biggest volatility factor here would be the threshold for staleness of the tile images: if they need to be refreshed more frequently than, say, hourly, the economics could be quite different given the need to balance the high-transactional requirements for the cache with the hourly limits set by the Windows Azure Caching service.

Final Words

Undoubtedly, there’s room for improvement in my analysis, so if you see something that looks amiss or doesn’t make sense, let me know. Windows Azure Caching is not a ‘one-size-fits-all’ solution, and your applications will certainly have requirements and data access patterns that may lead you to completely different conclusions. My hope is that by walking through the analysis for this specific application, you’ll be cognizant of what to consider when determining how (or if) you leverage Windows Azure Caching in your architecture.

As timing would have it, I was putting the finishing touches on this post, when I saw that the Forecast: Cloudy column in the latest issue of MSDN Magazine has as its topic Windows Azure Caching Strategies. Thankfully, it’s not a pedantic, deep-dive like this post, and it explores some other considerations for caching implementations in the cloud. Check it out!

Bruno Terkaly (@brunoterkaly) started a Service Bus series with Part 1-Cloud Architecture Series-Durable Messages using Windows Azure (Cloud) Service Bus Queues–Establishing your service through the Portal

Introduction

The purpose of this post is to explain and illustrate the use of Windows Azure Service Bus Queues.

This technology solves some very difficult problems. It allows developers to send durable messages among applications, penetrating network address translation (NAT) boundaries, , or bound to frequently-changing, dynamically-assigned IP addresses, or both. Reaching endpoints behind these types of boundaries in extremely difficult. The Windows Azure Service Bus Queuing technologies makes this challenge very approachable.
There are many applications for this technology. We will use this pattern to implement the CQRS pattern in future posts.

Well known pattern in computer science

These technologies refleofct well known patterns in computer science, such as the "Pub-Sub" or publish-subscribe pattern. This pattern allows senders of messages (Publishers) to send these messages to listeners (Subscribers) without knowing anything about the number of type of subscribers. Subscribers simply express an interest in receiving certain types of messages without knowing anything about the Publisher. It is a great example of loose coupling.

Publish-Subscribe Pattern: http://en.wikipedia.org/wiki/Publish/subscribe

Topics

Windows Azure also implements the concept of "Topics." In a topic-based system, messages are published to "topics" or named logical channels. Subscribers in a topic-based system will receive all messages published to the topics to which they subscribe, and all subscribers to a topic will receive the same messages. The publisher is responsible for defining the classes of messages to which subscribers can subscribe.

Getting Started at the portal

The next few screens will walk you through establishing a namespace at the portal.

Essential Download

To create a Service Bus Queue service running on Azure you’ll need to download the Azure SDK here:

Establishing a namespace for the service bus endpoint

Select “Service Bus, Access Control & Caching” as seen below.

Creating a new service bus endpoint

Click “New”

Providing a namespace, region. Selecting services.

The end result

Summary of information from portal

Thirumalai Muniswamy described Implementing Azure AppFabric Service Bus - Part 2 in a 1/4/2012 post to his Dot Net Twitter blog:

In my last post, I had explained about what Azure App Fabric is, the types and how Relaying Messaging works. From this post, we will see a real time example working with Windows Azure App Fabric - Relaying Messaging.

From the Azure training kit provided by Azure Team has lots of great materials to learn about each concepts of Azure. I hope this example also will add valuable learning material for those who starts implementing Azure App Fabric in their applications.

Implementation Requirement:
We are going to take an example of Customer Maintenance entry screen which has CURD (Create, Update, Read and Delete) operations on customer data. The service will be running in one of the on-premise server and exposed to public using Service Bus. The client Web Application will be running in another/same local system or in the cloud which consume the service to show the list of customers. The user can add a new customer, modify an existing customer by selecting from the customer list or even delete a customer from the list directly. The functionality is simple, but we are looking here how to implement with Service Bus and how the customer data flows from service to cloud and to client.
I am going to use Northwind database configured in my on-premise SQL Server for data storage. So make sure to have Northwind database for implementing this source code. (You can get the database script from here)

Requirements

You required to have Azure Subscription with App Fabric enabled.

Visual Studio 2010 (for development and compilation)

Install Windows Azure SDK for .NET (http://www.windowsazure.com/en-us/develop/net/)

Note: The Service Bus and Caching libraries are included as part of Windows Azure SDK from v1.6. So there is no need of installing additional setup for Azure App Fabric as we have done before.

Creating a Service Bus Namespace
Pls skip this step if you already have namespace created or know how to create.

Step 1: Open Azure Management Portal using https://windows.azure.com and select the Service Bus, Access Control & Caching tab.

The portal will open Service panel in the left hand side.

Step 2: Select the Service Bus node and select the Subscription under which you required to create the Service Bus namespace.

Step 3: Press New menu button in the Service Namespace section. The portal will open Create a New Service Namespace popup window.

Step 4: Make sure Service Bus checkbox selected and provide the input as defined below.

Enter the meaningful namespace in the Namespace textbox and verify the namespace availability with Check Availability.
Note:

The namespace should not contain any special characters such as hyphen (_), dot (.) or any special characters. It can contain any alphanumeric letters and it should starts with only a characters.

The Service Namespace name must be greater than 5 and less than 50 characters in length.

As the namespace should be unique across all the service bus namespace created around the world, it would be better to follow some basic standard to form the namespace in the organization level.
For Example - Create a unique namespace at application level or business level. So, the namespace should be as defined below
<DomainExtention><DomainName><ApplicationName>
For Ex: ComDotNetTwitterSOP
The Service Bus endpoint URI would be
http://ComDotNetTwitterSOP.servicebus.windows.net
Once the Namespace created at application level, we can add other important names through hierarchies at the end of the service bus URI.
http://<DomainExtention><DomainName><ApplicationName>.servicebus.windows.net/<ServiceName>/<EnvironmentName>/<VersionNo>
- The Service Name will specify the name of the service
- The Environment Name will specify whether the namespace deployed for Test, Staging and Prod
- The Version No will specify the version number of the release such as V0100(for Version 1.0), V0102 (for Version 1.2), V01021203 (for V 1.2.1203) etc.,
So the fully qualified namespace URI for Sales Invoice would be
http://ComDotNetTwitterSOP.servicebus.windows.net/SalesInvoice/Test/V0100
This is an example of how a service bus service bus endpoint could be. Please decide as per your organization requirements.

Select the Country / Region (Good to select nearby where most of the consumer will reside and make the service also nearby the Country / Region selected).

Make sure the Subscription selected is correct or select the correct one from the drop down list.

Press the Create Namespace button.

The portal will create the namespace and list under the subscription selected. Initially the namespace status will be Activating… for some time and then will become Active.

Step 5: Select the namespace created, you can see properties panel in the right hand side. Press the View button under the Default Key heading from the properties.

The portal pops up a Default Key window with two values Default Issuer and Default Key. These two values are important to expose our service to cloud. So press the Copy to Clipboard button to copy to clipboard and save somewhere temporarily.

Note: When pressing Copy to Clipboard button, the system might popup Silverlight alert. You can press Yes to copy it to clip board.

Keep in mind

Only one on-premise service can listen on a particular service bus endpoint address. (except NetEventRelayBinding, which is designed to listen multiple services)

When attempting to use same address for multiple services, the first service only will take place. The remaining services will fail.

An endpoint can use part of other endpoint, where as it can’t completely use other endpoint address.
For Ex: When an endpoint defined as http://ComDotnetTwitterSOP.servicebus.windows.net/SalesInvoiceReport/Test/V0100 which can’t used completely by other service bus endpoint. So defining http://ComDotnetTwitterSOP.servicebus.windows.net/SalesInvoiceReport/Test/V0100/01012012 endpoint is wrong. But below is accepted. http://ComDotnetTwitterSOP.servicebus.windows.net/SalesInvoiceReport/Test/V0102

Further steps will be published on next post soon.

Windows Azure VM Role, Virtual Network, Connect, RDP and CDN

Amit Kumar Agrawal posted Remote Desktop with Windows Azure Roles on 12/30/2011 (missed when published):

The Windows Azure SDK 1.3 and later adds the ability to use Remote Desktop Services to access Windows Azure roles. Visual Studio lets you configure Remote Desktop Services from a Windows Azure project. To enable Remote Desktop Services, you must create a working project that contains one or more roles and then publish it to Windows Azure.

Note:This ability to access a Windows Azure role is intended for troubleshooting or development only. The purpose of each virtual machine is to run a specific role in your Azure application and not to run other client applications.

To enable Remote Desktop Services follow these steps

Open Solution Explorer, right-click the name of your project, and then click Publish.

The Deploy Windows Azure project dialog box appears. At the bottom of the dialog box, click the Configure Remote Desktop connections link at the bottom.

The Remote Desktop Configuration dialog box appears. checked the checkbox Enable connections for all roles.
Note :=>Visual Studio is designed to enable or disable Remote Desktop Services for all roles in your project. However, it will write remote desktop configuration information for each role. If you manually modify this information to disable Remote Desktop Service for some roles and not others, Visual Studio will no longer be able to modify the configuration and will display a dialog box that communicates this information.

You can select an existing certificate from the drop-down list or you can create a new one.
Note:=>The certificates that are needed for a remote desktop connection are different to the certificates that are used for other Windows Azure operations. The remote access certificate must have a private key.

To create a new certificate select <create> from the drop-down list.
The Create Certificate dialog box appears.

Type a friendly name for the new certificate and then click OK

To upload this certificate to the Windows Azure Platform Management portal, click View.

Click the Details tab.

To copy this certificate to a file, click Copy to File

To export a private key for this certificate, select Yes, export the private keyand then click Next.

To select the default export file format, click Next

To protect this private key using a password, type a password. Confirm this password and then click Next.

To obtain the path for this certificate file to use to upload the certificate, click Browse and copy the path shown in the Save As dialog box. Type the name of the file for this certificate in File name. Click Save, then click Next.

To create this file, click Finish.

Using the Windows Azure Platform Management portal, upload the certificate for the hosted service that you will connect to with Remote Desktop Services.
Note:=>If you attempt to deploy and you have not uploaded your Remote Desktop certificate to Windows Azure, you will receive an error message and your deployment will fail.

Into the Remote Desktop Configuration dialog box, type a Username and a Password.
Note:=>If the password does not meet the complexity requirements, a red icon will appear next to the password text box. A password that contains a combination of capital letters, lower case letters, and numbers or symbols will pass the complexity requirements.

Choose an account expiration date. The expiration date will automatically block any remote desktop connections when the date passes.

Click OK. A number of settings are added to the .cscfg and .csdef files to enable Remote Access Services.

If you are ready to publish your Windows Azure application, in the Deploy Windows Azure Project dialog box, click OK. If you are not ready to publish, click Cancel. Your Remote Desktop configuration will still be saved, and you can publish your application at a later date.

Once you have published your project to Windows Azure, log on to the Management Portal, and click Hosted Services, Storage Accounts & CDN in the lower right hand corner of the screen.

Click Hosted Services to see the hosted services currently running.

Select your role in the Management portal. In the Remote Access group, ensure that the Enable check box is selected and the Configure button is enabled. This shows that the deployment is enabled for Remote Desktop.

Select the role instance that you want to connect to, which should enable the Connect button in the Management Portal interface.

Click Connect. Your browser will prompt you to download a .RDP file, which you can open on your local computer.

Open the .RDP file and enter the user and password that you set up in the earlier steps. If you are on a domain, you might have to put a \ in front of the username.

You should now be logged into your remote session.

Live Windows Azure Apps, APIs, Tools and Test Harnesses

• Larry Franks (@larry_franks) posted Update: Deploying Ruby Applications to Windows Azure on 1/5/2012 to the (Windows Azure’s) Silver Lining blog:

Back when Brian and I first started this blog, I wrote about several methods for deploying Ruby applications to Windows Azure. There’s a new way to deploy that I wanted to cover before continuing on the testing Ruby Applications on Windows Azure, as this new deployment method is going to be the basis of the next few posts in this line.

During the fall, Windows Azure added a new feature that allowed you to specify the entry point for a role in the ServiceDefinition.csdef file. This means that you could directly specify the executable or script that you wanted to run instead of having to build a .NET wrapper. Steve Marx created an example of using this to run Python on Windows Azure (https://github.com/smarx/pythonrole,) which I’ve adapted into an example for running Ruby on Windows Azure. You can find the example project at https://github.com/Blackmist/rubyrole.

One slight drawback to this example; it requires access to a Windows machine. This is because it relies on utilities provided as part of the Windows Azure SDK. Any version of the SDK listed should probably work, but I’ve only tested with the “Other” and “.NET” versions of the SDK.

Here’s a brief overview of the how this sample works.

ServiceDefinition.csdef

The ServiceDefinition file defines the ports used by this role, per-instance file storage, custom environment variables, startup tasks, and finally the command to run as the program entry point for this role. Specifically, it does the following:

Defines a public TCP port of 80, which is named ‘HttpIn’

Defines per-instance local storage named ‘ruby’

Defines the following startup tasks:

installRuby.cmd - installs Ruby from RubyInstaller.org

installDk.cmd - installs DevKit from RubyInstaller.org

installDependencies.cmd - installs Bundler and then runs ‘bundle install’ to install any gems listed in Gemfile.

Defines the application entry point - run.cmd

Note that the .cmd files specified all live in the \WorkerRole subdirectory of this sample. The path isn’t specified in the ServiceDefinition file entries.

The ‘HttpIn’ and ‘ruby’ resources allocated in this file are normally only accessible to .NET applications, however we can expose them as environment variables. ‘HttpIn’ is queried and used to create the ADDRESS and PORT environment variables, while the full physical path to the storage allocated for ‘ruby’ is stored in the RUBY_PATH environment variable. There’s also an EMULATED environment variable, which is used to determine if the project is running in the Windows Azure Emulator or on Windows Azure.

During initialization, the role will allocate storage and name it ‘ruby’ and open up port 80 as specified by ‘HttpIn’. It will then start running the startup commands. After those have completed, it will run ‘run.cmd’ to launch the application.

Here's the interesting portions of the ServiceDefinition file. For a complete listing of the file see https://github.com/Blackmist/rubyrole/blob/master/ServiceDefinition.csdef.

Port Definition
<Endpoints>
    <InputEndpoint name="HttpIn" protocol="tcp" port="80" />
</Endpoints>
Per-instance LocalStorage
<LocalResources>
    <LocalStorage name="ruby" cleanOnRoleRecycle="true" sizeInMB="1000" />
</LocalResources>
InstallRuby.cmd Startup Task

Note that this defines not only the command line to run, but also the environment variables to create for this task.
<Task commandLine="installRuby.cmd" executionContext="elevated">
     <Environment>
          <Variable name="EMULATED">
              <RoleInstanceValue xpath="/RoleEnvironment/Deployment/@emulated" />
          </Variable>
          <Variable name="RUBY_PATH">
               <RoleInstanceValue xpath="/RoleEnvironment/CurrentInstance/LocalResources/LocalResource[@name='ruby']/@path" />
           </Variable>
     </Environment>
</Task>
RUBY_PATH Variable Definition
<Variable name="RUBY_PATH">
    <RoleInstanceValue xpath="/RoleEnvironment/CurrentInstance/LocalResources/LocalResource[@name='ruby']/@path" />
</Variable>
Application Entry Point
<EntryPoint>
     <ProgramEntryPoint commandLine="run.cmd" setReadyOnProcessStart="true" />
</EntryPoint>
ServiceConfiguration.Cloud.csfg and ServiceConfiguration.Local.csfg

The ServiceConfiguration file specifies service configuration options such as the number of instances to create for your role. That’s all that I’ve really specified for now, however this file can also be used to enable remote desktop functionality for a deployment, as well as diagnostic configuration. Currently the only important value in here is Instances count, which is 2 for cloud and 1 for local.

Why 2 for the cloud? Because Microsoft’s SLA guarantee requires at least two instances of a role. If one is taken down due to hardware failure, resource balancing, etc. you still have another one running. Setting the count to 1 for local is so that we only spin up one instance when we test using the Windows Azure emulator on your local machine.

The cloud version of this file is used when you deploy to the cloud, while the local version is used by the emulator. Here's the cloud version as an example:
<?xml version="1.0"?>
<ServiceConfiguration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" serviceName="RubyRole" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="2" osVersion="*">
  <Role name="WorkerRole">
    <ConfigurationSettings />
    <Instances count="2" />
  </Role>
</ServiceConfiguration>
The Application

The application is contained in the ‘/WorkerRole/app’ directory. For this sample it’s just a basic Sinatra application (app.rb) as follows:
require 'sinatra'

set :server, :thin
set :port, ENV['PORT']
set :bind, ENV['ADDRESS']

get '/' do
    "Hello World!"
end
One important thing to note is that this uses the PORT and ADDRESS environment variables to determine the port and address to listen on. If you’ll look back at the ServiceDefinition.csfg file, these environment variables are populated there based on the 'HttpIn' endpoint. Using these two environment variables allows the application to run correctly based on whether it is ran in the emulator (which may remap the port if you already have something on port 80,) or in Windows Azure.

Typical Workflow

When using this, the typical workflow is:

Create a new web application in the ‘/WorkerRole/app’ directory.

Make any modifications needed to ‘/WorkerRole/run.cmd’ in order to launch the application. E.g ‘call rails s’ instead of ‘ruby app.rb’.

Launch ‘run.cmd’ from the ‘/rubyrole’ directory. This will launch the application in the Windows Azure emulator.

After everything is working as desired, run ‘pack.cmd’ from the ‘/rubyrole’ directory to create the ‘RubyRole.cspkg’ deployment package.

Browse to Windows.Azure.com and login to your subscription, then create a new hosted service. Use the RubyRole.cspkg as the package file and ServiceConfiguration.Cloud.csfg as the configuration file.

The pack.cmd and run.cmd files in the '/rubyrole' directory depend on the cspack.exe and csrun.exe utilities installed by the Windows Azure SDK. Cspack.exe packages up the '/WorkerRole' folder into a deployment package, and csrun.exe launches the package in the Windows Azure Emulator. Here's the code for both:

pack.cmd
@echo off
if "%ServiceHostingSDKInstallPath%" == "" (
    echo Can't see the ServiceHostingSDKInstallPath environment variable. Please run from a Windows Azure SDK command-line (run Program Files\Windows Azure SDK\^<version^>\bin\setenv.cmd^).
    GOTO :eof
)

cspack ServiceDefinition.csdef /out:RubyRole.cspkg
run.cmd
@echo off
if "%ServiceHostingSDKInstallPath%" == "" (
    echo Can't see the ServiceHostingSDKInstallPath environment variable. Please run from a Windows Azure SDK command-line (run Program Files\Windows Azure SDK\^<version^>\bin\setenv.cmd^).
    GOTO :eof
)

cspack ServiceDefinition.csdef /copyOnly /out:RubyRole.csx

csrun RubyRole.csx ServiceConfiguration.Local.cscfg

if "%ERRORLEVEL%"=="0" ( echo Browse to the port you see above to view the app. To stop the compute emulator, use "csrun /devfabric:shutdown" )
Startup Tasks

I'm not going to list the source for all the startup tasks here, but I do want to call out a specific thing to be aware of. This is that when you run the application using the run.cmd command to launch the application in the Windows Azure Emulator, it's actually using the local copy of Ruby installed on your machine, along with whatever gems you have installed. So it's important that these startup tasks be crafted so that they don't clobber your installs when ran in the emulator.

This is the purpose of the EMULATED environment variable; it will only exist when running in the emulator. Here's the source for installRuby.cmd, note that the first thing we do is check if EMULATED exists and exit to prevent from reinstalling Ruby on your local box.
REM Skip Ruby install if we're running under the emulator
if "%EMULATED%"=="true" exit /b 0

REM Strip the trailing backslash (if present)
if %RUBY_PATH:~-1%==\ SET RUBY_PATH=%RUBY_PATH:~0,-1%

cd /d "%~dp0"

REM Download directly from rubyinstaller.org
powershell -c "(new-object System.Net.WebClient).DownloadFile('http://rubyforge.org/frs/download.php/75465/rubyinstaller-1.9.3-p0.exe', 'ruby.exe')"

REM Install Ruby and DevKit
start /w ruby.exe /verysilent /dir="%RUBY_PATH%"

REM Ensure permissive ACLs so other users (like the one that's about to run Ruby) can use everything.
icacls "%RUBY_PATH%" /grant everyone:f
icacls . /grant everyone:f

REM Make sure Ruby was installed properly (will produce a non-zero exit code if not)
"%RUBY_PATH%\bin\ruby" --version
Summary

We’ve gotten to the point that we can deploy Ruby applications to Windows Azure without requiring Visual Studio .NET and a .NET wrapper, but we still require a Windows system for this approach. Hopefully we’ll have a solution for developing and deploying from non-Windows systems one day.

Next week, I’ll demonstrate how to use this sample to run Ruby tests in Windows Azure.

• Himanshu Singh reported Windows Azure and Bing Maps Bring the History of Speilberg’s “War Horse” to Life Through Interactive Website on 1/5/2012:

The latest Steven Spielberg film, War Horse tells the story of a horse’s journey to the World War 1 battlefields of Flanders. To help promote the film and bring its history to life, UK-based developers at Shoothill have developed the website, The War Horse Journey. Using Deep Zoom technology, Bing Maps, and Windows Azure, the War Horse Journey provides users with an interactive exploration of the world they will view when watching the film.

The website is split in two parts: a Gallery, which uses Deep Zoom technology to let the user explore a patchwork of historical images, maps and shots from the film; and a TimeMap, which uses Bing Maps to overlay today’s map of the WWI battle area with historical maps and photos. The site also enables users can also explore videos and exclusive content from the British Imperial War Museum.

“It’s a labour of love,” says Shoothill’s Rod Plummer, “We’re creating a real journey for the user around the fictional horse’s story from the film.”

From a technical perspective, hosting the site on Windows Azure is a “fabulous alternative to conventional hosting,” according to Plummer. “With Windows Azure,” says Plummer, “Microsoft takes care of all the back-end software so we can concentrate on code performance, usability and the visitor’s total experience.”

Visit The War Horse Journey. Read more about this story in this post on the MSDN UK team blog and this post on Shoothill’s company blog.

MarketWire asserted “Delphi [Automotive] Unveils New MyFi Connecting With Safety Vehicle at 2012 CES; Delphi Uses Microsoft Windows Azure to House and Deliver Vehicle Content to Drivers” in an introduction to its Delphi [Automotive] MyFi(tm) Systems Connect With Safety and Provide Access to Online Services Through Cloud-Based Portal press release of 1/4/2012:

Harnessing deep infotainment, user experience and safety expertise in a fully integrated system, Delphi Automotive … is demonstrating the possibilities of its MyFi(tm) Connecting with Safety vision at the 2012 Consumer Electronics Show (Las Vegas Convention Center, CP26), January 10-13 in Las Vegas.

"Delphi has been at the forefront of developing innovative infotainment technologies since 1936 when we integrated the first radio in a production automobile's dashboard," said Jugal Vijayvargiya, general director of Delphi's Infotainment & Driver Interface product business unit. "In today's environment, innovation means creating infotainment solutions that provide the ultimate user experience while enhancing driver and passenger safety."

Recognizing the importance of drivers being focused on the driving task as well as the desire of consumers to be entertained, informed and connected 24/7, Delphi has applied its vehicle integration expertise to offer vehicle manufacturers connected systems that are designed to help maximize safety on the road.

MyFi systems allow drivers to enjoy the information and entertainment they expect while keeping their eyes on the road and hands on the wheel. Using voice recognition, text-to-speech, large touch-screens, reconfigurable displays and workload management technology, the connected systems tailor information available to drivers depending upon the driving environment. When the vehicle is in park, more information is available to users than when it is in drive. Additionally, when data from safety sensors is linked -- and certain unsafe conditions are detected such as lane drift, stopped traffic ahead and driver drowsiness -- audible and visual warnings redirect the driver's attention and if necessary, automatic braking is engaged.

At CES, Delphi will unveil its newest MyFi feature, a cloud-based portal that uses Windows Azure to bring OEMs and consumers a unique global connectivity solution. Delphi selected Windows Azure for its agility, efficiency and global reach. With Azure, Microsoft manages content storage and delivery, while Delphi provides the unique convenience features of its cloud-based solution. [Emphasis added.]

"Windows Azure allows us to rapidly respond to customer needs and to deploy back-end services efficiently," noted Doug Welk, chief engineer of Delphi Advanced Infotainment and Driver Interface. "Its global reach is helping Delphi to support customers worldwide and to provide contemporary, new value -- the ability of the vehicle to exchange data with the web."

"Imagine the opportunities this type of system makes possible," added Vijayvargiya. "Not only will it provide the functionality consumers expect today, it will enable much more. With this system's flexibility, vehicle owners can add new apps and functionality throughout vehicle life. Drivers can personalize the display of the vehicle's instrument cluster from home, selecting from multiple designs or even creating their own, arrange features and controls to make them function effectively in a way that wasn't possible before, and check diagnostics such as tire pressure, engine health and brake life from a portable tablet or smart phone. You can even create and store personalized accounts for different drivers. When companies like Delphi and Microsoft work together, the possibilities are limitless."

Unique Cohesive Value Showcasing advanced automotive electronics that work together to enhance virtually every aspect of the driving experience, Delphi has merged safety, connectivity and user experience technology in an integrated vehicle system that provides the latest connectivity features, permits a remarkable range of communication, manages driver workload and mitigates driver distraction.

"Delphi's MyFi Connecting with Safety vehicle creates the optimal user experience," concluded Vijayvargiya. "It exemplifies our vision: a world where it is simple for drivers to interact with their vehicles from wherever they are, where being connected and entertained does not compromise safety and where drivers are always focused on the driving task."

Interviews at CES Delphi will have technical specialists and business leaders available for interviews at CES. Please contact linda.s.ferries@delphi.com for additional information.

About Delphi Delphi is a leading global supplier of electronics and technologies for automotive, commercial vehicle and other market segments. Operating major technical centers, manufacturing sites and customer support facilities in 30 countries, Delphi delivers real-world innovations that make products smarter and safer as well as more powerful and efficient. Connect to innovation at www.delphi.com

The names of actual companies and products mentioned herein may be the trademarks of their respective owners.

SOURCE: Delphi Corporation

Delphi Automotive is a spinoff from General Motors Corp.

Visual Studio LightSwitch and Entity Framework 4.1+

• Beth Massi (@bethmassi) posted LightSwitch Community & Content Rollup– December 2011 on 1/5/2012:

Happy New Year everyone! I hope you all had a wonderful holiday and enjoyed some well-deserved time off. I sure did! Now that I’m back in the office I’m a few days overdue for this post so here ya go! A few months ago I started posting a rollup of interesting community happenings, content, samples and extensions popping up around Visual Studio LightSwitch. If you missed those rollups you can check them out here:

LightSwitch Community & Content Rollup–September 2011

LightSwitch Community & Content Rollup–October 2011

LightSwitch Community & Content Rollup–November 2011

Usually December is a pretty slow month all around, but there were still a lot of awesome things around LightSwitch, especially the number of community articles this month! Check them out…

“LightSwitch Star” Contest

Do you have what it takes to be a LightSwitch Star? Show us your coolest, most productive, LightSwitch business application and you could win a Laptop and other great prizes!

In November The Code Project launched the “LightSwitch Star” contest. You just answer some questions and email them a screenshot or two. They’re looking for apps that show off the most productivity in a business as well as apps that use extensions in a unique, innovative way. Prizes are given away each month (see here for more details on prizes). In early December they announced the November winners and today they announced the December winners in each category:

Most Efficient Business Application

November

1st prize: Security Central
2nd prize: PTA LightSwitch

December
1st prize: Patient Tracker
2nd prize: Instant Assets
Most Ground-breaking Business Application

November

1st prize: Church+
2nd prize: Engineering App

December
1st prize: Health and Safety Management System
2nd prize: Orion

Congrats to all! Can’t wait to see what people submit in January. There were some really cool applications submitted in December. Here’s a breakdown of what was submitted last month (see all 23 apps that have been submitted so far here). Make sure to vote for your favorite ones, grand prize is a laptop!

7 More Production Apps…

Instant Assets - The one stop shop for all your asset, equipment and facilities management needs

Patient Tracker - used to interface to a SharePoint list (used as the dataset) to provide timestamp and delay code inputs to the SharePoint list.

PeopleTrac – People Management for Not-For-Profit Organizations - Demonstrates several LightSwitch capabilities including WCF-RIA services, Native SQL, Name and Addres merge purge

The Health and Safety Management System - The Health and Safety Management System is an application that is used to help organizations better manage and mitigate health and safety statistics and KPIs.

Orion - A mini-CRM that helps with POS records, service request, field technicians workload, inventory, inventory tracking, fuel management and reports.

Search SSRS Subscriptions – Manages subscriptions to SQL Server Reporting Services

WindowsDevNews.com: A Visual Studio LightSwitch Application – WindowsDevNews.com site uses Visual Studio LightSwitch for it's back-end

3 More Tutorials/Samples/Videos…

An Advanced Visual Studio LightSwitch Application

Health and Safety Management System (Tutorial)

YouTube Video - Using LightSwitch with MySQL

ComponentOne Scheduler for LightSwitch Released

ComponentOne released another LightSwitch extension last month in addition to the OLAP Control. This ready-to-use LightSwitch screen gives you a complete Outlook-style scheduling application! Check it out:

Live Demo!

Product Details

Notable Content this Month

Here’s some more of the fun things the team and community released in December.

Extensions released in December (see all 71 of them here!):

Background Group Layout Extension

ComponentOne Scheduler for LightSwitch

Extended Text Box Extension

Image Button Extension

Light Brown Theme

Spursoft Context Menu Commands

Build your own extensions by visiting the LightSwitch Extensibility page on the LightSwitch Developer Center.

Team Articles:

In December I kicked off a series aimed at beginner developers just getting started with LightSwitch and was featured in the MSDN Flash Newsletter. This brought A LOT of traffic to my blog – in fact, it is at an all-time high since the month of December! WOW! This series is popular!

Part 0: Getting Started Resources

Part 1: What’s in a Table? Describing Your Data

Part 2: Feel the Love. Defining Data Relationships

Part 3: Screen Templates, Which One Do I Choose?

Part 4: Too much information! Sorting and Filtering Data with Queries

Part 5: May I? Controlling Access with User Permissions

Part 6: I Feel Pretty! Customizing the "Look and Feel" with Themes

Matt Thalman also wrote the following article that address a top request in the LightSwitch Forums:
Creating a Custom Login Page for a LightSwitch Application

Community Articles:

A Groundbreaking Control - ComponentOne Scheduler LightSwitch Extension

Extending LightSwitch TextBox : Walk-Through

Help Desk: An Advanced Visual Studio LightSwitch Application

Levels of Validation and Filtering on User Membership

LightSwitch: increasing the client-side timeout in code (VB)

LightSwitch and the MEF story

Metadata driven development, the Holy Grail of software development

Microsoft LightSwitch – Championing the Citizen Developer

Microsoft LightSwitch – Code Project LightSwitch Star Contest

Microsoft LightSwitch – Maintaining a Primary Child Entity

Microsoft LightSwitch – Multi Select Checklist

Microsoft LightSwitch – Simple but Effective Application Defaults

The LightSwitch JetPack Theme

Using the ComponentOne FlexGrid Control in Visual Studio LightSwitch

Using the Infragistics Reporting with OData In Visual Studio LightSwitch

WindowsDevNews.com: A Visual Studio LightSwitch Application

Samples (see all of them here):

Background Group Layout Extension - Source Code

Extended Text Box Extension - Source Code

How to Open Properties Window for Default Project with Macro

LightSwitch Image Button Extension – Walkthrough

LightSwitch Team Community Sites

The Visual Studio LightSwitch Facebook Page has been increasing in activity thanks to you all. Become a fan! Have fun and interact with us on our wall. Check out the cool stories and resources.

Also here are some other places you can find the LightSwitch team:

LightSwitch MSDN Forums
LightSwitch Developer Center
LightSwitch Team Blog
LightSwitch on Twitter (@VSLightSwitch, #VisualStudio #LightSwitch)

Join Us!

The community has been using the hash tag #LightSwitch on twitter when posting stuff so it’s easier for me to catch it (although this is a common English word so you may end up with a few weird ones ;-)). Join the conversation! And if I missed anything please add a comment to the bottom of this post and let us know!

Sumit Sarkar described Getting started with Microsoft Visual Studio LightSwitch to Oracle, Sybase, and DB2 in a 1/3/2012 post to Progress DataDirect blog:

Interest in Microsoft Visual Studio LightSwitch is definitely trending up based on recent data access consulting projects. Organizations are getting their feet wet with SQL Server for a POC, and then expanding to other systems running Oracle, DB2, and Sybase. From these conversations, there is a buzz about Microsoft LightSwitch’s ability to quickly modernize proven tools such as MS Access, MS Excel, Lotus, etc. that were not necessarily designed for easy cloud or desktop deployments.

Benefits of using Connect for ADO.NET Entity Framework providers with Microsoft Visual Studio LightSwitch:

Quickly develop and deploy LightSwitch applications for Oracle, DB2 iSeries and Sybase.

Installation and deployment is easy!

Absolutely no database client is required!

Review the advantages of running DataDirect Connect for ADO.NET for your next project.

The preliminary steps are:

1. Download and install a free 15-day trial today of the Progress DataDirect Connect for ADO.NET Oracle provider.

2. Download and install GA version of LightSwitch with latest patch. Note that there are known issues with pre-GA/Beta releases that have been addressed by Microsoft.

3. Launch Visual Studio LightSwitch

4. Attach to external Data Source

5. Choose ‘Oracle Database’ for DataDirect Connect for ADO.NET Entity Framework provider (note: Sybase and DB2 are listed under ‘Progress DataDirect’)

6. Enter connection information to data source.

6. Click on Advanced Tab and disable distributed transactions to support non SQL Server connections.

7. Click OK to create new external data source.

8. Create a new screen, and specify the newly created data source.

Once you have the data connectivity established with Progress DataDirect, you can find additional tutorials on the web:

http://www.lightswitchtutorial.com/2010/08/lightswitch-tutorial-creating-a-basic-application/

Corrected a few instances of Lightswitch to LightSwitch.

Return to section navigation list>

Windows Azure Infrastructure and DevOps

• The SearchCloudComputing Staff posted The cloud market in 2012: Through the eyes of experts (including me) on 1/6/2012:

SearchCloudComputing.com staff asked our contributors to share their thoughts on the year ahead. In this round-up, we asked our cloud experts these two questions:

What changes can we expect in the cloud computingmarket in 2012?

Which cloud vendors will be the big winners of 2012?

So now that we’ve swept up all the glitter and confetti, put away the noisemakers and broken most of our New Year’s resolutions, let’s take a look at what possible ups and downs lie ahead for the cloud computing industry.

…

ROGER JENNINGS
Big data grew legs in late 2011 with major cloud providers offering beta versions of Apache Hadoop and MapReduce implementations, most of which will mature to fully supported products in 2012.

Microsoft’s SQL Server team announced the availability of Community Technology Preview (CTP) of a Hadoop-based service on Windows Azure as a private preview. The CTP includes a Hive add-in and Hive ODBC Driver to connect with the Azure-based Hadoop service, as well as a JavaScript library for writing MapReduce programs. The Hive ODBC Driver lets analysts work with PowerPivot and PowerView tools, while the Hive add-in for [64-bit] Excel [2010] enables issuing Hive queries and manipulating Hadoop results in Excel.

Microsoft’s SQL Azure Labs also released a private CTP of Codename Data Explorer, which is designed to discover syndicated big data sets from the Widows Azure Marketplace Datamarket, formerly codenamed “Dallas.” Data Explorer lets analysts link information of interest with structured and unstructured Web resources in custom mashups. The CTP offers a prebuilt Data Explorer desktop client with a workspace and Office plug-in for Excel.

In addition, the SQL Azure Labs team released a trial version of Codename Social Analytics, which enables initial users to analyze real-time Twitter streams of tweets about Bill Gates or Windows 8 for quantitative“buzz” and user sentiment.

The Social Analytics Engagement client is based on Microsoft Research’s QuickView advanced search service and sentiment analysis technology. Sentiment analysis has been on Microsoft Research’s radar for many years. Its eXtreme Computing Group (XCG) released a public CTP of the Daytona MapReduce runtime for Windows Azure project in July 2011. Daytona competes with Amazon Web Service's Elastic Map Reduce, Apache Foundation's Hadoop Map Reduce, MapR's Apache Hadoop distribution and Cloudera Enterprise Hadoop.

With all this big data analytics activity occurring at the end of 2011, it’s a safe bet that at least a few of these and similar offerings from Microsoft’s competitors will release to the Web in early 2012.

…

Read other contributors takes here.

Paul MacDougall asserted “Look for Linux and Hadoop twists as Microsoft bolsters its cloud platform. Some observers say the changes could move Azure into more direct competition with Amazon Web Services” in a deck for his Microsoft Azure In 2012: Watch Out Amazon? article of 1/4/2012 for Information Week:

Microsoft plans to bolster Windows Azure in the coming months as it looks to fill gaps in the cloud OS. Redmond's goal: to make Azure a more compelling environment for mission-critical enterprise applications and services while reducing migration hassles.

Observers suggest the changes will go so far as to move Azure from platform as a service (PaaS) into a more comprehensive infrastructure as a service (IaaS) play, a market where Amazon dominates with Amazon Web Services. PaaS typically offers users and developers a plug-and-play environment for apps, but key choices, like underlying OS and database, are limited. IaaS provides raw computing power, and users get more choices, but also more responsibilities.

Microsoft is reportedly making changes to the way Azure implements virtual machines (VMs) so that it can accommodate a wider variety of software--even Linux, which could run on top of Azure in a VM. Azure's present VM role is extremely limited by the fact that it does not offer a persistent state, meaning that data is lost in the event of a reboot, failover, or other interruption.

The addition of a so-called persistent VM to Azure, which would effectively create a hypervisor in the sky, means businesses could in theory upload VMs running Linux, SharePoint, SQL Server, or other "stateful" applications. Microsoft is said to be preparing a Community Technology Preview (CTP) of such features that could roll out soon. Company officials wouldn't comment.

Benjamin Day, principal at Benjamin Day Consulting in Boston and a Microsoft MVP, told me that he is "extremely confident" that Azure will gain persistent VM capabilities, although he is not certain when. "It is killer and it's going to be really valuable," Day said. "The fact that it has not been available has been just awful, because the Amazon platform has done it for years."

Day said the technology would allow businesses to upload virtually any job or application they are running in Hyper-V in Windows Server to Azure, making the service a more practical and potentially less costly option for many organizations. "This is where Azure has to go to be competitive," he said.

Microsoft, and this is confirmed, is also rolling out a CTP of Apache Hadoop for Azure. The idea is to make Azure a service that can handle so-called Big Data--large data sets that businesses collect from everything from call centers to electronic smart sensors embedded in their products.

The company has added tools atop Hadoop that allow users to set up and configure the framework on Azure "in a few hours instead of days" according to Val Fontama, the company's senior product manager for SQL Server, in a blog post.

Microsoft has also added JavaScript libraries that let programmers write JavaScript programs for MapReduce, the Google-inspired distributed computing framework that's at the foundation of Hadoop, and access them on Azure through a browser. The idea is to make JavaScript "a first-class programming language in Hadoop," said Fontama.

"These improvements reduce the barrier to entry by enabling customers to easily deploy and explore Hadoop on Windows," said Fontama. The Hadoop-on-Azure CTP will also offer an add-in for Hive, which layers data warehousing capabilities onto Hadoop. That will give users a way to interact with Hadoop data through Excel and Microsoft business intelligence tools. For programmers unfamiliar with the Hadoop environment, Microsoft has conveniently added its Metro interface, borrowed from Windows Phone and Windows 8, over Hadoop tools.

Azure, launched two years ago, is key to Microsoft's cloud strategy, but the company is cagey about how many business customers it's attracting. To be sure, there are some high-profile wins like Boeing, Toyota, and Fujitsu, but Redmond won't say how many users Azure has in total, nor does it break out revenue for the service. Customer fees vary depending on the amount of compute and storage resources consumed.

With the changes coming to Azure in 2012, Microsoft "wants to make sure you have no excuses for avoiding their platform," Day said. Amazon, which says it's got 20,000 active customers on AWS CloudFront, should take notice.

Steve Plank (@plankytronixx) pointed to a new Microsoft white paper in his The Cloud Skills Debate post of 1/4/2012:

The notion of computing in the cloud naturally attracts the interest of developers but a number of IT Pros see it as a threat. In a lot of cases the core skills transfer directly to cloud technologies but some IT Pros might want to develop new skills to meet the demand of the new set of roles that are emerging. There is a de-emphasis on patching, upgrading and general service management and a greater emphasis on monitoring and integration with the core on-premises infrastructure plus a greater understanding of virtualisation and hybrid clouds.

A new white paper on cloud skills might be of help.

From the Cloud Computing: What IT Professionals Need to know whitepaper’s introduction:

Cloud computing promises new career opportunities for IT professionals. In many cases, existing core skill sets transfer directly to cloud technologies. In other instances, IT pros need to develop new skill sets that meet the demand of emerging cloud job roles.

Companies that consider moving to cloud computing will want to educate their IT professionals about the potential opportunities ahead so that they can build staff capabilities and skills ahead of the change. Chief Information Officers (CIO) who want to generate more business value from IT by necessity have to be in the front line of cloud skills education — both for themselves and to build training capacity for their IT staff.

The emerging cloud world offers those with the capability to build and grow their portfolio of skills. This paper explores the advantages of moving to the cloud and outlines the delta skill sets IT pros will want to acquire. It describes what the cloud offers and how it applies to and impacts existing infrastructure, including such issues as cost, security, data control, and integrity.

Stephen O’Grady (@sogrady) posted Revisiting the 2011 Predictions, Part 1 to his tecosystems (Redmonk) blog on 1/4/2012. Following are his 2011 predictions related to Windows Azure:

Predicting is an easier business than it once was. True, technology is hysterically accelerating rates of change and disruption, but that’s only relevant if the substance of your predictions matters. Which all too often, these days, it doesn’t. Analysts and pundits are able to prognosticate with relative impunity; who has the time to go back and check their accuracy? Pageview driven models, in fact, reward wilder predictions because the error cost is, generally, approaching zero. Unless you predicted, say, that Linux would be killed off by Windows NT, nobody will remember later.

I find value in reviewing my annual predictions, however. If they prove correct, that’s useful. If they were not, understanding the reasons why is important to adjusting our models moving forward.

Because I made the mistake of making better than a dozen predictions last year, this year’s review will be delivered in two parts. Part 1, below, will cover my predictions for browsers, the cloud, data, developers and programming language frameworks. Part 2, covering predictions within hardware, mobile, NoSQL, open source and programming languages, will hit tomorrow.

If you’d prefer to read last year’s first, they can be found here. …

Cloud

PaaS Adoption Will Begin to Show Traction, With Little Impact on IaaS Traction

The first Platform-as-a-Service providers essentially asked developers to trade choice for development speed. Like Ruby on Rails – itself the basis for multiple first generation PaaS platforms – PaaS was built for those that would embrace constraints. But PaaS platforms never saw the type of growth that Rails experienced, in part because of the further loss of control that the cloud represents. It’s one thing to have a web framework like Rails dictate the way that you build web applications; having PaaS platforms also choose the operating system, database, version control systems and more was too much.

Which is why second and third generation PaaS providers have furiously removed barriers to entry, adding additional runtimes, open sourcing the underlying platform and allowing you to pick your provider. Which, in turn, is why adoption of PaaS is accelerating. VMware CEO Paul Maritz calls PaaS “the 21st-century equivalent of Linux,” which explains not only why they feel compelled to compete in the space, but also why Red Hat might.

Virtually every vendor in this space is reporting growth similar to the Hacker News trajectories for Cloud Foundry and Openshift (below).

In spite of the growth of PaaS, however, none of the metrics we track reflect any decline in usage of general infrastructure platforms. Quite the contrary, in fact.

I count this as a hit. …

Data

…

Hadoop Will Become the MySQL of Big Data

EMC, HP, IBM, NetApp and even Oracle all have Hadoop – or in EMC’s case, MapReduce – plays in market. Microsoft actually deprecated its own Dryad initiative in favor of the Apache project. Players from AsterData to CouchBase to EnterpriseDB to MarkLogic to Tableau to Vertica have purpose built Hadoop connectors. The commerical distribution space, once essentially owned solely by Cloudera, has expanded to multiple third parties with varying points of differentiation. (Emphasis added.)

Hadoop interest elsewhere, meanwhile, has not slowed.

Need I say more about the growing ubiquity of Hadoop? I count this as a hit. …

Frameworks

Node.js Will Continue its Growth Trajectory

October was a rough month for Node.js, with posts like Node.js is Cancer and node.js Is VB6 – Does node.js Suck? following the tradition of March reddit discussions like Is NodeJS Wrong? The Trough of Disillusionment, it seemed, had arrived well ahead of schedule.

Except that interest metrics showed no commensurate decline. Node took – again – three of the Top 5 spots in inbound search queries within RedMonk Analytics. Which is unsurprising against the backdrop of Google’s Insights for Search numbers.

Over on GitHub, meanwhile, which itself has achieved dramatic growth, Node.js is the second most popular watched repository, ahead of Rails, jQuery, HTML5-Boilerplate, and Homebrew. Microsoft clearly perceives this growth, because it has worked with Joyent to create a stable build of Node for Windows which in turn led to an SDK for Azure.

All of which means nothing except that Node’s growth trajectory continues.

I count this as a hit.

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

• Mary Jo Foley (@maryjofoley) asserted “Microsoft has a big System Center 2012 and private-cloud strategy event on tap for January 17” in a deck for her Microsoft to detail new private cloud and System Center strategy report of 1/5/2012 for ZDNet’s All About Microsoft blog:

Mark your calendars, private cloud fans. On January 17, Microsoft is holding a webcast to “lay out the new System Center 2012 vision.”

On hand for the two-hour event will be Satya Nadella, President of Microsoft’s Server and Tools Division, as well as Brad Anderson, Corporate Vice President of Management and Security. It starts at 8:30 a.m. PT/11:30 a.m. ET and will be broadcast here.

This could be the day that Microsoft announces the release to manufacturing of the 10 or so products that comprise the System Center 2012 family. Last year, Microsoft was predicting that these products would RTM before the end of calendar 2011, but that didn’t happen.

Here’s a chart from the company’s TechEd conference in May of last year that shows the original schedule for the various point products:

(click on image above to enlarge)

Microsoft officials said last year that the plan was to “launch” System Center 2012 in the early part of calendar 2012. “Launch” most likely means make generally available, in this case, I’d think. I’ve heard from several of my contacts that Microsoft’s plan is to time the launch around April, which is when the Softies will be holding their annual Microsoft Management Summit.

To date, Microsoft has sold the System Center point products independently. I’ve been hearing rumblings that the company may be about to change this strategy and sell one offering called System Center 2012 that includes all 10 or so individual pieces in a single bundle. …

Cloud Security and Governance

David Linthicum (@DavidLinthicum) asserted “The Stop Online Piracy Act could oblige cloud providers to harm their own customers” in a deck for his How SOPA threatens the move to the cloud article of 1/4/2012 for InfoWorld’s Cloud Computing blog:

The Stop Online Piracy Act (SOPA), introduced in the U.S. House of Representatives in late October, would allow the Department of Justice and copyright holders to seek court orders to block payment processors and online advertising networks from doing business with foreign sites accused of infringing copyright.

If this thing passes, we could see court orders that bar search engines from linking to the allegedly infringing sites. Or most interesting, it would require domain name registrars to stop resolving queries that direct traffic to those sites -- and even require Internet service providers to block subscriber access to sites accused of infringing.

As you may expect, nobody likes this act due to the potential for abuse. Indeed, in the last 30 days we've seen the Internet in an uproar. This includes a movement to boycott Go Daddy, which has now changed its tune on SOPA from supporting to not supporting.

Although few cloud providers have chimed in on this controversial issue, we did hear from Lanham Napier, CEO of Rackspace:

SOPA would require that Rackspace and other Internet service providers censor their customers with little in the way of due process, trumping the protections present in the current Digital Millennium Copyright Act. What's more, the SOPA bill would seriously disrupt the Domain Name Service that is crucial to the smooth operation of the Web.

The bottom line is that this legislation sounds like a good idea for those who make a living by providing copyrighted content. However, giving government the power to pull domains and block access could lead to instances where the innocent are caught up in a legal mess they can't afford to fight -- without due process.

On this issue, cloud providers such as Rackspace, Amazon.com, and Microsoft are between the proverbial rock and a hard place. They'll be forced to carry out these actions. But as a result -- at a time when businesses are moving data and content to the cloud -- SOPA would provide a reason not to trust cloud computing providers for fear one day the providers would be legally obliged to turn off the users' business, without a day in court.

Cloud Computing Events

• Eric Nelson (@ericnel) reported a FREE seminar on Windows Azure February 7th, Basingstoke on 1/5/2012:

ICS Solutions are one of the partners helping us deliver www.sixweeksofazure.co.uk which starts January 23rd. They also deliver regular free seminars on Windows Azure and the next takes place on February 7th 2012. Mark Hirst (an old friend of mine) is delivering it and hence it promises to be great.

The seminar will cover

What is Cloud Computing?

Why Microsoft?

What is Azure?

How can I use Azure Today?

Demonstration

Case Studies

What are the costs?

What are the benefits?

What should I be doing next?

The seminar will also cover the updates from the Microsoft Professional Developers Conference in October 2010.

Check out the details.

P.S. If I’m free i hope to pop along for part of the day as well – so do say hi if you see me there.

• Ilan Rabinovitch reported DevOps Day Los Angeles will be returning to SCALE later this month on January 20th in a 1/4/2012 email message from devops@googlegroups.com:

The event is organized by the local DevOps meetups and
co-hosted at the annual Southern California Linux Expo.
DevOps Day Los Angeles is a one day event discussing topics around
bridging the development and operations divide. The event will be
held at the LAX Hilton as a lead in to the Southern California Linux
Expo (Jan 20-22). The event will include a combination of
presentations, lightening talks, and panels.

DevOps Day Los Angeles
January 20, 2012
LAX Hilton Hotel - Los Angeles, CA
https://www.socallinuxexpo.org/scale10x/events/devops-day-la

Just a few of this year's sessions will include:

nVentory - Your Infrastructure's Source of Truth (Christopher Nolan,
Darren Dao, Jeff Roberts - eHarmony)

Panel: Monitoring Sucks! (John Vincent, Simon Jakesch, and others)

3 Myths and 3 Challenges to Bring System Administration out of the
Dark Ages (Mark Burgess)

The DevOps Newly Wed Game!

and many more....

Following DevOps Day, SCALE will be held on January 21st and 22nd. The
SCALE schedule is available at:

https://www.socallinuxexpo.org/scale10x/schedule

We expect to have sessions recorded and live streamed. We will post a
link closer to the event for those interested in attending / watching
remotely.

Video and slides from our last event held in February 2011 event are
available online at:

www.socallinuxexpo.org/scale9x/video-audio-slides

We hope you will be able to join us at the LAX Hilton at the end of
the month.

Regards,

Ilan Rabinovitch
ilan@fonz.net

Himanshu Singh asked At CodeMash Next Week? Don’t Miss The Training On Social Games using Windows Azure and HTML5 in a 1/4/2012 post to the Windows Azure blog:

If you plan to attend CodeMash next week in Sandusky, Ohio, don’t miss the full-day pre-compiler session, Building Social Games using HTML5 with Windows 8, Windows Azure, and .NET, which will cover the development of social games and feature Windows Azure.

In the morning, Nathan Totten along with members of the Windows Azure engineering and evangelism teams will be delivering short presentations on how to get started with building social games using Windows Azure, Windows 8 and leveraging .NET, Node.JS and HTML5. You’ll hear from experts Nathan Totten, Nick Harris, Jennifer Marsman, Wade Wegner, Brian Prince, and Glenn Block.

In the afternoon, we’ll let you get hands-on with these technologies building your own social game while having experts in the room to help you make your ideas real. We’ll have resources available, including a walkthrough of building your first social game with Windows Azure.

If you plan to join us for the hands-on portion, it will be helpful to have the prerequisite software listed below installed on your laptop. Happy coding!

Prerequisite

Windows Azure Camps Kit: The Windows Azure Camps Training Kit uses the new Content Installer to install all prerequisites, hands-on labs and presentations that are used for the Windows Azure Camp events.

Optional

If you’d like to work on any of the following scenarios, you will want to download the appropriate Toolkit ahead of time.

Windows Azure Toolkit for Social Games: The Windows Azure Toolkit for Social Games is a set of guidance, samples, and tools that helps developers quickly get started building a casual or social game on Windows Azure.

Windows Azure Toolkit for Windows Phone: The Windows Azure Toolkit for Windows Phone is designed to make it easier for you to build mobile applications that leverage cloud services running in Windows Azure.

Windows Azure Toolkit for Windows 8: This toolkit includes Visual Studio project templates for a sample Metro style app and a Windows Azure cloud project.

Manual Install

If you’d like to manually install the tools and SDKs, you can use the following links:

Visual Studio 2010 Express (or higher)

Windows Azure SDK for .NET – November 2011 (or higher)

IIS: Tracing

SQL Server 2008 R2 Management Studio Express with SP1

Click here to learn more about this session.

Other Cloud Computing Platforms and Services

Matty Noble described How Collections Work in the AWS SDK for Ruby in a 1/4/2012 post to the AWS blog:

Today we have a guest blog post from Matty Noble, Software Development Engineer, SDKs and Tools Team.

- rodica

We've seen a few questions lately about how to work with collections of resources in the SDK for Ruby, so I'd like to take a moment to explain some of the common patterns and how to use them. There are many different kinds of collections in the SDK. To keep thing simple, I'll focus on Amazon EC2, but most of what you'll see here applies to other service interfaces as well.

Before we do anything else, let's start up an IRB session and configure a service interface to talk to EC2:
$ irb -r rubygems -r aws-sdk
> ec2 = AWS::EC2.new(:access_key_id => "KEY", :secret_access_key => "SECRET")
There are quite a few collections available to us in EC2, but one of the first things we need to do in any EC2 application is to find a machine image (AMI) that we can use to start instances. We can manage the images available to us using the images collection:
> ec2.images
=> <AWS::EC2::ImageCollection>
When you call this method, you'll notice that it returns very quickly; the SDK for Ruby lazy-loads all of its collections, so just getting the collection doesn't do any work. This is good, because often you don't want to fetch the entire collection. For example, if you know the ID of the AMI you want, you can reference it directly like this:
> image = ec2.images["ami-310bcb58"]
 => <AWS::EC2::Image id:ami-310bcb58>
Again, this returns very quickly. We've told the SDK that we want ami-310bcb58, but we haven't said anything about what we want to do with it. Let's get the description:
> image.description
 => "Amazon Linux AMI i386 EBS"
This takes a little longer, and if you have logging enabled you'll see a message like this:
[AWS EC2 200 0.411906] describe_images(:image_ids=>["ami-310bcb58"])
Now that we've said we want the description of this AMI, the SDK will ask EC2 for just the information we need. The SDK doesn't cache this information, so if we do the same thing again, the SDK will make another request. This might not seem very useful at first -- but by not caching, the SDK allows you to do things like polling for state changes very easily. For example, if we want to wait until an instance is no longer pending, we can do this:
> sleep 1 until ec2.instances["i-123"].status != :pending
The [] method is useful for getting information about one resource, but what if we want information about multiple resources? Again, let's look at EC2 images as an example. Let's start by counting the images available to us:
> ec2.images.to_a.size
[AWS EC2 200 29.406704] describe_images()
 => 7677
The to_a method gives us an array containing all of the images. Now, let's try to get some information about these images. All collections include Enumerable, so we can use standard methods like map or inject. Let's try to get all the image descriptions using map:
> ec2.images.map(&:description)
This takes a very long time. Why? As we saw earlier, the SDK doesn't cache anything by default, so it has to make one request to get the list of all images, and then one request for each returned image (in sequence) to get the description. That's a lot of round trips -- and it's mostly wasted effort, because EC2 provides all the information we need in the response to the first call (the one that lists all the images). The SDK doesn't know what to do with that data, so the information is lost and has to be re-fetched image by image. We can get the descriptions much more efficiently like this:
> AWS.memoize { ec2.images.map(&:description) }
AWS.memoize tells the SDK to hold on to all the information it gets from the service in the scope of the block. So when it gets the list of images along with their descriptions (and other information) it puts all that data into a thread-local cache. When we call Image#description on each item in the array, the SDK knows that the data might already be cached (because of the memoize block) so it checks the cache before fetching any information from the service.

We've just scratched the surface of what you can do with collections in the AWS SDK for Ruby. In addition to the basic patterns above, many of our APIs allow for more sophisticated filtering and pagination options. For more information about these APIs, you can take a look at the extensive API reference documentation for the SDK. Also don't hesitate to ask questions or leave feedback in our Ruby development forum.

A note about AWS.memoize

AWS.memoize works with both EC2, IAM and ELB; we'd like to extend it to other services, and we'd also like to hear what you think about it. Is the behavior easy to understand? Does it work well in practice? Where would this feature be most beneficial to your application?

Jeff Barr (@jeffbarr) announced Help Wanted - Manager and Senior Developers for new AWS Media Product (for Amazon Instant Videos?) on 1/4/2012:

We are staffing up a brand-new AWS team to take advantage of some really interesting opportunities in the digital media space.

This team is being launched from an existing product. This particular product has seen exponential growth in the size of its user base on an annualized (run rate) basis, along with 30x revenue growth over the same period.

In order to address this opportunity, we need to hire a Senior Development Manager and multiple Senior Developers ASAP. You'll be able to start from scratch, building a large-scale distributed application as part of the Seattle-based team.

The team is currently searching for developers at our "SDE III" level. Successful applicants for an SDE III position typically have six or more years of development experience in the industry, along with a BS or MS in Computer Science. They are able to solve large problems in the face of ambiguity, and are able to work on the architecture and the code. They will also have launched projects of significant complexity in the recent past, and have the ability to balance technical complexity with business value.

The developers on this team will drive the architecture and the technology choices. They'll need to have a broad knowledge of emerging technologies and will know the ins and outs of Java, C or C++, and Linux or Windows. They will also have significant experience with networking, multi-threaded applications, interprocess communication, and the architecture of fault-tolerant systems.

We also need a senior manager of software development to build and run this team of top-performers. The manager will establish a project framework and will also be responsible for putting the right development practices in to place. The manager will also be responsible for providing technical leadership and guidance to the team. Well-qualified applicants will have been managing teams of developers for four or more years, and will have shipped one or more highly available large-scale internet applications.

If you are qualified for one of these positions and you would like to apply, please send your resume to newawsproject@amazon.com.