Sunday, December 12, 2010

Windows Azure and Cloud Computing Posts for 12/11/2010+

A compendium of Windows Azure, Windows Azure Platform Appliance, SQL Azure Database, AppFabric and other cloud-computing articles.

• Updated 12/12/2010 with articles marked

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

To use the above links, first click the post’s title to display the single article you want to navigate.

Cloud Computing with the Windows Azure Platform published 9/21/2009. Order today from Amazon or Barnes & Noble (in stock.)

Read the detailed TOC here (PDF) and download the sample code here.

Discuss the book on its WROX P2P Forum.

See a short-form TOC, get links to live Azure sample projects, and read a detailed TOC of electronic-only chapters 12 and 13 here.

Wrox’s Web site manager posted on 9/29/2009 a lengthy excerpt from Chapter 4, “Scaling Azure Table and Blob Storage” here.

You can now freely download by FTP and save the following two online-only PDF chapters of Cloud Computing with the Windows Azure Platform, which have been updated for SQL Azure’s January 4, 2010 commercial release:

  • Chapter 12: “Managing SQL Azure Accounts and Databases”
  • Chapter 13: “Exploiting SQL Azure Database's Relational Features”

HTTP downloads of the two chapters are available for download at no charge from the book's Code Download page.

Tip: If you encounter articles from MSDN or TechNet blogs that are missing screen shots or other images, click the empty frame to generate an HTTP 404 (Not Found) error, and then click the back button to load the image.

Azure Blob, Drive, Table and Queue Services

Anton Staykov posted Windows Azure Storage Tips on 12/11/2010:

image Windows Azure is a great platform. It has different components (like Compute, Storage, SQL Azure, AppFabric) which can be used independently. So for example you can use just Windows Azure Storage (be it Blob, Queue or Table) without even using Compute (Windows Azure Roles) or SQL Azure or AppFabric. And using just Windows Azure Storage is worthy. The price is very competitive to other cloud storage providers (such as Amazon S3).

imageTo use Windows Azure Storage from within your Windows Forms application you just need to add reference to the Microsoft.WindowsAzure.StorageClient assembly. This assembly is part of Windows Azure SDK.

O.K. Assuming you have created a new Windows Forms application, you added reference to that assembly, you tried to create your CloudStorageAccount using the static Parse or TryParse method, and you try to build your application. Don’t be surprised, you will get following error (warning):

Warning 5 The referenced assembly "Microsoft.WindowsAzure.StorageClient, Version=, Culture=neutral, PublicKeyToken=31bf3856ad364e35, processorArchitecture=MSIL" could not be resolved because it has a dependency on "System.Web, Version=, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a" which is not in the currently targeted framework ".NETFramework,Version=v4.0,Profile=Client". Please remove references to assemblies not in the targeted framework or consider retargeting your project.   

And you will not be able to build.

Well, some of you may not know, but with the Service Pack 1 of .NET Framework 3.5, Microsoft announced a new concept, named “.NET Framework Client Profile” which is available for .NET Framework 3.5 SP 1 and .NET Framework 4.0. The shorter version of what Client Profile is follows:

The .NET Framework 4 Client Profile is a subset of the .NET Framework 4 that is optimized for client applications. It provides functionality for most client applications, including Windows Presentation Foundation (WPF), Windows Forms, Windows Communication Foundation (WCF), and ClickOnce features. This enables faster deployment and a smaller install package for applications that target the .NET Framework 4 Client Profile.

For the full version – check out the inline links.

What to do in order to use Microsoft.WindwsAzure.StorageClient from within our Windows Forms application – go to project Properties and from “Target Framework” in “Application” tab select “.NET Framework 4” and not the “* Client Profile” one:


The gotcha, is that the default setting for Visual Studio is to use Client Profile of the .NET Framework. And you cannot choose this option from the “New Project” wizard, and all new projects you create are targeting the .NET Framework Client Profile (if you choose a .NET Framework 4 or 3.5 project template).

Jerry Huang described Convenient SQL Server Backup to Amazon S3 or Windows Azure Storage in this 12/10/2010 post to his Gladinet blog:

Every SQL Server admin knows how to backup SQL Server. It is a very mature technology since SQL Server 2005 because it has built-in support for Volume Shadow Copy Service. It knows how to take a consistent snapshot of the database and produce the snapshot as files for backup applications to consume and store.

With Gladinet Cloud Backup, SQL Server Backup to Amazon S3 or any other supported Cloud Storage Services such as Windows Azure or Open Stack is as easy as set-it-and-forget-it, while restore is as simple as point and click. (check Gladinet’s homepage for supported cloud storage services)

Get Cloud Backup

You can have cloud backup in two different ways. The first one is a standalone installer package. The second one is an optional add-on for Gladinet Cloud Desktop. This tutorial will use Gladinet Cloud Desktop add-on for illustration purpose.

Mount S3 Virtual Directory

The next step will involve mounting Amazon S3 as a virtual folder.


Snapshot Backup

SQL Server Backup is available in the Snapshot Backup add-on for Gladinet Cloud Desktop. It is also available in the stand-alone Cloud Backup too.


SQL Server Backup is in the Backup by Application section. Application means those applications that support volume shadow copy service.


The SQL Server will show up and you can pick the database that you want to backup. (If SQL Server doesn’t show up, read this MSDN article about SQL Writer, most likely the SQL VSS Writer Service is not started)


Next step you will pick Amazon S3 as the destination.


This is it, the task will run in the background and backup the SQL Server to Amazon S3.

Before we go into the restore, first drop some data on the database to simulate some data lost or corruption case.

Now Restore

First Step is to pick the backup destination since it is now the restore source.


Now select the snapshot you need.


This is it, the database will be restored, you don’t need to restart SQL Server and it continues to run.

Recommended Posts

<Return to section navigation list> 

SQL Azure Database and Reporting

Steve Peschka continued his CASI (Claims, Azure and SharePoint Integration) Kit series with part 6, Integrating SQL Azure with SharePoint 2010 and Windows Azure, of 12/12/2010:

This post is most useful when used as a companion to my five-part series on the CASI (Claims, Azure and SharePoint Integration) Kit.

  • Part 1: an introductory overview of the entire framework and solution and described what the series is going to try and cover.
  • Part 2: the guidance part of the CASI Kit. It starts with making WCF the front end for all your data – could be datasets, Xml, custom classes, or just straight up Html. In phase 1 we take your standard WCF service and make it claims aware – that’s what allows us to take the user token from SharePoint and send it across application or data center boundaries to our custom WCF applications. In phase 2 I’ll go through the list of all the things you need to do take this typical WCF application from on premises to hosted up in Windows Azure. Once that is complete you’ll have the backend in place to support a multi-application, multi-datacenter with integrated authentication.
  • Part 3: describes the custom toolkit assembly that provides the glue to connect your claims-aware WCF application in the cloud and your SharePoint farm. I’ll go through how you use the assembly, talk about the pretty easy custom control you need to create (about 5 lines of code) and how you can host it in a page in the _layouts directory as a means to retrieve and render data in web part. The full source code for the sample custom control and _layouts page will also be posted.
  • Part 4: the web part that I’m including with the CASI Kit. It provides an out of the box, no-code solution for you to hook up and connect with an asynchronous client-side query to retrieve data from your cloud-hosted service and then display it in the web part. It also has hooks built into it so you can customize it quite a bit and use your own JavaScript functions to render the data.
  • Part 5: a brief walk-through of a couple of sample applications that demonstrate some other scenarios for using the custom control you build that’s described in part 3 in a couple of other common scenarios. One will be using the control to retrieve some kind of user or configuration data and storing it in the ASP.NET cache, then using it in a custom web part. The other scenario will be using the custom control to retrieve data from Azure and use it a custom task; in this case, a custom SharePoint timer job. The full source code for these sample applications will also be posted.

With the CASI Kit I gave you some guidance and tools to help you connect pretty easily and quickly to Windows Azure from SharePoint 2010, while sending along the token for the current user so you can make very granular security decisions. The original WCF application that the CASI Kit consumed just used a hard-coded collection of data that it exposed. In a subsequent build (and not really documented in the CASI Kit postings), I upgraded the data portion of it so that it stored and retrieved data using Windows Azure table storage. Now, I’ve improved it quite a bit more by building out the data in SQL Azure and having my WCF in Windows Azure retrieve data from there. This really is a multi-cloud application suite now: Windows Azure, SQL Azure, and (ostensibly) SharePoint Online. The point of this post is really just to share a few tips when working with SQL Azure so you can get it incorporated more quickly into your development projects.

Integration Tips

1. You really must upgrade to SQL 2008 R2 in order to be able to open SQL Azure databases with the SQL Server Management Studio tool. You can technically make SQL Server 2008 work, but there are a bunch of hacky work around steps to make that happen. 2008 R2 has it baked in the box and you will get the best experience there. If you really want to go the 2008 work-around route, check out this link: There are actually a few good tips in this article so it’s worth a read no matter what.

2. Plan on using T-SQL to do everything. The graphical tools are not available to work with SQL Azure databases, tables, stored procedures, etc. Again, one thing I’ve found helpful since I’m not a real T-SQL wiz is to just create a local SQL Server database first. In the SQL Management tool I create two connections – one to my local SQL instance and one to my SQL Azure instance. I create tables, stored procedures, etc. in the local SQL instance so I can use the graphical tools. Then I use the Script [whatever Sql object] As…CREATE to…New Sql Query Window. That generates the SQL script to create my objects, and then I can paste it into a query window that I have open to my SQL Azure database.

3. This one’s important folks – the default timeout is frequently not long enough to connect to SQL Azure. You can change it if using the .NET SqlClient classes in the connection string, i.e. add "Connection Timeout=30;". If you using SQL Server Management Studio then click the Advanced button on the dialog where you enter the server name and credentials and change it to at least 30. The default is 15 seconds and fails often, but 30 seconds seems to work pretty well. Unfortunately there isn’t a way to change the connection timeout for an external content type (i.e. BDC application definition) to a SQL Azure data source.

4. Don't use the database administrator account for connecting to your databases (i.e. the account you get to create the database itself). Create a separate account for working with the data. Unfortunately SQL Azure only supports SQL accounts so you can't directly use the identity of the requesting user to make decisions about access to the data. You can work around that by using a WCF application that front-ends the data in SQL Azure and using claims authentication in Windows Azure, which is exactly the model the CASI Kit uses. Also, it takes a few steps to create a login that can be used to connect to data a specific database. Here is an example:

--create the database first


--now create the login, then create a user in the database from the login


CREATE USER CustomerReader FROM LOGIN CustomerReader

--now grant rights to the user

GRANT INSERT, UPDATE, SELECT ON dbo.StoreInformation TO CustomerReader

--grant EXECUTE rights to a stored proc

GRANT EXECUTE ON myStoredProc TO CustomerReader

For more examples and details, including setting server level rights for accounts in SQL Azure, see

5. You must create firewall rules in SQL Azure for each database you have in order to allow communications from different clients. If you want to allow communication from other Azure services, then you must create the MicrosoftServices firewall rule (which SQL Azure offers to do for you when you first create the database), which is a start range of to If you do not create this rule none of your Windows Azure applications will be able to read, add or edit data from your SQL Azure databases! You should also create a firewall rule to allow communications with whatever server(s) you use to route to the Internet. For example, if you have a cable or DSL router or an RRAS server, then you want to use your external address or NAT address for something like RRAS. 

Those should be some good tips to get your data up and running.

Data Access

Accessing the data itself from your custom code – Windows Azure WCF, web part, etc. – is fortunately pretty much exactly the same as when retrieving data from an on-premise SQL Server. Here’s a quick code example, and then I’ll walk through a few parts of it in a little more detail:

//set a longer timeout because 15 seconds is often not

//enough; SQL Azure docs recommend 30

private string conStr = ";Database=Customers;" +

"user ID=CustomerReader;Password=FooBarFoo;Trusted_Connection=False;Encrypt=True;" +

"Connection Timeout=30";

private string queryString = "select * from StoreInformation";

private DataSet ds = new DataSet();

using (SqlConnection cn = new SqlConnection(conStr))


SqlDataAdapter da = new SqlDataAdapter();

da.SelectCommand = new SqlCommand(queryString, cn);


//do something with the data


Actually there’s not really here that needs much explanation, other than the connection string. The main things worth pointing out on it that are possibly different from a typical connection to an on-premise SQL Server are:

  • server: precede the name of the SQL Azure database with “tcp:”.
  • Trusted_Connection: this should be false since you’re not using integrated security
  • Encrypt: this should be true for any connections to a cloud-hosted database
  • Connection Timeout: as described above, the default timeout is 15 seconds and that will frequently not be enough; instead I set it to 30 seconds here.

One other scenario I wanted to mention briefly here is using data migration. You can use the BCP tool that comes with SQL Server to move data from an on-premise SQL Server to SQL Azure. The basic routine for migrating data is like this:

1. Create a format file – this is used for both exporting the local data and importing it into the cloud. In this example I’m creating a format file for the Region table in the Test database, and saving it to the file region.fmt.

bcp Test.dbo.Region format nul -T -n -f region.fmt

2. Export data from local SQL – In this example I’m exporting out of the Region table in the Test database, into a file called RegionData.dat. I’m using the region.fmt format file I created in step 1.

bcp Test.dbo.Region OUT RegionData.dat -T -f region.fmt

3. Import data to SQL Azure. The main thing that’s important to note here is that when you are importing data into the cloud, you MUST include the “@serverName” with the user name parameter; the import will fail without it. In this example I’m importing data to the Region table in the Customers database in SQL Azure. I’m importing from the RegionData.dat file that I exported my local data into. My server name (the -S parameter) is the name of the SQL Azure database. For my user name (-U parameter) I’m using the format username@servername as I described above. I’m telling it to use the region.fmt format file that I created in step 1, and I’ve set a batch size (-b parameter) of 1000 records at a time.

bcp Customers.dbo.Region IN RegionData.dat -S -U -P FooBarFoo -f region.fmt -b 1000

That’s all for this post folks. Hopefully this gives you a good understanding of the basic mechanics of creating a SQL Azure database and connecting to it and using it in your SharePoint site. As a side note, I used the CASI Kit to retrieve this data through my Windows Azure WCF front end to SQL Azure and render it in my SharePoint site. I followed my own CASI Kit blog when creating it to try and validate everything I’ve previously published in there and overall found it pretty much on the mark. There were a couple of minor corrections I made in part 3 along with a quick additional section on troubleshooting that I added. But overall it took me about 30 minutes to create a new custom control and maybe 15 minutes to create a new Visual web part.

I used the CASI Kit web part to display one set of SQL Azure data from the WCF, and in the Visual web part I created an instance of the custom control programmatically to retrieve a dataset, and then I bound it to an ASP.NET grid. I brought it all together in one sample page that actually looks pretty good, and could be extended to displaying data in a number of different ways pretty easily. Here’s what it looks like:


Steve is a Microsoft Lead Architect for Portals.

Changir Biyikoglu described  How to scale out an app with SQL Azure Federations… The Quintessential Sales DB with Customer and Orders in a 12/11/2010 post:

image Continuing on the SQL Azure Federation theme with this post, I’ll cover an important question; what would it take to build applications with the SQL Azure Federations. So lets walk that path… We’ll take the quintessential sales app with customers and orders. I’ll simplify the sample and schema to get you through the headlines. But, we won’t lose any fidelity to the basic relationships in the data model. To recap the sample; as you would expect from a sales db, customers have many orders and orders have many order details (line items). We’ll focus on implementation of 2 scenarios; get orders and order details for a given customer and get all orders of all customers.

Step 1 – Decide the Federation Key

imagePicking the right federation key is the most critical decision you will need to make. Simply because the federation key is something that the application needs to understand and will be tough thing to change later. How do you pick the federation key? Apps that fit the sharding pattern will have most of their workload focused on a subset of data. With the sample sales app, scenarios mostly revolve around a single customer and its orders. Customers log in place or view their orders or change demographic info etc. Some apps may not have a slam dunk choice like customer_id or may have more than one dimension to optimize. If you have more than one natural choice for a federation key, you will need to think about which specific workload you want to optimize to take a pick. We’ll cover techniques for optimizing multiple dimensions in the advanced class.

It will also be critical to pick a federation key that spreads the workload to many nodes instead of concentrating the workload. After picking a federation key, you don’t want all your load to be concentrated at a single federation member or node. In the sales app, customer id as a federation key help distribute the load across to many nodes in the sales app. One more important factor is the size of the largest federation key instance. Remember that atomic units cannot be split any further. So Atomic units, that is federation key instances, cannot span federation members. When you pick the federation key, the computational capacity requirements of the largest federation key instance should safely fit into a single SQL Azure database.

Step 2 – Denormalize the Schema

With the federation key is set, next step is to modify your schema a little to deploy it into SQL Azure federations. It is a fairly simple change; we need to carry the federation key to all tables that need to be federated. In the sales apps, customer and orders tables naturally have customer_id as the column. However order_details table also need to contain the customer_id column before the schema can be deployed. This is required for us to know how to redistribute your data when a SPLIT command is issued. The script below walks through how to deploy the schema to the SQL Azure federation member.

  1: -- Connect to 'master'
  3: GO
  5: -- CONNECT to 'SalesDB' and Create Federation
  7: GO
  9: -- Connect to ‘Order_Federation’ federation member covering customer_id 0.
 10: USE FEDERATION Orders_Federation(0) WITH RESET
 11: GO
 13: CREATE TABLE orders(
 14:   Customer_id bigint,
 15:   Order_id bigint,
 16:   Order_date datetime,
 17:   primary key (order_id, customer_id))
 18: FEDERATE ON (customer_id)
 19: GO
 21: -- note that customerid – federation key - needs to be part of all unique indexes but not noneunique indexes.
 22: CREATE INDEX o_idx1 on orders(odate)
 23: GO
 25: -- create a federated table 
 26: CREATE TABLE orderdetails(
 27:   customerid bigint,
 28:   orderdetailid bigint,
 29:   orderid bigint,
 30:   partid bigint,
 31:   primary key (orderdetailid, customerid))
 32: FEDERATE ON (customerid)
 33: GO
 35: ALTER TABLE orders 
 36: ADD CONSTRAINT orders_uq1 UNIQUE NONCLUSTERED (orderid,customerid)
 37: GO
 39: ALTER TABLE orderdetails 
 40: ADD CONSTRAINT orderdetails_fk1 FOREIGN KEY(orderid,customerid) REFERENCES orders(orderid,customerid)
 41: GO
 43: -- reference table
 44: CREATE TABLE uszipcodes(zipcode nvarchar(128) primary key, state nvarchar(128))
 45: GO

Step 3 – Make Application Federation Aware

For the large part of the workload that is focused on the atomic unit, life is fairly simple. Your app needs to issue an additional statement to rewire the connection to the right atomic unit. Here is the example statement that connect to customer_id 55.

  1: USE FEDERATION Orders_Federation(55) WITH RESET
  2: GO

Once you are connected, you can safely operate on data that belongs to this customer_id value 55. Even if SPLITs were to redistribute the data, you don’ tneed to remember a new physical database name or do anything different in your connection string. You connection string still connects to the database containing the federation and then issue the USE FEDERATION statement to go to customer_id 55. We will guarantee to land you in the correct federation member. Here is a dead simple implementation of GetData for a classic sales app with Orders and OrderDetails tables. Interesting thing to note here is that the connection string points to the SalesDB database and that the net new code here is the lines between line 13 and 17 below.

  1: private void GetData(long customerid)
  2: {
  3:   long fedkeyvalue = customerid;
  5:   // Connection
  6:   SqlConnection connection = new SqlConnection(@"Server=tcp:sqlazure;Database=SalesDB;User ID=mylogin@myserver;Password=mypassword");
  7:   connection.Open();
  9:   // Create a DataSet.
 10:   DataSet data = new DataSet();
 12:   // Routing to Specific Customer
 13:   using (SqlCommand command = connection.CreateCommand())
 14:   {
 15:     command.CommandText = "USE FEDERATION orders_federation(" + fedkeyvalue.ToString() + ") WITH RESET";
 16:     command.ExecuteNonQuery();
 17:   }
 19:   // Populate data from orders table to the DataSet.
 20:   SqlDataAdapter masterDataAdapter = new SqlDataAdapter(@"select * from orders where (customerid=@customerid1)", connection);
 21:   masterDataAdapter.SelectCommand.Parameters.Add("@customerid1", SqlDbType.BigInt);
 22:   masterDataAdapter.SelectCommand.Parameters[0].Value = customerid;
 23:   masterDataAdapter.Fill(data, "orders");
 25:   // Add data from the orderdetails table to the DataSet.
 26:   SqlDataAdapter detailsDataAdapter = new SqlDataAdapter(@"select * from orderdetails where (customerid=@customerid1)", connection);
 27:   detailsDataAdapter.SelectCommand.Parameters.Add("@customerid1", SqlDbType.BigInt);
 28:   detailsDataAdapter.SelectCommand.Parameters[0].Value = customerid;
 29:   detailsDataAdapter.Fill(data, "orderdetails");
 31:   connection.Close();
 32:   connection.Dispose();
 34:   …
 35: }

For parts of the application that needs to fan out queries, in future versions there will be help from SQL Azure. However in v1, you will need to hand craft the queries by firing multiple queries to multiple federation members. If you are simply appending or union-ing resultset across your atomic units, you need to iterate through the federation members. This won’t be too hard. Here is the GetData function expanded to include the scenario where all customers across all federation members are displayed. The function assumes the special customer_id=0 returns all customers so T-SQL is adjusted to enable that through the added @customerid2 parameter in the where clause ; where (customerid=@customerid1 or @customerid2=0). The net new code here is the loop that wraps lines 14 to 39. Lines 35 to 39 basically grab the value required to rewire the connection to the next federation member to repeat the commands until the range high of NULL is hit. NULL is the special value used to represent the maximum value of federation key in the sys.federation_member_columns system view.

  1: private void GetData(long customerid)
  2: {
  3:   long? fedkeyvalue = customerid;
  5:   // Connection
  6:   SqlConnection connection = new SqlConnection(@"Server=tcp:sqlazure;Database=SalesDB;User ID=mylogin@myserver;Password=mypassword");
  7:   connection.Open();
  9:   // Create a DataSet.
 10:   DataSet data = new DataSet();
 12:   do
 13:   {
 14:     // Routing to Specific Customer
 15:     using (SqlCommand command = connection.CreateCommand())
 16:     {
 17:       command.CommandText = "USE FEDERATION orders_federation(" + fedkeyvalue.ToString() + ") WITH RESET";
 18:       command.ExecuteNonQuery();
 19:     }
 21:     // Populate data from orders table to the DataSet.
 22:     SqlDataAdapter masterDataAdapter = new SqlDataAdapter(@"select * from orders where (customerid=@customerid1 or @customerid2=0)", connection);
 23:     masterDataAdapter.SelectCommand.Parameters.Add("@customerid1", SqlDbType.BigInt);
 24:     masterDataAdapter.SelectCommand.Parameters[0].Value = customerid;
 25:     masterDataAdapter.SelectCommand.Parameters.Add("@customerid2", SqlDbType.BigInt);
 26:     masterDataAdapter.SelectCommand.Parameters[1].Value = customerid;
 27:     masterDataAdapter.Fill(data, "orders");
 29:     // Add data from the orderdetails table to the DataSet.
 30:     SqlDataAdapter detailsDataAdapter = new SqlDataAdapter(@"select * from orderdetails where (customerid=@customerid1 or @customerid2=0)", connection);
 31:     detailsDataAdapter.SelectCommand.Parameters.Add("@customerid1", SqlDbType.BigInt);
 32:     detailsDataAdapter.SelectCommand.Parameters[0].Value = customerid;
 33:     detailsDataAdapter.SelectCommand.Parameters.Add("@customerid2", SqlDbType.BigInt);
 34:     detailsDataAdapter.SelectCommand.Parameters[1].Value = customerid;
 35:     detailsDataAdapter.Fill(data, "orderdetails");
 37:     using (SqlCommand command = connection.CreateCommand())
 38:     {
 39:       command.CommandText = "select cast(range_high as bigint) from sys.federation_member_columns";
 40:       fedkeyvalue = (long)command.ExecuteScalar();
 41:     }
 42:   } while (customerid == 0 && fedkeyvalue == null);
 44:   connection.Close();
 45:   connection.Dispose();
 47:   …
 48: }

Below is the output from a sample sys.federation_member_columns view for the orders_federation. With this federation setup, range_low is inclusive and range_high is exclusive to the given federation member. NULL represents the maximum bigint value +1.


The set of scenarios here is obviously not an exhaustive list. There will be more sophisticated types of queries that require cross federation joins. Those queries will need to utilize client side in v1. In future things will get better with support from SQL Azure for fanout queries. Fanout querying will enable easier querying across federation members.

Clearly, SQL Azure federations require changes to your application. The changes typically impact more the mechanics of the application to wire up coordination of the partitioning effort with the federations but less about the business logic of the application. These changes to the application comes with great benefits of massive scalability and ability to massively parallelize your workload by engaging man nodes of the cluster. There are many applications out there that have been designed utilizing the horizontal partitioning and sharding principals over the last decade. These applications did not have the luxury of having an RBDMS platform that had understanding of the data distribution and robust routing of connections. With SQL Azure federations, horizontal partitioning and sharding become first class citizens in the server and the bar for entry is lowered and it becomes much easier to get that flexibility to redistribute data at will, without downtime. These properties translates to great elasticity and best economics and unlimited scale for applications so I think the changes are well worth the effort.

Toto Gamboa described My First Shot At Sharding Databases on 12/11/2010:

image Sometime in 2004, I was faced with a question whether to have our library system’s design, which I started way back in 1992 using Clipper on Novell Netware and later was ported to ASP/MSSQL, be torn apart and come up with a more advanced, scalable and flexible design. The usual problem I would encounter most often is that sometimes in an academic organization, there could be varying structures. In public schools, they have regional structures where libraries are shared by various schools in the region. In some organizations, a school have sister schools with several campuses each with one or more libraries in it but managed by only one entity. In one setup, a single campus can have several schools in it, with each having one or more libraries. These variations pose a lot of challenge in terms of programming and deployment. A possible design nightmare.

Each school or library would often emphasize their independence and uniqueness against other schools and libraries, for example wanting to have their own library policies and control over their collections and users and customers and yet have that desire to share their resources to others and interoperate with one another. Even within inside a campus, one library can even operate on a different time schedule from the other library just a hallway apart. That presented a lot of challenge in terms of having a sound database design.

The old design from 1992 to 2004 was a single database with most tables have references to common tables called “libraries” and “organizations”. That was an attempt to partition content by libraries or organization (a school). Scaling though wasn’t a concern that time as even the largest library in the country won’t probably consume a few gigs of harddisk space. The challenge came as every query inside the system has to filter everything by library or school. As features of our library system grew in numbers and became more advanced and complex, it is apparent that the old design, though very simple when contained in a single database, would soon burst into a problem. Coincidentally though, I have been contemplating to advance the product in terms of feature set. Flexibility was my number one motivator, second was the possibility of doing it all over the web.

Then came the ultimate question, should I retain the design and improve on it, or should I be more daring and ambitious. I scoured over the Internet for guidance of a sound design and after a thorough assessment of current and possibly future challenges that would include scaling, I ended up with a decision to instead break things apart and abandon the single database mindset. The old design went into my garbage bin. Consequently, that was the beginning of my love of sharding databases to address issues of library organization, manageability and control and to some extent, scalability.

The immediate question was how I am gonna do the sharding. Picking up the old schema from the garbage bin, it was pretty obvious that breaking them apart by libraries is the most logical. I haven’t heard the concept of a “tenant” then, but I dont have to as the logic behind in choosing it is as ancient as it can be. There were other potential candidate for keys to shard the database like “schools” or “organization”, but the most logical is the “library”. It is the only entity that can stand drubbing and scrubbing. I went on to design our library system with each database containing only one tenant, the library. As of this writing, our library system have various configurations: one school have several libraries inside their campus, another have several campuses scattered all over metro manila with some campus having one or more libraries but everything sits on a VPN accessing a single server.

Our design though is yet to become fully sharded at all levels as another system acts as a common control for all the databases. This violates the concept of a truly sharded design where there should be no shared entity among shards. Few tweaks here and there though would fully comply with the concept. Our current design though is 100% sharded at the library level.

So Why Sharding?

The advent of computing in the cloud present to us new opportunities, especially with ISVs. With things like Azure, we will be forced to rethink our design patterns. The most challenging perhaps is on how to design not only to address concerns of scalability, but to make our applications tenant-aware and tenant-ready. This challenge is not only present in the cloud, but a lot of on-premise applications can be designed this way. This could help in everyone’s eventual transition to the cloud. But cloud or not, we could benefit a lot on sharding. In our case, we can pretty much support any configuration out there. We also got to mimic the real world operation of libraries. And it eases up on a lot of things like security and control.

Developing Sharded Applications

Aside from databases, applications need to be fully aware that it is not anymore accessing a single database where it can easily query everything with ease without minding other data exists somewhere. Could be on a different database, sitting on another server. Though the application will be a bit more complex in terms of design, often, it is easier to comprehend and develop if you have an app instance mind only a single tenant as oppose to an app instance trying to filter out other tenants just to get the information set of just one tenant.

Our Library System

Currently our library system runs on sharded mode both on premise and on cloud-like hosted environments. You might want to try its online search:

Looking at SQL Azure

imageSharding isn’t automatic to any Microsoft SQL Server platform including SQL Azure. One needs to do it by hand and from ground up. This might change in the future though. I am quite sure Microsoft will see this compelling feature. SQL Azure is the only Azure based product that currently does not have natural/inherent support for scaling out.  If I am Microsoft, they should offer a full SQL Server service like shared windows hosting sites do along side SQL Azure so it eases up adoption. Our systems database design readiness (being currently sharded) would allow us to easily embrace the new service. But I understand, it would affect, possibly dampen their SQL Azure efforts if they do it. But I would try to reconsider it than offering a very anemic product.

As of now, though we may have wanted to take our library system to Azure with few minor tweaks, we just can’t in this version of SQL Azure for various reasons as stated below:

  • SQL Server Full Text Search. SQL Azure does not support this in its current version.
  • Database Backup and Restore. SQL Azure does not support this in its current version.
  • Reporting Services. SQL Azure does not support this in its current version.
  • $109.95 a month on Azure/SQL Azure versus $10 a month shared host with a full-featured IIS7/SQL Server 2008 DB

My Design Paid Off

Now I am quite happy that the potentials of a multi-tenant, sharded database design, though is as ancient, it is beginning to get attention with the advent of cloud computing. My 2004 database design is definitely laying the groundwork for cloud computing adoption. Meanwhile, I have to look for solutions to address what’s lacking in SQL Azure. There could be some easy work around.

I’ll find time on the technical aspects of sharding databases in my future blogs. I am also thinking that PHISSUG should have one of this sharding tech-sessions.

Toto is a database specialist, software designer, and Microsoft MVP.

The MSDN Library described The New Silverlight based SQL Azure Portal in a 12/2010 article:

imageThere are currently two SQL Azure Portals which provide the user interface for provisioning servers and logins, and quickly creating databases. These portals will coexist until early 2011 when the old SQL Azure Portal is scheduled for decommissioning.

The new Silverlight based SQL Azure Portal is integrated with the new Windows Azure Platform Management Portal. It provides all the features included with the old SQL Azure Portal. You can access this new portal at New SQL Azure Portal.

The new portal provides access to a web-based database management tool for existing SQL Azure databases. The management tool supports basic database management tasks like designing and editing tables, views, and stored procedures, and authoring and executing Transact-SQL queries. For more information, see the MSDN documentation for Database Manager for SQL Azure. The database management tool is available when you select a database to manage in the new SQL Azure Portal and click the Manage button on the toolbar. See screenshot below.

SQL Azure Portal Screenshot

The Old SQL Azure Portal

The older SQL Azure Portal is still available at Old SQL Azure Portal. This portal is scheduled to be decommissioned around the first half of 2011. posted SQL Azure Primer (part 4) - Creating Logins and Users on 12/6/2010 (missed when posted):

5. Creating Logins and Users

imageWith SQL Azure, the process of creating logins and users is mostly identical to that in SQL Server, although certain limitations apply. To create a new login, you must be connected to the master database. When you're connected, you create a login using the CREATE LOGIN command. Then, you need to create a user account in the user database and assign access rights to that account.

5.1. Creating a New Login

image Connect to the master database using the administrator account (or any account with the loginmanager role granted), and run the following command:


At this point, you should have a new login available called test. However, you can't log in until a user has been created. To verify that your login has been created, run the following command, for which the output is shown in Figure 11:

select * from sys.sql_logins

Figure 11. Viewing a SQL login from the master database

If you attempt to create the login account in a user database, you receive the error shown in Figure 12. The login must be created in the master database.

Figure 12. Error when creating a login in a user database

If your password isn't complex enough, you receive an error message similar to the one shown in Figure 13. Password complexity can't be turned off.

Figure 13. Error when your password isn't complex enough


Selecting a strong password is critical when you're running in a cloud environment, even if your database is used for development or test purposes. Strong passwords and firewall rules are important security defenses against attacks to your database.

5.2. Creating a New User

You can now create a user account for your test login. To do so, connect to a user database using the administrator account (you can also create a user in the master database if this login should be able to connect to it), and run the following command:


If you attempt to create a user without first creating the login account, you receive a message similar to the one shown in Figure 14.

Figure 14. Error when creating a user without creating the login account first

6. Assigning Access Rights

So far, you've created the login account in the master database and the user account in the user database. But this user account hasn't been assigned any access rights.

To allow the test account to have unlimited access to the selected user database, you need to assign the user to the db_owner group:

EXEC sp_addrolemember 'db_owner', 'test'

At this point, you're ready to use the test account to create tables, views, stored procedures, and more.


In SQL Server, user accounts are automatically assigned to the public role. However, in SQL Azure the public role can't be assigned to user accounts for enhanced security. As a result, specific access rights must be granted in order to use a user account.

7. Understanding Billing for SQL Azure

SQL Azure is a pay-as-you-go model, which includes a monthly fee based on the cumulative number and size of your databases available daily, and a usage fee based on actual bandwidth usage. However, as of this writing, when the consuming application of a SQL Azure database is deployed as a Windows Azure application or service, and it belongs to the same geographic region as the database, the bandwidth fee is waived.

To view your current bandwidth consumption and the databases you've provisioned from a billing standpoint, you can run the following commands:

SELECT * FROM sys.database_usage        -- databases defined
SELECT * FROM sys.bandwidth_usage -- bandwidth

The first statement returns the number of databases available per day of a specific type: Web or Business edition. This information is used to calculate your monthly fee. The second statement shows a breakdown of hourly consumption per database.

Figure 15 shows a sample output of the statement returning bandwidth consumption. This statement returns the following information:

  • time. The hour for which the bandwidth applies. In this case, you're looking at a summary between the hours of 1 AM and 2 AM on January 22, 2010.

  • database_name. The database for which the summary is available.

  • direction. The direction of data movement. Egress shows outbound data, and Ingress shows inbound data.

  • class. External if the data was transferred from an application external to Windows Azure (from a SQL Server Management Studio application, for example). If the data was transferred from Windows Azure, this column contains Internal.

  • time_period. The time window in which the data was transferred.

  • quantity. The amount of data transferred, in kilobytes (KB).

Visit for up-to-date pricing information.

Figure 15. Hourly bandwidth consumption

Following are links to earlier members of this series:

Julie Lerman (@julielerman) posted the slides from her Working With Sql Azure from Entity Framework On-Premises presentation to the CodeProject’s Virtual Tech Summit of 12/2/2010 (missed when posted):

Be sure to check out the related presentations.

<Return to section navigation list> 

Dataplace DataMarket and OData

• Kevin Ritchie (@KevRitchie) published Day 12 - Windows Azure Platform – Marketplace on 12/12/2010:

image On the 12th day of Windows Azure Platform Christmas my true love gave to me the Windows Azure Marketplace.

imageWhat is the Windows Azure Marketplace?

You’ve probably heard of the Apple App Store; if not, it’s a central point for application developers to market and sell their applications. The Windows Azure Marketplace is Microsoft’s way of allowing Windows Azure Application developers to do the same thing, but with one fundamental difference. The Marketplace has an additional market; the DataMarket.

The DataMarket is a place that allows leading commercial data providers and authoritative public data sources to make data; such as image files, databases, reports on demographics and real-time feeds, readily available (at a price) to business end users and application developers. End users can consume this data using Microsoft Office tools like Excel or even use Business Intelligence tools like PowerPivot or SSRS (SQL Server Reporting Services) to mine this data and present it in a way that fits their business purposes.

Application developers can use data feeds to create real-time, content rich applications using a consistent RESTful approach. This means that developers can consume this data using tools like Visual Studio; well basically any development tools that support HTTP.

Here’s a tad more info on both Markets

Well, this brings an end to the 12 days of Windows Azure Platform Christmas. For anyone that has been reading and is new to the Windows Azure Platform, I hope this has given you a good insight into what each component of the platform provides and why you should think about using it.

If anything, it has also been a good learning exercise for me.

I have only one thing left to say Merry Christmas!

Oh and here are some links for more detail on the platform:

• MirekM updated the download of his Silverlight Data Source Control for OData at CodePlex on 12/10/2010 (site registration required):

Project Description
image The library contains Data Source Control for XAML forms. The control extends proxy class to get an access to OData Services(created by Visual Studio) and cooperates with others Silverlight GUI controls. The control supports master-detail, paging, filtering, sorting, validation, changing data. You can append your validation rules. It works like a Data Source Control for RIA Services.

imageYou can try Live demo here
In detail, there are three basic classes: ODataSource, ODataSourceView and Entity.

  1. ODataSource is a control and you can use it in XAML Forms.
  2. ODataSourceView is a class which contains a data set. ODataSourceView has the following interfaces: IEnumerable, INotifyPropertyChanged, INotifyCollectionChanged, IList, ICollectionView, IPagedCollectionView ,IEditableCollectionView.
  3. Entity is a class for one data record and has the following interfaces: IDisposable, INotifyPropertyChanged, IEditableObject, IRevertibleChangeTracking, IChangeTracking.

Here’s a capture of Mirek’s live demo:


• Chris Stein reported OData gaining traction in a 12/10/2010 post to the Simba Technologies blog:

image It’s been interesting to watch the list of producers of live OData Services grow over the last few months.  The early adopters (like Netflix, some municipal & federal government agencies, and several Microsoft projects) have been joined by the likes of Facebook, eBay and twitpic.  The list is even gaining a sense of humour, as evidenced by the entry for Nerd Dinner:  “Nerd Dinner is a website that helps nerds to meet and talk, not surprisingly it has adopted OData.”  You can check out the complete list here:

As of this writing, the list is up to 33 services with a good mix between large/high-profile entities with wide-ranging application possibilities and small initiatives often focused on just a slice of the tech community itself.  We need both, because it’s the big names with brand recognition that catch people’s attention, but small-scale tinkering by techies can be an excellent incubator from which new & creative ideas will eventually emerge.

imageOData, the technology, is “the web-based equivalent of ODBC, OLEDB, ADO.NET and JDBC” (Chris Sells, via MSDN).  Just as the ODBC specification and its descendants have enabled data access between clients and servers across widely disparate hardware and OS platforms, the OData specification promises data access between “consumers” (applications using standard web technologies) and “producers” (data sources residing on various web server platforms).

OData, looked at as a movement, is creating an open-ended environment with a whole new range of business decisions for companies to deal with.  At the simplest level, the questions for a company choosing to publish an OData service are similar to those you would consider for a web application:

  • Scope:  Will the data be accessible only via a corporate Intranet, or will it be available on the open Internet?
  • Authentication/Security:  Will a user have to login to access the data?  Will some portion of the data be available without logging in?  The Netflix example is a good one to consider here—their movie catalogue is available to the general public, but a user’s movie queue is restricted to authenticated users only.
  • Will the data be Read only or Read-write?
  • What subset of fields from the database will be exposed?
  • What range of data from the database will be exposed?

The big difference, of course, is that you’re not the one creating the application.  You may also create an OData consumer application in addition to the OData producer data source, but the key is that you’re not the only one doing so and there are potentially many applications consuming your data feed.  This moves you up to a higher level of business questioning:

  • How might our data potentially be combined with other data sources (either by us or by others) and what value does that potentially add for our customers?
  • What type of new business relationships are possible with other producers of OData feeds?
  • What is the value to our business of this data (i.e. do we need to charge for access to it)?

If your company makes a product that generates data in some form, adding an ODBC driver opens it up to a wide range of general-purpose reporting tools so your customers can slice & dice the data to their hearts’ content.  The benefits of providing data connectivity to the desktop are proven and clear.  If you’re already doing this, what are your company’s plans for providing data connectivity to the web?  Let us know, we’d love to hear from you.

Simba Technologies was one of the first third-party providers of ODBC drivers for Microsoft data sources. The firm now offers a Simba ODBC Driver for Microsoft HealthVault, which links HealthVault data with Microsoft Excel and Access.

Vincent Phillipe-Lauzon posted Advanced using OData in .NET: WCF Data Services to the CodeProject on 12/10/2010:

image Samples showcasing usage of OData in .NET (WCF Data Services) ; samples go in increasing order of complexity, addressing more and more advanced scenarios.


imageIn this article I give code samples showcasing usage of OData in .NET (WCF Data Services).

The samples will go in increasing order of complexity, addressing more and more advanced scenarios.


For some background on what is OData, I wrote an introductory article on OData a few months ago: An overview of Open Data Protocol (OData). In that article I showed what the Open Data Protocol is and how to consume it with your browser.

In this article, I'll show how to expose data as OData using the .NET Framework 4.0. The technology in the .NET Framework 4.0 used to expose OData is called WCF Data Services. Prior to .NET 4.0, it was called ADO.NET Data Services and prior to that, Astoria, which was Microsoft internal project name.

Using the code

The code sample contains a Visual Studio 2010 Solution. This solution contains a project for each sample I give here.

The solution also contains a solution folder named "DB Scripts" with three files: SchemaCreation.sql (creates a schema with 3 tables into a pre-existing DB), DataInsert.sql (inserts sample data in the tables) & SchemaDrop.sql (drops the tables & schema if needed).

We use the same DB schema for the entire project:


Hello World: your DB on the Web

For the first sample, we'll do an Hello World: we'll expose our schema on the web. To do this, we need to:

  • Create an Entity Framework model of the schema
  • Create a WCF Data Services
  • Map the Data Context to our entity model
  • Declare permissions

Creating an Entity Framework model isn't the goal of this article. You can consult MSDN pages to obtain details:


Creating a WCF Data Service is also quite trivial, create a new item in the project:


We now have to map the data context to our entity model. This is done by editing the code behind of the service and simply replacing:

public class EmployeeDataService : DataService&lt; /* TODO: put your data source class name here */ &gt;


public class EmployeeDataService : DataService&lt;ODataDemoEntities&gt;

Finally, we now have to declare permissions. By default Data Services are locked down. We can open read & write permission. In this article, we'll concentrate on the read permissions. We simply replace:

 // This method is called only once to initialize service-wide policies.
public static void InitializeService(DataServiceConfiguration config)
// TODO: set rules to indicate which entity sets and service operations are visible, updatable, etc.
// Examples:
// config.SetEntitySetAccessRule("MyEntityset", EntitySetRights.AllRead);
// config.SetServiceOperationAccessRule("MyServiceOperation", ServiceOperationRights.All);
config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;


 // This method is called only once to initialize service-wide policies.
public static void InitializeService(DataServiceConfiguration config)
config.SetEntitySetAccessRule("Employees", EntitySetRights.AllRead);
config.SetEntitySetAccessRule("Addresses", EntitySetRights.AllRead);
config.SetEntitySetAccessRule("Departments", EntitySetRights.AllRead);

config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;

There you go, you can run your project and have our three tables exposed as OData feeds on the web.

Exposing another Data Model

Unfortunately, the previous example is what %95 of the sample you'll find on the web are about. But WCF Data Services can be so much more.

In this section, we'll build a data model from scratch, one that isn't even connected to a database. What could we connect on... of course! The list of process running on your box!

In order to do this, we're going to have to:

  • Create a process model class
  • Create a data model exposing the processes
  • Create a WCF Data Services
  • Map the Data Context to our data model
  • Declare permissions

We create the process model class as follow:

 /// < />summary />>Represents the information of a process.< />/ />summary />>
public class ProcessModel
/// < />summary />>Name of the process.< />/ />summary />>
public string Name { get; set; }

/// < />summary />>Process ID.< />/ />summary />>
public int ID { get; set; }

Now the key take away here are:

  • Always use simple types (e.g. integers, strings, date, etc.): they are the only one to map to Entity Data Model (EDM) types
  • Always declare the key properties (equivalent to primary key column in a database table) with the DataServiceKey attribute. The concept of key is central in OData, since you can single out an entity by using its ID.

If you violate one of those two rules, you'll have errors in your web service and not much way to know why.

We then create the data model exposing the processes like this:

/// < />summary />>Represents a data model exposing processes.< />/ />summary />>
public class DataModel
/// < />summary />>Default constructor.< />/ />summary />>
/// < />remarks />>Populates the model.< />/ />remarks />>
public DataModel()
var processProjection = from p in Process.GetProcesses()
select new ProcessModel
Name = p.ProcessName,
ID = p.Id

Processes = processProjection.AsQueryable();

/// < />summary />>Returns the list of running processes.< />/ />summary />>
public IQueryable&lt;ProcessModel&gt; Processes { get; private set; }

Here we could have exposed more than one process for completeness, but we opted for simplicity.

The key here is to:

  • Have ONLY properties returning IQueryable of models.
  • Populate those collection in the constructor

Here we populate the model list directly, but sometimes (as we'll see in a next section), you can simply populate it with a deferred query, which is more performing.

Creation and mapping of the data service and the permission declaration works the same way as the previous sample. After you've done that, you have an OData endpoint exposing the processes on your computer. You can interrogate this endpoint with any type of client, such as LinqPad.

This example isn't very useful in itself, but it shows that you can expose any type of data as an OData endpoint. This is quite powerful because OData is a rising standard and as you've seen, it's quite easy to expose your data that way.

You could, for instance, have your production servers expose some live data (e.g. number of active sessions) as OData that you could consume at any time.

Exposing a transformation of your DB

Another very useful scenario, somehow a combination of the previous ones, is to expose data from your DB with a transformation. Now that might be accomplished by performing mappings in the entity model but sometimes you might not want to expose the entity model directly or you might not be able to do the mapping in the entity model. For instance, the OData data objects might be out of your control but you must use them to expose the data.

In this sample, we'll flatten the employee and its address into one entity at the OData level.

  • Create an Entity Framework model of the schema
  • Create an employee model class
  • Create a department model class
  • Create a data model exposing both model classes
  • Create a WCF Data Services
  • Map the Data Context to our data model
  • Declare permissions

Creation of the Entity Framework model is done the same way as in the Hello World section.

We create the employee model class as follow:

 /// < />summary />>Represents an employee.< />/ />summary />>
public class EmployeeModel
/// < />summary />>ID of the employee.< />/ />summary />>
public int ID { get; set; }

/// < />summary />>ID of the associated department.< />/ />summary />>
public int DepartmentID { get; set; }

/// < />summary />>ID of the address.< />/ />summary />>
public int AddressID { get; set; }

/// < />summary />>First name of the employee.< />/ />summary />>
public string FirstName { get; set; }

/// < />summary />>Last name of the employee.< />/ />summary />>
public string LastName { get; set; }

/// < />summary />>Address street number.< />/ />summary />>
public int StreetNumber { get; set; }

/// < />summary />>Address street name.< />/ />summary />>
public string StreetName { get; set; }

We included properties from both the employee and the address, hence flattening the two models. We also renamed the EmployeeID to ID.

We create the department model class as follow:

 /// < />summary />>Represents a department.< />/ />summary />>
public class DepartmentModel
/// < />summary />>ID of the department.< />/ />summary />>
public int ID { get; set; }

/// < />summary />>Name of the department.< />/ />summary />>
public string Name { get; set; }

We create a data model exposing both models as follow:

 /// < />summary />>Represents a data model exposing both the employee and the department.< />/ />summary />>
public class DataModel
/// < />summary />>Default constructor.< />/ />summary />>
/// < />remarks />>Populates the model.< />/ />remarks />>
public DataModel()
using (var dbContext = new ODataDemoEntities())
Departments = from d in dbContext.Department
select new DepartmentModel
ID = d.DepartmentID,
Name = d.DepartmentName

Employees = from e in dbContext.Employee
select new EmployeeModel
ID = e.EmployeeID,
DepartmentID = e.DepartmentID,
AddressID = e.AddressID,
FirstName = e.FirstName,
LastName = e.LastName,
StreetNumber = e.Address.StreetNumber,
StreetName = e.Address.StreetName

/// < />summary />>Returns the list of employees.< />/ />summary />>
public IQueryable&lt;EmployeeModel&gt; Employees { get; private set; }

/// < />summary />>Returns the list of departments.< />/ />summary />>
public IQueryable&lt;DepartmentModel&gt; Departments { get; private set; }

We basically do the mapping when we populate the employee query. Here, as opposed to the previous example, we don't physically populate the employees, we define a query to fetch them. Since LINQ always defines deferred query, the query simply maps information.

We then create a WCF data services, map the data context to the data model and declare permissions as follow:

 public class EmployeeDataService : DataService&lt;DataModel&gt;
// This method is called only once to initialize service-wide policies.
public static void InitializeService(DataServiceConfiguration config)
config.SetEntitySetAccessRule("Employees", EntitySetRights.AllRead);
config.SetEntitySetAccessRule("Departments", EntitySetRights.AllRead);

config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;

This scenario is quite powerful. Basically, you can mix data from the database and from other sources, transform it and expose it as O-Data while still beneficiating of the query power of your model (e.g. the database).

Service Operations

Another useful scenario covered by WCF Data Services, .NET 4.0 implementation of OData, is the ability to expose a service, with input parameters but where the output is treated as other entity sets in OData, that is, queryable in every way.

This is very powerful since the expressiveness of queries in OData is much less than in LINQ, ie there are a lot of queries in LINQ you just can't do in OData. This is quite understandable since queries are packed in a URL. Service Operations fill that gap by allowing you to take parameters in, perform a complicated LINQ query and return the result as a queryable entity-set.

Why would you query an operation being the result of a query? Well, for one thing, you might want to page on the result, using take & skip. But it might be that the result still represents a mass of data and you're interested in only a fraction of it. For instance, you could have a service operation returning the individuals in a company with less than a given amount of sick leave ; for a big company, that is still a lot data!

In this sample, we'll expose a service operation taking a number of employees in input and returning the departments with at least that amount of employees.

  • Create an Entity Framework model of the DB schema
  • Create a WCF Data Services
  • Map the Data Context to our entity model
  • Add a service operation
  • Declare permissions

The first three steps are identical as the hello world sample.

We define a service operation within the data service as follow:

public IQueryable&lt;Department&gt; GetDepartmentByMembership(int employeeCount)
var departments = from d in this.CurrentDataSource.Department
where d.Employee.Count &gt;= employeeCount
select d;

return departments;

We then add the security as follows:

// This method is called only once to initialize service-wide policies.
public static void InitializeService(DataServiceConfiguration config)
config.SetEntitySetAccessRule("Employee", EntitySetRights.AllRead);
config.SetEntitySetAccessRule("Address", EntitySetRights.AllRead);
config.SetEntitySetAccessRule("Department", EntitySetRights.AllRead);

config.SetServiceOperationAccessRule("GetDepartmentByMembership", ServiceOperationRights.AllRead);

config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2;

Notice that we needed to enable the read on the service operation as well.

We can then hit the service operation with URL such as:


You can read more about service operations on this MSDN article.

Consuming OData using .NET Framework

In order to consume OData, you can simply hit the URLs with an HTTP-get using whatever web library at your disposal. In .NET, you can do better than that.

You can simply add a reference to your OData service and Visual Studio will generate proxies for you. This is quick and dirty and works very well.

In case where you've defined your own data model (as in our DB transformation sample), you might want to share those models as data contracts between the server and the client. You would then have to define the proxy yourself, which isn't really hard:

/// < />summary />>Proxy to our data model exposed as OData.< />/ />summary />>
public class DataModelProxy : DataServiceContext
/// < />summary />>Constructor taking the service root in parameter.< />/ />summary />>
/// < />param /> name="serviceRoot" />>< />/ />param />>
public DataModelProxy(Uri serviceRoot)
: base(serviceRoot)

/// < />summary />>Returns the list of employees.< />/ />summary />>
public IQueryable&lt;EmployeeModel&gt; Employees
get { return CreateQuery&lt;EmployeeModel&gt;("Employees"); }

/// < />summary />>Returns the list of departments.< />/ />summary />>
public IQueryable&lt;DepartmentModel&gt; Departments
get { return CreateQuery&lt;DepartmentModel&gt;("Departments"); }

Basically, we derive from System.Data.Services.Client.DataServiceContext and define a property for each entity set and create a query for each. We can then use it this way:

static void Main(string[] args)
var proxy = new DataModelProxy(new Uri(@"http://localhost:9793/EmployeeDataService.svc/"));
var employees = from e in proxy.Employees
orderby e.StreetName
select e;

foreach (var e in employees)
Console.WriteLine("{0}, {1}", e.LastName, e.FirstName);

The proxy basically acts as a data context! We treat it as any entity set source and can do queries on it. This is quite powerful since we do not have to translate the queries into URLs ourselves: the platform takes care of it!


We have seen different scenarios using WCF Data Services to expose and consume data. We showed that there is no need to limit ourselves to a database or an entity framework model. We can also expose service operations to do queries that would be otherwise impossible to do through OData and we've seen an easy way to consume OData on a .NET client.

I hope this opens up the possibilities around OData. We typically see samples where a database is exposed on the web and it looks like Access 1995 all over again. But OData is much more than that: it enables you to expose your data on the web but to present it the way you want and to control its access. It's blazing fast to expose data with OData and you do not need to know the query needs of the client since the protocol takes care of it.


This article, along with any associated source code and files, is licensed under The Microsoft Public License (Ms-PL)

Vincent-Philippe is a Senior Solution Architect working in Montreal (Quebec, Canada).

<Return to section navigation list> 

Windows Azure AppFabric: Access Control and Service Bus

Suren Machiraju described Hybrid Cloud Solutions With Windows Azure AppFabric Middleware in a chapter-length post of 11/29/2010 (missed when posted):


image Technical and commercial forces are causing Enterprise Architects to evaluate moving established on-premises applications into the cloud – the Microsoft Windows Azure Platform.

This blog post will demonstrate that there are established application architectural patterns that can get the best of both worlds: applications that continue to live on-premises while interacting with other applications that live in the cloud – the hybrid approach. In many cases, such hybrid architectures are not just a transition point — but a requirement since certain applications or data is required to remain on-premises largely for security, legal, technical and procedural reasons.

Cloud is new, hybrid cloud even newer. There are a bunch of technologies that have just been released or announced so there is no one book or source for authentic information, especially one that compares, contrasts, and ties it all together. This blog, and a few more that will follow, is an attempt to demystify and make sense of it all.

This blog begins with a brief review of two prevalent deployment paradigms and their influence on architectural patterns: On-premises and Cloud. From there, we will delve into a discussion around developing the hybrid architecture.

image722322In authoring this blog posting we take an architect’s perspective and discuss the major building block components that compose this hybrid architecture. We will also match requirements against the capabilities of available and announced Windows Azure and Windows Azure AppFabric technologies. Our discussions will also factor in the usage costs and strategies for keeping these costs in check.

The blog concludes with a survey of interesting and relevant Windows Azure technologies announced at Microsoft PDC 2010 - Professional Developer's Conference during October 2010.

On-Premises Architectural Pattern

On-premises solutions are usually designed to access data (client or browser applications) repositories and services hosted inside of a corporate network. In the graphic below, Web Server and Database Server are within a corporate firewall and the user, via the browser, is authenticated and has access to data.

Figure 1: Typical On-Premises Solution (Source -

The Patterns and Practices Application Architecture Guide 2.0 provides an exhaustive survey of the on-premises Applications. This Architecture Guide is a great reference document and helps you understand the underlying architecture and design principles and patterns for developing successful solutions on the Microsoft application platform and the .NET Framework.

The Microsoft Application Platform is composed of products, infrastructure components, run-time services, and the .NET Framework, as detailed in the following table (source: .NET Application Architecture 434 Guide, 2nd Edition).

Table 1: Microsoft Application Platform (Source -


The on-premises architectural patterns are well documented (above), so in this blog post we will not dive into prescriptive technology selection guidance here, however let’s briefly review some core concepts since we will reference them while elaborating hybrid cloud architectural patterns.

Hosting The Application

Commonly* on-premises applications (with the business logic) run on IIS (as an IIS w3wp process) or as a Windows Service Application. The recent release of Windows Server AppFabric makes it easier to build, scale, and manage Web and Composite (WCF, SOAP, REST and Workflow Services) Applications that run on IIS.

* As indicated in Table 1 (above) Microsoft BizTalk Server is capable of hosting on-premises applications; however, for sake of this blog, we will focus on Windows Server AppFabric

Accessing On-Premises Data

You’re on-premises applications may access data from the local file system or network shares. They may also utilize databases hosted in Microsoft SQL Server or other relational and non-relational data sources. In addition, your applications hosted in IIS may well be leveraging Windows Server AppFabric Cache for your session state, as well as other forms of reference or resource data.

Securing Access

Authenticated access to these data stores are traditionally performed by inspecting certificates, user name and password values, or NTLM/Kerberos credentials. These credentials are either defined in the data source themselves, heterogeneous repositories such as in SQL Logins, local machine accounts, or in directory Services (LDAP) such as Microsoft Windows Server Active Directory, and are generally verifiable within the same network, but typically not outside of it unless you are using Active Directory Federation Services - ADFS.


Every system exposes a different set of APIs and a different administration console to change its configuration – which obviously adds to the complexity; e.g., Windows Server for configuring network shares, SQL Server Management Studio for managing your SQL Server Databases; and IIS Manager for the Windows Server AppFabric based Services.

Cloud Architectural Pattern

Cloud applications typically access local resources and services, but they can eventually interact with remote services running on-premises or in the cloud. Cloud applications usually hosted by a ‘provider managed’ runtime environment that provides hosting services, computational services, storage services, queues, management services, and load-balancers. In summary, cloud applications consist of two main components: those that execute application code and those that provide data used by the application. To quickly acquaint you with the cloud components, here is a contrast with popular on-premises technologies:

Table 2: Contrast key On-Premises and Cloud Technologies


NOTE: This table is not intended to compare and contrast all Azure technologies. Some of them may not have an equivalent on-premise counterpart.

The graphic below presents an architectural solution for ‘Content Delivery’. While Content creation and its management is via on-premises applications; the content storage (Azure Blob Storage) and delivery is via Cloud – Azure Platform infrastructure. In the following sections we will review the components that enable this architecture.

components that enable this architecture.

Figure 2: Typical Cloud Solution

Running Applications in Windows Azure

Windows Azure is the underlying operating system for running your cloud services on the Windows Azure AppFabric Platform. The three core services of Windows Azure in brief are as follows:

  1. Compute: The compute or hosting service offers scalable hosting of services on 64-bit Windows Server platform with Hyper-V support. The platform is virtualized and designed to scale dynamically based on demand. The Azure platform runs web roles on Internet Information Server (IIS) and worker roles as Windows Services.

  2. Storage: There are three types of storage supported in Windows Azure: Table Services, Blob Services, and Queue Services. Table Services provide storage capabilities for structured data, whereas Blob Services are designed to store large unstructured file like videos, images, batch files in the cloud. Table Services are not to be confused with SQL Azure; typically you can store the high-volume data in low-cost Azure Storage and use (relatively) expensive SQL Azure to store indexes to this data. Finally, Queue Services are the asynchronous communication channels for connecting between Services and applications not only in Windows Azure but also from on-premise applications. Caching Services, currently available via Windows Azure AppFabric LABS, is another strong storage option.

  3. Management: The management service supports automated infrastructure and service management capabilities to Windows Azure cloud services. These capabilities include automatic and transparent provisioning of virtual machines and deploying services in them, as well as configuring switches, access routers, and load balancers.

A detailed review on the above three core services of Windows Azure is available in the subsequent sections.

Application code execution on Windows Azure is facilitated by Web and Worker roles. Typically, you would host Websites (ASP.NET, MVC2, CGI, etc.) in a Web Role and host background or computational processes in a Worker role. The on-premises equivalent for a Worker role is a Windows Service.

This architecture leaves a gray area — where should you host WCF Services? The answer is – it depends! Let me elaborate, when building on-premises services, you can host these WCF Services in a Windows Service, (e.g., In BizTalk, WCF Receive Locations can be hosted within an in-process host instance that runs as a Windows Service) and in Windows Azure you can host the WCF Services as Worker roles . While you can host a WCF Service (REST or SOAP) in either a Worker Role or Web role, you will typically host these services in a Web role; the one exception to this is when your WCF service specifically needs to communicate via TCP/IP endpoints. The Worker roles support the ability to expose TCP endpoints, while Web roles are capable of exposing HTTP and HTTP(S) endpoints; Worker roles add the ability to expose TCP endpoints.

Storing & Retrieving Cloud Hosted Data

Applications that require storing data in the cloud can leverage a variety of technologies including SQL Azure, Windows Azure Storage, Azure AppFabric Cache, Virtual Machine (VM) instance local storage, Instance Local Storage and Azure Drive (XDrive). Determining the right technology is a balancing exercise in managing trade-offs, costs, and capabilities. The following sections will attempt to provide prescriptive guidance on what usage scenario each storage option is best suited for.

The key takeaway is whenever possible co-locate data and the consuming application. Your data in the cloud, stored in SQL Azure or Windows Azure Storage, should be accessible via the ‘local’ Azure “fabric” to your applications. This has positive performance and a beneficial cost model when both the application and the data live within the same datacenter. Co-location is enabled via the 'Region' you choose for the Microsoft Data Center.

SQL Azure

SQL Azure provides an RDBMS in the cloud, and functions for all intents and purposes similarly to your on-premises version of SQL Server. This means that your applications can access data in SQL Azure using the Tabular Data Stream (TDS) protocol; or, in other words, your application uses the same data access technologies (e.g. ADO.NET, LINQ to SQL, EF, etc.) used by on-premises applications to access information on a local SQL Server. The only thing you need to do is change the SQL Client connection strings you’ve always been using. You just have to change the values to point to the server database hosted by SQL Azure. Of course I am glossing over some details like the SQL Azure connection string which contains other specific settings such as encryption, explicit username (in very strict format username@servername) and password – I am sure you get the drift here.

SQL Azure Labs demonstrate an alternative mechanism for allowing your application to interact with SQL Azure via OData. OData provides the ability to perform CRUD operations using REST and can retrieve query results formatted as AtomPub or JSON.

Via .NET programming, you can interact at a higher level using a DataServicesContext and LINQ.

Securing Access

When using TDS, access is secured using SQL Accounts (i.e., your SQL Login and password must be in the connection string), however there is one additional layer of security you will need to be aware of. In most on-premises architectures, your SQL Server database lives behind a firewall, or at most in a DMZ, which you would rarely expose. However, no matter where they are located, these repositories are not directly accessible to applications living outside the corporate boundaries. Of course even for on-premises solutions you can use IPSEC to restrict the (range of) machines that can access the SQL Server machine. Eventually, the access to these databases is mediated and controlled by web services (SOAP, WCF).

What happens when your database lives on Azure, in an environment effectively available to the entire big bad world? SQL Azure also provides a firewall that you can configure to restrict access to a set of IP address ranges, which you would configure to include the addresses used by your on-premises applications, as well as any Microsoft Services (data center infrastructure which enable access from Azure hosted services like your Web Roles). In the graphic below you will notice the ‘Firewall Settings’ tab wherein the IP Address Range is specified.

Figure 3: SQL Azure Firewall Settings

When using OData, access is secured by configuration. You can enable anonymous access where the anonymous account maps to a SQL Login created on your database, or you can map Access Control Service (ACS) identities to specific SQL Logins, whereby ACS takes care of the authentication. In the OData approach, the Firewall access rules are effectively circumvented because only the OData service itself needs to have access to the database hosted by SQL Azure from within the datacenter environment.


You manage your SQL Azure database the same way you might administer an on-premises SQL Server database, by using SQL Server Management Studio (SSMS). Alternately, and provided you allowed Microsoft Services in the SQL Azure firewall, you can use the Database Manager for SQL Azure (codename “Project Houston”) which provides the environment for performing many of the same tasks you would perform via SSMS.

The Database Manager for SQL Azure is a part of the Windows Azure platform developer portal refresh, and will provide a lightweight, web-based database management and querying capability for SQL Azure databases. This capability allows customers to have a streamlined experience within the web browser without having to download any tools.

Figure 4: Database Manager for SQL Azure (Source - PDC 10)

One of the primary technical differences between the traditional SSMS management option is the required use of port 1433; whereas, “Project Houston” is web-based and leverages port 80. An awesome demo on the capabilities of the Database Manager is available here.

Typical Usage

SQL Azure is best suited for storing data that is relational, specifically where your applications expect the database server to perform the join computation and return only the processed results. SQL Azure is a good choice for scenarios with high transaction throughput (view case studies here) since it has a flat-rate pricing structure based on the size of data stored and additionally query evaluation is distributed across multiple nodes.

Windows Azure Storage (Blobs, Tables, Queues)

Windows Azure Storage provides storage services for data in the form of metadata enriched blobs, dictionary-like tables, and simple persistent queues. You access data via HTTP or HTTP(S) by leveraging the StorageClient library which provides 3 different classes, respectively CloudBlobClient, CloudTableClient, CloudQueueClient), to access storage services. For example, CloudTableClient provides a DataServices context enabling use of LINQ for querying table data. All data stored by Windows Azure Storage can also be accessed via REST. AppFabric CAT (team) plans to provide code samples via this blog site to demonstrate this functionality – stay tuned.

Securing Access

Access to any of the Windows Azure Storage blobs, tables, or queues is provided through symmetric key authentication whereby an account name and account key pair must be included with each request. In addition, access to the Azure blobs can be secured via the mechanism known as Shared Access Signature.


Currently there is no single tool for managing Windows Azure Storage. There are quite a few ‘samples’ as well as third-party tools. The graphic below is from the tool – Azure Storage Explorer available for download via Codeplex.

Figure 5: Azure Storage Explorer (Source – Codeplex)

In general, a Windows Azure storage account, which wraps access to all three forms of storage, services) is created through the Windows Azure Portal, but is effectively managed using the API’s.

Typical Usage

Windows Azure storage provides unique value propositions that make it fit within your architecture for diverse reasons.

Although Blob storage is charged both for amount of storage used and also by the number of storage transactions, the pricing model is designed to scale transparently to any size. This makes it best suited to the task of storing files and larger objects (high resolution videos, radiology scans, etc.) that can be cached or are otherwise not frequently accessed.

Table storage is billed the same way as Blob Storage, but its unique approach to indexing at large scale makes it useful in situations where you need to efficiently access data from datasets larger than the 50 GB per database maximum of SQL Azure, and where you don’t expect to be performing joins or are comfortable enabling the client to download a larger chunk of data and perform the joins and computations on the client side.

Queues are best suited for storing pointers to work that needs to be done, due to their limited storage capacity of a maximum of 8K per message, in a manner ensuring ordered access. Often, you will use queues to drive the work performed by your Worker roles, but they can also be used to drive the work performed by your on-premises services. Bottom-line, Queues can be effectively used by a Web Role to exchange control messages in an asynchronous manner with a Worker Role running within the same application, or to exchange messages with on-premises applications

Azure AppFabric Cache

The Azure AppFabric Cache, currently in CTP/Beta as of November 2010, should give you (technology is pre-release and hence the caveat!) the same high-performance, in-memory, distributed cache available with Windows Server AppFabric, as a hosted service. Since this is an Azure AppFabric Service, you are not responsible for managing servers participating in the data cache cluster. Your Windows Azure applications can access the cached data through the client libraries in the Windows Azure AppFabric SDK.

Securing Access

Access to a cache is secured by a combination of a generated authentication tokens with authorization rules (e.g., Read/Write versus Read only) as defined in the Access Control Service.

Figure 6: Access Control for Cache


At this time, creation of a cache, as well securing access to it, is performed via the Windows Azure AppFabric labs portal. Subsequent to its release this will be available in the commercial Azure AppFabric portal.

Typical Usage

Clearly the cache is best suited for persisting data closer to your application, whether its data stored in SQL Azure, Windows Azure Storage, or the result of a call to another service or a combination of all of them. Generally, this approach is called the cache-aside model, whereby requests for data made by your application first check the cache and only query the actual data source, and subsequently add it to the cache, if it’s not present. Typically we are seeing cached used in the following scenarios:

  • A scalable session store for your web applications running on ASP.NET.

  • Output cache for your web application.

  • Objects created from resultsets from SQL or costly web service calls, stored in cache and called from your web or worker role.

  • Scratch pad in the cloud for your applications to use.

Using VM local storage & Azure Drives

When you create an instance in Windows Azure, you are creating an instance of a Virtual Machine (VM). VM, just like its on-premises counterpart, can make use of locally attached drives for storage or network shares. Windows Azure Drives are similar, but not exactly the same as, network shares. They are an NTFS formatted drive stored as page blob in Windows Azure Storage, and accessed from your VM instance by drive letter. These drives can be mounted exclusively to a single VM instance as read/write or to multiple VM instances as read-only. When such a drive is mounted, it caches data read in local VM storage, which enhances the read performance of subsequent accesses for the same data.

Securing Access

An Azure Cloud Drive is really just a fa├žade on top of a page blob, so access to the drive effectively amounts to having access to the Blob store by Windows Storage Account credentials (e.g., account name and account key).


As for Blobs, there is limited tooling for managing Azure drives outside of the StorageClient API’s. In fact, there is no portal for creating Azure Cloud drives. However, there are samples on Codeplex that help you with creating and managing the Azure drives.

The Windows Azure MMC Snap-In is available and can be downloaded from here – and the graphic below provides you a quick peek into the familiar look/feel of the MMC.

Figure 7: Windows Azure MMC Snap-In (Source - MSDN)

Typical Usage

Cloud drives have a couple of unique attributes that make them interesting choices in hybrid architecture. To begin with, if you need drive-letter based access to your data from your Windows Azure application and you want it to survive VM instance failure, a cloud drive is your best option. Beyond that, you can mount a cloud drive to instances as read-only. This enables sharing of large reference files across multiple Windows Azure roles. What’s more, you can create a VHD, for example, using Window 7’s Disk Management MMC and load this VHD to the blob storage for use by your VM instances, effectively cloning an on-premises drive and extending its use into the cloud.

Hybrid Architectural Pattern: Exposing On-premises Services To Azure Hosted Services

Hybrid is very different – unlike the ‘pure’ cloud solution, hybrid solutions have significant business process, data stores, and ‘Services’ as on-premises applications possibly due to compliance or deployment restrictions. A hybrid solution is one which has applications of the solution deployed in the cloud while some applications remain deployed on-premises.

For the purposes of this article, and as illustrated in the diagram, we will focus on the scenario where your on-premises applications are services hosted in Windows Server AppFabric and then communicate to other portions of your hybrid solution running on the cloud

Figure 8: Typical Hybrid Cloud Solution

Traditionally, to connect your on-premises applications to the off-premises applications (cloud or otherwise), you would enable this scenario by “poking” a hole in your firewall and configuring NAT routing so that Internet clients can talk to your services directly. This approach has numerous issues and limitations, not the least of which is the management overhead, security concerns, and configuration challenges.


So the big question here is: How do I get my on-premises services to talk to my Azure hosted services?

There are two approaches you can take: You can use the Azure AppFabric Service Bus, or you can use Windows Azure Connect.

Using Azure AppFabric Service Bus

If you’re on-premises solution includes WCF Services, WCF Workflow Services, SOAP Services, or REST services that communicate via HTTP(S) or TCP you can use the Service Bus to create an externally accessible endpoint in the cloud through which your services can be reached. Clients of your solution, whether they are could other Windows Azure hosted services, or Internet clients, simply communicate with that end-point, and the AppFabric Service Bus takes care of relaying traffic securely to your service and returning replies to the client.

The graphic below (from demonstrates the Service Bus functionality.

Figure 9: Windows Azure AppFabric Service Bus (Source -

The key value proposition of leveraging the Service Bus is that it is designed to transparently communicate across firewalls, NAT gateways, or other challenging network boundaries that exist between the client and the on-premises service. You get the following additional benefits:

  • The actual endpoint address of your services is never made available to clients.

  • You can move your services around because the clients are only bound to the Service Bus endpoint address, that is a virtual and not a physical address.

  • If both the client and the service happen to be on the same LAN and could therefore communicate directly, the Service Bus can set them up with a direct link that removes the hop out to the cloud and back and thereby improves throughput and latency.

Securing Access to Service Bus Endpoints

Access to the Service Bus is controlled via the Access Control Service. Applications that use the Windows Azure AppFabric Service Bus are required to perform security tasks for configuration/registration or for invoking service functionality. Security tasks include both authentication and authorization using tokens from the Windows Azure AppFabric Access Control service. When permission to interact with the service has been granted by the AppFabric Service Bus, the service has its own security considerations that are associated with the authentication, authorization, encryption, and signatures required by the message exchange itself. This second set of security issues has nothing to do with the functionality of the AppFabric Service Bus; it is purely a consideration of the service and its clients.

There are four kinds of authentication currently available to secure access to Service Bus:

  • SharedSecret, a slightly more complex but easy-to-use form of username/password authentication.

  • SAML, which can be used to interact with SAML 2.0 authentication systems.

  • SimpleWebToken, uses the OAuth Web Resource Authorization Protocol (WRAP) and Simple Web Tokens (SWT).

  • Unauthenticated, enables interaction with the service endpoint without any authentication behavior.

Selection of the authentication mode is generally dictated by application connecting to the Service Bus. You can read more on this topic here.

Cost considerations

Service Bus usage is charged by the number of concurrent connections to the Service Bus endpoint. This means that when on-premises or cloud-hosted service registers with the Service Bus and opens its listening endpoint it is considered as 1 connection. When a client subsequently connects to that Service Bus endpoint, it’s counted as a second connection. One very important point falls out of this that may affect your architecture: you will want to minimize concurrent connections to Service Bus in order to keep the subscription costs down. It may be more likely that you would want to use the Service Bus in the middle tier more like a VPN to on-premises services rather than allowing unlimited clients to connect through the Service Bus to your on-premises service.

To reiterate, the key value proposition of the Service Bus is Service Orientation; it makes it possible to expose application Services using interoperable protocols with value-added virtualization, discoverability, and security.

Using Windows Azure Connect

Recently announced at PDC 10, and expected for release by the end of 2010, is an alternative means for connecting your cloud services to your on-premises services. Windows Azure Connect effectively offers IP-level, secure, VPN-like connections from your Windows Azure hosted roles to your on-premises services. This service is still not available and of course pricing details have not as yet been released. From the information available to date, you can conclude that Windows Azure Connect would be used for connecting middle-tier services, rather than public clients, to your on-premises solutions. While the Service Bus is focused on connectivity for on-premise Services exposed as a Azure endpoints without having to deal with firewall and NAT setup; Windows Azure Connect provides broad connectivity between your Web/Worker roles and on-premises systems like SQL Server, Active Directory or LOB applications.

Securing the solution

Given the distributed nature of the hybrid (on-premises and cloud) solution, your approach to security should match it - this means your architecture should leverage Federated Identity. This essentially means that you are outsourcing authentication and possibly authorization.

If you want to flow your authenticated on-premises identities, such as domain credentials, into Azure hosted web sites or services, you will need a local identity (akin to presenting a claim issued by a local security token service) providing a security Identity Provider Security Token Service (IP-STS) such as Active Directory Federation Services 2.0 (ADFS 2.0). Your services, whether on-premises or in the cloud, can then be configured to trust credentials, in the form of claims, issued by the IP-STS. Think of the IP-STS as simply the component that can tell if a username and password credentials are valid. In this approach, clients authenticate against the IP-STS; for example, by sending their Windows credentials to ADFS, and if valid, they receive claims they can subsequently present to your websites or services for access. Your websites or services only have to evaluate these claims when executing authorization logic - Windows Identity Foundation (WIF) provides these facilities.

For additional information, review this session - SIA305 Windows Identity Foundation and Windows Azure for Developers. In this session you can learn how Windows Identity Foundation can be used to secure your Web Roles hosted in Windows Azure, how you can take advantage of existing on-premises identities, and how to make the most of the features such as certificate management and staged environments.

In the next release (ACS v2), a similar approach can be taken by using the Windows Azure AppFabric Access Control Service (ACS) to act as your IP-STS and thereby replacing ADFS. With ACS, you can define service identities that are essentially logins secured by either a username and password, or a certificate. The client calls out to ACS first to authenticate, and the client thereafter presents the claims received by ACS in the call to your service or website.

Finally, more advanced scenarios can be put into place that uses ACS as your Relaying-Party Security Token Service (RP-STS) as the authority for all your apps, both on-premises and in the cloud. ACS is then configured to trust identities issued from other identity providers and convert them to the claims expected by your applications. For example, you can take this approach to enable clients to authenticate using Windows Live ID, Google, and Yahoo, while still working with the same set of claims you built prior to supporting those services.

Port Bridge, a point-to-point tunneling scenario is perfect when the application is not exposed as a WCF service or doesn’t speak HTTP. Port bridge is perfect for protocols such as SMTP, SNMP, POP, IMAP, RDP, TDS and SSH. You can read more about this here.

Optimizing Bandwidth – Data transfer Costs

One huge issue to reconcile in hybrid solutions is the bandwidth – this is unique for hybrid.

The amount of data transferred in and out of a datacenter is billable, so an approach to optimizing your bandwidth cost is to keep as much traffic from leaving the datacenter as possible. This translates into architecting designs that transfer most data across the Azure “fabric” within affinity groups and minimize traffic between disparate data centers and externally. An affinity group ensures that your cloud services and storage are hosted together on the Windows Azure infrastructure. Windows Azure roles, Windows Azure storage, and SQL Azure services can be configured at creation time to live within a specific datacenter. By ensuring the Azure hosted components within your hybrid solution that communicate with each other live in the same data center, you effectively eliminate the data transfer costs. 

The second approach to optimizing bandwidth costs is to use caching. This means leveraging Windows Server AppFabric Cache on premise to minimize calls to SQL Azure or Azure Storage. Similarly, it also means utilizing the Azure AppFabric Cache from Azure roles to minimize calls to SQL Azure, Azure Storage, or to on-premises services via Service Bus.

Optimizing Resources

One of the most significant cost-optimization features of Windows Azure roles is their support for dynamic scale up and down. This means adding more instances as your load increases and removing them as the load decreases. Supporting load-based dynamic scaling can be accomplished using the Windows Azure management API and SQL Azure storage. While there is no out-of-the box support for this, there is a fair amount of guidance on how to implement this within your solutions (see the Additional References section for one such example). The typical approach is to configure your Azure Web and Worker roles to log health metrics (e.g., CPU load, memory usage, etc.) to a table in SQL Azure. You then have another Worker role that is responsible for periodically polling this table and when certain thresholds are reached or exceeded, it would increase or decrease the instance count for the monitored service using the Windows Azure management APIs. 

Monitoring & Diagnostics

For on-premises services, you can use Windows Server AppFabric Monitoring which logs tracking events into your on-premises instance of SQL Server, and this data can be viewed both through IIS Manager and by querying the monitoring database.

For Azure roles in production you will leverage Windows Azure Diagnostics, wherein your instances periodically load batches to Azure storage. For SQL Azure you can examine SQL Azure query performance by querying the related Dynamic Management Views

IntelliTrace is very useful for debugging solutions. With IntelliTrace debugging, you can log extensive debugging information for a Role instance while it is running in Windows Azure. If you need to track down a problem, you can then use the IntelliTrace logs to step through your code from within Visual Studio as though it were running in Windows Azure. In effect, IntelliTrace records key code execution and environment data while your service is running, and allows you to replay the recorded data from within Visual Studio. For more information on IntelliTrace, see Debugging with IntelliTrace.


Based off the discussions above, let’s consolidate our findings into this easy to read table.

Table 3: In Summary


It’s amply clear that the Hybrid Cloud Architectural pattern is ideally suited for connecting on-premises based appliances and applications with its cloud counterparts. This hybrid pattern is ideal since it leverages the best of both: traditional on-premises applications and new distributed/multi-tenant cloud applications.

Let’s Make Plans To Build This Out Now!

In the above sections we have presented a good survey of available technologies to build out the hybrid applications; recent announcements at PDC 10 are worth considering before you move forward.

First, you should consider the Composition Model within the Windows Azure Composite Applications Environment. The Composition Model ties together your entire Azure hosted services (roles, storage, SQL Azure, etc.) as one composite application enabling you to see the end to end picture in a single designer in Visual Studio. The Composition Model is a set of .NET Framework extensions and builds on the familiar Azure Service Model concepts and adds new capabilities for describing the application infrastructure, its components, and relationships among the components, in order to treat the composite application as a single logical identity.

Second, if you were making calls back to an on-premises instance of Windows Server AppFabric, Windows Workflow is fully supported on Azure as part of the composition model, and so you may choose to move the relevant Workflows directly into the cloud. This is huge, very huge since it enables develop once and deploy as needed!

Third, there is a new role in town - the Windows Azure Compute VM Role. This enables us to host 3rd party and archaic Line-of-business applications fully participate (wow - without a major code do-over!) in the hybrid cloud solution.

And last but not least, Windows Azure is also providing the Remote Desktop functionality, which enables you to connect to a running instance of your application or service to monitor activity and troubleshoot common problems. The Remote Desktop Protocol Access while super critical to the VM instance is also available to Web and Worker roles too. This in turn enables another related new feature, which is the availability of the full IIS instead of just the core that was available to web roles. 

Additional References

While web links for additional/background information are embedded into the text; the following additional references are provided as resources to move forward on your journey.

Stay Tuned!

This is the beginning; our team is currently working on the next blog that builds on this topic. The blog will take a scenario driven approach in building a hybrid Windows Azure Application.


Significant contributions from Paolo Salvatori, Valery Mizonov, Keith Bauer and Rama Ramani are acknowledged.

The PDF Version of this document is attached. Attachment: HYBRID CLOUD SOLUTIONS WITH WINDOWS AZURE APPFABRIC MIDDLEWARE.pdf

Kevin Ritchie (@KevRitchie) posted Day 11 - Windows Azure Platform - Azure AppFabric Caching on 12/11/2010:

imageOn the 11th day of Windows Azure Platform Christmas my true love gave to me the Windows Azure AppFabric Caching Service.

image722322What is the Windows Azure AppFabric Caching Service?

The Caching Service is a collection of distributed in-memory caches that contain frequently accessed data from your Windows Azure application, the key being to improve application performance.

So how do you use it?

Well, to be able to use caching you first need to store data, to do this you need to programmatically insert the data into the caches using the Caching API. With your frequently accessed data stored in the cache, you use a pattern called cache-aside to access it. What this means is that you first check the cache for data before going to the database to retrieve it; here’s our performance improvement. What if the data isn’t in the cache? Easy, retrieve it from the database and store it in the cache, so the next time your application needs the data, it gets it directly from the cache.

The great thing about using the Caching Service is that because it’s in the Cloud, your cache is distributed to multiple Windows Azure Instances giving you scalability, availability and fault tolerance, all of which is automatically configured and administered for you. Less Work!

Caching is not a new concept, but the provision of caching through the Cloud is and is a welcome feature to the increasingly popular Platform as a Service.

Tomorrow’s installment: The Azure Marketplace, links and stuff

The Windows Azure AppFabric LABS Team reminded developers with the Windows Azure AppFabric CTP December release – announcement and scheduled maintenance that the CTP will be updated on Wednesday, 12/15/2010:

image722322The next update to the Windows Azure AppFabric LABS environment is scheduled for December 15, 2010 (Wednesday).  Users will have NO access to the AppFabric LABS portal and services during the scheduled maintenance down time. 


  • START: December 15, 2010, 10 am PST
  • END:  December 15, 2010, 6 pm PST

Impact Alert:

The AppFabric LABS environment (Service Bus, Access Control, Caching, and portal) will be unavailable during this period. Additional impacts are described below.

Action Required:

  • Existing accounts and Service Namespaces will be available after the services are deployed.
  • However, ACS Identity Providers, Relying Party Applications, Rule Groups, Certificates, Keys, Service Identities and Management Credentials will NOT be persisted and restored after the maintenance. The user will be responsible for both backing up and restoring any ACS entities they care to reuse after the Windows Azure AppFabric LABS December Release.
  • Cache users will see the web.config snippet on their provisioned cache page change automatically as part of this release. We advise Cache customers to redeploy their application after applying the new snippet from the provisioned cache page to their application

Thank you for working in LABS and giving us valuable feedback.  Once the update becomes available, we'll post the details via this blog. 

Stay tuned for the upcoming LABS release!

<Return to section navigation list> 

Windows Azure Virtual Network, Connect, and CDN

Sam Vanhoutte (@SamVanhoutte) posted Azure Connect: VPN as a Service, a quick introduction to the CODit blog on 12/9/2010:

image I just received a notification that my request to join the Beta Program for Azure Connect was approved.  And that immediately got me starting to test it out.  Things look very straightforward.

I am using the phrase VPN as a Service for this feature, since it really explains it all.

Virtual Network Configuration of Windows Azure Role

imagePortal settings
  • After logging in on the Azure Portal, you can click the Virtual Network button in the left corner at the bottom of the screen:
  • After this, it is possible to enable the Virtual Network features for a specific subscription
  • When selecting a subscription, you can get the Activation Token from the portal, by clicking the ‘Get Activation Token’ button.  That allows to copy the activation token to the clipboard for later use.
Visual Studio project settings
  • In Visual Studio, when having the SDK 1.3 installed, it is possible to copy the activation token to the properties of an Azure role in the property pages:
  • Now you can deploy the role to the Windows Azure portal.

Adding an ‘on-premise’ server to the Virtual Cloud Network

Installing the Azure Connect Client software agent
  • On the local servers, it is now possible to install the ‘Local endpoint’, by clicking the correct button.
  • This shows a link to download the software on the machine (on premise).  This link is only active for a while.
  • The installation package is very easy to install, by selecting the correct language and clicking Next-Next-Finish.  After the endpoint software is installed, be sure to open the TCP 443 outbound port.
  • As expected, the local endpoint agent runs as a Windows Service:
Adding a local endpoint to an Azure Endpoint group
  • An Azure Endpoint group can be created, by clicking the “Create Group” button in the ribbon of the management portal.
  • This pops up a wizard where you can provide a name for the group and where you can add the local endpoints and Azure roles that should be part of the group.  You can also indicate if the local endpoints are “interconnected” or not.  This specifies if the endpoints can reach each other. 
    (be careful: in some multi-tenant situations, this can be seen as a risk!)
  • I could immediately see my local computer name in the Local Endpoint list and in the Role list, I could only see the role that was configured with the activation token for this Connect group. 
  • That’s the only required actions we need to take and now we have IP/Network connectivity between my local machine and my Azure role in the Cloud.

Testing the connectivity

Since I had added the Remote Desktop Connectivity support to my Azure role (see my previous blog post: Exploring the new Azure property pages in Visual Studio), I am now able to connect to my Role instance in the cloud and connect to it.

  • After logging in on my machine, I was immediately able to connect to my local machine, using my machine name.  I had a directory shared on my local machine and I was able to connect to it.
  • For a nice test, I added a nice ‘cloud picture’ on my local share and selected it to be my desktop background in the cloud.  (the picture was on top of a Mountain in the French Alps, with the Mount Blanc in the background, for those wondering) 
  • A part of my cloud desktop is here:


This was a very simple post, highlighting the way to set up the configuration between a Cloud app and local machines.  It really only took me about 5 minutes to get this thing working, knowing that I had never seen or tested this functionality before (only heard about it).

Some nice scenarios can now be implemented:

  • Making your Azure roles part of your Active Directory
  • Network connectivity between Cloud and Local (including other protocols, like UDP)

Definitely more to follow soon.

<Return to section navigation list> 

Live Windows Azure Apps, APIs, Tools and Test Harnesses

• Microsoft FUSELABS has hosted Montage, a new social networking/aggregation application running on Windows Azure at Here’s the result of a search on my name:


imageObviously, Montage had a problem processing my black-and-white avatar:

Here’s a composite capture of my Windows Azure Montage on 12/12/2010:


Montage, which is dynamic and requires no user input after you select your topics and layout, competes with, which delivers static, dated pages.

Brian T. Horowitz reported HP, Intel, Microsoft Technology Enable Less-Invasive Virtual Colonoscopies to’s Healthcare IT News blog on 12/10/2010:

image With HP's multicore PCs, Microsoft's high-performance computing technology and Intel's Parallel Studio 2011, Massachusetts General has shortened a virtual colonoscopy from 30 minutes to under 4 minutes.

Using technology from HP, Intel and Microsoft, a researcher at Massachusetts General Hospital has developed an algorithm to make colonoscopies less invasive, faster and less expensive.

image Hiro Yoshida, director of 3D imaging research in the radiology department at Massachusetts General Hospital and associate professor of radiology at Harvard Medical School, has developed a DTLS (datagram transport layer security) algorithm to run a virtual colonoscopy.

Colonoscopies test for colon cancer, the third-leading cause of cancer death in the United States, according to the American Cancer Society. Colonoscopies are recommended for people over 50 or those with a family history of colorectal cancer.

The algorithm runs on HP multicore systems as well as Microsoft's HPC (high-performance computing) Server 2008 and .NET 4.0 Framework. In addition, it incorporates Vectorform's image viewer and Intel's Parallel Studio 2011 to reduce the time of virtual colonoscopies, or CT colonographies, from 30 minutes to about 3 minutes, Yoshida told eWEEK.

Instead of sending a tube with a camera up from the anus to the cecum to look for polyps and if necessary remove them, imagery makes the process simpler. From the CT scan "we are able to reconstruct the colon and see the inside of a colon surface as if we are doing the colonoscopy exam through a camera," Yoshida said.

"The idea is you want to be able to have the results back while you're in front of the patient if it's at all possible," Steve Aylward, Microsoft's general manager for commercial health and life sciences, told eWEEK.

"The program's goal is to reduce waiting times for patients and medical experts by increasing the performance, reliability, and speed in processing and displaying images," James Reinders, Intel's director of software products, wrote in an e-mail to eWEEK. "By combining HP systems and Microsoft's technologies with the Intel Parallel Studio 2011 developer tool suite, Massachusetts General's radiology application is operating up to 10 times faster in demo trials, dramatically improving operational efficiency and easing the lives of patients."

Massachusetts General sought the advice of Microsoft and Intel to see how the virtual colonoscopies could be speeded up while avoiding the invasive procedures, chalky laxatives, sedation and higher costs of traditional colonoscopies, according to Aylward.

In the virtual colonoscopy method, patients don't have to take a laxative to clean out their system, according to Yoshida. The images are able to electronically remove the fecal matter so it doesn't affect the test results, he explained.

"Rather than take that chalky laxative and do the preparation the day before, what that does is automatically tag the items inside you and can flag the potential polops—it's done in a matter of minutes," Aylward said.

If doctors discover a polyp during a virtual colonoscopy, however, patients will then need to undergo the traditional procedure.

With virtual colonoscopies, doctors and researchers wanted to reduce the time of patient preparation and recovery (two days combined).

"The expense of doing the virtual procedures is a fraction of what the traditional optical process is," Aylward said. "We're talking hundreds of dollars as opposed to multiple thousands of dollars to do the traditional procedure."

Various components of Intel Parallel Studio 2011, including Parallel Advisor and Parallel Amplifier, helped make the algorithms run faster, according to Intel's Reinders. 

The software analyzed Massachusetts General's virtual colonoscopy code to see how it could be optimized. Intel PP (Performance Primitives) substituted the standard code for faster versions of the data, Parallel Amplifier increased the speed of the code and Parallel Inspector fixed memory and threading errors. 

"The net effect of these optimization and correctness-checking steps resulted in a code that was up to 10 times faster than the original." Reinders said.

imageMeanwhile, VectoForm built a 3D image viewer running on Microsoft's .NET to allow the images to be viewed on any screen, including mobile devices. Multitouch capabilities in Windows 7 allow gastroenterologists flexibility in positioning the 3D images. In addition, by using the Windows Azure platform, the virtual colonoscopy will eventually be performed using cloud computing. [Emphasis added.]

"Hosting this type of service in the cloud allows the hospital or physician to pay per use, not have to pay for the infrastructure onsite and not have to pay for it themselves," Aylward said.

The virtual colonoscopy method is currently in a "point of concept" phase and may be available for use in the third quarter of 2011, Yoshida said.

Bill Crounse, MD added more background about the new procedure in his 11/30/2010 Fine tuning virtual colonoscopy—a faster, better, less expensive screening test for colon cancer? post to the Microsoft HealthBlog:

I happen to be old enough to have had a screening colonoscopy on two occasions.  As anyone who has had this procedure performed will know, the exam itself isn’t nearly as bad as the preparation for it.  With good sedation, one hardly remembers the procedure.  However, drinking a gallon or so of go-litely bowel prep and then waiting for your gut to evacuate has never been high on this doctor’s list of favorite things to do.  For that reason alone, many people avoid having a screening colonoscopy even though it is a test that can save lives by detecting cancer early.

Avoiding the nasty prep and the invasive (some would say embarrassing) test are reasons why many people are attracted to an alternate test called “virtual” screening colonoscopy.  In this test, a CT scanner replaces the colonoscope, but unfortunately the gut-cleaning prep is still required.  The American Cancer Society has added virtual colonoscopy to its list of recommended screenings and studies have shown that the virtual exam is as reliable as the scope method in finding polyps or cancer (New England Journal of Medicine).

Now, clinicians at the Massachusetts General Hospital are testing a concept that could make virtual colonoscopy faster, less expensive and even easier for the patient.  Using the magic of computers and software, it is possible to digitally remove the normal contents of the gut and make the inside of the gut appear just as if the patient had done the usual bowel evacuation prep.  However, one issue with the virtual “cleaning” has been the amount of time needed to process the CT images so they can be interpreted by clinicians.  The time needed to run the necessary computer algorithms can take up to 60 minutes for each exam, far too long to be practical.  So Dr. Hiro Yoshida, a researcher with Massachusetts General who developed the datagram transport layer security (DTLS) algorithm used to electronically clean virtual colonoscopy images, knew it needed to be done faster.  Using Microsoft’s high performance computing (HPC) platform, Microsoft .Net 4.0, and the Intel Parallel Studio 2011 developer tool suite, researchers were able to reduce the time needed to run the algorithms from 60 minutes to just 3 minutes.  This might very well lead to faster, less expensive virtual exams that are also much more pleasant for patients because no bowel prep is required.

The collaboration also:

  • demonstrates advanced visualization capabilities with 2D to 3D conversion, gesture-based (multi-touch) navigation and computer-aided diagnostics on Windows 7 powered devices, including a Slate device, Windows Phone 7, or desktop PC
  • leverages Intel’s multi-core processor technologies, enabling programmers to make performance and reliability improvements to applications
  • uses a fully parallelized GPU-based volume rendering engine developed by Microsoft Research
  • demonstrates the capability to do high performance CPU and GPU-based computing with Windows HPC and .NET for colon cancer screening

Virtual Colonoscopy courtesy Massachusetts General Hospital

One disadvantage of virtual colonoscopy is that if a polyp or cancer is discovered with the CT scan, a patient must undergo a standard colonoscopy for final diagnosis and treatment.  However, as a screening test the virtual exam is likely to be favored by many patients who would rather avoid the prep and inconvenience associated with traditional colonoscopy.  By working together, computer scientists and clinicians may be finding ways to make virtual colonoscopy even better and more attractive to patients.  If so, many more lives could be saved.

Dr. Crounse is Microsoft’s Senior Director of Worldwide Health  

Wally McClure detailed My challenges in upgrading my Windows Azure App from SDK 1.2 to 1.3 in a 12/10/2010 post:

image I finally got around yesterday evening to updating my Windows Azure Application from the 1.2 SDK to the 1.3 SDK.  Wow, what a fun strange trip that was.  Yes Virginia, it is possible to upgrade.  I ran across the following two issues:

  • imageDon't try to use beta features if you aren't signed up for them.  Ok, this one should have been obvious, unfortunately, it wasn't.  What was my first thought when I started my upgrade? Of course, it was to try out the ExtraSmall VM.  So, I plugged that into my .csdef file..........and my deployment promptly died with no info as to what had happened beyond some cryptic messages.  Finally, I dug through and figure out that the "Windows.Azure.......PassWord.Encryption.........IDonCareWhatYouTriedToDoException" meant that my app did not have the ExtraSmall VM beta allowed on Azure.  I went back to the Small VM Size to deploy my app with, and the app deployed.  Problem #1 solved.
  • Did I test my app after I deployed it?  Of course not.  Its my test app. There are a few users, but nothing huge.  I logged in this morning, and bomb, the app died.  WTF had I done?  WTF was broken due to moving from 1.2 to 1.3?  I was getting this error

    “SetConfigurationSettingPublisher needs to be called before FromConfigurationSetting can be used”

    I looked at the error, set some break points and ran locally. Boom, it didn't work here either.  Great, what was wrong.  Finally, I found the answer. The change to using the Full IIS7 instead of the Web Core had introduced a breaking change in the app.  So, I copied some code from my WebRole.cs to my Global.asax and things worked.  Sweetness.  Here is the code, it will look a little familiar to someone developing in Azure.

        void Application_Start(object sender, EventArgs e)
            // Code that runs on application startup
                 (configName, configSettingPublisher) =>
                     var connectionString = RoleEnvironment.IsAvailable
                     ? RoleEnvironment.GetConfigurationSettingValue(configName)
                     : ConfigurationManager.AppSettings[configName];
                     connectionString = RoleEnvironment
            // For information on handling configuration changes
            // see the MSDN topic at
            RoleEnvironment.Changing += RoleEnvironmentChanging;
        private void RoleEnvironmentChanging(object sender, RoleEnvironmentChangingEventArgs e)
            // If a configuration setting is changing
            if (e.Changes.Any(change => change is RoleEnvironmentConfigurationSettingChange))
                // Set e.Cancel to true to restart this role instance
                e.Cancel = true;
Now, my Azure app is back running and chugging along. Hopefully, this helps you out some.

My Strange Behavior of Windows Azure Platform Training Kit with Windows Azure SDK v1.3 under 64-bit Windows 7 provides details of problems upgrading the WAPTK v1.3 demo apps to the Windows Azure SDK v1.3.

Mike Wood reported New PowerShell Cmdlets on Azure in a 12/9/2010 post:

imageOne of the new features released within the last month on Windows Azure was the Remote Desktop capability.  This gives you the ability to connect into one of your running instances via the standard Remote Desktop capability that most operations folks would be used to.  If you want some quick instruction to get this set up for yourself check out the MSDN docs.  If you want to hear about it from Ryan Dunn and Steve Marx, you can watch the recent Channel 9 Cloud Cover episode (the remote desktop feature discussions start at about 27:08 in the video). Pay really close attention to their warnings about making changes to your instances via RDP without updating your deployment so that those changes are included in the next deployment of that service.  That really is the danger of this type of feature in the Azure environment.

imageSomething that Ryan and Steve didn’t talk about was that there are a few new PowerShell Cmdlets that are included only on Azure Role Instances.  These were showed off a bit at PDC during one of the demos.  Ever since then I’ve been waiting to see all the cmdlets that would be made available.  Turns out, there are four:

  • Set-RoleInstanceStatus – Set the status of the instance to either Busy or Ready.  (This is what was shown in the PDC Demo)
  • Get-ConfigurationSettting – Get a value from the ServiceConfiguration file.
  • Get-LocalResource – Get a handle to a local resource that was defined in the service definition file for the service.
  • Get-RoleInstance – Get role instance information for the current instance or another instance in the deployment.

To get started with using these cmdlets after you remote into the instance you’ll first have to load the snappin that contains them (which is something I originally forgot about and thought that they didn’t make it into the release).  Open up PowerShell on the remote instance and type the following:

Get-PsSnapin –registered


This will get a list of the registered PowerShell Snappins that are registered on the machine, but aren’t loaded.  This step really isn’t necessary, other than you’ll need the name of the snappin before you can load it.  Once you know the name you can skip straight to loading it with:

Add-PSSnapin Microsoft.WindowsAzure.ServiceRuntime

This tells PowerShell to load up the snapin and make the cmdlets available to the PowerShell session.  Now we can get the list of available commands by using the following:

Get-Command -PsSnapin Microsoft.WindowsAzure.ServiceRuntime


You can then use PowerShell’s excellent built in command discovery by typing help {commandName} –full, so for example:

Help Get-ConfigurationSetting -full

The Cmdlets

The Set-RoleInstanceStatus cmdlet makes a lot of sense for the main use case of the Remote Desktop feature – troubleshooting issues.  You can remote onto the machine and then use this cmdlet to take the instance out of the load balancer rotation by calling:

Set-RoleInstanceStatus -Busy

This will inform the fabric to take the instance out of the load balancer rotation, but it only lasts as long as the PowerShell instance that called it is running (even if the PowerShell process dies unexpectedly! I killed the process just to see how it would react and the instance came back on line), or until you call the cmdlet again with a -Ready argument.  Note that this isn’t instantaneous.  There is some time where the fabric agent communicates this change to the fabric and then the load balancers are updated.  But, if you run this command you should see the instance go “busy” in the management portal within a minute or so.  I did this with a single instance deployment and when I attempt to hit the site I got “Connection failed”, which makes sense.

The Get-ConfigurationSetting cmdlet could be useful in scripts to help pull out what is in the service configuration file for specific settings.  Note that you have to ask for a setting that is configured, you can’t use this to get a list of the settings or iterate them. 

You can use Get-LocalResource to get at the local resources that were defined in the ServiceDefinition file and exposes the RoleEnvironment.GetLocalResource method to PowerShell.  This would be useful for getting the physical path on the server where the local resource is stored. So if you have a local resource set aside for scratch file IO during your processing you can use this cmdlet to get the physical directory and take a peek at the on machine local store.  For troubleshooting this makes a lot of sense.

The Get-RoleInstance cmdlet is probably the most powerful cmdlet of the four.  It gives you the ability to take a peek into the deployment by getting access to data for any of the running instances of the deployment.  You can get at data for the roles that are running, number of instances of each role, how they are spread across update and fault domains and the endpoints for each of the instances. 


It will be interesting to see if/what other cmdlets appear on the Azure boxes in the future.  While I think the main idea of these cmdlets is to help in troubleshooting issues on deployed instances I think I could also see a use for them during instance start ups for scripts run in PowerShell (thought I’ve not tried doing that…maybe another blog post).

<Return to section navigation list> 

Visual Studio LightSwitch

image2224222No significant articles today.

Return to section navigation list> 

Windows Azure Infrastructure

Avkash Chauhan reported Publishing Azure package may st[i]ck in 95% upload status when using new Azure Portal and Chrome Browser on 12/10/2010 to his MSDN blog:

image We have seen a problem with new Azure portal and Chrome browser. The problem is that if you login Azure Portal in Chrome Browser and publish an azure service package, the upload process will always stuck with 95% and just stuck their forever. If you try to resize the windows the browser might crash as well.

imageThis is a known issue with Azure portal and Azure Portal team is working hard to fix this problem. The current  workaround is to use IE to solve this problem.

<Return to section navigation list> 

Windows Azure Platform Appliance (WAPA), VM Roles, Hyper-V and Private Clouds

Sam Vanhoutte (@SamVanhoutte) explained Creating your own virtual machine on Azure: Introducing VM Role on 12/11/2010 in a post to the CODit blog:

image Apparently, I was lucky to be in the first batch of beta participants for the 3 new Azure beta programs:

  • I received the confirmation for the extra small instances rather fast
  • Wednesday evening, I received confirmation for the Azure Connect beta program.  You can read my post on that here: Azure Connect: VPN as a Service, a quick introduction.
  • And later that night, I received confirmation for the VM Role beta program.


This post will demonstrate how to upload a virtual hard disk to the Azure storage and how to configure it to run it as your own instance(s) in the cloud.

Creating the VHD

Microsoft Hyper-V on server, or VMWare workstation on laptop?

I was very happy with Windows Virtual PC.  It was lightweight, free and pretty stable.  It allowed us, developers, to use it on our laptops on the road, at our customers, in the office.  But there is one important thing: Virtual PC does not support a 64-bit OS on a guest system.  And that became a big problem with the introduction of Windows Server 2008 R2; that operating system is only available in 64-bit, which makes sense. 

Because of this, we needed an alternative:

  • Hyper-V: free, stable, Microsoft-product, but not running on Windows 7
  • VMWare Workstation: not free, stable, non-Microsoft product, but running on Windows 7

So, I decided to use VMWare for the creation of my virtual machines for Azure. 

To leverage the full capabilities of the VM Roles, it would be better to use Hyper-V though, since that allows the usage of differencing disks, which would be a big timesaver for incremental updates to a virtual machine image.  But for now, I’ve used VMWare, since I didn’t have a physical Hyper-V server available.

Preparing the Virtual Machine

I took the following steps to create the Virtual machine.

  • Created a virtual machine
  • Installed Windows Server 2008 R2 Enterprise edition on it
  • I enabled the Application Server and Web Role (and therefore also the .NET Framework 3.5.1 features)
  • I installed all updates, but disabled the auto update feature.
Installing the Windows Azure Integration Components
  • The Windows Azure Integration Components are a set of components that are needed to be able to communicate with the Windows Azure Fabric controller.  They need to be installed on the virtual machine, prior to uploading the machine to the Azure storage.
  • Load the iso file that is found in the Azure SDK (iso directory - wavmroleic.iso) as a DVD/IDE in your virtual machine
  • Start the installer on the virtual machine, by double clicking WaIntegrationComponents-x64.msi
  • Follow the installation screen (Next-next-finish)

Please make sure that the firewall windows service is enabled (see ‘Things to know’ at the end of this post)

Sysprepping the Virtual machine

There are some specificities with VM Roles that are different from a typical hosting/IaaS solution.  I discussed some of these in my previous post: Hosting BizTalk Server on Azure VM Role.  One of the important things to know is that the machine name of the Virtual Machine is changed and assigned at runtime by the Azure Fabric controller.  This makes sense, since it is possible to configure multiple instances for a VM Role.

To make the virtual machine ready for these actions, the machine needs to be ‘sysprepped’.  This is a typical action that is taken when creating a base image for a machine.  It provides a different SID and name for the machine.

  • Run the %windir%\system32\sysprep\sysprep.exe tool
  • In the popup windows, select the “Enter System Out-of-Box experience (OOBE)” for the System Cleanup Action
  • Make sure that Generalize is selected and select Shutdown in the Shutdown options.
  • OK

vm0 sysprep

[Only with VMWare] : Convert to VHD file
  • This step is only needed when using VMWare. 
  • I used the tool VMDK to VHD Converter and that allowed me to convert my VMWare virtual disk to a VHD.

Uploading the Virtual Hard disk (VHD)

The VHD file now needs to be uploaded.  For this, the Azure SDK 1.3 comes with a command-line tool: CSUPLOAD.  Using this tool, we can upload the VHD to our Azure subscription.

  • Open the Azure SDK Command prompt
  • Type the following command to link the context to your current subscription:
    csupload Set-Connection “SubscriptionId=[SubscriptionId];CertificateThumbprint=[tp]”
    The subscription id can be found on the management portal in the property grid, when selecting a subscription.
    The certificate thumbprint needs to be a private certificate that is installed on the local machine and that is linked to the management certificates on the Azure portal.  On the Azure portal, you can easily copy the thumbprint from the property grid.
  • Now the upload command needs to be done
    csupload Add-VMImage –LiteralPath [localpathtovhd] –Location “West Europe” –Name [cloudname]
    Passing in the localpath to the VHD is important.  Next to that, the location (data center) needs to be defined where the page blob will be created.  And optionally you can specify a friendly name that will be used on the Azure management portal.
  • When this is done, a verification window pops up.  This indicates that the image will be mounted to verify some settings (like Operating System, etc).  This verification is important because it can prevent you from uploading a VHD of several GBs, and then finding out there is an incorrect configuration.
    If the operating system you’re uploading the VHD from does not support mounting a VHD, you can use the parameter –SkipVerify of the CSUpload tool to skip this step.
  • After the verification, the upload to the Azure data center starts and that seems to be a good moment to have a sleep (like I did), take a break, or do something else.  The upload typically will take some hours and therefore, the csupload tool is recoverable from connection errors.
  • As soon as the upload process starts, you can already look at the Azure portal and see the status of the VHD as pending.  The VHD is created as a page blob.
  • After the upload is completed, you can see the status of the VHD to committed on the Azure portal.  When the connection breaks, it is possible to re-execute the command and the upload will resume where it stopped.  (see ‘Things to know’)

Configure the Azure Role Service

Create Visual Studio Project
  • In Visual Studio (after installing the Azure SDK 1.3 and the VMRole Beta unlocker), you can create a new Visual Studio Cloud project and add a VMRole to the Cloud project.
  • On the property page of the VMRole project, you can specify the credentials and the VHD that is uploaded to the Azure platform.
  • On the Configuration tab page, you can specify the number of instances and the size of the instances (notice that I can use the Extra small instance size here, part of the beta program)
  • It is also required to open the endpoints when needed (like when IIS needs to be used, you’ll typically create an HTTP endpoint on port 80 or 443)
Publishing the project to Azure
  • Now it is straightforward to publish this project to Azure, using Visual Studio, or by uploading it on the Azure management portal.
  • After the upload is done, you can see on the management portal the process of the Virtual machine.
  • The roles take the following statuses
    • Initializing
    • Waiting for host
    • Setting up Windows for first use
    • Starting Windows
    • Starting Traffic
    • Ready


Things to know & troubleshooting

Don’t disable the Windows Firewall service

On my virtual machine (locally), I had disabled the Windows Firewall service.  Because of that, the installation of the Windows Azure Integration Components failed with the following exception (copying it here for bing/google search):
Product: Windows Azure VM Role Integration Components -- Error 1720. There is a problem with this Windows Installer package. A script required for this install to complete could not be run. Contact your support personnel or package vendor. Custom action InstallScriptAction script error -2147023143, : Line 81, Column 5,

After I enabled the Windows firewall service again, things worked smooth again.

Resuming the upload
  • Just before the upload finished, I received a timeout exception on the csupload command.  So, I decided to find out if the resumability of the CSUpload tool works as announced.
  • I re-executed the command the 1st time and I noticed the upload started again at 3,2%…  Knowing that I’ve seen the upload pending at 98%, the last time I checked, I decided to retry it again.  (ctrl+c to exit the command)
  • The second time, the resume started at 87,3%.  So I tried again.
  • And, third time lucky!, now I immediately received a confirmation of the upload.  And the management portal also reflected this well.
Don’t install the Windows Azure SDK on the Virtual Machine

It is not supported to install the Windows Azure SDK inside a VM intended for use with VM role. The Windows Azure SDK and the Integration Components are mutually exclusive.

No support for startup tasks

Unfortunately there seems to be no support to use Startup tasks in VMRoles.  So startup tasks will have to be ‘embedded’ in the Virtual Machine startup process.  This might change after the beta, off course.


Again, just as with the Azure Connect, I was very surprised by the simplicity of the process of getting things to work.  Off course, it takes more time, because of the preparation and the upload, but besides that, things looks solid and stable.

Things that I hope to see soon:

  • Base images on Azure that can be downloaded, so that only the differencing disks need to be uploaded.  This would save a lot of time and bandwidth.  (download is cheaper/faster than upload)
  • Start up tasks.

Great stuff.

Great tutorial, Sam!

<Return to section navigation list> 

Cloud Security and Governance


No significant articles today.

<Return to section navigation list> 

Cloud Computing Events

• Joey deVilla reminded erstwhile Windows Azure developers Don’t Forget AzureFest This Saturday! 12/11/2010:


What is AzureFest?
  • AzureFest is an event where you’ll see how quick and easy it is to develop and deploy cloud applications with Windows Azure.
  • We’ll show you how to set up an Azure account and then take a traditional ASP.NET application and turn it into an Azure application.
  • AzureFest is run by ObjectSharp with the assistance of Microsoft Canada. ObjectSharp is a Toronto-based company that specializes in building software for customers and training developers (and they‘ve helped us a great deal with TechDays!). ObjectSharp is a Microsoft Gold Certified Partner. They’re also very personable, very funny people.
  • AzureFest happens this Saturday, December 11th at Microsoft Canada’s Mississauga office (1950 Meadowvale, off Mississauga Road, just north of Highway 401), and there are two sessions:
    • Morning session: 9:00 a.m. to noon
    • Afternoon session: 1:00 p.m. to 4:00 p.m.
  • Azurefest is FREE! Just visit the AzureFest page for registration details.
What Will You Learn?
  • How to set up your Azure account
  • How to take a traditional ASP.NET application that would typically live in an on-premises server or at a hosting service and deploy it to the Azure cloud
  • How to publish applications to the Azure Developer Portal
  • How to set up the Azure SDK and Azure Tools for Visual Studio on your laptop
  • How to use AppFabric
What Will We Provide?
  • The tools you’ll need to install on your machine to prepare yourself for Azure
  • Hands-on instruction and expert assistance
  • Post-event technical resources so you can continue learning afterwards and take your skills to the next level
  • Power and network access
  • Snacks and refreshments
What Do You Need to Bring?
  • Your own laptop, which should be capable of running Microsoft’s developer tools (A machine with a decent processor and RAM running at least Windows Vista, or preferably, Windows 7)
  • A credit card – the event is free, but activating an Azure account requires a credit card, even if you’re setting up for a trial period.
  • Some experience building ASP.NET applications and services
What’s This About a Bonus for My User Group?
  • For each Azure account activation by a member of a Microsoft/.NET user group at AzureFest, we’ll donate $25.00 for that person’s user group. So invite all your friends from your user group to come to AzureFest and give your user group some extra funding! (When you register for AzureFest, the registration page will ask them which user group they belong to, so we’ll know how much to give each user group.)

This article also appears in Global Nerdy.

Mike Wood announced on 12/11/2010 MSDN Boot Camps coming up in Ohio and Indiana:

image Microsoft is planning a road show of two development focuses boot camps: Windows Azure Boot Camps and Windows 7 Development Boot Camps. 

The Windows Azure Boot Camps are two day events focusing on getting you hands on experience deploying applications to the cloud.  I’ve been a trainer at several of these events and the people I’ve talked to about them afterwards have all said they got a lot of good information from them.  The boot camps cover the Azure Platform and how to leverage the cloud.


For Azure Boot Camps in my area we have:

Cincinnati – Jan 25-26
Columbus – Feb 8-9

Check out the full schedule for a city near you.

The FTW (For the Win) Windows 7 Development Boot Camps you can expect a day of interactive sessions covering things like jump lists, IE 9 Development, multi-touch interfaces, location and sensor integration.  If you write code that will run on a Windows client you will get benefit from this boot camp. 

For the FTW events in my area:

Indianapolis – Jan 19
Columbus – Jan 25
Cincinnati – Jan 27

Both of these Boot Camps are hands on, interactive sessions.  You’ll want to bring your laptop and have the tools loaded (see the event details for what to get loaded up).  If you have questions about these I’d recommend talking to your local Microsoft Developer Evangelist (in my neck of the woods that’s Jennifer Marsman, Jeff Blankenburg and Brian H. Prince).

Bill Zack reminded erstwhile developers on 12/11/2010 about Free Windows Azure Training on 12/15 through 12/17/2010:


Here is an excellent opportunity to get free Windows Azure training.  If you are exploring the possibilities with cloud-based applications and Windows Azure then you don’t want to miss this free live training course. “Developing Cloud Applications with Windows Azure”. It will be delivered via virtual classroom on December 15, 16, and 17, 2010 from 12:00-4:00pm PST and will include hands-on labs, real-world examples, and new content from the Windows Azure team.

This course is designed for advanced Application Developers, Software Engineers, Application & Solutions Architects, Software Product Managers, CTOs and Enterprise Architects; and will be team taught by two of the most respected technologists in the field in an interactive engaging radio talk show format:

  • Dave Platt is a developer, consultant, author, and “.NET Professor” at Harvard University. He’s written nearly a dozen Microsoft Press books on web services and various aspects of programming, and is an accomplished instructor and presenter. Dave was named by Microsoft as a Software Legend in 2002. 
  • Manu Cohen‐Yashar is a leading global expert in Microsoft Cloud technologies, Distributed Systems and Application Security. Through hands-on engagement with enterprise customers and his position as professor at SELA Technical College in Tel Aviv, Manu brings a uniquely pragmatic approach to software development. He is author to many courses and is a top-rated speaker at Microsoft TechEd & Metro Early Adoption Programs.

Register now for all three sessions!
Session 1
| Dec. 15, 2010 from 12pm-4pm PST
Session 2
| Dec. 16, 2010 from 12pm-4pm PST
Session 3
| Dec. 17, 2010 from 12pm-4pm PST

For more details, see my Windows Azure and Cloud Computing Posts for 12/9/2010+ post.

<Return to section navigation list> 

Other Cloud Computing Platforms and Services

Adam DuVander posted 50 New APIs This Week: USA Today, Google, SimpleGeo and Shutterfly to the ProgrammableWeb blog on 12/12/2010:

This week we had 50 new APIs added to our API directory, including the 17 we covered earlier. The 33 remaining include an online coupons and discounts service, shipping rate calculator, online payment service, cloud-based telephony application platform, website reputation service and a photo sharing platform.

Among the more interesting items:

AuctionInc ShippingAuctionInc Shipping API: AuctionInc provides e-commerce and auction tools, including a full-featured shipping rate calculator. The services calculates shipping rates, including insurance, taxes and handling, for UPS, FedEx, DHL and the USPS. Include local pickup, drop shipping from multiple locations and intelligent packaging for multiple items. The service is available as an API using XML-RPC. API: is a database service in the cloud coming in 2011. It’s services include database tools, file storage, a social data model, identification and authentication, and a developer API. With the API, developers can create their own applications in a variety of languages for a variety of platforms.

DoubleClickDoubleClick API: DoubleClick is a large advertising network specializing in online display ads. The company is a subsidiary of Google and provides an API for customers to manage their accounts. DoubleClick’s API is SOAP-based and does not provide helper libraries, so developers need to interact directly with the service using XML. There are currently 19 separate WSDLs for describing the services and commands available, including procedures for logging users in and organizing advertising campaigns.

Google GeocodingGoogle Geocoding API: The Google Geocoding API is a webservice for the Google Maps API. The Geocoding API provides a direct way to access a geocoder via an HTTP request, though it must be used in conjunction with Google Maps. Additionally, the service allows you to perform the converse operation (turning coordinates into addresses); this process is known as “reverse geocoding.” Also, you can choose to limit your search to certain bounded areas (Viewport Biasing) or regions (Region Biasing). API: The service is a simple online storage solution. The site provides several upload options and responds to new uploads with a short link and a secret deletion link. The API provides access to the major functionality of the service. Applications using the API can upload files, edit descriptions and get information about files, folders and user accounts.

Ixaris OpnIxaris Opn API: Ixaris Opn is a platform for external financial networks and allows third-party financial institutions to provide payment services through publicly-available APIs. For developers, Ixaris makes complex global payments easy. Full documentation is not yet available, though we did cover the Secret Path Developer Challenge on the blog.

KooKooKooKoo API: KooKoo acts as an interface between your applications and telephony applications, allowing users to setup an extra delivery channel for their web applications. It can perform telephony applications like placing a call, receiving calls, send sms,gathering user input etc. It acts as another web page in an application which is accessible from the phone rather than the browser. With KooKoo users can build telecom applications, IVRs (Interactive Voice Response), office PBX and outbound campaigns. The API lets users take advantage of their existing web development skills to build telephony apps. The API uses RESTful protocol and responses are formatted in XML.

OCR TerminalOCR Terminal API: This API has been created to let users and partners interact programmatically with OCR Terminal. OCR Terminal is an online OCR (optical character recognition) service that allows you to convert PDF to Word, JPEG to Word and scanned images into editable documents. A HTTP based API for OCR – it allows developers to upload scanned images(JPEG, TIFF etc) or PDFs and convert them to editable document formats (like DOC, RTF, TXT) or searchable PDFs. Full documentation is available by request.

ShutterflyShutterfly API: Shutterfly is an online photo printing service. Using the Shutterfly API, developers can create new and innovative applications using the Shutterfly service. The API can upload and organize images, authenticate Shutterfly users and even place orders.

SimpleGeo ContextSimpleGeo Context API: SimpleGeo Context provides relevant contextual information such as weather, demographics, or neighborhood data for a specific location. You give SimpleGeo Context a latitude and longitude, and it turns that point into structured data that is meaningful and useful to real people. The SimpleGeo API lets developers make more meaningful location apps by using a latitude/longitude to look up real-time information on a location.

SimpleGeo PlacesSimpleGeo Places API: The SimpleGeo Places is a free database of business listings and points of interest (POIs) that enables real-time community collaboration. Using proprietary data processes, SimpleGeo Places leverages a mixture of crowdsourcing and automation technologies to clean, update, and validate place data that our community creates in real-time. The SimpleGeo API Tools and SDKs make it faster and much simpler for developers to take their location-aware app ideas from sketch to launch.

USA Today Best-Selling BooksUSA Today Best-Selling Books API: USA Today’s Best-Selling Books API provides a method for developers to retrieve USA TOday’s weekly compiled list of the nation’s best-selling books, which is published each Thursday. In addition, developers can also retrieve archived lists since the book list’s launch on Thursday, Oct. 28, 1993. The Best-Selling Books API can also be used to retrieve a title’s history on the list and metadata about each title.

USA Today Sports SalariesUSA Today Sports Salaries API: USA Today’s sports databases contain information about the salaries for baseball, football, basketball, and hockey players for the past few years (since 1988 for baseball, and around 2000 for all others). In addition to salaries, there is extensive information for the MLB, NFL, NBA and NHL available, including player, position, and team data.The USA Today Salaries API allows developers to programmatically access the USA TODAY Sports Salaries database.

David Makogon described working with MongoDB and Windows Azure in this 00:46:23 video cllip:


Unfortunately, the videographer used the camcorder’s mic, so the audio quality is poor.

David is a former Windows Azure MVP and now is a Microsoft Azure developer-evangelist.

Anand Rajaraman (Kosmix, Inc.) and Jeffrey D. Ullman (Stanford Univ.) made their 300-page Mining of Massive Datasets book publically available. From the Preface:

This book evolved from material developed over several years by Anand Rajaraman
and Jeff Ullman for a one-quarter course at Stanford. The course
CS345A, titled “Web Mining,” was designed as an advanced graduate course,
although it has become accessible and interesting to advanced undergraduates.

What the Book Is About

At the highest level of description, this book is about data mining. However,
it focuses on data mining of very large amounts of data, that is, data so large
it does not fit in main memory. Because of the emphasis on size, many of our
examples are about the Web or data derived from the Web. Further, the book
takes an algorithmic point of view: data mining is about applying algorithms
to data, rather than using data to “train” a machine-learning engine of some
sort. The principal topics covered are:

  1. Distributed file systems and map-reduce as a tool for creating parallel
    algorithms that succeed on very large amounts of data.
  2. Similarity search, including the key techniques of minhashing and localitysensitive hashing.
  3. Data-stream processing and specialized algorithms for dealing with data
    that arrives so fast it must be processed immediately or lost.
  4. The technology of search engines, including Google’s PageRank, link-spam
    detection, and the hubs-and-authorities approach.
  5. Frequent-itemset mining, including association rules, market-baskets, the
    A-Priori Algorithm and its improvements.
  6. Algorithms for clustering very large, high-dimensional datasets.
  7. Two key problems for Web applications: managing advertising and recommendation systems.


The book contains extensive exercises, with some for almost every section. We
indicate harder exercises or parts of exercises with an exclamation point. The
hardest exercises have a double exclamation point.

Support on the Web

You can find materials from past offerings of CS345A at: There, you will find slides, homework assignments, project requirements, and in some cases, exams.


We would like to thank Foto Afrati and Arun Marathe for critical readings of
the draft of this manuscript. Errors were also reported by Shrey Gupta, Mark
Storus, and Roshan Sumbaly. The remaining errors are ours, of course.

<Return to section navigation list>