Wednesday, May 02, 2012

Windows Azure and Cloud Computing Posts for 4/28/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI Access Control, Connect, SQL Azure Database, and other cloud-computing articles. image222


Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

Azure Blob, Drive, Table, Queue and Hadoop Services

Carl Nolan (@carl_nolan) described a Generic based Framework for .Net Hadoop MapReduce Job Submission on 4/29/2012:

imageOver the past month I have been working on a framework to allow composition and submission of MapReduce jobs using .Net. I have put together two previous blog posts on this, so rather than put together a third on the latest change I thought I would create a final composite post. To understand why lets run through a quick version history of the code:

  1. Initial release where the values are treated as strings, and serialization was handled through Object.ToString()
  2. Made minor modifications to the submission APIs
  3. Modified the Reducer and Combiner types to allow In-Reducer optimizations through the ability to yield a Tuple of the key and value in the Cleanup
  4. Modified the Combiner and Reducer base classes such that data out of the mapper, in and out of the combiner, and in to the reducer uses a binary formatter; thus changing the base classes from strings to objects; meaning the classes can now cast to the expected type rather than performing string parsing
  5. Added support for multiple mapper keys; with supporting utilities

image_thumb1[1]The latest change takes advantage of the fact the objects are serialized in Binary format. This change has allowed for the base abstract classes to move away from object based APIs to one based on Generics. This change hopefully greatly simplifies the creation of .Net MapReduce jobs.

As always to submit MapReduce jobs one can use the following command line syntax:

MSDN.Hadoop.Submission.Console.exe -input "mobile/data/debug/sampledata.txt" -output "mobile/querytimes/debug"
-mapper "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryMapper, MSDN.Hadoop.MapReduceFSharp"
-reducer "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryReducer, MSDN.Hadoop.MapReduceFSharp"
-file "%HOMEPATH%\MSDN.Hadoop.MapReduce\Release\MSDN.Hadoop.MapReduceFSharp.dll"


The mapper and reducer parameters are .Net types that derive from a base Map and Reduce abstract classes shown below. The input, output, and files options are analogous to the standard Hadoop streaming submissions. The mapper and reducer options (more on a combiner option later) allow one to define a .Net type derived from the appropriate abstract base classes. Under the covers standard Hadoop Streaming is being used, where controlling executables are used to handle the StdIn and StdOut operations and activating the required .Net types. The “file” parameter is required to specify the DLL for the .Net type to be loaded at runtime, in addition to any other required files.

As always the source can be downloaded from:

Mapper and Reducer Base Classes

The following definitions outline the abstract base classes from which one needs to derive. Lets start with the C# definitions:

C# Abstract Classes

  1. namespace MSDN.Hadoop.MapReduceBase
  2. {
  3. [Serializable]
  4. [AbstractClass]
  5. public abstract class MapReduceBase<V2>
  6. {
  7. protected MapReduceBase();
  8. public override IEnumerable<Tuple<string, V2>> Cleanup();
  9. public override void Setup();
  10. }
  11. [Serializable]
  12. [AbstractClass]
  13. public abstract class MapperBaseText<V2> : MapReduceBase<V2>
  14. {
  15. protected MapperBaseText();
  16. public abstract override IEnumerable<Tuple<string, V2>> Map(string value);
  17. }
  18. [Serializable]
  19. [AbstractClass]
  20. public abstract class MapperBaseXml<V2> : MapReduceBase<V2>
  21. {
  22. protected MapperBaseXml();
  23. public abstract override IEnumerable<Tuple<string, V2>> Map(XElement element);
  24. }
  25. [Serializable]
  26. [AbstractClass]
  27. public abstract class MapperBaseBinary<V2> : MapReduceBase<V2>
  28. {
  29. protected MapperBaseBinary();
  30. public abstract override IEnumerable<Tuple<string, V2>> Map(string filename, Stream document);
  31. }
  32. [Serializable]
  33. [AbstractClass]
  34. public abstract class CombinerBase<V2> : MapReduceBase<V2>
  35. {
  36. protected CombinerBase();
  37. public abstract override IEnumerable<Tuple<string, V2>> Combine(string key, IEnumerable<V2> values);
  38. }
  39. [Serializable]
  40. [AbstractClass]
  41. public abstract class ReducerBase<V2, V3> : MapReduceBase<V2>
  42. {
  43. protected ReducerBase();
  44. public abstract override IEnumerable<Tuple<string, V3>> Reduce(string key, IEnumerable<V2> values);
  45. }
  46. }

The equivalent F# definitions are:

F# Abstract Classes

  1. namespace MSDN.Hadoop.MapReduceBase
  2. [<AbstractClass>]
  3. type MapReduceBase<'V2>() =
  4. abstract member Setup: unit -> unit
  5. default this.Setup() = ()
  6. abstract member Cleanup: unit -> IEnumerable<string * 'V2>
  7. default this.Cleanup() = Seq.empty
  8. [<AbstractClass>]
  9. type MapperBaseText<'V2>() =
  10. inherit MapReduceBase<'V2>()
  11. abstract member Map: value:string -> IEnumerable<string * 'V2>
  12. [<AbstractClass>]
  13. type MapperBaseXml<'V2>() =
  14. inherit MapReduceBase<'V2>()
  15. abstract member Map: element:XElement -> IEnumerable<string * 'V2>
  16. [<AbstractClass>]
  17. type MapperBaseBinary<'V2>() =
  18. inherit MapReduceBase<'V2>()
  19. abstract member Map: filename:string -> document:Stream -> IEnumerable<string * 'V2>
  20. [<AbstractClass>]
  21. type CombinerBase<'V2>() =
  22. inherit MapReduceBase<'V2>()
  23. abstract member Combine: key:string -> values:IEnumerable<'V2> -> IEnumerable<string * 'V2>
  24. [<AbstractClass>]
  25. type ReducerBase<'V2, 'V3>() =
  26. inherit MapReduceBase<'V2>()
  27. abstract member Reduce: key:string -> values:IEnumerable<'V2> -> IEnumerable<string * 'V3>

The objective in defining these base classes was to not only support creating .Net Mapper and Reducers but also to provide a means for Setup and Cleanup operations to support In-Place Mapper/Combiner/Reducer optimizations, utilize IEnumerable and sequences for publishing data from all classes, and finally provide a simple submission mechanism analogous to submitting Java based jobs.

The usage of the Generic types V2 and V3 equate to the names used in the Java definitions. The current type of the input into the Mapper is a string (this normally being V1). This is needed as the mapper, in Streaming jobs, performs the projection from the textual input.

For each class a Setup function is provided to allow one to perform tasks related to the instantiation of the class. The Mapper’s Map and Cleanup functions return an IEnumerable consisting of tuples with a Key/Value pair. It is these tuples that represent the mappers output. The returned types are written to file using binary serialization.

The Combiner and Reducer takes in an IEnumerable, for each key, and reduces this into a key/value enumerable. Once again the Cleanup allows for return values, to allow for In-Reducer optimizations.

Binary and XML Processing and Multiple Keys
As one can see from the abstract class definitions the framework also provides support for submitting jobs that support Binary and XML based Mappers. To support using Mappers derived from these types a “format” submission parameter is required. Supported values being Text, Binary, and XML; the default value being “Text”.

To submit a binary streaming job one just has to use a Mapper derived from the MapperBaseBinary abstract class and use the binary format specification:

-format Binary

In this case the input into the Mapper will be a Stream object that represents a complete binary document instance.

To submit an XML streaming job one just has to use a Mapper derived from the MapperBaseXml abstract class and use the XML format specification, along with a node to be processed within the XML documents:

-format XML –nodename Node

In this case the input into the Mapper will be an XElement node derived from the XML document based on the nodename parameter.

Using multiple keys from the Mapper is a two-step process. Firstly the Mapper needs to be modified to output a string based key in the correct format. This is done by passing the set of string key values into the Utilities.FormatKeys() function. This concatenates the keys using the necessary tab character. Secondly, the job has to be submitted specifying the expected number of keys:

MSDN.Hadoop.Submission.Console.exe -input "stores/demographics" -output "stores/banking"
-mapper "MSDN.Hadoop.MapReduceFSharp.StoreXmlElementMapper, MSDN.Hadoop.MapReduceFSharp"
-reducer "MSDN.Hadoop.MapReduceFSharp.StoreXmlElementReducer, MSDN.Hadoop.MapReduceFSharp"
-file "%HOMEPATH%\Projects\MSDN.Hadoop.MapReduce\Release\MSDN.Hadoop.MapReduceFSharp.dll"
-nodename Store -format Xml -numberKeys 2

This parameter equates to the necessary Hadoop job configuration parameter.


To demonstrate the submission framework, here are some sample Mappers and Reducers with the corresponding command line submissions:

C# Mobile Phone Range (with In-Mapper optimization)

Calculates the mobile phone query time range for a device with an In-Mapper optimization yielding just the Min and Max values:

  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Text;
  5. using MSDN.Hadoop.MapReduceBase;
  6. namespace MSDN.Hadoop.MapReduceCSharp
  7. {
  8. public class MobilePhoneRangeMapper : MapperBaseText<TimeSpan>
  9. {
  10. private Dictionary<string, Tuple<TimeSpan, TimeSpan>> ranges;
  11. private Tuple<string, TimeSpan> GetLineValue(string value)
  12. {
  13. try
  14. {
  15. string[] splits = value.Split('\t');
  16. string devicePlatform = splits[3];
  17. TimeSpan queryTime = TimeSpan.Parse(splits[1]);
  18. return new Tuple<string, TimeSpan>(devicePlatform, queryTime);
  19. }
  20. catch (Exception)
  21. {
  22. return null;
  23. }
  24. }
  25. public override void Setup()
  26. {
  27. this.ranges = new Dictionary<string, Tuple<TimeSpan, TimeSpan>>();
  28. }
  29. public override IEnumerable<Tuple<string, TimeSpan>> Map(string value)
  30. {
  31. var range = GetLineValue(value);
  32. if (range != null)
  33. {
  34. if (ranges.ContainsKey(range.Item1))
  35. {
  36. var original = ranges[range.Item1];
  37. if (range.Item2 < original.Item1)
  38. {
  39. // Update Min amount
  40. ranges[range.Item1] = new Tuple<TimeSpan, TimeSpan>(range.Item2, original.Item2);
  41. }
  42. if (range.Item2 > original.Item2)
  43. {
  44. //Update Max amount
  45. ranges[range.Item1] = new Tuple<TimeSpan, TimeSpan>(original.Item1, range.Item2);
  46. }
  47. }
  48. else
  49. {
  50. ranges.Add(range.Item1, new Tuple<TimeSpan, TimeSpan>(range.Item2, range.Item2));
  51. }
  52. }
  53. return Enumerable.Empty<Tuple<string, TimeSpan>>();
  54. }
  55. public override IEnumerable<Tuple<string, TimeSpan>> Cleanup()
  56. {
  57. foreach (var range in ranges)
  58. {
  59. yield return new Tuple<string, TimeSpan>(range.Key, range.Value.Item1);
  60. yield return new Tuple<string, TimeSpan>(range.Key, range.Value.Item2);
  61. }
  62. }
  63. }
  64. public class MobilePhoneRangeReducer : ReducerBase<TimeSpan, Tuple<TimeSpan, TimeSpan>>
  65. {
  66. public override IEnumerable<Tuple<string, Tuple<TimeSpan, TimeSpan>>> Reduce(string key, IEnumerable<TimeSpan> value)
  67. {
  68. var baseRange = new Tuple<TimeSpan, TimeSpan>(TimeSpan.MaxValue, TimeSpan.MinValue);
  69. var rangeValue = value.Aggregate(baseRange, (accSpan, timespan) =>
  70. new Tuple<TimeSpan, TimeSpan>((timespan < accSpan.Item1) ? timespan : accSpan.Item1, (timespan > accSpan.Item2) ? timespan : accSpan.Item2));
  71. yield return new Tuple<string, Tuple<TimeSpan, TimeSpan>>(key, rangeValue);
  72. }
  73. }
  74. }

MSDN.Hadoop.Submission.Console.exe -input "mobilecsharp/data" -output "mobilecsharp/querytimes"
-mapper "MSDN.Hadoop.MapReduceCSharp.MobilePhoneRangeMapper, MSDN.Hadoop.MapReduceCSharp"
-reducer "MSDN.Hadoop.MapReduceCSharp.MobilePhoneRangeReducer, MSDN.Hadoop.MapReduceCSharp"
-file "%HOMEPATH%\MSDN.Hadoop.MapReduceCSharp\Release\MSDN.Hadoop.MapReduceCSharp.dll"

C# Mobile Min (with Mapper, Combiner, Reducer)

Calculates the mobile phone minimum time for a device with a combiner yielding just the Min value:

  1. using System;
  2. using System.Collections.Generic;
  3. using System.Linq;
  4. using System.Text;
  5. using MSDN.Hadoop.MapReduceBase;
  6. namespace MSDN.Hadoop.MapReduceCSharp
  7. {
  8. public class MobilePhoneMinMapper : MapperBaseText<TimeSpan>
  9. {
  10. private Tuple<string, TimeSpan> GetLineValue(string value)
  11. {
  12. try
  13. {
  14. string[] splits = value.Split('\t');
  15. string devicePlatform = splits[3];
  16. TimeSpan queryTime = TimeSpan.Parse(splits[1]);
  17. return new Tuple<string, TimeSpan>(devicePlatform, queryTime);
  18. }
  19. catch (Exception)
  20. {
  21. return null;
  22. }
  23. }
  24. public override IEnumerable<Tuple<string, TimeSpan>> Map(string value)
  25. {
  26. var returnVal = GetLineValue(value);
  27. if (returnVal != null) yield return returnVal;
  28. }
  29. }
  30. public class MobilePhoneMinCombiner : CombinerBase<TimeSpan>
  31. {
  32. public override IEnumerable<Tuple<string, TimeSpan>> Combine(string key, IEnumerable<TimeSpan> value)
  33. {
  34. yield return new Tuple<string, TimeSpan>(key, value.Min());
  35. }
  36. }
  37. public class MobilePhoneMinReducer : ReducerBase<TimeSpan, TimeSpan>
  38. {
  39. public override IEnumerable<Tuple<string, TimeSpan>> Reduce(string key, IEnumerable<TimeSpan> value)
  40. {
  41. yield return new Tuple<string, TimeSpan>(key, value.Min());
  42. }
  43. }
  44. }

MSDN.Hadoop.Submission.Console.exe -input "mobilecsharp/data" -output "mobilecsharp/querytimes"
-mapper "MSDN.Hadoop.MapReduceCSharp.MobilePhoneMinMapper, MSDN.Hadoop.MapReduceCSharp"
-reducer "MSDN.Hadoop.MapReduceCSharp.MobilePhoneMinReducer, MSDN.Hadoop.MapReduceCSharp"
-combiner "MSDN.Hadoop.MapReduceCSharp.MobilePhoneMinCombiner, MSDN.Hadoop.MapReduceCSharp"
-file "%HOMEPATH%\MSDN.Hadoop.MapReduceCSharp\Release\MSDN.Hadoop.MapReduceCSharp.dll"

F# Mobile Phone Query

Calculates the mobile phone range and average time for a device:

  1. namespace MSDN.Hadoop.MapReduceFSharp
  2. open System
  3. open MSDN.Hadoop.MapReduceBase
  4. type MobilePhoneQueryMapper() =
  5. inherit MapperBaseText<TimeSpan>()
  6. // Performs the split into key/value
  7. let splitInput (value:string) =
  8. try
  9. let splits = value.Split('\t')
  10. let devicePlatform = splits.[3]
  11. let queryTime = TimeSpan.Parse(splits.[1])
  12. Some(devicePlatform, queryTime)
  13. with
  14. | :? System.ArgumentException -> None
  15. // Map the data from input name/value to output name/value
  16. override self.Map (value:string) =
  17. seq {
  18. let result = splitInput value
  19. �� if result.IsSome then
  20. yield result.Value
  21. }
  22. type MobilePhoneQueryReducer() =
  23. inherit ReducerBase<TimeSpan, (TimeSpan*TimeSpan*TimeSpan)>()
  24. override self.Reduce (key:string) (values:seq<TimeSpan>) =
  25. let initState = (TimeSpan.MaxValue, TimeSpan.MinValue, 0L, 0L)
  26. let (minValue, maxValue, totalValue, totalCount) =
  27. values |>
  28. Seq.fold (fun (minValue, maxValue, totalValue, totalCount) value ->
  29. (min minValue value, max maxValue value, totalValue + (int64)(value.TotalSeconds), totalCount + 1L) ) initState
  30. Seq.singleton (key, (minValue, TimeSpan.FromSeconds((float)(totalValue/totalCount)), maxValue))

MSDN.Hadoop.Submission.Console.exe -input "mobile/data" -output "mobile/querytimes"
-mapper "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryMapper, MSDN.Hadoop.MapReduceFSharp"
-reducer "MSDN.Hadoop.MapReduceFSharp.MobilePhoneQueryReducer, MSDN.Hadoop.MapReduceFSharp"
-file "%HOMEPATH%\MSDN.Hadoop.MapReduceFSharp\Release\MSDN.Hadoop.MapReduceFSharp.dll"

F# Store XML (XML in Samples)

Calculates the total revenue, within the store XML, based on demographic data; also demonstrating multiple keys:

  1. namespace MSDN.Hadoop.MapReduceFSharp
  2. open System
  3. open System.Collections.Generic
  4. open System.Linq
  5. open System.IO
  6. open System.Text
  7. open System.Xml
  8. open System.Xml.Linq
  9. open MSDN.Hadoop.MapReduceBase
  10. type StoreXmlElementMapper() =
  11. inherit MapperBaseXml<decimal>()
  12. override self.Map (element:XElement) =
  13. let aw = ""
  14. let demographics = element.Element(XName.Get("Demographics")).Element(XName.Get("StoreSurvey", aw))
  15. seq {
  16. if not(demographics = null) then
  17. let business = demographics.Element(XName.Get("BusinessType", aw)).Value
  18. let bank = demographics.Element(XName.Get("BankName", aw)).Value
  19. let key = Utilities.FormatKeys(business, bank)
  20. let sales = Decimal.Parse(demographics.Element(XName.Get("AnnualSales", aw)).Value)
  21. yield (key, sales)
  22. }
  23. type StoreXmlElementReducer() =
  24. inherit ReducerBase<decimal, int>()
  25. override self.Reduce (key:string) (values:seq<decimal>) =
  26. let totalRevenue =
  27. values |> Seq.sum
  28. Seq.singleton (key, int totalRevenue)

MSDN.Hadoop.Submission.Console.exe -input "stores/demographics" -output "stores/banking"
-mapper "MSDN.Hadoop.MapReduceFSharp.StoreXmlElementMapper, MSDN.Hadoop.MapReduceFSharp"
-reducer "MSDN.Hadoop.MapReduceFSharp.StoreXmlElementReducer, MSDN.Hadoop.MapReduceFSharp"
-file "%HOMEPATH%\MSDN.Hadoop.MapReduceFSharp\bin\Release\MSDN.Hadoop.MapReduceFSharp.dll"
-nodename Store -format Xml

F# Binary Document (Word and PDF Documents)

Calculates the pages per author for a combination of Office Word and PDF documents:

  1. namespace MSDN.Hadoop.MapReduceFSharp
  2. open System
  3. open System.Collections.Generic
  4. open System.Linq
  5. open System.IO
  6. open System.Text
  7. open System.Xml
  8. open System.Xml.Linq
  9. open DocumentFormat.OpenXml
  10. open DocumentFormat.OpenXml.Packaging
  11. open DocumentFormat.OpenXml.Wordprocessing
  12. open iTextSharp.text
  13. open iTextSharp.text.pdf
  14. open MSDN.Hadoop.MapReduceBase
  15. type OfficePageMapper() =
  16. inherit MapperBaseBinary<int>()
  17. let (|WordDocument|PdfDocument|UnsupportedDocument|) extension =
  18. if String.Equals(extension, ".docx", StringComparison.InvariantCultureIgnoreCase) then
  19. WordDocument
  20. else if String.Equals(extension, ".pdf", StringComparison.InvariantCultureIgnoreCase) then
  21. PdfDocument
  22. else
  23. UnsupportedDocument
  24. let dc = XNamespace.Get("")
  25. let cp = XNamespace.Get("")
  26. let unknownAuthor = "unknown author"
  27. let authorKey = "Author"
  28. let getAuthorsWord (document:WordprocessingDocument) =
  29. let coreFilePropertiesXDoc = XElement.Load(document.CoreFilePropertiesPart.GetStream())
  30. // Take the first dc:creator element and split based on a ";"
  31. let creators = coreFilePropertiesXDoc.Elements(dc + "creator")
  32. if Seq.isEmpty creators then
  33. [| unknownAuthor |]
  34. else
  35. let creator = (Seq.head creators).Value
  36. if String.IsNullOrWhiteSpace(creator) then
  37. [| unknownAuthor |]
  38. else
  39. creator.Split(';')
  40. let getPagesWord (document:WordprocessingDocument) =
  41. // return page count
  42. Int32.Parse(document.ExtendedFilePropertiesPart.Properties.Pages.Text)
  43. let getAuthorsPdf (document:PdfReader) =
  44. // For PDF documents perform the split on a ","
  45. if document.Info.ContainsKey(authorKey) then
  46. let creators = document.Info.[authorKey]
  47. if String.IsNullOrWhiteSpace(creators) then
  48. [| unknownAuthor |]
  49. else
  50. creators.Split(',')
  51. else
  52. [| unknownAuthor |]
  53. let getPagesPdf (document:PdfReader) =
  54. �� // return page count
  55. document.NumberOfPages
  56. // Map the data from input name/value to output name/value
  57. override self.Map (filename:string) (document:Stream) =
  58. let result =
  59. match Path.GetExtension(filename) with
  60. | WordDocument ->
  61. // Get access to the word processing document from the input stream
  62. use document = WordprocessingDocument.Open(document, false)
  63. // Process the word document with the mapper
  64. let pages = getPagesWord document
  65. let authors = (getAuthorsWord document)
  66. // close document
  67. document.Close()
  68. Some(pages, authors)
  69. | PdfDocument ->
  70. // Get access to the pdf processing document from the input stream
  71. let document = new PdfReader(document)
  72. // Process the pdf document with the mapper
  73. let pages = getPagesPdf document
  74. let authors = (getAuthorsPdf document)
  75. // close document
  76. document.Close()
  77. Some(pages, authors)
  78. | UnsupportedDocument ->
  79. None
  80. if result.IsSome then
  81. snd result.Value
  82. |> (fun author -> (author, fst result.Value))
  83. else
  84. Seq.empty
  85. type OfficePageReducer() =
  86. inherit ReducerBase<int, int>()
  87. override self.Reduce (key:string) (values:seq<int>) =
  88. let totalPages =
  89. values |> Seq.sum
  90. Seq.singleton (key, totalPages)

MSDN.Hadoop.Submission.Console.exe -input "office/documents" -output "office/authors"
-mapper "MSDN.Hadoop.MapReduceFSharp.OfficePageMapper, MSDN.Hadoop.MapReduceFSharp"
-reducer "MSDN.Hadoop.MapReduceFSharp.OfficePageReducer, MSDN.Hadoop.MapReduceFSharp"
-combiner "MSDN.Hadoop.MapReduceFSharp.OfficePageReducer, MSDN.Hadoop.MapReduceFSharp"
-file "%HOMEPATH%\MSDN.Hadoop.MapReduceFSharp\bin\Release\MSDN.Hadoop.MapReduceFSharp.dll"
-file "C:\Reference Assemblies\itextsharp.dll" -format Binary

Optional Parameters

To support some additional Hadoop Streaming options a few optional parameters are supported.

-numberReducers X

As expected this specifies the maximum number of reducers to use.


The option turns on verbose mode and specifies a job configuration to keep failed task outputs.

To view the the supported options one can use a help parameters, displaying:

Command Arguments:
-input (Required=true) : Input Directory or Files
-output (Required=true) : Output Directory
-mapper (Required=true) : Mapper Class
-reducer (Required=true) : Reducer Class
-combiner (Required=false) : Combiner Class (Optional)
-format (Required=false) : Input Format |Text(Default)|Binary|Xml|
-numberReducers (Required=false) : Number of Reduce Tasks (Optional)
-numberKeys (Required=false) : Number of MapReduce Keys (Optional)
-file (Required=true) : Processing Files (Must include Map and Reduce Class files)
-nodename (Required=false) : XML Processing Nodename (Optional)
-debug (Required=false) : Turns on Debugging Options

UI Submission

The provided submission framework works from a command-line. However there is nothing to stop one submitting the job using a UI; albeit a command console is opened. To this end I have put together a simple UI that supports submitting Hadoop jobs.


This simple UI supports all the necessary options for submitting jobs.

Code Download

As mentioned the actual Executables and Source code can be downloaded from:

The source includes, not only the .Net submission framework, but also all necessary Java classes for supporting the Binary and XML job submissions. This relies on a custom Streaming JAR which should be copied to the Hadoop lib directory, there are two versions of the Streaming jar; one for running in azure and one for when running local. The difference is that they have been compiled with different versions of the Java compiler. Just remember to use the appropriate version (dropping the –local and –azure prefixes) when copying to your Hadoop lib folder.

To use the code one just needs to reference the EXE’s in the Release directory. This folder also contains the MSDN.Hadoop.MapReduceBase.dll that contains the abstract base class definitions.

Moving Forward

In a separate post I will cover what is actually happening under the covers.

As always if you find the code useful and/or use this for your MapReduce jobs, or just have some comments, please do let me know.

Wade Wegner (@WadeWegner) described a Simple Capped Exponential Back-Off for Queues in a 4/27/2012 post:

imageRecently Steve Marx and I spent a few hours working on a best practices document for Windows Azure. As expected, this was a fun and educational experience – plenty of goofing around, but also some really good discussion on things to think about when building applications for Windows Azure. One of the items we discussed is a better approach for sleeping inside the Worker Role when pulling from queues. Rather than defaulting to a retry every 10 seconds we decided that the best approach is to exponentially back-off on your queue reads while capping it with an upper bound.

imageThe primary value of this is to decrease the number of storage transactions when reading from your queue, and therefore reduce both bandwidth and transaction costs.

There are plenty of other good posts on this topic that provide a lot more detailed justification and rationale for this approach:

The logic and approach is deceptively simple and I thought I’d share a really simple, yet effective, example. (Incidentally, credit goes to Steve for very quickly putting together the basis of this really simple example.)

Here’s the code:

   string queueName = "queuetest";

   int minInterval = 1;
   int interval = minInterval;

   int exponent = 2;
   int maxInterval = 60;

   CloudStorageAccount account = CloudStorageAccount.DevelopmentStorageAccount;
   CloudQueueClient queueClient = account.CreateCloudQueueClient();
   CloudQueue queue = queueClient.GetQueueReference(queueName);

   while (true)
      var msg = queue.GetMessage();
      if (msg != null)
         // do something
         interval = minInterval;

         Trace.WriteLine(string.Format("Interval reset to {0} seconds", interval));
         Trace.WriteLine(string.Format("Sleep for {0} seconds", interval));
         interval = Math.Min(maxInterval, interval * exponent);

As I said, really simple. The magic is in the last line where we check to see which is smaller – the maximum interval or the product of the interval and the exponent. At some point the product of the interval and exponent grows larger than the maximum interval, and consequently the interval value is set to the maximum interval.

Here’s the output in the Windows Azure Compute Emulator:

   Sleep for 1 seconds
   Sleep for 2 seconds
   Sleep for 4 seconds
   Sleep for 8 seconds
   Sleep for 16 seconds
   Sleep for 32 seconds
   Sleep for 60 seconds
   Sleep for 60 seconds

Now, the application will continue to sleep until it finds a message in the queue, at which point the interval is reset back to one. To test this I used the Azure Storage Explorer and created a new queue message.


Once the message is created the output is as follows:

   Interval reset to 1 seconds
   Sleep for 1 seconds
   Sleep for 2 seconds
   Sleep for 4 seconds
   Sleep for 8 seconds

And so forth.

You can find all the source code for this sample in my CappedExponentialBackOff repository on GitHub.

Pretty simple but quite useful. I hope this helps!

Avkash Chauhan (@avkashchauhan) described his Cloud Fair 2012 Presentation on Apache Hadoop, Windows Azure and Open Source on 4/25/2012 (missed when published):

imageRecently I was given an opportunity to talk about “Apache Hadoop on Windows Azure” and “Open Source & Cloud Computing” at Cloud Fair 2012 Seattle. If you wish to get the presentation, please follow the info below:

Presentation Topic: Apache Hadoop on Windows Azure

image_thumb1[1]Since Microsoft’s adoption of open source Apache Hadoop in its cloud offering and CTP release “Apache Hadoop on Windows Azure”, the Hadoop is most exciting and adopted technology to analyze large amount of data in a very simple ways. The service offered as an elastic service for both on premise and public clouds based on Microsoft and Apache Hadoop technologies. Scale invariant insight, information processing, and analytics are now available to all participants in the Microsoft ecosystem – cloud and enterprise. Best of all, these capabilities have been bridged into the vibrant domains of Office BI and Collaboration, Data Warehousing, and Visualization/Reporting.


You will learn how to unlock business insights from all your structured and unstructured data, including large volumes of data not previously activated, with Microsoft’s Big Data solution. I will explain using live demonstration, how you can enterprise class Hadoop based solutions designed by Microsoft, on both Windows Server and Windows Azure. This talk is developer oriented and tutorial includes installation, configuration and simplified Big Data analysis with JavaScript.

Please download the Presentation from the link below:

Presentation Title: Open Source and Cloud Computing

Targeting the Open Source developer community will unlock significant growth opportunities for any cloud service vendor however for an open source developer; it is very hard to decide which cloud option to choose. Cost reduction is one of the main value-prop of both Cloud services and Open Source tools. So combining both of them is a significant reduction in cost, however, other hidden and unknown facts, may appear later. A significant portion of the development community uses LAMP stack and when you add Java, Java Script, Ruby, CGI, Python, Node.JS, *SQL, *DB, and Hadoop to list, you have a majority of the application currently thriving to move to cloud.
This interactive session targets to open source developers, suggesting what cloud services could be best option and what they should really look for, in a cloud platform. Attendees will learn about cloud services both big and small, those have successfully adopted Open Source applications support in their services offering. Attendees can use the information to understand what technical limitations exist along with how to overcome these hurdles.

Please download the Presentation from the link below:

<Return to section navigation list>

SQL Azure Database, Federations and Reporting

Bud Aaron described How to set up a database-driven Azure site in a 4/26/2012 post to Red Gate Software’s ACloudyPlace blog:

This article is really a review of the alternative ways to get started with setting up a database-driven Azure site. I’ll start by using SQL Azure Management Portal and SQL Server Management Studio. Then, I’ll use Visual Studio to create a SQL Azure database (Web edition, 1 GB). I’m going to add a single Customer table and I’m going to see if I can deploy a LightSwitch application using that database.

SQL Azure is NOT SQL Server

imageYes, SQL Azure is NOT SQL Server. I’ve wanted to say that for weeks – now I’ve said it. I keep hearing from so many sources that SQL Azure is SQL Server in the Cloud and to a degree that is correct. SQL Azure is a relational database system with many similarities to SQL Server; after all, both are Microsoft products. Beyond that many things from the developer’s perspective take a divergent direction.

Taking a broad look, it takes planning to create any application that uses a database to store information. Let’s assume that you have elected to use SQL Azure as your database system. Now you need to do two things to get started;

A. Name and generate a database.

B. Build tables for that database.

For me, the choice with SQL Server has always been simple; use SQL Server Management Studio. This is largely because it has some useful built-in wizards. I’ve never been put in the position of having to use scripts to do these things but with SQL Azure and SSMS you are stuck with scripts.

There is a way of avoiding the use of scripts, however, since SQL Azure offers a management tool of its own much like SSMS, and Visual Studio 11 has some terrific tools available. We’re therefore going to explore these options for creating a database and the table using the tools that are currently available. I think you’ll be pleasantly surprised (or at least surprised) so sit back and enjoy the ride.

Creating a database

Many things about SQL Azure are covered ad nauseam in dozens of places. For my initial foray here I want to talk about two things.

A. Creating a database with the tools we all know and

B. Creating tables in the new database.

The tools we will use are the SQL Azure Management portal, SQL Server Management Studio and Visual Studio 2010. So let’s look first at creating a database.

SQL Azure Management Portal

I’m going to assume that you’re already created a SQL Azure server and taken note of its name, url, administrator name and password because you will need these. Note that you cannot log in to a SQL Azure instance with Windows Authentication. You must use a user name and password. Here’s a sample of what you might see.

When logging in, your user name will take the form (using the information above) of dncadmin@i46it8qgsh. Note that you cannot use admin, administrator or any of several other short, easily-remembered, names for your Administrator Login name. Actually all of the rules that Microsoft employs make perfect sense because you will certainly want your database as secure as possible.

Create a database using the Portal

From the above screen, use the ribbon Database Create button in the ribbon to create your SQL Azure database. My suggestion is that you always use the Portal to create your databases. The Create button presents the following screen:

SQL Azure has a concept you won’t see in an on-premise SQL. That concept is size. The size is determined from this screen. The Web Edition offers 2 sizes:

The business edition started out with 10, 20, 30, 40 and 50 GB sizes. That has now been increased to 7 sizes as shown here:

For this discussion I chose a Web Edition with a size of 1GB.

And now we have a new database. The point is that the SQL Azure Portal makes the creation of a database a simple selection of two options, Edition and Maximum Size, and the entry of a name.

You can do a number of things now such as Test Connectivity.

You can also Create another database, Drop the database or Manage the database. So let’s explore Manage for a minute.

Manage your database using the portal

First, you will need your Administrator Login from the properties and your password to open the Manage operation. Note that this is a Silverlight application. Click the Manage button in the ribbon to get the Logon screen.

SQL Manager fills in the Server and Database information. Note that the User name is the Administrator Logon name with @ and the unique part of the server name appended. Once you’re logged on you get a very busy screen with a bunch of selections. We’re going to explore the design part for this discussion but you can always explore on your own if you like.

In the lower left you will see Overview, Administrator, and Design. For now click the Design button to get the following screen.

And suddenly we’re back to something most of us are familiar with:

You can change the table name and begin designing your new table. Interestingly there are 36 data types, very similar to plain old SQL Server:

bigint, binary, bit, char, date, datetime, datetime2, datetimeoffset, decimal, float, geography, geometry, hierarchyid, image, int, money, nchar, ntext, numeric, nvarchar, nvarchar(max), real, smalldatetime, smallint, smallmoney, sql_variant, sysname, text, time, timestamp, tinyint, uniqueidentifier, varbinary(max), varchar, varchar(max) and xml

Obviously you’ll be familiar with many of these types but we’ll get into the meaning of all of these another time. Here’s a sample Customer table filled in. Again this is all very familiar territory with Wizards used for easing the work.

SQL Server Management Studio

Now let’s explore the very different experience using SQL Server Management Studio. For this example I’m using SQL Server 2012 Management Studio just to ensure we have the latest updates.

Things start out quite normally except that Windows Authentication is not allowed. The Server Name is the full name from your Azure portal and the Login name follows the Administrator Login name followed by the @ symbol and the unique part of the server name. But before we go any further let’s explore what we have when logging on to a local SQL Server instance. Right clicking Databases brings up this dialog with a plethora of options.

Selecting New Database… brings up the following wizard. You set various things in a Wizard fashion and Voila! you have a new database.

On the other hand connecting to a SQL Azure instance using SSMS gives you the following options when right clicking Databases:

So you get an extremely truncated group of options. Making things worse, if you select New Database… you get whoops!!! A SQL Script instance:

But let’s back up a minute and do some additional things. When you log in to SSMS you need to add some parameters. When a SQL Azure instance is built that instance includes a MASTER database. When you log on you need to log on in the master context so click the Options button on the log on screen. You will get the following:

Select the Connection Properties tab for this:

On the dropdown to the right of Connect to database select <Browse Server…>. After a few seconds you will get this:

Highlight master to get the following screen and also check Encrypt connection. Now click the Connect button to proceed.

Creating a database using SSMS

When you are connected to the master database, you can create new databases on the server and modify or drop existing databases. The steps below describe how to accomplish several common database management tasks through SSMS. They assume that you are connected to the master database with the server-level principal login that you created when you set up your server.

When you right click on databases you get the following script:

You can use the CREATE DATABASE statement to create a new database. The statement below creates a new database named mySSMSTestDB, and specifies that it is a Business Edition database with a maximum size of 20 GB.

(MAXSIZE=20GB, EDITION=’Business’)

You can then use the ALTER DATABASE statement to modify that database, for example if you want to change the name, maximum size, or edition (business or web) of the database. The statement below modifies the database you created in the previous step to change the maximum size to 5 GB and change the edition to web.

(MAXSIZE=5GB, Edition=’Web’)

You could finally use the DROP DATABASE statement to delete the database. The statement below deletes the mySSMSTestDB database.


The master database has the sys.databases view that can be used to view the details of all databases. To view all existing databases, use the following statement.

SELECT * FROM sys.databases

In SQL Azure the USE statement is not supported. You need to establish a connection directly to the relevant database. Also Transact-SQL statements that create or modify databases must be run within their own batch and cannot be grouped.

Aside from the fact that you create and alter databases using script – it’s not too bad but I personally still prefer the wizards. The problem is it doesn’t get better.

NOTE: I did notice one oddity. Sometimes ‘Web’ and ‘Business’ work and sometimes they work without the quotes. Your mileage may vary but if it fails one way, try it the other.

Creating tables using SSMS

If you select the new database and expand it, then right click Tables you will get the following and I’m quitting here. For you dyed-in-the-wool scripting fans this will probably work but I lost it at this point.

Creating databases using Visual Studio 11

My tool of choice when working with local SQL instances has always been SQL Server Management Studio. I generate databases and tables using that tool and then deploy them using Visual Studio. I always knew that VS provided some good database tools – I just didn’t use them. Now, I’m going to go a little crazy. I will stick to one small SQL Azure database (Web edition, 1 GB). I’m going to add a single Customer table and I’m going to see if I can deploy a LightSwitch application using that database. I am NOT going to get fancy. If this works I’ll be proud and we will both have learned something (I hope). To duplicate this you will need the Visual Studio 11 Beta.

Creating the database

I’m going to connect to my SQL Azure SQL instance and create a single database that I’m calling MyLightSwitchDB, so let’s fire up Visual Studio 11 and get started. The interface is likely to surprise you if you haven’t used it before. It comes in two flavors, dark or light. I’m using the light interface. Gone are the pretty icons. Instead most are very austere and functional.

Clicking the icon that looks like an American wall-socket plug with a + sign (3rd from the left) will bring up the Add Connection dialog.

Once filled out just click OK. Because the database does not exist you will get the following:

Again click Yes and the system will create your database in SQL Azure. You can verify this by going to the Azure portal and checking the database section. You should see a new Web edition DB with 1GB size.

In Visual Studio 11 SERVER EXPLORER if you right click the database name you can shift to SQL SERVER OBJECT EXPLORER which shows the following view.

At this point you can right click Tables and elect to Add New Table….

Now you have all of the goodness of SQL Server Management Studio with the addition of on the fly script generation. Start by changing [Table] in the script view to [Customers]. Also change the primary key name to CustomerID. Truthfully, I struggled for a bit finding where to set the identity specification but it’s in the PROPERTIES windows. That window has little right pointing arrows in the bar at the left of the window. Clicking that arrow expands the choices thusly:

Setting (Is Identity) to True fills in the Increment and Seed values with suggested values of 1 which is fine here. It also changes the script being generated to an appropriate value.

Now fill in the additional values and click the Update arrow at the top of the screen.

You have the option of generating and executing an update script or simply updating the database. Once you’ve updated the SQL Azure database you can move on by opening a LightSwitch application.

Once the application is created select the Attach to External Data Source option.

Highlight Database as shown and click Next.

You will need to fill in the Server name and credentials. The Connect to a database dropdown will at that point offer MyLightSwitchDB as selection. On clicking OK you will get a new dialog that will take a little time to populate.

Once populated, make the selections shown above and click finish. You should see the following screen indicating that things have gone as expected.

With the problems I’ve had over the last few weeks with both Azure and LightSwitch this exercise has made me feel very good about future prospects. At this point I would suggest making a Detail Screen and an Editable Grid Screen so you can explore LightSwitch connected to SQL Azure in more detail.

This article has been mostly about the creation of SQL Azure databases and the generation of tables using several available tools. At this point using the Azure Management Portal probably offers the best options overall. Visual Studio 11 has made some large strides into the management of SQL Azure as we’ve shown above. My favorite tool to this point, SQL Server Management Studio, can be used but the Wizards are not available at this point even with the SQL 2012 version. It’s possible this will change but for me it has now become a choice of the Azure portal or Visual Studio 11.

Hope this lengthy, detailed post makes up for the recent lack of SQL Azure content.

Full disclosure: I’m a paid contributor to Red Gate Software’s ACloudyPlace blog.

<Return to section navigation list>

MarketPlace DataMarket, Social Analytics, Big Data and OData

Lori MacVittie (@lmacvittie) claimed to be “Bridging the Gap between Big Data and Business Agility” in her When Big Data Meets Cloud Meets Infrastructure post of 5/2/2012 to F5’s DevCentral blog:

imageI’m a huge fan of context-aware networking. You know, the ability to interpret requests in the context they were made – examining user identity, location, client device along with network condition and server/application status. It’s what imbues the application delivery tier with the agility necessary to make decisions that mitigate operational risk (security, availability, performance) in real-time.

cc stirling

In the past, almost all context was able to be deduced from the transport (connection) and application layer. The application delivery tier couldn’t necessarily “reach out” and take advantage of the vast amount of data “out there” that provides more insight into the conversation being initiated by a user. Much of this data falls into the realm of “big data” – untold amounts of information collected by this site and that site that offer valuable nuggets of information about any given interaction.

quote-badgeBecause of its expanded computing power and capacity, cloud can store information about user preferences, which can enable product or service customization. The context-driven variability provided via cloud allows businesses to offer users personal experiences that adapt to subtle changes in user-defined context, allowing for a more user-centric experience.

-- “The power of cloud”, IBM Global Business Services

All this big data is a gold mine – but only if you can take advantage of it. For infrastructure and specifically application delivery systems that means somehow being able to access data relevant to an individual user from a variety of sources and applying some operational logic to determine, say, level of access or permission to interact with a service.

It’s collaboration. It’s integration. It’s an ecosystem.

It’s enabling context-aware networking in a new way. It’s really about being able to consume big data via an API that’s relevant to the task at hand. If you’re trying to determine if a request is coming from a legitimate user or a node in a known botnet, you can do that. If you want to understand what the current security posture of your public-facing web applications might be, you can do that. If you want to verify that your application delivery controller is configured optimally and is up to date with the latest software, you can do that.

What’s more important, however, is perhaps that such a system is a foundation for integrating services that reside in the cloud where petabytes of pertinent data has already been collected, analyzed, and categorized for consumption. Reputation, health, location. These are characteristics that barely scratch the surface of the kind of information that is available through services today that can dramatically improve the operational posture of the entire data center.

Imagine, too, if you could centralize the acquisition of that data and feed it to every application without substantially modifying the application? What if you could build an architecture that enables collaboration between the application delivery tier and application infrastructure in a service-focused way? One that enables every application to enquire as to the location or reputation or personal preferences of a user – stored “out there, in the cloud” – and use that information to make decisions about what components or data the application includes? Knowing a user prefers Apple or Microsoft products, for example, would allow an application to tailor data or integrate ads or other functionality specifically targeted for that user, that fits the user’s preferences. This user-centric data is out there, waiting to be used to enable a more personal experience. An application delivery tier-based architecture in which such data is aggregated and shared to all applications shortens the development life-cycle for such personally-tailored application features and ensures consistency across the entire application portfolio.

It is these kinds of capabilities that drive the integration of big data with infrastructure. First as a means to provide better control and flexibility in real-time over access to corporate resources by employees and consumers alike, and with an eye toward future capabilities that focus on collaboration inside the data center better enabling a more personal, tailored experience for all users.

It’s a common refrain across the industry that network infrastructure needs to be smarter, make more intelligent decisions, and leverage available information to do it. But actually integrating that data in a way that makes it possible for organizations to actually codify operational logic is something that’s rarely seen.

Until now.

My (@rogerjenn) free US Air Carrier Flight Delays, Monthly data set remains in Preview limbo since 4/27/2012 in the Widows Azure Marketplace DataMarket:

You probably won’t be able to open this image from the preceding link because of its status:


image_thumb15_thumbThe public Marketplace DataMarket data set is identical to the on-premises version I created with Microsoft Codenames “Data Hub” and “Data Transfer,” as described in my Creating a Private Data Marketplace with Microsoft Codename “Data Hub” of 4/27/2012.

My (@rogerjenn) Creating a Private Data Marketplace with Microsoft Codename “Data Hub” of 4/27/2012 begins:

image• Updated 4/30/2012 with link to OakLeaf’s new US Air Carrier Flight Delays, Monthly (free) data set on the public Windows Azure Marketplace Datamarket.


SQL Server Labs describes their recent Codename “Data Hub” Consumer Technical Preview (CTP) as “An Online Service for Data Discovery, Distribution and Curation.” At its heart, “Data Hub” is a private version of the public Windows Azure Marketplace DataMarket that runs as a Windows Azure service. The publishing process is almost identical to the public version, except for usage charges and payment transfers. “Data Hub” enables data users and developers, as well as DBAs, to:

  • imageMake data in SQL Azure discoverable and accessible in OData (AtomPub) format by an organization’s employees
  • Enable data analysts and business managers to view and manipulate data from the Marketplace with Service Explorer, Excel, and Excel PowerPivot
  • Publish datasets for further curation and collaboration with other users in the organization
  • Federate data from the Windows Azure Marketplace DataMarket for the organization’s employees (in addition to the organization’s uploaded data)

The initial CTP supports the preceding features but is limited to SQL Azure as a data source and OData (AtomPub) as the distribution format. Microsoft is considering other data source and distribution formats.

Glenn Gailey (@ggailey777) continued his OData on Windows 8: Part 2—The OData Client Library for Metro series on 5/27/2012:

imageIn the previous post Running WCF Data Services on Windows 8 Consumer Preview- Part 1, I described running WCF Data Services on the Windows 8 desktop…but what I have really been wanting to do is write a Win8 Metro style app that consumes an OData feed. I started messing around with the Windows 8 Metro quickstart that consumes Atom feeds (just like OData right), but I really didn’t want to have to parse XML on the client and it took a bit of time for me to really “grok” the new Metro templates in Visual Studio 2012.

imageI was glad to see that in the meantime Phani Raj has gone ahead and completed the exact same OData-based Windows 8 Metro app that I was planning, which accesses the Netflix OData feed and displays titles data grouped by genre. (Great minds think alike—but some just code faster…way to go Phani!) You can see the details of his new app in the post Developing Windows 8 Metro style applications that consume OData.

The best news is that rather than having to parse raw Atom XML, this app uses a preview release of the OData client library for Windows 8 Metro style apps, which Phani has uploaded as a .zip in this post. The library still only speaks Atom, but it works very much like the WCF Data Services 5.0 client library for Silverlight—with only the async APIs included.

Please leave any comments about his app or the new OData library on Phani’s blog post.

My (@rogerjenn) Microsoft Codename “Data Transfer” and “Data Hub” Previews Don’t Appear Ready for BigData updated 4/24/2012 begins:

imageOr even MediumData, for that matter:

Neither the Codename “Data Transfer” utility nor Codename “Data Hub” application CTPs would load 500,000 rows of a simple Excel worksheet saved as a 17-MB *.csv file to an SQL Azure table.

imageThe “Data Transfer” utility’s Choose a File to Transfer page states: “We support uploading of files smaller than 200 MB, right now,” but neither preview publishes a row count limit that I can find. “Data Hub” uses “Data Transfer” to upload data, so the maximum file size would apply to it, too.

Both Windows Azure SaaS offerings handled a 100-row subset with aplomb, so the issue appears to be row count, not data structure.

Update 4/24/2012 8:15 AM PDT: A member of the Microsoft Data Hub/Transfer team advised that the erroneous row count and random upload failure problems I reported for Codename “Data Transfer” were known issues and the team was working on them. I was unable to upload the ~500,000-row files with Codename “Data Hub”; see the added “Results with Codename “Data Hub” Uploads” section at the end of the post.

Update 4/23/2012 10:00 AM PDT: Two members of Microsoft Data Hub/Transfer team reported that they could upload the large test file successfully. Added “Computer/Internet Connection Details” section below. Completed tests to determine maximum file size I can upload. The My Data page showed anomalous results but only the 200k row test actually failed on 4/23. See the Subsequent Events section.

<Return to section navigation list>

Windows Azure Service Bus, Access Control, Identity and Workflow

Haishi Bai (@HaishiBai2010) described Writing fault-resilient Service Bus client code to handle transient faults in a 4/30/2012 post:

imageTransient faults are exceptions that your clients may receive because some temporary conditions on the services you are trying to invoke, or because temporary network errors. When you retry the failed operation again, chances are everything will work correctly. When programming against certain Azure services -- including SQL Azure, Service Bus, Storage Service, and Caching Service – you need to be ready to handle transient errors, no matter how innocent and harmless the method seems to be. For example, NamespaceManager.QueueExists() may throw exceptions because of transient errors.

Because the method is usually used during role initialization code, the error may cause your web role or worker role fail to start. Another example is that BrokeredMessage.Complete() may fail due to transient errors as well. When you have multiple work roles competing for jobs, this error may cause a job be processed multiple times. Of course most transient faults don’t cause much of a problem, and if you ignore them probably your clients will be working fine most of the time as well – most of the time. And years of experiences with software tell us, when it fails, it almost always fail at the most inconvenient time.

imageMicrosoft Patterns & Practices team released Transient Fault Handling Framework to help Azure service consumers to incorporate retry logics into their client codes. The idea is simple – the framework automatically retry operations based on pre-defined error-detection strategies and retry policies. You can find guidance, NuGet package, as well as source code of the framework from the link at the beginning of this paragraph.

The library is all nice but there are still quite some codes you need to write. What I’d like to have is something like ReliableSqlConnection that gives you retry logics while allowing you write code just as before. I’ve created a open-source library that provides implementations such as ReliableQueueClient for you to enjoy benefits of retry logics without needing to write any retry logics. The following code shows how to connect to a Service Bus queue and send a message using the library:

ReliableQueueClient client = new ReliableQueueClient("{SB NAMESPACE}", 
     TokenProvider.CreateSharedSecretTokenProvider("owner", "{SECRET KEY}"),
client.Send("some message");
As you can see, you can easily new-up a client and send/receive messages. Behind the scenes the library takes care quite a few details for you:
  • Manage messaging entities lifetime. For instance, it re-constructs BrokeredMessage instances between retries. It also try to reuse Service Bus resources by keep objects such as NamespaceManager around.
  • Invoke Transient Fault Handling Framework.
  • Provide sample implementations of both synchronous and asynchronous calls.

[W]hen this post [was] written, the project [was] still in POC. Check the Github repository for updates or to participate.

<Return to section navigation list>

Windows Azure VM Role, Virtual Network, Connect, RDP and CDN

Steve Plank (@plankytronixx) posted Video: Windows Azure Connect: a real-world solution on 4/30/2012:

imageK3 Retail are the UK’s biggest Microsoft Dynamics partner. They’ve built a system that allows them to expose on-premises applications and data such as Dynamics, SQL Server Reporting Services, Email and File-Servers to a cloud-based IIS web application.

imageThis video show how they are doing this in the real-world; not in a lab or controlled environment.

Bruno Terkaly (@brunoterkaly) described Networking in the Cloud–Understanding Windows Azure Traffic Manager on 4/26/2012:

imageExercise 1: Windows Azure Traffic Manager
It manages traffic. What kind of traffic? The Traffic Manager manages incoming traffic to your web roles that are hosted in Windows Azure Data Centers.

imageExercise 1: Task 1 - Understanding High Level Concepts
Before diving into any How-to's, lets get discuss the basics.
  1. You may have Azure instances of your application in multiple data centers throughout the world and you want to manage traffic among all your running instances
  2. Traffic Manager works by applying an intelligent policy engine to the Domain Name Service (DNS) queries on your domain name(s).
  3. Traffic Manager relies on the concept of "Policies"
  4. You will typically define multiple policies to manage your incoming traffic to your running instances of your applications
  5. These policies dictate which hosted service receives the request
  6. Traffic Manager provides many capabilities.
    • It provides a responsive customer experience
    • It ensures higher availability
      • You can define how failover takes place.
      • If traffic is sent to a primary service and, if this service goes offline, traffic is routed to the next available service in a list
  7. Each Policy gets:
    • A "DNS" name
    • A list of your Azure hosted instances
    • A user defined criteria (the criteria determine how incoming traffic is routed and managed)

Exercise 1: Task 2 - How Traffic Manager Works
The following is a quick walkthrough about what happens when an incoming request hits your company URL (company domain w/IP address)
  1. Customers will got to your company domain.
  2. You will use Traffic Manager to capture any traffic to your company domain.
  3. This traffic will be re-directed to Traffic Manager, which is hosted in MS data centers.
    • Specifically, the traffic will travel to the Traffic Manager Domain.
  4. The policy engine now takes over to re-route traffic.
  5. You will specify some load balancing rules in your policies that dictate how traffic is routed
  6. The user's DNS resolver will use the IP address provided by Traffic Manager.
  7. The user will now use the returned IP address to access your hosted services
    • The user may be re-directed to a running instance in another data center

Exercise 1: Task 3 - How Traffic Manager will route a request
Traffic Manager maintains a network performance table.

You may have a global application that could scale across multiple data centers and traffic manager understands the best performing endpoints in terms of response time.

  1. Traffic Manager monitors your Azure application instances
    • It executes periodic HTTP GET requests to the endpoint you include in the policies that you define
      • IT considers the service to be available if its monitoring endpoint responds with an HTTP status code of 200 OK within 5 seconds
  2. Traffic Manager maintains a network performance table that it updates periodically and contains the round trip time between various IP addresses around the world and each Windows Azure data center
    • It forwards requests to the closest hosted service in terms of its network latency

Exercise 2: Getting started – How to implement Traffic Manager
Traffic Manager policy that maximizes performance by forwarding traffic to the hosted service that offers the best performance for any given client. But there is more to it. There are issues of failover and of routing with a round-robin approach.
Exercise 2: Task 1 - Lab = Windows Azure Traffic Manager
The best way to learn about Traffic Manager is to work through the lab in the Windows Azure Platform Training Kit. I’m not going to do the lab for you here. I will just point out some key points to prepare you.
The Windows Azure Portal is a great place to start implementing Traffic Manager
  1. You will be allowed to choose a load balancing method. There are 3 options at the portal.
    • Performance – fastest response time
    • Failover
      • When using a failover policy, if the primary hosted service is offline, traffic is sent to the next one in a sequence defined by the policy.
    • Round Robin
      • The round robin load balancing method distributes load evenly among each of the hosted services assigned to the policy. It keeps track of the last hosted service that received traffic and sends traffic to the next one in the list of hosted services.
      • Please note the various parameters specified in a Traffic Manager Policy
  2. At the command line, you can use nslookup [domain name] to lookup the address chosen by Traffic Manager in response to an incoming service request


Windows Azure Traffic Manager is a load balancing solution that enables the distribution of incoming traffic among different hosted services in your Windows Azure subscription, regardless of their physical location. Traffic routing occurs as a the result of policies that you define and that are based on one of the following criteria:

In a nutshell, understanding the way Traffic Manager routes request is the key. We learned about how Traffic Managers addresses, network load balancing in terms of:

  1. Performance
    • Traffic is forwarded to the closest hosted service in terms of network latency
  2. Round Robin
    • Traffic is distributed equally across all hosted services
  3. Failover
    • Traffic is sent to a primary service and, if this service goes offline, to the next available service in a list
MSDN - Windows Azure Traffic Manager
Overview of Windows Azure Traffic Manager
Features at the Windows Azure Web Site

<Return to section navigation list>

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Brian Goldfarb (@bgoldy) posted Announcing Native Windows Azure Libraries and Special Free Pricing using Twilio for Windows Azure Customers to the Windows Azure Team blog on 5/2/2012:

imageOur friends over at Twilio have been working to make it easier for developers to integrate text message and phone services into applications hosted on Windows Azure using native libraries for Java, PHP and .NET. To sweeten the pot, Twilio and Windows Azure have teamed up to offer 1000 free text messages or inbound voice minutes when you activate your new Twilio account.

imageGet started today with step-by-step tutorials on how to integrate Twilio services in your application (.NET, PHP, Java) from the Windows Azure Developer Center and take advantage of the free offer from Twilio and Windows Azure.

imageSending text messages from Windows Azure has never been so easy. For example with .NET:

//grab the library for .NET using NuGet

PM> Install-Package Twilio

It can be this easy

//put your account info!
string accountSID = "your_twilio_account";
string authToken = "your_twilio_authentication_token";
//grab your account!
TwilioRestClient client;
client = new TwilioRestClient(accountSID, authToken);
Twilio.Account account = client.GetAccount(); 
// Send an SMS message.
SMSMessage result = client.SendSmsMessage(
"+14155992671", "+12069419717", "Windows Azure and Twilio ROCK!");
// catch the error
if (result.RestException != null)
string message = result.RestException.Message;

Signup for 1000 free text messages or incoming voice minutes today!

Mary Jo Foley (@maryjofoley) provided more background on the Twilio agreement in a 5/2/2012 post to ZD Net’s All About Microsoft blog:

imageIt’s no secret Microsoft has been courting startups like mad to get them on board with Microsoft technologies. Sometimes, those efforts pay off.

In Twilio’s case, a handshake at the South by Southwest Conference in March led to a full-on partnership with the Windows Azure cloud team, the first fruits of which were announced on May 2.

imageTwilio, which calls itself a “cloud communications company,” offers RESTful programming interfaces for telephony and text messaging to devs who want to build them into applications. GroupMe, another startup which was acquired by Microsoft’s Skype division last year, was built on top of Twilio’s APIs.

imageAs of today, Microsoft and Twilio are offering Azure developers 1,000 free text messages or inbound voice minutes when they activate their new Twilio accounts. The idea is devs will be able to integrate text and phone services into applications that are hosted on Windows Azure by using Twilio’s helper libraries available for Java, PHP, C#/.Net and node.js, according to Twilio officials.

Jon Plax, Twilio Director of Product Management, said he was surprised that a handshake with the Azure team back in March led to such a quick onboarding by Microsoft. He said Microsoft officials said they had heard some of their developers were interested in the ability to more easily incorporate text messages into their apps. Microsoft and Twilio worked together on the code as well as the technical documentation for Twilio on Azure, which is available in the Azure developer center, Plax said.

It’s worth noting that Twilio’s back-end infrastructure is still hosted on Amazon Web Services and that isn’t changing as a result of the partnership with Microsoft.

“We have a friendly relationship with a variety of cloud vendors,” Plax said.

Matias Woloski (@woloski) answered Why node.js on Windows? (for Microsoft centric devs) on 5/1/2012:

imageYesterday, on an internal discussion list, someone said:

as a long-time .NET developer I don’t see the allure of node. We already have lots of framework support for scalability/async as well as very good tooling and library support. My sense is that node.js has a lot of catch-up to do

Here is my answer:

I don’t think node.js is only about the non-blocking capabilities. That’s what made it popular and novel in the beginning. However, there are many other reasons why you would want to use node.js:

  • Community/open source: The node.js open source community is vibrant. Nuget, as of today, has 5751 package (from Node has 9,450 (from It’s amazing the things you can reuse and the quality is really good. Just to pick an example: we developed MarkdownR, which is a collaborative markdown editor, by mashing up things like ShareJS, snowdown, ace and azure storage. ShareJS implements OT (Operational Transformation) which is the same principle used in Google Waves to merge operations from multiple concurrent clients. It builds on top of browserchannel and (which is kind of the SignalR equivalent in node). With this I want to illustrate the kind of things (high level building blocks) that you can rely on.

  • One language to rule them all: It’s JavaScript all over. This could be good or bad for some people, but arguably there are many devs out there who love JavaScript. And if you mix it with mongo, you have the full stack. If you want to take it further, I recommend you to look at this:

  • Multi-device/multi-platform: It runs on Windows and Linux and the benchmarks are pretty good. This makes it a great choice for develop software that runs on cloud and on-premise on any device (ARM/x86/x64) and platform (even on a $35 usd computer, Raspberry PI). For instance, I am currently developing a logging/auditing server on node and mongo. I can run it on Azure and provide the “Enterprise” version running on the customer premises (or even customer cloud) and it’s the same code. I could even setup a server with the same code in a different cloud (AWS, Heroku, nodejitsu, etc.) within minutes. More flexibility.

  • developer friendly: For development, you can use Mac, Linux or Windows with a simple text editor like Sublime Text 2 or go with WebMatrix to get IntelliSense. All free.

  • NoSQL: If you are interested in NoSql, you have tons of options and drivers for node.js.

  • Mix and match: And to make it even more interesting, you can run node.js as part of an ASP.NET solution with IIS node (mix and match). So it’s not an all or nothing decision.

Then Glenn Block jumped into the conversation and added:

  • Lightweight: It is also extremely lightweight. In node you start with a single exe on Windows, or command on Mac. It’s hard to really grock that until you actually use it.

  • Designed for real time: Aside from that another very interesting aspect is that node was designed specifically with real time web apps in mind i.e. WebSockets etc. There are some very compelling modules for building realtime applications which is one of the sweet spots for the community around node.

imageAnd last but not least, Glenn makes emphasis on:

  • Microsoft is not selling node: Microsoft is not trying to tell .net devs to use node, we are trying to make Azure a more open place.

I also recommend reading this post by Rick who also comes from a Microsoft perspective and compares the processing model of IIS against the event-loop from node.js:

Steve Plank (@plankytronixx) posted Article: I’ve built a SaaS solution on Windows Azure: now, how do I know what to charge my customers? on 5/1/2012:

imageWhen we move from writing shrink-wrapped software that a customer runs a setup.exe in their own datacenter to the completely different world of providing the software’s functionality as a service over the Internet, probably our first port of call is to base it on a cloud platform like Microsoft, Google, Amazon, Rackspace and so on.

imageThat’s an architectural decision, mostly technical. The commercial question, is “how do we charge for our service?”. When you think about the models that are out there – take or Office 365’s models where each individual customer is charged a per-user, per-month fee. That seems to be the most common. But what about others? An upfront cost for the establishment of the service, then a monthly charge no matter how many users are on the system; a direct charge to each customer for the resources they use each month such as database, storage and bandwidth (yes – this results in a variable monthly charge but does have the advantage of closely reflecting your cost of providing the service in the first place). I heard an interesting one the other day from a logistics company who charge per delivery. In other words a “per-transaction” cost. The customer would probably be happy with that – as they are more successful, they make more transactions. As business shrinks, the transaction count shrinks but so do the costs – there is a direct relationship between the success of the business and the costs of doing business.

All these models are great, but at the end of the day – you, as the service provider, have to still be able to make money while providing the service. Price your service too low and you lose money, price it too high and your competitors eat in to your customer base.

What you really need to know is how much each of your customers is costing you to service. How much database, storage and bandwidth do they use, what is their resource usage is like. From this you can calculate what a reasonable price is that still allows you to make a profit.

imageThis is where a new Codeplex project can be useful. The Cloud Ninja Metering Block is an extensible and reusable software component designed to assist software developers with the metering of tenant resource usage in a multi-tenant solution on the Windows Azure platform.

There’s even a demo site you can have a look at which shows a collection of Windows Azure services and what resources they are actually consuming:


Maybe you’d like to try this out for yourself. If you have released a Beta service on Windows Azure, you could use this project to monitor usage and get a reasonable idea of what your pricing model should be before you go into a full commercial release. From then on, knowing exactly who is using what resources will allow you to fine-tune your pricing.

Himanshu Singh (@himanshuks) posted Real World Windows Azure: Interview with Soluto Founder Tomer Dvir to the Windows Azure Blog on 5/1/2012:

imageAs part of the Real World Windows Azure series, I connected with Soluto Founder Tomer Dvir about how Windows Azure helped their hosted server environment scale to millions of downloads after the solution won Best in Show at TechCrunch Disrupt. Read Soluto’s success story here. Here’s what he had to say:

Himanshu Kumar Singh: Tell me about the idea behind Soluto.

imageTomer Dvir: Co-founder Ishay Green and I have been programming software since before we were teenagers. In 2008, we started a business and built Soluto, which helps PC users manage their Windows-based PCs of their friends and family over the web. As technology becomes ubiquitous, we want to help people do more and enjoy their devices. And we wanted to do it in Tel Aviv, where we could be at the center of all that innovation.

HKS: When did you launch Soluto?

TD: We publicly launched Soluto in May 2010 at the inaugural TechCrunch Disrupt, a worldwide competition for IT startup companies. Everyone at TechCrunch was talking about Facebook and Twitter applications, and we showed up with PC stuff. But the idea that Soluto could help make people happier with their technology really resonated. Soluto won Best of Show, and suddenly we were an international thought leader, with media coverage in global outlets such as The New York Times, CNN, and the BBC.

HKS: How did winning the award affect your business?

TD: In the month after TechCrunch Disrupt, almost 1 million people downloaded Soluto. As people use Soluto, it collects information about the PCs they work on and then analyzes and presents that data for use in managing other PCs, which requires a lot of computing capacity. We supported Soluto with a hosted environment running SQL Server and Amazon Web Services, but the system failed under the sudden demand. Our server environment couldn't scale up fast enough. We needed a better, more flexible solution, but we still wanted to avoid the risk of a big IT investment.

HKS: What did you do?

imageTD: We wanted to power the application with cloud technology. We expected that transferring a live application to the cloud would be challenging, and quickly determined we could support Soluto most effectively with Windows Azure. Because we could utilize platform services and not just infrastructure (like storage), and also work with familiar tools such as .NET and Microsoft Visual Studio, we knew that we could get to market much faster with Windows Azure than we would with Amazon or other cloud services.

HKS: How are you using Windows Azure?

TD: Soluto users connect to Windows Azure through a web browser and use Soluto to see what has been going not-so-well on a PC and allows people to do simple safe things to make things work better, and also do more with their devices. Each PC added to the service starts sending events about things that are not working well, and also about what’s available on that PC. Using the data from all agents calculating statistics and detecting patterns – Soluto can help people make better decisions by showing data at an aggregate.

HKS: What can users do with Soluto?

TD: Soluto users can remotely understand PC problems, add updates, or install applications, often in a few mouse-clicks and with no action required by the user. From any location, I can update my mother’s Skype, turn on her firewall, or install Dropbox on her PC. If her PC is off, the command will be stored on Windows Azure and executed when she turns her machine on.

Soluto uses Windows Azure Storage to save PC data, deliver it to users, and process it with data from other machines. Soluto users generate tens of millions of data transactions every day, making Soluto one of the largest consumers of Windows Azure resources among startups worldwide.

HKS: What are some of the benefits that you’re seeing from using Windows Azure?

TD: Platform-as-a-service services (Such as TableStorage, Compute) allowed us to quickly develop components that are very easily scalable, without thinking about the backend “pipes”. Demand can spike by as much as 30 million transactions per day, but the team can quickly and easily scale Windows Azure to avoid any break in service.

After migrating to Windows Azure, we grew rapidly. By the beginning of 2012, our application had 3 million downloads. With Windows Azure, we have complete elasticity and endless scalability and we’re ready to serve any peak in consumer demand.

HKS: How do you think Windows Azure compares to other cloud services?

TD: Platform-as-a-service is the number one benefit. Windows Azure is not just infrastructure to run an operating system – it provides services that just work. We don’t have to think about how those services are running in the background (even though we were curious enough to learn – we’re geeks after all..). Also, We didn’t have to learn new development tools—or build a server infrastructure—so we could stay focused on the unique value we offer.

HKS: What’s next for Soluto?

TD: We’re still perfecting the user experience and adding more value, in order to bring more happiness to more people. But with Windows Azure, we have the capacity and flexibility to serve our global user base and maintain our high-profile industry position in the meantime. Using Windows Azure, we have the capacity to process hundreds of terabytes of data. That releases our growth potential and gives us the agility to watch the market evolve, learn from our customers, enhance our products, and develop a profitable model.

Read how others are using Windows Azure.

Vikas Bhatia posted Announcing Casablanca, a Native Library to Access the Cloud From C++ to the Windows Azure blog on 4/30/2012:

imageToday we are announcing Casablanca, a Microsoft incubation effort to support cloud based client-server communication in native code using a modern asynchronous C++ API design.

imageCasablanca is a project to start exploring how to best support C++ developers who want to take advantage of the radical shift in software architecture that Windows Azure represents.

Here’s what you get with Casablanca:

  • Support for writing native-code REST for Windows Azure, including Visual Studio integration
  • Convenient libraries for accessing Windows Azure blob and queue storage from native clients as a first class Platform-as-a-Service (PaaS) feature
  • Support for accessing REST services from native code on Windows Vista, Windows 7 and Windows 8 Consumer Preview by providing asynchronous C++ bindings to HTTP, JSON, and URIs
  • A Visual Studio extension SDK to help you write C++ HTTP client side code in your Windows 8 Metro style app
  • A consistent and powerful model for composing asynchronous operations based on C++ 11 features
  • A C++ implementation of the Erlang actor-based programming model
  • A set of samples and documentation

We have released Casablanca on devlabs as to get feedback from you on what you think your needs are and how we can improve. Please use the forums to give us feedback.

For additional details

Liam Cavanagh (@liamca) continued his What I Learned Building a Startup on Microsoft Cloud Services: Part 12 – Your Customers are my Customers series on 4/27/2012:

imageI am the founder of a startup called Cotega and also a Microsoft employee within the SQL Azure group where I work as a Program Manager. This is a series of posts where I talk about my experience building a startup outside of Microsoft. I do my best to take my Microsoft hat off and tell both the good parts and the bad parts I experienced using Azure.

Over the years I have constantly been amazed at how customers end up using products I work on. No matter, how much thought and planning has been put into a product it seems you can never fully appreciate how customers will use your products until they actually start using them. I suppose that is a key reason for building a “Minimum Viable Product” and to get your product out the door as soon as possible so that you can get this feedback an iterate quickly.

One example of this became very clear early on in the Cotega beta. Initially, I thought that charting of logged data would be really useful for DBA’s to be able to visualize the trends that were happening over time within their database. This was a key reason for adding this feature. After talking with various people, I started to realize that although this was in fact useful to DBA’s, what many of them really felt would be useful would be to take these charts and embed them in their own web site so that their customers could see the health of the system. In a way, their customers would be my customers. This was really a surprise to me and something I had never thought of. As it turned out, it was very easy to implement because it was just a matter of creating a new MVC page that accepted the name of the chart and the user name to be used to builds the chart. Then I could use either <OBJECT> or <IFRAME> tags to take this page and embed it. Here is an example that shows a historical look at how long it takes to connect to my SQL Azure database. The numbers to the right indicate how many milliseconds it took to complete the connection.

This chart will be loaded dynamically each time you load this page. Here is the code that I used to embed this chart. Notice how, I used iFrame, which is because WordPress does not work well with the Object tag.

<iframe src=” Customer Database&amp;userName=liam” frameborder=”0″ marginwidth=”0″ marginheight=”0″ scrolling=”no” width=”500″ height=”400″></iframe>

Cloud Competitive Advantage

As it turned out, this ended up being a real competitive advantage for me over traditional on-premises monitoring solutions. Since Cotega is hosted completely in the Azure environment, it is very easy for me to make these charts available to be embedded in customer’s web sites. If Cotega was an on-premises system, this would have been much more difficult due to firewall and other issues. Also, if the charting was only used by DBA’s, this capability to embed charts and data would not be nearly as critical.

Protovis Charting

While I am on the subject, you might be interested to learn how I chose to implement the charting. In the early stages of the beta, I used a charting control from Infragistics. This control was great and very easy to use (although certainly not cheap). In the end, I decided to use a charting control from Protovis because it had the ability to copy and paste charts directly from the Dashboard so that they could be used in something like Excel or Word. If you are using Internet Explorer, give this a try by right clicking on the above chart and choosing paste into a Word document. Very cool, right? The other nice part about Protovis is that it is easy and free.

<Return to section navigation list>

Visual Studio LightSwitch and Entity Framework 4.1+

Beth Massi (@bethmassi) posted LightSwitch Community & Content Rollup–April 2012 on 5/1/2012:

imageLast Fall I started posting a rollup of interesting community happenings, content, samples and extensions popping up around Visual Studio LightSwitch. If you missed those rollups you can check them all out here: LightSwitch Community & Content Rollups

image_thumb1April continued to roll out content around Visual Studio 11 Beta since it was released back on February 29th. If you haven’t done so already, I encourage you to give it a spin by downloading Visual Studio 11 Beta. Also make sure to check out these LightSwitch Beta resources and community sites:

Have an idea?: LightSwitch UserVoice site
Need to report a bug?: LightSwitch Connect site
Have a question?: LightSwitch Beta Forum
Need to learn what’s new?: LightSwitch Developer Center

LightSwitch in Visual Studio Beta Resources

If you haven’t noticed the LightSwitch team has been releasing a lot of good content around the next version of LightSwitch in Visual Studio 11. Check out the LightSwitch Developer Center for a list of key Beta resources for you to explore. Here’s some of our more popular content:

Companion Client Examples

www.odata.orgOne of the biggest features with LightSwitch in Visual Studio 11 is the Open Data Protocol (OData) support. Not only can you consume OData services in LightSwitch, the middle-tier services are also now exposed as OData service endpoints reachable by other clients. In April, the community posted many more examples of alternative clients that you can build against your LightSwitch services. I’d like to particularly thank Michael Washington who runs the who has been on fire this month posting most of these. Awesome!

Excel PowerPivot:
Creating and Consuming LightSwitch OData Services

Windows 8:
Using LightSwitch OData Services in a Windows 8 Metro Style Application

Windows Phone:
Consume a LightSwitch OData Service from a Windows Phone application

A Full CRUD DataJs and KnockoutJs LightSwitch Example Using Only An .Html Page

JQuery Mobile:
A Full CRUD LightSwitch JQuery Mobile Application

Communicating With LightSwitch Using Android App Inventor

Unity 3D:
Using Visual Studio LightSwitch To Orchestrate A Unity 3D Game

MSDN Magazine Column: Leading LightSwitchApril 2012

One of our other LightSwitch community rock stars started a new LightSwitch column in MSDN Magazine in March. Jan van der Haegen continues his journey into the depths of LightSwitch with his second article in the April issue:

Leading LightSwitch: The LightSwitch MVVM Model
In this month’s Leading LightSwitch column, Jan explains MV3, a variation of the MVVM application architecture used for LightSwitch apps that is even more powerful than the original application architecture.

New Visual Studio LightSwitch Books Released

Paul Ferrill released his new book in the beginning of April on building SharePoint Apps with LightSwitch. It sounds like this is a short book for the LightSwitch beginner looking to connect to SharePoint data.

Also released was Pro Visual Studio LightSwitch 2011 Development by Tim Leung and Yann Duran. You may have seen these guys helping answer questions in the LightSwitch forums and they really know their stuff.

More LightSwitch books to check out:

imageXpert 360’s Dynamics CRM Online Adapter LightSwitch Extension Released

In April Xpert 360 released a FREE LightSwitch extension that allows you to connect to Dynamics CRM. Looks like they are also working on one for connecting to Salesforce. Check out these resources to get started:

Other Commercial LightSwitch Extensions from our Partners

Many of our Visual Studio partners who build LightSwitch extensions injected into the #LightSwitch twitter feed this month to remind us of these great products, some with new features recently added.

MSDN Webcast: What's New with LightSwitch in Visual Studio 11MSDN Webcast: What's New with LightSwitch in Visual Studio 11 (Level 200)

Join me Friday May 11th 1:00 PM PST for What's New with LightSwitch in Visual Studio 11.

Microsoft Visual Studio LightSwitch is the simplest way to build business applications and data services for the desktop and the cloud. LightSwitch contains several new features and enhanced capabilities in Visual Studio 11. In this demonstration-heavy webcast, we walk through the major new features, such as creating and consuming OData services, new controls and formatting, new features with the security system and deployment, and much more.

Register here.

Notable Content this Month

Extensions (see all 85 of them here!):

In addition to the above, we had a few more extensions from the community released this month.

Samples (see all 80 of them here):

Team Articles:

Lots more articles from the team this month on all the new features in LightSwitch in Visual Studio 11.

We also had a couple top requested How To posts that apply to all versions of LightSwitch:

Community Content:

LightSwitch Team Community Sites

Become a fan of Visual Studio LightSwitch on Facebook. Have fun and interact with us on our wall. Check out the cool stories and resources. Here are some other places you can find the LightSwitch team:
LightSwitch MSDN Forums
LightSwitch Developer Center
LightSwitch Team Blog
LightSwitch on Twitter (@VSLightSwitch, #VisualStudio #LightSwitch)

Eric Erhardt described Updating Records in LightSwitch using Stored Procedures in a 4/30/2012 post to the Visual Studio LightSwitch blog:

image_thumb1Since stored procedures provide benefits over directly using TSQL commands, a lot of database administrators like to completely hide database tables from applications. Instead, they will expose a View to read data from the table, and then create stored procedures to allow inserting, updating and deleting records in the table.

The problem is, Visual Studio LightSwitch doesn’t understand how to call stored procedures to insert, update and delete records out of the box. Thus, any database that only allows stored procedures to update records seems like it is unusable in LightSwitch.

However, LightSwitch can still support these databases, but it just needs a little more work to get it hooked up. I’ll show you a simple way to manipulate these records in LightSwitch.

General Approach

To work with stored procedures in our data model, we are going to use one of my favorite features in LightSwitch: attaching to a custom WCF RIA Service. The reasons this is one of my favorite features is because it provides you the flexibility you need to bring any data into your application. If you have data stored in a text file, a spreadsheet, out on the web, or wherever, you can write a WCF RIA Service to expose that data and LightSwitch will seamlessly work with that data.

Overall, the approach will be:

  • Create an Entity Framework .edmx model that can write using the database’s stored procedures
  • Create a WCF RIA Service that talks to the Entity Framework model
  • Import the WCF RIA Service into LightSwitch and use it as a normal data source

The stand alone VS LightSwitch 2011 doesn’t have an Entity Framework designer in it. In order to build the Entity Framework .edmx model, you will need one of the following:

I will use the standard Northwind database for this article. However, I will add 3 standard stored procedures to it: InsertCustomer, UpdateCustomer, and DeleteCustomer. These will be simple stored procedures that call INSERT, UPDATE, and DELETE TSQL commands respectively. See the appendix at the end of this article or the MSDN Code Gallery: Updating Records in LightSwitch using Stored Procedures sample for this article for the stored procedures’ definitions.

Entity Framework Model

First start by creating a new Class Library in Visual Studio (not the stand alone Visual Studio LightSwitch, but Visual Studio Pro or Visual Studio Express), name it “NorthwindService”. In the Solution Explorer, right-click the “NorthwindService” project and select “Add” –> “New Item…”. Select “ADO.NET Entity Data Model” and name it “Northwind.edmx”.


Select “Generate from database” and enter the connection information to your Northwind database and click “Next”. Expand the “Tables” node and select “Customers”. (Note: you could have selected a view if the table isn’t exposed.) Also expand the “Stored Procedures” node and select “InsertCustomer”, “UpdateCustomer”, and “DeleteCustomer”. (Note these stored procedures don’t exist in a default Northwind database. They need to be added. See the gallery sample for the commands to create them.)


Click Finish. This generates you an entity model that contains the Customer entity in it. We want to insert, update and delete through our stored procedures. So we need to tell the Entity Framework how to do that. Right-click the Customer entity and select “Stored Procedure Mapping”. At the bottom of the screen, you will see a windows with three grid rows:


Click on each of these dropdowns and select the corresponding stored procedure for each action. The Entity Framework will automatically hook up the parameter mapping if the names match. If the names don’t match, you can easily map each property to the corresponding parameter. When you are finished your mapping screen should look like:


You now have an Entity Framework data model that will insert, update and delete records using your stored procedures.

Creating the WCF RIA Service

The next thing we need is a way for LightSwitch to consume this Entity Framework model. That is where WCF RIA Services fits into the picture. There are two ways to create a WCF RIA Service. If you have a Visual Studio Professional installation, you can right-click the project in the Solution Explorer and “Add” –> “New Item…” and select “Web” –> “Domain Service Class”.


Name the file “NorthwindService.cs” (or .vb) and click “Add”. In the “Add New Domain Service Class” dialog, accept the defaults and click OK.

If you are working with Visual Studio Express, you can do all the steps the above Add New Item template does by hand:

  • Add References to the following .NET assemblies:
    • System.ComponentModel.DataAnnotations
    • System.ServiceModel.DomainServices.EntityFramework (Look in %ProgramFiles(x86)%\Microsoft SDKs\RIA Services\v1.0\Libraries\Server if it isn’t under the .Net tab)
    • System.ServiceModel.DomainServices.Server (Look in %ProgramFiles(x86)%\Microsoft SDKs\RIA Services\v1.0\Libraries\Server if it isn’t under the .Net tab)
  • Create a new Code File by “Add” –> “New Item…” and select “Code File” named “NorthwindService”

No matter which route you take, copy and paste the following code into your NorthwindService.cs file:


using System.Data;
using System.Linq;
using System.ServiceModel.DomainServices.EntityFramework;
using System.ServiceModel.DomainServices.Server;

namespace NorthwindService
    public class NorthwindService : LinqToEntitiesDomainService<NorthwindEntities>
        [Query(IsDefault = true)]
        public IQueryable<Customer> GetCustomers()
            return this.ObjectContext.Customers;

        public void InsertCustomer(Customer customer)
            if ((customer.EntityState != EntityState.Detached))
                    ChangeObjectState(customer, EntityState.Added);

        public void UpdateCustomer(Customer currentCustomer)

        public void DeleteCustomer(Customer customer)
            if ((customer.EntityState != EntityState.Detached))
                    ChangeObjectState(customer, EntityState.Deleted);


Imports System.ComponentModel.DataAnnotations
Imports System.ServiceModel.DomainServices.EntityFramework
Imports System.ServiceModel.DomainServices.Server

Public Class NorthwindService
    Inherits LinqToEntitiesDomainService(Of NorthwindEntities)

    Public Function GetCustomers() As IQueryable(Of Customer)
        Return Me.ObjectContext.Customers
    End Function

    Public Sub InsertCustomer(ByVal customer As Customer)
        If ((customer.EntityState = EntityState.Detached) = False) Then
                ChangeObjectState(customer, EntityState.Added)
        End If
    End Sub

    Public Sub UpdateCustomer(ByVal currentCustomer As Customer)
    End Sub

    Public Sub DeleteCustomer(ByVal customer As Customer)
        If ((customer.EntityState = EntityState.Detached) = False) Then
                ChangeObjectState(customer, EntityState.Deleted)
        End If
    End Sub
End Class

You now are ready to import this WCF RIA Service into a LightSwitch application.

LightSwitch Application

Open Visual Studio LightSwitch and create a new LightSwitch application. Name it “Northwind”. Make sure the solution is being shown by opening Tools –> Options. Under “Projects and Solutions” ensure that “Always show solution” is checked. Then, in the Solution Explorer, right-click on the solution and say “Add –> Existing Project”. Navigate to the Class Library you created above. (The Open File Dialog may filter only *.lsproj files. Either typing the full path into the dialog or navigating to the folder, typing * and pressing “Enter” into the File name text box will allow you to select the NorthwindService.csproj/vbproj project.)

Now click the “Attach to external Data Source” link in LightSwitch and select “WCF RIA Service”.


Click Next. Since the new LightSwitch application doesn’t have any references, no classes are shown. Click the “Add Reference” button and select the “Projects” tab and double-click the NorthwindService project.


Select the “NorthwindService.NorthwindService” class and click Next.


Select all the Entities check box and click “Finish”.


One last thing needs to happen in order for you to use this data source. You need to copy the “NorthwindEntities” connection string from the NorthwindService class library’s App.Config into your application’s web.config. Open the App.config file and copy the line starting with “<add name=”NorthwindEntities”…”. Paste this line under the <connectionStrings> node in the LightSwitch web.config. To open the LightSwitch web.config:

Switch your Solution Explorer to “File View”.


Then click on the “Show All Files” tool bar button.


Under the “ServerGenerated” project (or just “Server” project if you are using Visual Studio 11) you will find the “Web.config” file.

That’s it. Now you have a fully functioning data source that can insert, update and delete records using stored procedures. You can create your LightSwitch screens and business logic as usual, and quickly and easily make a functioning application using stored procedures.

I have uploaded a sample application in both C# and VB to the MSDN Code Gallery at


p.s. For more information on what the Entity Framework can do with views and stored procedures, see Julie Lerman’s excellent article: Stored Procedures in the Entity Framework

Stored Procedures Used in This Article
CREATE Procedure [dbo].[InsertCustomer]
    @CustomerID nchar(5),
    @CompanyName nvarchar(40),
    @ContactName nvarchar(30),
    @ContactTitle nvarchar(30),
    @Address nvarchar(60),
    @City nvarchar(15),
    @Region nvarchar(15),
    @PostalCode nvarchar(10),
    @Country nvarchar(15),
    @Phone nvarchar(24),
    @Fax nvarchar(24)
INSERT INTO [Northwind].[dbo].[Customers]
CREATE PROCEDURE [dbo].[UpdateCustomer]
    @CustomerID nchar(5),
    @CompanyName nvarchar(40),
    @ContactName nvarchar(30),
    @ContactTitle nvarchar(30),
    @Address nvarchar(60),
    @City nvarchar(15),
    @Region nvarchar(15),
    @PostalCode nvarchar(10),
    @Country nvarchar(15),
    @Phone nvarchar(24),
    @Fax nvarchar(24)
UPDATE [Northwind].[dbo].[Customers]
   SET [CompanyName] = @CompanyName
      ,[ContactName] = @ContactName
      ,[ContactTitle] = @ContactTitle
      ,[Address] = @Address
      ,[City] = @City
      ,[Region] = @Region
      ,[PostalCode] = @PostalCode
      ,[Country] = @Country
      ,[Phone] = @Phone
      ,[Fax] = @Fax
 WHERE CustomerID=@CustomerID 
CREATE PROCEDURE [dbo].[DeleteCustomer]
    @CustomerID nchar(5)

DELETE FROM [Northwind].[dbo].[Customers]
 WHERE CustomerID=@CustomerID

Matt Sampson continued his OData series with OData Apps in LightSwitch, Part 3 on 5/30/2012:

Welcome back! In this post we are going to wrap up the OData application we started back in Part 1 and continued in Part 2.

To refresh everyone on how we got here, we started with the idea that we would use some new features in LightSwitch for Visual Studio 11 (Beta) that would allow us to attach to an OData service. Specifically we are attaching to the Commuter API OData service.

We wanted to solve 4 basic problems:

  1. Where is my stop? (we solved this in Part 1 and 2 with a Bing maps extension)
  2. When is my train arriving? (we solved this in Part 1)
  3. How do I get there? (we solved this in Part 1 and 2 with Route information and maps)
  4. How many escalators will be broken today? (we saved this problem for Part 3)

To find a solution to this last problem we are going to pull in data from the “Incidents” entity in the OData service. This entity contains all the information regarding broken escalators and things of that nature.

Retrieving “Incidents” data

If we open up our OData app and look at the Incidents entity in the Entity Designer, it should look like this:


Note: If you don’t have the Incidents entity in your entity designer, then right click on the TransitDataData data source and select “Update data source”. Then you will be able to import the Incidents entity.

You can see we don’t have any relationships on this entity. There are indeed relationships defined on this entity in the OData Service, but they are all “many to many” relationships. And LightSwitch does not provide support for many to many relationships out of the box.

That’s too bad, but it’s not a deal breaker. I would really like to relate the Incident data with our Stops data so that on our Stops List and Detail screen we could show all the incidents related to that stop (for example, the Metro Center stop has a broken escalator, and trains are all delayed because of snow).

Fortunately for us we can still make this work and pull in information for the Incidents entity and relate it to a specific Stop by creating a Custom RIA Data source. In general, Custom RIA Data Sources are beneficial for a wide array of problems, like aggregating data. Our custom RIA data source is going to enable us to create our own query to pull in the Incidents data from the Transit Data OData service and relate it with a corresponding Stop.

Custom RIA Data Source – Creation

Let’s get started on making our basic Custom RIA data source.

First of all, make sure, in Visual Studio, under Tools –> Options –> Projects and Solutions that you have the “Always show solution” checkbox checked.

Now, right click on your solution and select Add –> New Project.

Let’s just add a “Class Library” (I’m using a C# Class Library because my LightSwitch project is a C# project). Call the new project “RiaService” and click OK.

In our new project, double click on the “Properties” icon and change the Target Framework to .NET Framework 4.0. This will be important later on when we try to attach to our Custom RIA data source (I’ll explain when we get there).

Now right click our new project and select Add –> New Item. Select Visual C# Items –> Web –> Domain Service Class. Call it TransitData and hit OK. The only reason why I chose this file type is because it automatically adds a bunch of project references that we are going to need (you certainly could have just selected a basic Class file and added the references yourself). We still need to add one more reference to our project. So right click the References node and select Add Reference. Add a reference to System.Data.Services.Client.dll.

Your list of References should look like this:


Now we need to add a service reference to the Transit Data OData service. Right click the References node again and select “Add Service Reference” and use this OData service - (this is of course the same one that we attached to with our LightSwitch application in Part 1). Call the namespace TransitDataServiceReference. It should look something like this -

Once you hit OK it will attach to that OData service and generate code for the corresponding entities in the service. You’ll see now in a minute how we have an Incidents entity available to us in our custom RIA data service project. It’s important to understand that we are in NO way using any code right now from our LightSwitch project, or am I referring to the Incidents entity that is available in our LightSwitch project under the TransitDataData source. The Transit Data service that we just attached to and the Incidents entity that we have generated code for is specific to this RIA project.

At this point we have a basic custom RIA data service. We just have to add a bit of code at this point before we can add the RIA service to LightSwitch as a data source.

Custom RIA Data Source – Adding Code

Let’s add some code now to the TransitData.cs file. Copy and paste the below code into your file:

  1. using System;
  2. using System.Collections.Generic;
  3. using System.ComponentModel;
  4. using System.ComponentModel.DataAnnotations;
  5. using System.Linq;
  6. using System.ServiceModel.DomainServices.Hosting;
  7. using System.ServiceModel.DomainServices.Server;
  8. using RIAService.TransitDataServiceReference;
  9. namespace RIAService
  10. {
  11. public class TransitDomainService : DomainService
  12. {
  13. [Query(IsDefault = true)]
  14. public IQueryable<Incident> GetIncidents()
  15. {
  16. return null;
  17. }
  18. [Query]
  19. public IQueryable<Incident> GetIncidentsByStop(string stopId)
  20. {
  21. TransitData myTransitData = new TransitData(new Uri(""));
  22. return myTransitData.Stops.Where(s => s.StopId == stopId).SelectMany(s => s.Incidents);
  23. }
  24. }
  25. }
  26. namespace RIAService.TransitDataServiceReference
  27. {
  28. [MetadataType(typeof(Metadata))]
  29. public partial class Incident
  30. {
  31. public class Metadata
  32. {
  33. [Key]
  34. public string IncidentId
  35. {
  36. get;
  37. set;
  38. }
  39. }
  40. }
  41. }

That’s it for code that we’ll need to write. Now let’s go over it:
Note how we have a “using TransitDataServiceReference” statement at the top. When we generated code for our service reference that we added we put all that code under the TransitDataServiceReference namespace. This will allow us to directly reference the Incident entity and the TransitData class.

For a custom RIA data service we need to have a default query type which is what the GetIncidents() method is. This query would typically be the “All” query or the query that returns all of the records for the entity. For our purposes we don’t need to do anything more than just return null here.

The GetIncidentsByStop() method we define as a “Query” as well. This means that when we attach to this RIA Data service through LightSwitch that this query will show up in the Query designer. This is really the query that’s going to do all our work. The code in this method is:

  1. Creating a new instance of the TrainsitData data service (for which you have to pass in the Uri to the actual OData service)
  2. Querying on all of the “Stops” and retrieving all the Incidents that have a “StopId” corresponding to the “stopId” that was passed into the method. (For example, if we pass in the Metro Center stop it will retrieve all the Incidents that are related to that stop).
  3. Those incidents related to the stopId are then returned

At the bottom of the code snippet you’ll notice that we have a separate class. The Incidents entity that was generated does not have a primary key defined on it. We are going to need one before we try to attach to this RIA data service with LightSwitch. So we need to specify which field is the primary key, which is all we are doing here.

Extending upon the “Incident” class that was generated we define an inner class called Metadata. Then we specify that the IncidentId property is the primary key by giving it the “Key” attribute.

More information on partial classes: Partial Classes and Methods (C# Programming Guide) and Partial (Visual Basic)

More information on creating a metadata class for RIA services: How to Add Metadata Classes

More information on the “Key” attribute: A guide through WCF RIA Services attributes

Add Our RIA Data Source

Go ahead and build our RIAService project. Remember we made our RIA project target the .Net 4.0 framework? That was important otherwise you will get an error when trying to attach to it here, since the LightSwitch project is also targeting the .NET 4.0 Framework.

Open up the LightSwitch project, and let’s add this data source to it, like so:


If the RIA service doesn’t show up here, then select Add Reference –> Solution –> Projects and select the RIAService project.


Click Next, select all the entities and queries (there should only be the Incidents entity and GetIncidentsByStop query).

We should now have the data source, entity and query in our designer:


Modify our Stops List and Detail Screen

We’ll need to modify our Stops List and Detail screen now to include the Incidents information that we want (my screen is specifically called StopsDCMetroListDetail).

Double click on our screen to bring it up in the screen designer. With the screen designer open, select “Add Data Item…” and select the GetIncidentsByStop query like so:


Hit OK and it will be added to the screen designer.

Now drag and drop the data for GetIncidentsByStop onto our screen so that it looks like this:


Now we’ll need to bind our stopId query parameter to whatever Stop we currently have selected on our screen.

So, double click on the stopId query parameter, and then in the “Properties” window bind it to StopsDCMetro.SelectedItem.StopId so that it looks like this:


This effectively says that the “stopId” query parameter is going to be equal to whatever Stop we currently have selected on our screen.

And that is it! Save it, and re-build the solution.

F5 IT!

F5 it and open up the Stops DC Metro List and Detail screen. You should see a list of Incidents beneath the Details for the Stop. So it will look something like this now:


You can see in this instance that not even the Pentagon is immune to the ubiquitous broken escalator problem.

Wrapping it Up

To summarize what we’ve done here, I showed you how to create a RIA Service wrapper around a data source that has many to many relationships. We specifically utilized this RIA Service to relate data between the Stops and the Incidents entity (so that we can tell how many escalators will be busted at our Metro stop).

That pretty much does it for our OData sample app. I will be publishing all this code (C# and VB.NET) out on the MSDN Code Gallery and will update the blog as soon as I have that done.

Please let me know if you have any questions or requests for future topics.

Beth Massi (@bethmassi) described Using the Save and Query Pipeline to Flag and Filter Data with LightSwitch in Visual Studio 11 on 4/26/2012 (missed when published):

Note: This information applies to LightSwitch in Visual Studio 11 (LightSwitch V2)

imageIn business applications sometimes we need to flag records with additional attributes in response to business rules and then consistently filter those flagged records in some way or another throughout the application. For instance we may have critical or historical data that must always be stored in our databases and never deleted. I used to work in the health care industry and it was important to keep historical patient data. Therefore we would not allow a user to delete a patient’s hospital visit information from the system. Over time, however this causes our data sets to get very large and distracts users from finding the relevant information they need. Hence, we need a way to flag these records and then filter this data out of our result sets across the entire application.

image_thumb1Last year I wrote an article based on LightSwitch in Visual Studio 2010 (LightSwitch V1) showing how you can use the Save & Query pipeline to archive and filter records instead – effectively marking them deleted but not actually deleting them from the system: Using the Save and Query Pipeline to “Archive” Deleted Records. With LightSwitch in Visual Studio 11 (LightSwitch V2) the filtering mechanism has been improved using the new entity set filters.

In this post I’ll show you how to use the new entity set filter methods to apply global filtering of records. But first, let’s recap how we can flag these records using the save pipeline.

Tapping into the Save Pipeline

The save pipeline runs in the middle tier anytime an entity is being updated, inserted or deleted. This is where you can write business logic that runs as changes are processed on the middle tier and saved to data storage. Let’s take an example where we don’t want to ever physically delete customers from the system. Here’s our data model for this example. Notice that I’ve created a required field called “IsDeleted” on Customer that is the type Boolean. I’ve unchecked “Display by Default” in the properties window so that the field isn’t visible on any screens.


In order to mark the IsDeleted field programmatically when a user attempts to delete a customer, just select the Customer entity in the data designer and drop down the “Write Code” button and select Customers_Deleting method.


Here are the 2 lines of code we need to write:

Private Sub Customers_Deleting(entity As Customer)
    'First discard the changes, in this case this reverts the deletion
    'Next, change the IsDeleted flag to "True"
    entity.IsDeleted = True
End Sub

Notice that first we must call DiscardChanges in order to revert the entity back to it’s unchanged state. Then we simply set the IsDeleted field to True which puts the entity into a change state. So the appropriate save pipeline methods (Customers_Updating & Customers_Updated) will now run automatically. You can check the state of an entity by using the entity.Details.EntityState property.

This save pipeline code remains unchanged from LightSwitch V1. However in LightSwitch V2, we need to filter the data differently. Let’s see how.

Using Entity Set Filters

Now that we’re successfully marking deleted customers, the next thing to do is filter them out of our queries so that they don’t display to the user on any of the screens they work with. In LightSwitch V2 a new query interception method, EntitySet_Filter, has been added which allows you to specify a filter that is applied whenever an entity set is referenced. (See the recent article on the LightSwitch Team Blog by Michael Simons: Filtering Data using Entity Set Filters for details on this new feature.)

So instead of using the Customers_All_PreprocessQuery like we had to in LightSwitch V1 as demonstrated here, we can apply the IsDeleted filter in the Customers_Filter method. Using the entity set Filter method is preferred over the All_PreprocessQuery method because the Filter method will execute any time the entity set is referenced, even when the entity has relationships and is not the direct target of a query. This enables true row level filtering across the application.

So for this example, select the Customer entity in the data designer and drop down the “Write Code” button and select Customers_Filter method.


Here we need to write a filter predicate as a lambda expression.

Private Sub Customers_Filter(ByRef filter As Expressions.Expression(Of Func(Of Customer, Boolean)))
    filter = Function(c) c.IsDeleted = False
End Sub

These expressions can seem a little tricky at first but they just take some practice. You can think if these as in-line functions. This one takes a customer parameter and then you return a Boolean expression used to filter the records. You can filter on anything you want, even filter based on security permissions like Michael showed in his blog post. (For more information on lambda expressions see: Lambda Expressions (Visual Basic) and Lambda Expressions (C#).)

Now that we’ve got our Save and Query pipelines set up to handle the “deleted” Customers, we can run the application. When a user deletes a Customer, the behavior is exactly the same as if the Customer was physically deleted from the database.


And if we peek inside the actual Customer table in the database you can see them flagged correctly.


Also, now when we write a query where Customers is not the direct target, the row filtering will still apply. For instance, we have a parent table called Regions. If we wanted to create a query “Regions Over 10 Customers” we would write our query like so:

Private Sub RegionsOver10Customers_PreprocessQuery(ByRef query As IQueryable(Of Region))
    query = From r In query
            Where r.Customers.Count > 10
End Sub

In LightSwitch V1 the additional filter in the Customers_All_PreProcessQuery would not be called when we executed this query. It would include the count of the Customers flagged IsDeleted – which isn’t what we want. You would have to include that filter in this query as well. So if you had a lot of queries in your application that indirectly reference the Customer, then this could get difficult to maintain. In LightSwitch V2 the Customers_Filter method is always called anytime the Customer entity set is referenced, regardless if it is the direct target of a query or not, so this query would always return the expected results.

Because we want to encourage using this approach to achieve true row level filtering, for new LightSwitch V2 projects there is no entry point for EntitySet_All_PreProcessQuery methods anymore. With upgraded projects from V1 they will still run the same as before and you will see them as modeled queries in the Solution Explorer. So there are no breaking changes to your application, but you will want to strongly consider using the entity set Filter methods instead.

Wrap Up

I hope this post has demonstrated how powerful and simple LightSwitch can be when working with data in the save and query pipelines. We’ve beefed up the global filtering technique so that it can handle scenarios like I showed above as well as row level security scenarios Michael demonstrated. If you were using the _All_PreProcessQuery methods, then they will continue to run when you upgrade to Visual Studio 11. However you should be using the entity set Filter methods instead for applying global filters that will run no matter how the entities are accessed.

Return to section navigation list>

Windows Azure Infrastructure and DevOps

David Linthicum (@DavidLinthicum) asserted “For a successful cloud implementation, IT must prepare ahead of time and not merely adjust after the fact” in a deck for his To win with cloud computing, change IT first article of 5/1/2012 for InfoWorld’s Cloud Computing blog:

imageMost organizations consider the move to cloud-based platforms as simple additions to the existing portfolio of IT systems. However, if internal IT does not change around the usage of most cloud services, enterprise IT won't get the full benefits. Indeed, many initial uses of cloud computing resources in such organizations will end up in failure. The dirty little secret is that most of the change to IT needs to occur before the first implementation to make cloud computing holistically successful.

imageBut most IT organizations are not wired that way. IT doesn't like to prepare ahead of time; it would much rather react. Thus, when cloud storage is adopted, for example, enterprise IT spends the months after the migration trying to adjust internal systems to take full advantage of the new cloud storage resources. The result is less than optimal, resulting in another silo with patchwork links to other core enterprise systems.

To avoid this fate, follow these two core rules:

First, any addition or replacement of an enterprise IT resource (storage, compute, and so on) requires enough preparation so that IT resources can become at least 90 percent productive after a shift to a cloud service, whether IaaS, PaaS, or SaaS. This means you've done enough configuration and development so that existing core business systems can exploit the value of this cloud computing service right away. This is a tactical move, but essential nonetheless.

Second, there needs to be an ongoing holistic enterprise architecture program that considers the use of cloud-based services. The use of cloud computing needs to exist in a much larger plan, not tacked on or added in isolation. But most enterprise architecture departments may not have the control required to truly drive the systemic and longer-term changes that will make cloud computing effective. This is a strategic move, but few organizations acknowledge or enable it.

You'd think these suggestions are common sense, but experience shows they are not. The reality of corporate politics and budget cycles means that these rules, even if known, tyically fall by the wayside. That needs to change if we're going to make the cloud work.

Wely Lau (@Wely_Live) posted An Introduction to Windows Azure (Part 1) to his blog on 4/30/2012:

This post was also published at A Cloud[y] Place blog.

imageWindows Azure is the Microsoft cloud computing platform which enables developers to quickly develop, deploy, and manage their applications hosted in a Microsoft data center. As a PAAS provider, Windows Azure not only takes care of the infrastructure, but will also help to manage higher level components including operating systems, runtimes, and middleware.

imageThis article will begin by looking at the Windows Azure data centers and will then walk through each of the available services provided by Windows Azure.

Windows Azure Data Centers

Map showing global location of datacenters

Slide 17 of WindowsAzureOverview.pptx (Windows Azure Platform Training Kit)

Microsoft has invested heavily in Windows Azure over the past few years. Six data centers across three continents have been developed to serve millions of customers. They have been built with an optimized power efficiency mechanism, self-cooling containers, and hardware homogeneity, which differentiates them from other data centers.

The data centers are located in the following cities:

  • US North Central – Chicago, IL
  • US South Central – San Antonio, TX
  • West Europe – Amsterdam
  • North Europe – Dublin
  • East Asia – Hong Kong
  • South-East Asia – Singapore

Windows Azure Datacenters- aerial and internal views

Windows Azure data centers are vast and intricately sophisticated.

Images courtesy of Microsoft

Windows Azure Services

Having seen the data centers,let’s move on to discuss the various services provided by Windows Azure.

Microsoft has previously categorized the Windows Azure Platform into three main components: Windows Azure,SQL Azure, and Windows Azure AppFabric. However, with the recent launch of the Metro-style Windows Azure portal, there are some slight changes to the branding, but the functionality has remained similar. The following diagram illustrates the complete suite of Windows Azure services available today.

The complete suite of Windows Azure services available today

The complete suite of Windows Azure services available today

A. Core Services
1. Compute

The Compute service refers to computation power, usually in the form of provisioned Virtual Machines (VMs). In Windows Azure, the compute containers are often referred to as ‘roles’. At the moment, there are three types of roles:

(i) Web Roles

Web Roles offer a predefined environment, set-up to allow developers to easily deploy web applications. Web server IIS (Internet Information Services) has been preinstalled and preconfigured to readily host your web application.

(ii) Worker Roles

Worker Roles allow the developer to run an application’s background processes that do not require user interface interaction. Worker Roles are perfectly suitable to run processes such as scheduled batch jobs, asynchronous processing, and number crunching jobs.

(iii) VM Roles

VM Roles enable developers to bring their customized Windows Server 2008 R2 VM to the cloud, and configure it. VM Roles are suitable for cases where the prerequisite software requires lengthy, manual installation.

Using VM Roles has one substantial drawback. Unlike Web Roles and Worker Roles, whereby Windows Azure will automatically manage the OS, VM Roles require developers to actively manage the OS.

Apart from ‘roles’, there are two other essential terms, namely ‘VM Size’ and ‘Instance’.

  • VM Size denotes the predefined specifications that Windows Azure offers for the provisioned VM. The following diagram shows various Windows Azure VM Sizes.

Various Windows Azure VM Sizes, and the associated costs

Slide 21 of WindowsAzureOverview.pptx (Windows Azure Platform Training Kit)

  • Instance refers to the actual VM that is provisioned. Developers will need to specify how many instances they need after selecting the VM Size.

Screenshot showing VM size

2. Storage

Windows Azure Storage is a cloud storage service that comes with the following characteristics:

The first step in using Windows Azure Storage is to create a storage account by specifying storage account name and the region:

Screenshot- creating a storage account

There are four types of storage abstraction that are available today:

(i) BLOB (Binary Large Object) Storage

Blob Storage provides a highly scalable, durable, and available file system in the cloud. Blob Storage allows customers to store any file type such as video, audio, photos, or text.

(ii) Table Storage

Table Storage provides structured storage that can be used to store non-relational tabular data. A Table is a set of entities, which contain a set of properties. An application can manipulate the entities and query over any of the properties stored in a Table.

(iii) Queue Storage

Queue Storage is a reliable and persistent messaging delivery that can be used to bridge applications. Queues are often being used to reliably dispatch asynchronous work.

(iv) Azure Drive

Azure Drive (aka X-Drive) provides the capability to store durable data by using the existing Windows NTFS APIs. Azure Drive is essentially a VHD Page Blob mounted as an NTFS drive by a Windows Azure instance.

3. Database

SQL Azure database is a highly available database service built on existing SQL Server technology. Developers do not have to setup, install, configure, or manage any of the database infrastructure. All developers need to do is define the database name, edition, and size. Developers are then ready to bring the objects and data to the cloud:

Screenshot- creating a database

SQL Azure uses the same T-SQL language and the same tools as SQL Server Management Studio to manage databases. SQL Azure is likely to lead to a shift in the responsibility of DBAs toward a more logical administration, as SQL Azure handles physical administration. For example, a SQL Azure database will be replicated to three copies to ensure high-availability.

Although some variations exist today, Microsoft plans to support the features unavailable in SQL Azure in the future. Users can always vote and provide feedback to the SQL Azure team for upcoming feature consideration.

Coming up in my next article, I will carry on the discussion with the additional services that Windows Azure offers including ‘Building Block Services’, Data Services, Networking and more so make sure you keep an eye out for it because it’s coming soon!

Christian Weyer (@christianweyer) explained Measuring performance of your HTTP-based .NET applications: Performance Counters for HttpWebRequest on 4/27/2012:

imageJust found this and thought to share it with you: Network Class Library Team (System.Net): New Performance Counters for HttpWebRequest

Each of the seven green circles represents one of the six performance counters (there are two ‘5’ items because 5 is the average lifetime, and there are two code paths that will affect that counter).

imageNote: Be aware that ‘new’ means new in .NET 4.0 (the blog post is from Aug 2009).

Greg Oliver (@GoLiveMSFT) explained Auto-scaling Azure with WASABi–From the Ground Up in a 5/26/2012 post (missed when published):

image(The information in this post was created using Visual Studio 2010, the January refresh of the Windows Azure Platform Training Kit and v1.6 of the Windows Azure SDK in April of 2012. The autoscaler was installed in the compute emulator. The “app to be scaled” was deployed to Azure.)

imageThe Microsoft Enterprise Library Autoscaling Application Block (WASABi) lets you add automatic scaling behavior to your Windows Azure applications. You can choose to host the block in Windows Azure or in an on-premises application. The Autoscaling Application Block can be used without modification; it provides all of the functionality needed to define and monitor autoscaling behavior in a Windows Azure application.

A lot has been written about the WASABi application block, notably in these two locations:

In this post, I demonstrate the use of WASABi in a compute emulator hosted worker role scaling both web and worker roles for the “Introduction to Windows Azure” hands on lab from the training kit (the “app to be scaled”). I use a performance counter to scale down the web role and queue depth to scale up the worker. Here are a few links:

Here’s an overview of the steps:

  1. Create a new cloud app, add a worker role.
  2. Using the documentation in the link above, use Nuget to add the autoscaling application block to your worker role.
  3. Use the configuration tool (check it out: to set up app.config correctly.
  4. Start another instance of Visual Studio 2010. Load up “Introduction to Windows Azure”, Ex3-WindowsAzureDeployment, whichever language you like.
  5. Change the CSCFG file such that you’ll deploy 2 instances of the web role and 1 instance of the worker role.
  6. Modify WorkerRole.cs so that queue messages won’t get picked up from the queue. We want them to build up so the autoscaler thinks the role needs to be scaled.
  7. Modify WebRole.cs so that the required performance counter is output on a regular basis.
  8. Deploy the solution to a hosted service in Azure.
  9. Set up the Service Information Store file. This describes the “app to be scaled” to the autoscaler.
  10. Set up rules for the autoscaler. Upload the autoscaler’s configuration files to blob storage.
  11. Run the autoscaler, watch your web role and worker role instance numbers change. Success!

And now some details:

For #3 above there are a few things to be configured:

  1. Nothing under “Application Settings”. You can ignore this section.
  2. The Data Point Storage Account must be in the Azure Storage Service rather than the storage emulator. The reason is that they’re using the new upsert statement which is not supported in the emulator. (how many hours did this one cost me???)
  3. Your autoscaling rules file can be stored in a few different places. I chose blob storage. It can be either storage emulator or Azure Storage Service. An example of a rule is “if CPU % utilization goes above 80% on average for 5 minutes, scale up”.
    The Monitoring Rate is the rate at which the autoscaler will check for runtime changes in the rules. You can ignore the certificate stuff for now. It’s there in case you want to encrypt your rules.
  4. Next is the Service Information Store. This file describes the service that you want to monitor and automate. In it you tell the system which hosted service, which roles, and so on. The same configuration details as those for the rules file pertain.
  5. You can ignore the “Advanced Options” for now.
  6. When you’re done with the configuration tool, take a look at app.config. Down near the bottom is a section for <system.diagnostics>. In it, the Default listener is removed in a couple places. Unless you configure your diagnostics correctly, the effect of this is that the trace messages from the autoscaler will go nowhere. My advice: remove the lines that remove the default listener. This will cause the trace messages to go into the Output window of Visual Studio. As follows:
    (If you have read my previous blog post about Azure Diagnostics (link), you already know how to get the trace log to output to table storage, so this step isn’t necessary.)

For #6 above, I just commented the line that gets the CloudQueueMessage:

For #8 above, reference my earlier blog post (link) on configuration of diagnostics to switch on the CPU% performance counter. Here are the lines you’ll need to add to the DiagnosticMonitorConfiguration object:


For #9 above, there are a few things I want to point out.

  1. In the <subscription>, the certificate stuff points to where to find your management certificate in the WASABi host. WASABi knows where to find the matching certificate in your subscription once it has your subscription ID.
  2. In the <roles> elements, the alias attributes point at (match – case sensitive) the target attributes in your rules in #10 (coming next).
  3. In the <storageAccount> element you need to define the queue that you’ll be sampling and call it out by name. The alias attribute of the <queue> matches to the queue attribute of the <queueLength> element in the rules file.


For #10 above, this is the fun part. Smile

To show the operation of performance counters causing a scaling event, I set up a rule to scale down the web role if CPU% is less than 5%. Since I’m not driving any load to it, this will be immediately true. Scaling should happen pretty soon. To show the operation of queue depth causing a scaling event, I set up a rule to scale up if the queue gets greater than 5 deep. Then I push a few queue messages up by executing a few web role transactions (uploading pictures).

Most folks initially learning about autoscaling wisely think that we need to set up some kind of throttle so that the system doesn’t just scale up to infinity or down to zero because of some logic error. This is built in to WASABi. Here’s how this manifests in this example:


Next, we need to think about how to define the criteria that we’ll use to scale. We need some kind of way to designate which performance counter to use, how often it gets evaluated, and how it’s evaluated (min/max/avg, whatever). The role’s code might be outputting more than one counter – we need some way to reference them. Same goes for the queue. With WASABi, this is done with an “operand”. They are defined in this example as follows:


Here, the alias attributes are referenced in the rules we’re about to write. The queue attribute value “q1” is referenced in the Service Information Store file that we set up in step #10. The queueLength operand in this case looks for the max depth of the queue in a 5 minute window. The performanceCounter operand looks for the average CPU% over a 10 minute window.

You should be able to discern from this that autoscaling is not instantaneous. It takes time to evaluate conditions and respond appropriately. The WASABi documentation gives a good treatment of this topic.

Finally, we write the actual rules where the operands are evaluated against boundary conditions. When the boundary conditions are met, the rules engine determines that a scaling event should take place and implements the appropriate Service Management API to do it.


Note that the target names are case-sensitive. Note also the match between the operand name in the reactive rule, and the alias in the operand definition.

For #11. Run the autoscaler and the “app to be scaled”, upload 6 pictures. Because we’ve disabled the ingest of queue messages in the worker role, thumbnails won’t be generated. It might be easier to use really little pictures. Now sit back and wait about 10 minutes. After about 5 minutes you should start to expect the worker role to scale up. After about 10 minutes you should start to expect the web role to scale down. Watch the messages that are appearing in the Output or table storage. They offer an informative view into how WASABi works. Because the actual scaling of the roles causes all of them to go to an “updating…” status, you might not see the distinct events unless you spread them out with your time values.

Please note this is a sample and doesn’t include best practices advice on how often to sample performance counters and queues, how often to push the information to table storage, and so on. I leave that to the authoritative documentation included with WASABi.

<Return to section navigation list>

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds


No significant articles today.

<Return to section navigation list>

Cloud Security and Governance

Nathan Totten and Nick Harris produced Episode 78 - Security and Compliance on 4/29/2012:

Join Nate and Nick each week as they cover Windows Azure. You can follow and interact with the show at @CloudCoverShow.

In this episode, we are joined by Krishna Anumalasetty — Principal Program Manager for Compliance on Windows Azure — who discusses compliance, security, disaster recovery and privacy on Windows Azure. Krishna walks us through some of the common questions related to building compliant applications on Windows Azure and shows the new Windows Azure Trust Center.

In the News:

image_thumbIn the Tip of the Week, we discuss a blog post by Ranjith Ramakrishnan of Opstera. Ranjith give five tips to optimize your application and deployments to reduce your Windows Azure bill.

CLARIFICATION - Nathan mentioned HIPPA compliance several times as an example of a type of compliance certification. It should be noted that Windows Azure is not HIPPA compliant.

Follow @CloudCoverShow
Follow @cloudnick
Follow @ntotten

David Linthicum (@DavidLinthicum) asserted “When it comes to technology dependency and risk of legal compliance, the cloud is just like everything else” as a deck for his 2 more cloud myths busted: Lock-in and locked up article of 4/27/2012 for InfoWorld’s Cloud Computing blog:

imageThe world of cloud computing grows like a weed in summer, and many assumptions are being made that just aren't correct. I've previously exposed four cloud myths you shouldn't believe. Now it's time for me to climb up on my soapbox and correct a few more.

imageMyth 1: Cloud computing is bringing back vendor lock-in. The notion that using cloud computing features (such as APIs) created by one provider or another causes dreaded lock-in seems to be a common mantra. The reality is that using any technology, except the most primitive, causes some degree of dependency on that technology or its service provider. Cloud providers are no exception.

Here's the truth about technology, past, present, and future: Companies that create technology have no incentive to fly in close enough formation to let you move data and code willy-nilly among their offerings and those provided by their competitors. The cloud is no different in that respect.

We can talk about open source distributions and emerging standards until we're blue in the face, but you'll find that not much changes in terms of true portability. As long as technology and their service providers' profitability and intellectual property value trump data and code portability, this issue will remain. It's not a new situation.

Myth 2: Cloud computing use will put you in jail. Yes, you need to consider compliance issues when moving to any new platform, including private, public, and hybrid clouds. However, stating in meetings that moving data and processes to cloud-based platforms somehow puts you at risk for arrest is a tad bit dramatic, don't you think? Yet I hear that attention-getting claim frequently.

We've been placing data, applications, and processes outside of our enterprises for years, and most rules and regulation you find in vertical markets (such as health care and finance) already take this into account. Cloud computing is just another instance of using computing resources outside your span of control, which is nothing new, and typically both perfectly legal and not at all risky. Cut out the false drama as an excuse to say no.

<Return to section navigation list>

Cloud Computing Events

Eric Boyd (@EricDBoyd) reported on 5/1/2012 Chicago Windows Azure Kick Start on May 3rd:

imageI’ve been in Redmond all week hanging out with some awesome folks on the Microsoft campus. This evening, I will be heading back to Chicago to host a Windows Azure Kick Start. If you are new to the Cloud and Windows Azure, or would just like some time to get hands-on and create an app in Windows Azure, tomorrow’s Windows Azure Kick Start is for you. There are only 4 seats left and if you’d like to attend, you should register soon.

Register for the Windows Azure Kick Start

Tomorrow, May 3rd, you will get to spend the day with some of the nation’s leading cloud experts in learning how to build a web application that runs in Windows Azure. We will show you how to sign up for free time in the cloud, and how to build a typical web application using the same ASP.NET tools and techniques you already use today. We will explore web roles, cloud storage, SQL Azure, and common scenarios. We will save time for open Q&A, and even cover what should not be moved to cloud. This will be a hands-on event where you will need a laptop configured with the required pieces. We will have help onsite to get the right bits installed as well.

Lunch and prizes will be provided and you could get lucky and win a Kinect!

Microsoft – Chicago Office
200 E Randolph St
Chicago, Illinois 60601
9:00AM – 5:00PM

To make the best use of your time at the Windows Azure Kick Start Event, we recommend you prepare the following requirements before the event:

  • A computer or laptop: Operating Systems Supported: Windows 7 (Ultimate, Professional, and Enterprise Editions); Windows Server 2008; Windows Server 2008 R2; Windows Vista (Ultimate, Business, and Enterprise Editions) with either Service Pack 1 or Service Pack 2
  • Install the free Windows Azure SDK and required software using Web Platform Installer.
  • Setup a Free Windows Azure Platform Trial. If you have MSDN you should activate your MSDN Azure Benefits.
  • The sample code and handbook for the labs will be provided at the event.
  • Consider bringing a power strip or extension cord to stay charged all day.

Register for the Windows Azure Kick Start

Steve Plank (@plankytronixx) reported Event: Free UK Windows Azure Bootcamp: Online on 4/27/2012 (Missed when published):

imageWe’re going to be running an online version of the popular Windows Azure Bootcamp on the 21st May. We’re very aware that the London and Liverpool bootcamps due to run in Early May are now full so we thought by running a similar, slightly shortened version of the bootcamp through Lync, anybody who didn’t manage to get a place on the in-person bootcamps might benefit from a good deal of the same material delivered online. There will be hands-on labs and you will be expected to have a laptop with all the correct software installed. The registration link has very specific instructions for setting up a developer machine.

imageAlthough this video shows what goes on during an in-person bootcamp, you can get a general “feel” for what it will be like.

If you’d like to register, please take this link.

<Return to section navigation list>

Other Cloud Computing Platforms and Services

No significant articles today.

<Return to section navigation list>