Tuesday, December 27, 2011

Mashup Big Data with Microsoft Codename “Data Explorer” - An Illustrated Tutorial

Microsoft described Codename “Data Explorer” in its 10/12/2011 product announcement as follows:

image"Data Explorer” is a new concept which provides an innovative way to gain new insights from the data you care about. With "Data Explorer” you can discover relevant data sources, enrich your data by combining with other sources, and then publish and share your insights with others.

You can get started with “Data Explorer” by identifying the data you care about from the sources you work with (e.g. Excel spreadsheets, files, SQL Server databases, Windows Azure Marketplace, etc.). [Emphasis added.]

The dataset for this tutorial is a Windows Azure Marketplace DataMarket OData document from Microsoft’s Social Analytics team. I’ve been working with the API for Microsoft Codename “Social Analytics” since its introduction on 10/25/2011. My Microsoft tests Social Analytics experimental cloud article of 12/1/2011 for SearchCloudComputing.com describes how Social Analytics datasets enable discovery of consumer interest in and sentiment about products (Windows 8) or individuals (Bill Gates.)

imageNote: The design of OakLeaf tutorials for Codename “Data Explorer” is modeled on Beth Massi’s (@bethmassi) tutorials for Visual Studio LightSwitch because both products are intended for an audience that includes non-programmers.

Prerequisites: You need an invitation to Social Analytics’ private beta, which contains an invitation code (Windows Azure Marketplace DataMarket Account Key) to gain access to the Social Analytics OData Windows 8 dataset used in this tutorial. Apply for it here; select the Windows 8 slice for the VancouverWindows8 dataset. (“Vancouver” was Social Analytics’ earlier codename.) The invitation is required to use the Cloud (hosted in Windows Azure) implementation also. You must have Office 2010 SP1 or 2007 installed on the machine running the Data Explorer software.

Tip: If you have a Social Analytics Account Key, you can open and save the completed Cloud version of this tutorial’s sample app. See step 38 near the end of this post.

My downloadable Codename “Social Analytics” WinForms Client Sample App displays individual Tweets, Facebook posts, and occasional Stack Overload questions in a DataGrid control and summarizes daily counts of items referring to Windows 8 as well as their positive and negative sentiment (tone) in a graph, shown here for 12/25/2011 and the preceding 21 days:

imageClick figures to display full-size (1024 x 768 px) screen captures.

The following sections describe creating a Microsoft Codename “Data Explorer” desktop mashup that emulates most features of my sample Winform client app.

Obtaining a Social Analytics Account Key, Downloading the “Data Explorer” Desktop Client and Adding the Windows 8 Dataset

1. When you receive the Social Analytics invitation email, follow its instructions to enable access to the VancouverWindows8 dataset.

2. Log into the Windows Azure Marketplace DataMarket (Marketplace), click the My Account button and the navigation pane’s Account Keys link to open the Account Keys page:


3. Copy the Default Account Key’s Base64-encoded value to a safe place for reuse in step 10.

4. When you receive the Data Explorer invitation email, follow its instructions to activate your account for the hosted Cloud Client.

Note: The Cloud Client lets you publish your mashup to share it with others but isn’t required to complete this tutorial. The only difference between the Cloud and Desktop clients is that you save Snapshots to SQL Azure instead of SQL Server databases in step 32 and have the option to publish (host) them for private or public consumption in step 37 and later.

Tip: You can’t publish a populated snapshot with the Cloud version that was current when this post was written.

5. Visit the Microsoft Codename “Data Explorer” landing page, watch a one-minute video, and read about the product’s features:


6. Click the Download "Data Explorer" Desktop Client link to open the SQL Azure Labs Codename "Data Explorer" Client (Labs Release) page, download the x86 or x64 version of the DataExplorer.msi file which installs the Data Explorer workspace as well as an Office plugin that integrates Data Explorer into Excel.

Note: The installer version must correspond to the version of Microsoft Office 2010 SP1 or Office 2007 installed.

7. Run it to install the Desktop client, which adds a Microsoft Codename “Data Explorer” node to your All Programs menu with Microsoft Codename “Data Explorer” Home Page and Microsoft Codename “Data Explorer” choices.

8. Click the Microsoft Codename “Data Explorer” choice image to start the Desktop Client, click New to open a dialog to name your first mashup, and type SocialAnalyticsDesktop or the like:


9. Click OK to open the Add Data page:


Tip: Choose Data Explorer Mashup to open a *.import file if you want to start a new mashup from an existing Desktop mashup saved on your machine or accessible on your LAN. The Cloud version disables the Link To File button.

10. Click Windows Azure Marketplace to open a WindowsAzureMarketplace1 placeholder, mark the Specify a Windows Azure Marketplace Account Key option, and paste the Account Key you copied in step 3:


Note: The image and image buttons open a dialog for sending positive or negative feedback to the Data Explorer team. image is the help button.

Learn About the Formula Language (M), Select a Feed and Collection to Serve as the Primary Data Source, Hide Unwanted Fields, and Reorder Column Sequence

11. Click the image button and choose Data Explorer Help to open the Data Explorer Help page. Click the Help page’s Data Explorer Formula Language Primer link to display Help/Language Primer Page:


Data Explorer uses a Microsoft modeling language, called M, as its formula language. M is a domain-specific language (DSL) that has been in limbo since the demise of the “Oslo” program. Oslo was a Microsoft development program intended to “apply model-driven principles to building applications and services,” primarily with SQL Server. Microsoft’s Don Box posted an Update on SQL Server Modeling CTP (Repository/Modeling Services, "Quadrant" and "M") to MSDN’s Model Citizen blog on 9/22/2010. In it, he said, inter alia:

image… Given the increasing adoption of both OData and EDM, we have decided to focus our investments in those technologies to make our modeling vision a reality. One important aspect of that focus is that we will not bring “Oslo” repository to market. We believe that taking a loosely coupled, federated approach using OData and EDM will ultimately get more models exposed sooner than an approach based on building a common repository database.

We created a visual tool codenamed “Quadrant” to help people query, update and visualize information that is stored in SQL Server databases. As with the “Oslo” repository, customers told us that they wanted to work with information from a variety of sources, not just data stored in an RDBMS. Customers also told us that they wanted the experience to be native to the tools they are already using; specifically either Visual Studio or Microsoft Office. Given this feedback, we are not bringing “Quadrant” to market. Instead, we will work to make the experience with OData and EDM in Visual Studio and Microsoft Office even better.

Finally, we created a language codenamed “M” for defining schema, constraints, queries, and transformations. While we used “M” to build the “Oslo” repository and “Quadrant,” there has been significant interest both inside and outside of Microsoft in using “M” for other applications. We are continuing our investment in this technology and will share our plans for productization once they are concrete.

M’s syntax is similar to Excel’s formula language. Examples of M statements appear in this tutorial as captured images:


12. Click Continue to open a list of feeds (datasets) available to the Windows Live ID you use to log into the Marketplace and select the Microsoft Codename “Social Analytics” feed:



Note: If the Filter, Order, Column Names, and Transform tools icons don’t appear as shown above, click the More Tools button to display them and change the button name to Less Tools. Column widths are fixed in the Client version used to write this tutorial.

13. Click the + symbol in the selected Feeds cell to add a Feeds button to the navigation pane and display a list of Data Analytics collections. Select the ContentItems collection:


Tip: Periodically open the SocialAnalyticsDesktop list and select Save to save your work to the local or hosted *.import file at the conclusion of each step.

14. Click the ContentItems’ + symbol to add a ContentItems button to the navigation pane and display the first 17 or more rows of the data source, depending on your display resolution:



Note: As you compose the final data presentation, named steps appear in a series of buttons under the tools. You can click any button to redisplay the data in the selected state. Right-clicking most buttons opens a menu with Edit, Rename and Delete choices.

15. Click the Filter group’s Show Columns button to open a grid with check boxes to select which columns to display. To match the Winforms Sample App’s DataGrid control, mark the Id, Title, Summary, PublishedOn, Tone, CalculatedToneId, ToneReliability and ContentTypeId columns only:



Note: Alternatively, the Hide Columns tool lets you mark the checkboxes of columns to hide.

16. Click Done when finished selecting columns to show and click the Orders group’s Reorder Columns button to add a set of sliding buttons. Drag the ContentItemTypeId button to the right of the Id column:



17. Click Done to establish the new column order and hide the column-header buttons.

Sort the ContentItems Collection and Remove Rows with Null Tone Values by Filtering

The Winforms Sample app orders the ContentItems collection by PublishedOn in descending DateTime order, which reflects the order of the stream of tweets and posts. To apply this sort order to the collection:

18. Click the Order group’s Sort Rows button to open a list box for each column with --None--, Ascending, and Descending choices. Select Descending above the PublishedOn column:



19. Click Done to save the sort order, which displays null values for CalculatedToneId and ToneReliability in the latest four rows as shown above.

20. To remove the items with null Tone values, click the Filter Rows button, select the CalculatedToneId column and the not-equal (<>) operator, type null in the Value text box, and click Add Requirement:



Tip: You can add another requirement with an and clause by specifying it and clicking Add Requirement again or an or clause by clicking Add Alternative.

21. Click Done to apply the filter:



Add Lookup Columns for ContentItemName and ToneValue

One of Data Explorer’s most important features is the ability to add lookup columns to display related information from other data sources. For example, the Winform client displays ContentItemName (Tweet, Retweet, Reply, etc.) in the second column based on the value of the data source’s ContentItemId column. The ContentItemTypes enumeration provides the lookup data source with ContentItemId and ContentItemName fields.

22. To add ContentItemTypes as a new resource (table), click to open the Feeds placeholder, right-click the ContentItemTypes icon and choose Add As New Resource:


Tip:  Use As Next Task is the default, as you saw in step 14.

23. Click the Insert Column button to open a gallery of column types to insert:


24. Choose Lookup Column to open the editing pane, type ContentTypeName as the New Column name, select ContentItemTypes as the table containing the lookup data and Name as the column value to be retrieved:



25. Click Done to add the lookup column in the rightmost position.

Note:  The Tones table provides lookup data for the CalculatedToneId column. The Winform client doesn’t substitute a Tone value for CalculatedToneId because the app has a list of Tones table rows. If you want to translate CalculateToneId values, add the Tones collection as a new table resource and add a ToneValue lookup list.

26. Click the Reorder Columns tool to add the moveable buttons to the edit pane, move ContentTypeName to the right of ContentTypeId and, if you added it, ToneValue to the right of CalculatedToneID.

27. Click Done to complete the reordering process and the the Hide Columns tool. Mark the check boxes for ContentTypeID and CalculateToneId and click Done to hide the columns. Your completed mashup appears as shown here:


Note: Summary values appear only for Facebook, Stack Overload and blog posts.

Save a Snapshot of Your Mashup to Local Storage and Use It as a New Data Source

A snapshot of a mashup preserves the design and data that’s current when you take the snapshot. The original mashup and snapshots are saved to your My Documents\Data Explorer\My Workspace folder, as shown here with two snapshots:


Note: Workspace.zip is a hidden archive

28. Click the My Workspace button in the header to return to the My Workspace page, which displays a tile for your current mashup, SocialAnalyticsDesktop, as well as tiles for previous mashup snapshots you saved. With your current mashup selected, you can preview live (current) results by clicking one of the navigation pane’s placeholders:


29. Click the Snapshot link to replace the preview with a Create Snapshot pane. Edit the Snapshot Name textbox to provide as you want:


30. Click the image button to add a tile for the new snapshot and replace the Create Snapshot pane with a Configure Snapshot pane:


31. Use SQL Server Management Studio (SSMS) 2008 R2 [Express] or later to add an empty database to a local or networked SQL Server instance.

Tip: Data Explorer won’t add a new database for you.

32. Click the Database icon to add a Select Storage pane. Type the Server Name, Database Name you create in step 31, and a unique Schema Name, Snapshot2 for this example:


Note: The Please Specify How to Connect dialog opens when you click one of the arrows or click the Update button.

33. Click Update to replace the lower pane with:


Note: You might be asked to provide your Account Key for Microsoft Codename “Social Analytics” data if this is the first snapshot you’ve created (see step 40 for an example).

34. Click the Start image button to start the snapshotting process, which displays a Snapshot Is Running … progress pane:


Note: Downloading ContentItems from the Windows Azure Marketplace takes several minutes to complete, depending on your Internet connection speed:


35. Click to select the tile for the newly completed snapshot and click the ContentItems button to preview the table:


36. Double-click the newly created snapshot tile to use it as a data source for a new mashup:



Note: Data Explorer adds an integer row Index Column (refer to step 23) to substitute for Id as the primary key for the table, as shown here in SSMS:


For more information about primary key differences between Desktop and Cloud snapshot versions, see my Microsoft Codename “Data Explorer” Cloud Version Fails to Save Snapshots of Codename “Social Analytics” Data post of 12/26/2011.

Publishing a Publicly Accessible Mashup with the Cloud Version

When this post was written, it was impossible to populate and publish a snapshot created with Data Explorer’s Cloud (hosted) version, but you can publish the mashup itself. When you complete the preceding steps in the Cloud version’s UI and click the My Workspace button, the page appears as follows:


37. Click the Publish button to publish the replace the Preview Live Results pane with:


38. Click the https://ws24675086.dataexplorer.sqlazurelabs.com/Published/SocialAnalyticsMashup link to verify that you can access the published mashup, which displays the following page:


39. Click the Mashup button, select Save, and open SocialAnalyticsMashup.import in your Downloads library. Click Content items to display the Specify How to Connect page, which is the same as that in step 10.

40. Select the Specify an OData Feed Key option and past your Account Key into the Feed Key text box:


41. Click Continue and More Tools to display the following page, which you can use as a data source for another mashup:


Tip: Be patient. Downloading the data from the Windows Azure Marketplace might take 10 minutes or more.

imageMy (@rogerjenn) Problems with Microsoft Codename “Data Explorer” - Aggregate Values and Merging Tables - Solved post of 12/29/2011 describes how to emulate the WinForms client’s daily summary data, saved as ContentItems.csv, for creating and recreating the graph of Tweet Count (a.k.a., “Buzz”), Tones Positive, Tones Negative, [Tone] Reliability Pos[itive], and [Tone] Reliabilty Neg[itive] values.

Stay tuned for more Data Explorer tutorials.