Friday, December 30, 2011

Problems with Microsoft Codename “Data Explorer” - Aggregate Values and Merging Tables - Solved

Update 12/30/2011: Added details of creating ContentItemsDailySummary.csv or ContentItemsDailySummary.xlsx and graphs from the DailySummary table with the Data Explorer Add-in for Excel. See steps 11 through 18 at the end of this post.


Update 12/29/2011: Miguel Llopis replied on 12/28/2011 regarding my answer in the forum to Alejandro Lopez Lago about issues with the Merge action:

imagePlease try using the "Optional Prefix" field in the Merge builder for either one of the two tables and let us know if that fixes the issue. We are currently working, as we speak, to improve usability of this Merge builder so any feedback is really welcome.

Adding a prefix with any field selected in either list box creates a usable table that can be fixed up by further modification. This is a cumbersome workaround that needs fixing (see steps 5 through 10 at the end of this post).


Update 12/28/2011: Alejandro Lopez Lago replied as follows to my thread about this problem in the Microsoft codename Data Explorer forum:

The error with List.Average occurs because it is trying to average a resource or task named "ToneReliability" instead of the column. To fix this, put square brackets around the name.

Steps 2 and 3 below have been updated to reflect Alejandro’s solution. This problem is solved, but the square-bracket requirement needs documentation.


My downloadable Codename “Social Analytics” WinForms Client Sample App automatically generates a summary Users\UserName\AppData\Local\ContentItems.csv file while retrieving rows of the VancouverWindow8 dataset from the Windows Azure Marketplace DataMarket. The summary file contains the following columns:

Column Name Contents
Published On The date the ContentItem was published (time value removed)
Tweet Count The number of matched items (a.k.a., buzz) on the publish date
Tones Positive The number of matched items having a positive tone on the publish date
Tones Negative The number of matched items having a negative tone on the publish date
Reliability Pos The average of ToneReliability values for positive items on the publish date
Reliability Neg The average of ToneReliability values for negative items on the publish date

The file is used to optionally recreate the graph when reopening the WinForm Sample App, as shown here:

image

The sample Microsoft Code “Data Explorer” mashup created in my Mashup Big Data with Microsoft Codename “Data Explorer” - An Illustrated Tutorial post of 12/27/2011 accurately replicates the layout of the WinForm sample’s DataGrid control:

image

by the ContentItems table snapshot in the Desktop client:

image

imageNote: The Summary column appears to the right of Tone Reliability in the WinForm UI and is hidden in the screen capture.

A local SQL Server SocialAnalyticsSnapshot database contains the ContentItems snapshot’s data:

image 

It’s easy to generate a table that emulates the ContentItems.csv file by executing a series of T-SQL SELECT … INTO statements to create a table containing Published On and Tweet Count columns and then append Tones Positive and average Reliability Pos as well as Tones Negative and average Reliability Neg columns with Left Outer Joins. Following are the corresponding T-SQL statements:

image

image

Note: CalculatedToneId was the original name of the field and does not appear in ContentItems’ columns list. The looked up name, ToneValue, isn’t usable in the projection.

image

image

I intended to emulated the above process by creating the corresponding “Data Explorer” tables:

1. Successfully created a DailyTotals table by duplicating and renaming ContentItems, ChangedFormat by trimming time from Published On, and Summarized with the Count(_) function:

image

image

image

image

2. Attempted to cCreated a DailyPositives table with Positive count and average Tone Reliability to merge with DailyTotals:

image

image

image

Note: Usage of List.Average(ColumnName) is a topic of the Data Explorer Team’s The Data Explorer Formula Language post of 11/17/2011 and the spelling appears to be correct.

image

Update: As noted at the beginning of this post, square brackets must surround field names.

3. Repeated step 2 for Negatives and ReliabilityNeg: with a similar failure.

image

4. Attempted to merge DailyItems with DailyPositives to add Positives count, which did not succeed:

image

Selecting Published On in the left (unidentified) list resulted in what appeared to be the correct expression but threw an (unidentified) error:

image

Note: Selecting any column name from either list throws the error icon shown above. The error message is “Exception: A join operation must not result in a table with duplicate column names ("PublishedOn").”


Update 12/29/2011 9:00 AM PST: Workaround for Duplicate Column Name Error

5. Add a prefix to the Right Table’s rows, as suggested by Miguel Llopis, and select PublishedOn in either list box to produce the following interim result:

image

image

6. Click Done to display the table and click the Merge button again to create a Merge1 action, select DailyNegatives as the Right Table, specify Rt2 or the like as the right table prefix to add three more rows to the table:

image

image

Note: It appears that you don’t need to select any column name in the list box in this case.

6. Click Done, right-click the Merge1 button and rename the resource Daily Summary:

image

7. With Daily Summary selected, click the Hide Columns button and mark the Rt.PublishedOn and Rt2.PublishedOn check boxes:

image

image

8. Click Done, click the Reorder Columns button and move the Rt2.Negatives button to the right of Rt.Postives:

image

image

9. Click Done, click Rename Columns, and retype the column names without Rt and Rt2 prefixes to match those of the WinForm client:

image

image

Note: Press Enter to set the rightmost column name.

10. Click Done to display the final DailySummary result, which is identical in format to that generated by the WinForm client:

image

Note: If you add these modifications to the Cloud (hosted) version of the project and save a snapshot, you can enable users to download the same CSV file as the WinForm version saves.


Update 12/29/2011 9:00 AM PST: Creating ContentItemsDailySummary.csv or ContentItemsDailySummary.xlsx Files from the DailySummary table with the Data Explorer Add-in for Excel

Running the DataExplorer.msi file to install the Desktop Client also installs an Data Explorer Excel Add-in to Excel 2007 or 2010. The Add-in adds a Data Explorer group to Excel’s Data ribbon:

image

11. Click the Import Data button to open the eponymous dialog and select the mashup and resource to use, SocialAnalyticsMashup2 and Daily Summary for this example:

image

12. If you have Excel open, click OK to select whether to import the data to an existing (open) worksheet or a new worksheet:

image

13. Click OK to populate the worksheet with the mashup data:

image

14. Click the File tab, choose Save As to open the Save As dialog and choose CSV (Comma Delimited) (*.csv) or Excel Workbook (*.xlsx) to save the file:

image

15. To make an overall comparison of the summary data with that of the WinForm client, create a graph of the Tweet Counts, Tones Positive and Tones Negative:

image

Note: The first valid data appears to occur on 10/27/2011, about two weeks after the Data Explorer team announced availability of the private CTP on 10/12/2011.

16. Double click the graph’s abscissa (y-axis) to open the Format Axis dialog, change its scale to logaritmic to math the WinForm client’s scale:

image

17. Click Close to display the reformatted graph:

image

18. Optionally, select and right-click the Tones Positives line, choose Format Data Series to open the Format Data Series Dialog, select the Line Color option, open the Color picker, and select Green. Do the same for the Tones Negative line, but select Red:

image


Conclusion: The present design of the Merge action needs considerable improvement, especially as to specifying merged projections. Otherwise, Data Explorer behaves as expected.


See my Microsoft Codename “Data Explorer” Cloud Version Fails to Save Snapshots of Codename “Social Analytics” Data post of 12/27/2011 for details of another Data Explorer problem that Microsoft is investigating.


0 comments: