Problems with Microsoft Codename “Data Explorer” - Aggregate Values and Merging Tables - Solved
Update 12/30/2011: Added details of creating ContentItemsDailySummary.csv or ContentItemsDailySummary.xlsx and graphs from the DailySummary table with the Data Explorer Add-in for Excel. See steps 11 through 18 at the end of this post.
Update 12/29/2011: Miguel Llopis replied on 12/28/2011 regarding my answer in the forum to Alejandro Lopez Lago about issues with the Merge action:
Please try using the "Optional Prefix" field in the Merge builder for either one of the two tables and let us know if that fixes the issue. We are currently working, as we speak, to improve usability of this Merge builder so any feedback is really welcome.
Adding a prefix with any field selected in either list box creates a usable table that can be fixed up by further modification. This is a cumbersome workaround that needs fixing (see steps 5 through 10 at the end of this post).
Update 12/28/2011: Alejandro Lopez Lago replied as follows to my thread about this problem in the Microsoft codename Data Explorer forum:
The error with List.Average occurs because it is trying to average a resource or task named "ToneReliability" instead of the column. To fix this, put square brackets around the name.
Steps 2 and 3 below have been updated to reflect Alejandro’s solution. This problem is solved, but the square-bracket requirement needs documentation.
My downloadable Codename “Social Analytics” WinForms Client Sample App automatically generates a summary Users\UserName\AppData\Local\ContentItems.csv file while retrieving rows of the VancouverWindow8 dataset from the Windows Azure Marketplace DataMarket. The summary file contains the following columns:
Column Name | Contents |
Published On | The date the ContentItem was published (time value removed) |
Tweet Count | The number of matched items (a.k.a., buzz) on the publish date |
Tones Positive | The number of matched items having a positive tone on the publish date |
Tones Negative | The number of matched items having a negative tone on the publish date |
Reliability Pos | The average of ToneReliability values for positive items on the publish date |
Reliability Neg | The average of ToneReliability values for negative items on the publish date |
The file is used to optionally recreate the graph when reopening the WinForm Sample App, as shown here:
The sample Microsoft Code “Data Explorer” mashup created in my Mashup Big Data with Microsoft Codename “Data Explorer” - An Illustrated Tutorial post of 12/27/2011 accurately replicates the layout of the WinForm sample’s DataGrid control:
by the ContentItems table snapshot in the Desktop client:
Note: The Summary column appears to the right of Tone Reliability in the WinForm UI and is hidden in the screen capture.
A local SQL Server SocialAnalyticsSnapshot database contains the ContentItems snapshot’s data:
It’s easy to generate a table that emulates the ContentItems.csv file by executing a series of T-SQL SELECT … INTO statements to create a table containing Published On and Tweet Count columns and then append Tones Positive and average Reliability Pos as well as Tones Negative and average Reliability Neg columns with Left Outer Joins. Following are the corresponding T-SQL statements:
Note: CalculatedToneId was the original name of the field and does not appear in ContentItems’ columns list. The looked up name, ToneValue, isn’t usable in the projection.
I intended to emulated the above process by creating the corresponding “Data Explorer” tables:
1. Successfully created a DailyTotals table by duplicating and renaming ContentItems, ChangedFormat by trimming time from Published On, and Summarized with the Count(_) function:
2. Attempted to cCreated a DailyPositives table with Positive count and average Tone Reliability to merge with DailyTotals:
Note: Usage of List.Average(ColumnName) is a topic of the Data Explorer Team’s The Data Explorer Formula Language post of 11/17/2011 and the spelling appears to be correct.
Update: As noted at the beginning of this post, square brackets must surround field names.
3. Repeated step 2 for Negatives and ReliabilityNeg: with a similar failure.
4. Attempted to merge DailyItems with DailyPositives to add Positives count, which did not succeed:
Selecting Published On in the left (unidentified) list resulted in what appeared to be the correct expression but threw an (unidentified) error:
Note: Selecting any column name from either list throws the error icon shown above. The error message is “Exception: A join operation must not result in a table with duplicate column names ("PublishedOn").”
Update 12/29/2011 9:00 AM PST: Workaround for Duplicate Column Name Error
5. Add a prefix to the Right Table’s rows, as suggested by Miguel Llopis, and select PublishedOn in either list box to produce the following interim result:
6. Click Done to display the table and click the Merge button again to create a Merge1 action, select DailyNegatives as the Right Table, specify Rt2 or the like as the right table prefix to add three more rows to the table:
Note: It appears that you don’t need to select any column name in the list box in this case.
6. Click Done, right-click the Merge1 button and rename the resource Daily Summary:
7. With Daily Summary selected, click the Hide Columns button and mark the Rt.PublishedOn and Rt2.PublishedOn check boxes:
8. Click Done, click the Reorder Columns button and move the Rt2.Negatives button to the right of Rt.Postives:
9. Click Done, click Rename Columns, and retype the column names without Rt and Rt2 prefixes to match those of the WinForm client:
Note: Press Enter to set the rightmost column name.
10. Click Done to display the final DailySummary result, which is identical in format to that generated by the WinForm client:
Note: If you add these modifications to the Cloud (hosted) version of the project and save a snapshot, you can enable users to download the same CSV file as the WinForm version saves.
Update 12/29/2011 9:00 AM PST: Creating ContentItemsDailySummary.csv or ContentItemsDailySummary.xlsx Files from the DailySummary table with the Data Explorer Add-in for Excel
Running the DataExplorer.msi file to install the Desktop Client also installs an Data Explorer Excel Add-in to Excel 2007 or 2010. The Add-in adds a Data Explorer group to Excel’s Data ribbon:
11. Click the Import Data button to open the eponymous dialog and select the mashup and resource to use, SocialAnalyticsMashup2 and Daily Summary for this example:
12. If you have Excel open, click OK to select whether to import the data to an existing (open) worksheet or a new worksheet:
13. Click OK to populate the worksheet with the mashup data:
14. Click the File tab, choose Save As to open the Save As dialog and choose CSV (Comma Delimited) (*.csv) or Excel Workbook (*.xlsx) to save the file:
15. To make an overall comparison of the summary data with that of the WinForm client, create a graph of the Tweet Counts, Tones Positive and Tones Negative:
Note: The first valid data appears to occur on 10/27/2011, about two weeks after the Data Explorer team announced availability of the private CTP on 10/12/2011.
16. Double click the graph’s abscissa (y-axis) to open the Format Axis dialog, change its scale to logaritmic to math the WinForm client’s scale:
17. Click Close to display the reformatted graph:
18. Optionally, select and right-click the Tones Positives line, choose Format Data Series to open the Format Data Series Dialog, select the Line Color option, open the Color picker, and select Green. Do the same for the Tones Negative line, but select Red:
Conclusion: The present design of the Merge action needs considerable improvement, especially as to specifying merged projections. Otherwise, Data Explorer behaves as expected.
See my Microsoft Codename “Data Explorer” Cloud Version Fails to Save Snapshots of Codename “Social Analytics” Data post of 12/27/2011 for details of another Data Explorer problem that Microsoft is investigating.
0 comments:
Post a Comment