Tuesday, November 15, 2011

Microsoft Codename “Social Analytics” ContentItems Missing CalculatedToneId and ToneReliability Values

imageIn the process of finishing a SocialAnalyticsWinFormsSample C# application, I discovered on 11/11/2011 that members of the VancouverWindows8 dataset’s ContentItems collection began displaying null values for CalculatedToneId and ToneReliability values on 11/10/2011 at about 7:12:25 PM UTC.

Update 11/15/2011 9:00 AM PST: New version of the sample Windows form client application with daily Calculated Tone Positive and Negative counts added. See end of post.

Update 11/14/2011 1:45 PM PST: The CalculatedToneId and ToneReliability values reappeared at about 1:35 PM PST. See end of post.

imageHere’s the form’s DataGridView control (with a descending sort on the Published On column) displaying the last of 5,000 items with a value:

image

imageCalculatedToneId and ToneReliability data are important when performing sentiment analysis or opinion mining on a particular topic, Windows 8 in this instance. Microsoft is one of the sponsors of an Opinion Mining, Sentiment Analysis, and Opinion Spam Detection project of Bing Liu and and Minqing Hu at the Department of Computer Science, University of Illinois at Chicago (UIC).


According to a Tweet from Richard Orr (@richorr), the Data Analytics Team is investigating the matter now:

image

My (@rogerjenn) application will be completed and available for download after Microsoft’s Social Analytics Team fixes the problem.


Update 11/14/2011 1:45 PM PST: The CalculatedToneId and ToneReliability values reappeared at about 1:35 PM PST, as illustrated in this screen capture of the the test application in progress:

image

The screen capture shows a later version of the chart, which adds daily count data for Tone Positives and Negatives. Execution time values are erroneous due to an apparent bug in the System.Diagnostics.Stopwatch object which causes it to reset at random intervals. I’m considering adding code to add average tone reliability values and optional labels to the points.

Click the image for a 1,024 x 768 pixel version. Notice that the Calibri text font isn’t antialiased, despite AntiAliasing = All and TextAntiAliasing = High settings. Text elsewhere in the form is antialiased as expected.

Following is the graph for all 100,000 requested rows:

image

The abrupt increase in Tweet count per day that occurred on 10/27/2011 is believed to be a data sampling artifact.


For more details about Codename “Social Analytics,” see:


0 comments: