Saturday, November 19, 2011

My Microsoft Codename “Social Analytics” Windows Form Client Detects Anomaly in VancouverWindows8 Dataset

I just completed a new feature that enables users to store and display daily Tweet, positive Tone and negative Tone count as well as average positive and negative ToneReliability data in an Excel-compatible Comma-Separated Value (ContentItems.csv) file.

Updated 11/19/2011 9:30 AM PST: Rate of new Tweets added appears to be returning to normal:


Notice the added display of Days, Average Tweets/Day and Average Tones/Day.

imageSampling Tweets about Windows 8 in DataSift’s (@DataSift) Tweet-stream indicated a rate of about two/minute (2,880/day) on 11/19/2011 at 9:00 AM:


I’m not sure why the salience (similar to Codename “Social Analytics” Tone) is positive, as indicated by the smiley, based on the Tweet’s content. Nor can I determine why DataSift determined the language of rvackooij’s Tweet to be French. The “ij” at the end of his alias is corresponds to “y” in Dutch and his last name is Ackooij. The image symbol indicates the author’s Klout rating; the human figure represents demographic data.

Using DataSift’s OnDemand pricing calculator with Codename “Social Analytics” 180 tweets per hour for the past 22 days as number of DataSift Interactions per Hour, the the cost per month to retrieve a similar amount of data would be about US$160:


For more information about DataSwift’s data analytics offering, see Leena Rao’s DataSift Launches Powerful Twitter Data Analysis And Business Intelligence Platform TechCrunch article of 11/16/2011.

Updated 11/18/2011 4:00 PM PST: Results for 3/18/2011 show continuing abrupt decrease in new Tweets added to the VancouverWindows8 dataset on 11/16 and 11/17/2011, as well as a marked reduction in the ratio of positive to negative messages (highlighted below):


When you start the Microsoft Codename “Social Analytics” Windows Form Client app after having saved a ContentItems.csv file, the following message box appears:


Note: The file extension was changed to *.csv after the above capture was taken.

Clicking Yes runs the program and creates a new CSV file. Clicking No reads the existing ContentItems.csv file, populates the DataGrid control with its values, and regenerates the graph as shown here:


Notice that the Tweet count for yesterday (11/16/2011, highlighted in the DataGrid) is 819 versus an average of about 4,000 Tweets/day for the past 20 days.

For reference, here’s a screen capture of the project after downloading 100,000 data rows from the Team’s Windows Azure Marketplace DataMarket site:


I’ve asked the Codename “Social Analytics” team if this reflects an actual dramatic decrease in Windows 8 buzz or is a sampling artifact. I’ll report when I learn more.