Wednesday, November 02, 2011

Using the Microsoft Codename “Social Analytics” API with Excel PowerPivot and Visual Studio 2010

Update 1/2/2012: Following are links to other OakLeaf blog posts about Codename “Social Analytics”:


Update 11/5/2011: Microsoft’s Richard Orr said Brad Calder’s Ingestion Engine isn’t used to generate the Twitter feed. See below.

The SQL Azure Labs team announced the availability of a new Microsoft Codename “Social Analytics” program on 10/25/2011. See my SQL Azure Labs Unveils Codename “Social Analytics” Skunkworks Project post of 11/1/2011 for more details.


Table of Contents
  • Introduction
  • Testing the Windows 8 DataFeed with the Excel 2010 Power-Pivot Add-in
  • Details about a Possibly Related “Ingestion Engine” Data Source
  • Creating a Service Reference to the VancouverWindows8 Dataset in Visual Studio 2010

Introduction

The Microsoft Social Analytics Team described Getting Access to the Social Analytics Lab on 10/26/2011. Following are two sections from this post that relate to the API:

Using Social Analytics API

When you are ready to start using the Social Analytics API with data about Windows 8, start with the following steps:

1. Get your account key here. <We provide programmatic access to Social Analytics data through Windows Azure DataMarket. This step provides you with the secret (account key) linking your LiveID to the Windows 8 data>

2. Copy your account key to use in LinqPad, PowerPivot or Visual Studio.

3. Review instructions for using LinqPad, PowerPivot or C# to access your dataset <We want you to be able to use the API as part of your usual development process. Let us know what we can do to make it simpler and easier for you to achieve your social web integration goals.>

Note: You may see an “Explore this Dataset” option on the DataMarket offer page. This explorer is not compatible with the Social Analytics source and should not be used to explore the data.

For additional information, you can read the full API documentation.

Additional Resources

Here are some other links that you may find useful:

That's basically it!

If you click on one these URLs and don’t have access, go to this "shortcut" to request access.

I received my invitation key on 11/1/2011.

Testing the Windows 8 DataFeed with the Excel 2010 Power-Pivot Add-in

Here’s how I tested the API with the PowerPivot add-in for Excel 2010:

1. Download and install the x86 or x64 version of Excel PowerPivot from the download page.

image2. Launch Excel 2010, click the PowerPivot tab and the PowerPivot Window button to open the PowerPivot for Excel button.

image3. Click the Get External Data group’s From Azure DataMarket button to open the Table Import Wizard. Type Bill Gates or Windows 8 DataFeed (for this example) as the Friendly connection name and the corresponding URL in the Azure DataMarket Dataset URL text box:

Note: Project “Vancouver” was the original codename for the “Social Analytics” project.

4. Click Find to open your Windows Azure Marketplace DataMarket Account page, copy it to the Account Key text box and mark the Save My Account Key check box:

image

5. Click Test Connection to verify your entries and click OK to dismiss the Connection Succeeded message.

6. Click Next to open the Wizard’s Select Tables and Views dialog, which displays a list of the data source’s collections, and mark the Content Items check box:

image

7. Click the Preview & Filter button to open the Preview Selected Table dialog and, to improve readability, clear the checkboxes for all columns except Title, Summary, HtmlUrl, PublishedOn, CalculatedToneID, ToneReliability, ImageUrl, and ContentTypeItemId:

image 

Note: Neither Bing nor Google search returned a hit for “CalculatedToneId,” which I find strange. Following is the content of the Tones collection:

image

I’m interested in the calculation method for CalculatedToneId. Microsoft has conducted much research into “sentiment analysis,” to which CalculatedToneID is related. For more details about recent work, see the Target-dependent Twitter Sentiment Classification paper presented by Long Jiang, Mo Yu, and Ming Zhou of Microsoft Research Asia, Beijing, China and Xiaohua Liu and Tiejun Zhao of the School of Computer Science & Technology, Harbin Institute of Technology, Harbin, China.

See the later OData Messages section for the XML document behind the above preview.

8. Click OK to return to the Select Tables and Views dialog, which now displays Applied Filters in the Filter Details column:

image

9. Click Finish to open the Importing dialog:

image

10. After a few minutes, Retrieved #0,000 Rows status messages begin to appear. When all available rows are downloaded, the Importing dialog updates:

image

Note: If you click Stop Import before downloading all available rows, you might not find a table when you close the dialog.

11. Click Close to display the rows in a worksheet and adjust the column widths appropriately. Apply a sort order on a column, such as PublishedOn:

image

Note: If you apply a descending sort, you’ll notice that the data is quite up-to-date. The second row’s tweet was about 15 minutes old when I checked it after the download completed (the image below was captured about 13 minutes later):

image

Details about a Possibly Related “Ingestion Engine” Data Source

Update 11/5/2011: Richard Orr (@richorr) of the Social Analytics Team responded to my question with the following tweet on 11/5/2011:

image

Brad Calder and 26 other members of the Windows Azure Storage team presented a Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency technical paper at the 23rd ACM Symposium on Operating Systems Principles held in Cascais, Portugal on 10/23 through 10/26/2011. The “Introduction” contains a description of a real-time Ingestion Engine:

Windows Azure Storage (WAS) is a scalable cloud storage system that has been in production since November 2008. It is used inside Microsoft for applications such as social networking search, serving video, music and game content, managing medical records, and more. In addition, there are thousands of customers outside Microsoft using WAS, and anyone can sign up over the
Internet to use the system.

WAS provides cloud storage in the form of Blobs (user files), Tables (structured storage), and Queues (message delivery). These three data abstractions provide the overall storage and workflow for many applications. A common usage pattern we see is incoming and outgoing data being shipped via Blobs, Queues providing the overall workflow for processing the Blobs, and intermediate service state and final results being kept in Tables or Blobs.

An example of this pattern is an ingestion engine service built on Windows Azure to provide near real-time Facebook and Twitter search. This service is one part of a larger data processing pipeline that provides publically searchable content (via our search engine, Bing) within 15 seconds of a Facebook or Twitter user’s posting or status update. Facebook and Twitter send the raw public content to WAS (e.g., user postings, user status updates, etc.) to be made publically searchable. This content is stored in WAS Blobs. The ingestion engine annotates this data with user auth, spam, and adult scores; content classification; and classification for language and named entities. In addition, the engine crawls and expands the links in the data. While processing, the ingestion engine accesses WAS Tables at high rates and stores the results back into Blobs. These Blobs are then folded into the Bing search engine to make the content publically searchable. The ingestion engine uses Queues to manage the flow of work, the indexing jobs, and the timing of folding the results into the search engine. As of this writing, the ingestion engine for Facebook and Twitter keeps around 350TB of data in WAS (before replication). In terms of transactions, the ingestion engine has a peak traffic load of around 40,000 transactions per second and does between two to three billion transactions per day (see Section 7 for discussion of additional workload profiles). … [Emphasis added.]

From the Conclusions section:

… As our examples demonstrate, the three storage abstractions, Blobs, Tables, and Queues, provide mechanisms for storage and workflow control for a wide range of applications. Not mentioned, however, is the ease with which the WAS system can be utilized. For example, the initial version of the Facebook/Twitter search ingestion engine took one engineer only two months from the start of development to launching the service. This experience illustrates our service's ability to empower customers to easily develop and deploy their applications to the cloud. … [Emphasis added.]

My assumption is that a Windows Azure worker process could make the conversion from WAS Tables to OData’s AtomPub format. The fact that the “engine crawls and expands the links in the data” makes related entities accessible.

OData Messages

imageAll data delivered from the Windows Azure Marketplace DataMarket is formated as AtomPub documents. Here’s IE9 displaying the first part of the collections list at https://api.datamarket.azure.com/Vancouver/VancouverWindows8/:

image

Note: For more information on OData, see the OData.org site.

Here’s the first part of the payload for a request for the Tones collection (GET https://win8.social.azure.com/VancouverAuthenticatedApi.svc/Tones) shown in a Display Selected Table dialog above as provided by Fiddler2:

image

Notice the link to the CalculatedToneContentItems collection.

Creating a Service Reference to the VancouverWindows8 Dataset in Visual Studio 2010

The C# sample code for a .NET console project specifies the VancouverBillGates dataset. To change to the Windows8 dataset, and remove source code errors, do the following:

1. Open VS 2010, choose New, Project and open the C# node.

2. Select Console Application and type SocialAnalyticsExample as the Name:

image

3. Click OK to create the solution, which opens with default references and Program class.

4. Open Microsoft Connect’s Business Platform Customer Programs’ Codename “Social Analytics” page and locate the Sample Code for Visual Studio section:

image

5. Copy the code to the Clipboard, press Ctrl+A to select all source code in the Program.cs file, and press Ctrl+P to paste the new code to Program.cs. You will notice several errors due to lack of a SocialAnalyticServices.VancouverBillGates Service reference.

6. For consistency, replace both instances of VancouverBillGates with VancourverWindows8. 

7. Replace the two placeholders with your Windows LiveID and DataMarket Accound Key.

8. Copy the SERVICE_URI string to the clipboard.

9. Right-click the Reference node in Solution Explorer and choose Add Service Reference to open the eponymous dialog.

10. Paste the SERVICE_URI string to the Address text box, click Go to list the namespace’s services (entities) and type VancouverWindows8 as the Namespace name:

image

11. Click OK to add the service reference and close the dialog. Your VS 2010 window appears as follows:

image

12. Press F5 to compile and run the project. The resulting command window lists the available SocialProfileTypes (data sources):

image

Note: You can’t filter ContentItems collections by their SocialProfileType because ContentItem entities don’t have a SocialProfileType property.

Stay tuned for extended code examples.

0 comments: