Friday, September 05, 2008

Microsoft Introduces Data Minining “Cloud Services”

The SQL Server team recently made a stealth release of a Table Analysis Tools for the Cloud technology preview during the 14th ACM SIGKDD International Conference on Knowledge Discover and and Data Mining (KDD08) held in Las Vegas from August 24 to 27, 2008.

The preview takes your data in the form of an Excel worksheet or a .csv file, uploads it to SQL Server 2008 Table Analysis Tools running as a Web service, and adds it to the original or a second worksheet or thin client (browser) window. This description indicates to me that the application is better classified as a traditional Web service or Software as a Service (SaaS) application, rather than a true Cloud Computing candidate.

Here’s a promotional sample of the Excel 2007 add-in displaying a forecast in a chart:

Following are feature descriptions by Microsoft senior software engineer Bogdan Crivat from his Data Mining for the Cloud (or how I spent my summer) post of August 29, 2008:

  • Analyze Key Influencers — detects the columns that impact your target column. It presents a report of those values in other columns that correlate strongly with values in your target column.
  • Detect Categories (clustering, for data miners) — identifies groups of table rows that share similar characteristics. A categories report is generated, which details the characteristics of each category
  • Fill From Example — to some extent, similar to Excel’s Autofill feature: it learns from a few examples and extends the learned patterns to the remaining rows in the table
  • Forecasting — analyzes vertical series of numeric data, detects periodicity, trends and correlations between series and produces a forecast for those series
  • Highlight Exceptions — finds the interesting (or unusual, or out-of-ordinary) rows in your table
  • Scenario Analysis — What-If and Goal-Seek tools based on a probabilistic model built on top of your data.
  • Prediction Calculator — a tool for generating prediction scorecards
  • Market Basket Analysis — analyzes transaction tables to identify groups of items that appear together in transactions

The thin-client (browser) version doesn’t implement all the preceding features yet.

There’s more about the development of the tools and KDD08 in Jamie MacLennan’s KDD 2008 and Incredibly Awesome SQL 2008 Data Mining Demos post of 8/25/2008.

Brent Ozar’s SQL Server Data Mining in the Cloud post of 8/27/2008 has a detailed and fully illustrated tutorial for the Excel version using sample data from PerfMon. Brent describes the tools as “Incredibly Awesome.”

SSDS is an obvious candidate for a future data source for Data Mining Cloud Services, but it will take considerable cooperative effort between the SSDS and Data Mining teams to enable direct data transfer from SSDS to the table tool via today’s plain old XML (POX) or the future AtomPub or JSON wire formats. However, hooking up with SSDS might qualify the Web service as a true Cloud Service.

Note: Thanks to C. C. Chai for the heads-up on this new service.