Sunday, July 03, 2005

A Business Intelligence Demo with Real-World Data

The Gartner Group attributes much of the 10.3 percent growth of the total relational database market from 2003 to 2004 to new business intelligence features—data analysis, warehousing and reporting—as well as the weakening US dollar. Gartner's May 2005 "No Clear Winner in Overall RDBMS Market Share Race" report gave IBM 34.1%, Oracle 33.7%, and Microsoft 20% of the total market. Teradata, Sybase, and others claimed 2.9%, 2.3% and 6.6%, respectively. Microsoft's market share increase from 18.7% in 2003 is surprising when you consider that prospective licensees were then anticipating Yukon to release to manufacturing in mid-2004, not November 2005. eWeek magazine's Lisa Vass quotes Microsoft director of SQL Server product management, Tom Rizzo: "BI is a tremendous growth driver for us, especially Reporting Services, which we've seen a ton of customers buying and deploying. That's why we invested so heavily in BI technologies across SQL Server. ... We put a down payment many years ago, and now it's paying off in terms of revenue growth." Unlike its major database competitors, Microsoft includes business intelligence integration, development, reporting and management features in the basic license fee for all editions of SQL Server 2000 (except MSDE) and SQL Server 2005 (except Express and Workgroup editions, which do include Reporting Services.) SQL Server 2005 has a raft of new and improved business intelligence (BI) features. But the standard sample online transaction processing (OLTP) and data warehouse (DW) databases are based on the relational AdventureWorks sample database. Demonstrating the performance of Integration Services (SSIS, formerly Data Transformation Services. DTS) extract, transformation, and loading (ETL) features isn't practical with tables that contain only a few hundred or thousand rows. You need partitions containing multimillion-row dimension and fact tables to emulate the BI systems of, for example, nationwide or multi-national retailers. The table-size issue with performance testing is similar to that I describe in my FTP Online articles about SQL Server 2005's xml data type and data encryption features. Microsoft's Project REAL is "a reference implementation of a business intelligence (BI) system using real large-scale data from a real customer." Phase 1 of the project used source data from a large electronics retailer. Phase 2's "real customer" is Barnes & Noble, the largest U.S. bookseller, who contributed the masked source data and BI scenario. Barnes & Noble have about 40,000 employees and 800 stores in the U.S.

According to the Technical Overview for Phase 2, Project REAL's goal "is to discover the best practices for creating BI systems with SQL Server 2005 and to build a system that exhibits as many of those best practices as we can. This project is not just a demo—we are creating this system for ongoing operation. It is a complete system, including daily incremental updates of the data, large multiuser workloads, and system monitoring."

The Technical Overview is the first in a promised series of articles that will be based on the B&N source data and analytical model. Hopefully, the Project REAL team will provide dynamic, online demonstrations of simulated ad hoc and preprogrammed BI reports. Making Project REAL's warehoused data and Reporting Services accessible to developers by Web service methods similar to those for TerraServer or MapPoint maps would be a major SQL Server 2005 marketing coup.

Hey, Tom Rizzo—are you listening?

--rj