Wednesday, November 16, 2005

Google Base and Bulk Uploads with Microsoft Access

I learned that Google Base went live last night from a soundbyte on my local NBC TV affiliate's (KNTV) 5:00 am local news. So I checked Memeorandum for the details, read the Google Base announcement, created a Google account, read a few help files, looked over a few sample tables, and decided to try a simple Bulk Upload operation from a tab delimited *.txt file.

Access 11+ offers the simplest, rapid-application-development (RAD) approach to creating and exporting tables in a variety of formats, including tab-delimited text files. Creating a custom table requires running a make-table query that aliases field names to Google's required standard attribute names—such as title and id—optional standard attribute names, like currency and quantity, and custom attribute names. Click here for a list of standard and optional attribute names and value restrictions.

I found a few gotchas in the tab-separated file bulk upload process, which Google describes only for Excel worksheets, so here's a preview of a bulk-uploaded product list. An outline of the process for generating and uploading a NwindProducts.txt file that you create from a Make-Table query against the Northwind Categories, Products, and Suppliers tables follows the preview.

Update: Google Groups has a Google Base Help Discussion group with a few messages in Basics, Bulk Uploads, Individual Items, Search Results, and Miscellaneous categories. Bulk Upload Preview For a preview of the initial list of NwindProducts items in the Google Base UI, click here to search for the two Sir Rodney's products, which opens the following page:

Click the Sir Rodney's Marmalade link or click here to open the details page shown here:


Click the All Items by OakLeaf Systems link or click here to display a list of all the uploaded items, which has links to detail pages similar to the preceeding. OakLeaf's authorid attribute value is 1016205.

Beta Version Issue: The source of the 44 results count is a mystery—perhaps a Beta version problem related to publishing. Overnight, the number of results increased to 77 because Google Base doesn't delete items with 'id' values that aren't included in subsequent bulk updates.


Hopefully, problems shown above with high-order ANSI characters in manufacturer attribute values will be corrected in future versions.

Completing Your Personal Profile
If you have or create a Google account, which you need for most Google software-as-a-service (SaaS) applications, you'll probably find it worthwhile to add the additional default attribute values that apply to Google Base only. The following screen capture displays the fields common to most Google SaaS apps:


With the exception of the Web Site field, the Google Base-specific fields and their default values for the Products and related Item Types are shown here:



Notice the integration of Google Maps with the Item Location attribute value.

Creating the Tab-Delimited File with Access 11
1. Create a SELECT query with column aliases to corresponding standard attribute names, as shown here for three of the eight columns of the MakeNwindProducts query:


Notice that USD is a constant for all currency cells. Here's the Jet SQL statement for the Make-Table version of the SELECT query:




SELECT Products.ProductName AS title,
   Categories.CategoryName AS product_type,
   Products.QuantityPerUnit AS description,
   Products.ProductID AS id,
   Format(UnitPrice, "#0.00") AS price,
   "USD" AS [currency],
   Products.UnitsInStock AS quantity,
   Suppliers.CompanyName AS manufacturer
INTO NwindProducts
FROM Suppliers INNER JOIN (Categories
   INNER JOIN Products ON
     (Categories.CategoryID =
      Products.CategoryID) AND
     (Categories.CategoryID =
      Products.CategoryID)) ON
     (Suppliers.SupplierID =
      Products.SupplierID)
WHERE (((Products.Discontinued)=No))
ORDER BY Products.ProductID;

2. Execute the Make-Table query to create the NwindProducts table, which should appear as follows:

3. Open the NwindProducts table in Design mode, select the currency field, and set the Unicode Compression property value to Yes, as shown here:

Warning: If you don't make this change, the USD constant will appear in the exported text file as six Unicode bytes, and Google Base will reject your upload.

4. With the NwindProducts table selected, choose Export to open the Export 'NwindProducts' To ... dialog, select Text Files (*.txt; *.csv; *.tab; *.asc) in the Save as Type list, navigate to the appropriate folder in which to save the file —C:\GoogleBase for this example—and type NwindProducts.txt as the file name. Click Export to open the Export Text Wizard.

5. Accept the default Delimited option and, if you haven't created an Export Format previously, click Next, select the Tab option, mark the Include Field Names on First Row option, and select {None} as the Text Qualifier, as shown here:


Warning: If you don't make the preceding changes, your upload will fail.
Click Next.

6. Click Advanced to open the NwindProducts Export Specification dialog, and accept the previously set values for File Format, Field Delimiter, and Text Qualifier; and the the default Language (English) and Code Page (Western European - Windows). Set the Date Order to YMD, the Date Delimiter to - (hyphen), mark the Leading Zeros in Dates check box, and accept the remaining defaults, as shown here:



Caution: The Google help file states that bulk import accepts UTF-8, Latin1, or ASCII encoding. The Export Specification doesn't include a Latin1 option, but the default Western European (Windows) code page is the equivalent of Latin1. Selecting the UTF-8 code page causes incorrect display of characters with diacriticals in the Beta version; selecting US-ASCII omits diacriticals.

Note: The date settings create date values that comply with the XML Schema date datatype—such as 2005-09-05. The sample table doesn't include Jet DateTime fields.

7. Click Save As to open the Save Import/Export Specification input box, change the default Name to NwindProducts Export to Google Base and click OK twice to save the specification. Click Next, verify the path and file name for export, and click Finish to save the tab-delimited file.

8. Open the file in Notepad and verify that the currency field doesn't include a $ prefix and USD appears in the currency column, as shown here:



Uploading the Tab-Delimited File to Google Base

9. From Google Base's home page, click the Bulk Upload Files link

10. If want to set up an FTP account, follow the Google instructions for obtaining an account for the Google Base FTP server. An FTP account isn't required for the Direct Upload default .

11. If you haven't previously uploaded a file, open the Specify a Bulk Upload File page and type NwindProducts.txt as the file name, select Products as the item type, and accept the default English and US Dollar selections, as shown here:



Beta Version Limitation: You can only upload a single feed from a Google account.

12. Click the Specify Bulk Upload File button to specify the file to upload.

13. Click the Upload and Process This File link. If the tab-delimited file's data format and content is correct, the following status page opens:


14. Return to the Google Base home page and your uploaded items appear as shown here (after a few hours):



Hopefully a Beta Version Problem: It takes an inordinately long time to fully publish a compact list of 69 items (the 77 items appears to be an interim count). After about four hours, only six of the items gained 'Published' status. Overnight all 77 items became 'Published.' Is Google Base running on Sun Servers by any chance?

Definite Beta Version Problem: The "How do I delete my item?" help topic states: "If you're updating your items via bulk uploads, simply delete the item(s) from your bulk upload, then submit the bulk upload again." Deleting the 8 discontinued items from the tab-delimited file did not delete them from the list. Discontinued items—such as Alice Mutton or Perth Pasties—appear with a modified time of 12:23 p.m. All non-discontinued items—such as Sir Rodney's Scones—have 1:49 p.m. as the modified time.

15. To edit an entry, click the Edit link for an item that has 'Published' status to display the autogenerated editing form, as partially shown here:



16. To display the following list of [some] OakLeaf entries, clidk the All Items by OakLeaf Systems link:


Hopefully a Beta Version Problem: The source of the 44 search results count is a mystery.

Initial Conclusions
Google Base's beta incarnation doesn't appear to me to be a serious competitor to eBay, CraigsList, QuickBase, or other forms-based database applications offered as SaaS Web apps. The apparent lack of ability to create multiple named tables having identical item_type attributes is a serious limitation, as is the inability to make lists accessible only to specific Google or other user accounts, group roles, or the like.

Other Google Base watchers have posted entries with similar reservations about the immediate utility of the service other than for classified advertising listings. As time permits, I'll add additional links to comments regarding Google Base's initial commercial viability. In the meantime, check this later Memeorandum page snapshot. Stay tuned for the Google Base blog, which currently requires a login ID and password—as did Google Base itself just before the beta went public.

If Google lifts is ban against multiple Bulk Updates from a single account, I'll run a similar test with SQL Server 2005's Adventure Works product listings and SQL Server Integration Service's (SSIS) text export feature.

Initial tests with an Atom 0.3 (Atom.xml) file generated by Blogger for the OakLeafBlog, saved as a local XML file with FireFox 1.5 RC2, and Bulk Uploaded as the Reference Articles item type showed several problems. The description attribute contains HTML markup and error messaages state that the value is limited to a maximum of 10,000 characters. Thus only the shorter OakLeafBlog articles publish to the list; HTML markup contributes substantially to description length.

Help Center's "What do I include in 'Description'?" topic says "Please ensure that the description does not contain any HTML as we don't currently recognize or display HTML tags in your item." Help Center also says the maximum description length is 1,000 characters. Neither statement appears to be true.
I also plan to give XML files and RSS 2.0 feeds a try with bulk uploads. Click here for a detailed list of all Google Base XML attributes.

Technorati:

Tuesday, November 15, 2005

Ray Ozzie's "Internet Service Disruption" Memo and Microsoft Access

Dave Winer published copies of Bill Gates' October 30, 2005 internal e-mail message and Ray Ozzie's memo of October 28, 2005 on November 9, 2005. Bill Gates' "Internet Software Services" e-mail laid the historical foundation for Ray Ozzie's initial definition of "The Internet Services Disruption" to Microsoft's increasingly ossified weltanschauung. Ray Ozzie's memo attempts to instill leaders of Microsoft's three new divisions—Platform Products & Services, Business, and Entertainment & Devices—with a "'services-enhanced software' mindset." Internal distribution of the message and memo preceded the pair's November 1, 2005 announcement of Windows Live and Office Live at a San Francisco flackfest, and followed by a few days the brief exposure of Google Base on October 25, 2005. Update 11/21/2005: Mini-Microsoft analyzes Ray Ozzie's memo in "A Disruptive Defrag for Microsoft." The beta version of Google Base went live in the evening of November 15, 2005, and Ray Ozzie started version 3 of his blog at MSN Spaces on November 16. A November 20, 2005 post ("Really Simple Sharing") describes a proposed RSS extension for content synchronization specification called Simple Sharing Extensions (SSE). The SSE FAQ is here. Microsoft is releeasing the specification under the Creative Commons Attribution-ShareAlike license. View the Memeorandum 11/21/2005 page snapshot.

Update 12/7/2005: ComputerWorld's Robert L. Mitchell criticizes a pre-beta version of Office Live in his "Office Live: The demo" post. Microsoft responds in the comments.

Update 12/11/2005: The New York Times' Steve Lohr writes in "Can This Man Reprogram Microsoft?"—"The man whom Mr. Gates is counting on to make a difference is Ray Ozzie, a soft-spoken 50-year-old who joined the company just eight months ago. He has the daunting task of galvanizing the troops to address the Internet services challenge, shaking things up and quickening the corporate pulse." ZDNet's Dan Farber comments in "Reprogramming Microsoft and the GoogleNet PC"—"Steve Lohr's New York Times article 'Can this man reprogram Microsoft' doesn't offer much that hasn't already been endlessly reported about Ray Ozzie's background and mission to bring the services economy to Microsoft. He does bring up Google's discussions with thin-client system maker Wyse Technology: 'The discussions are focused on a $200 Google-branded machine that would likely be marketed in cooperation with telecommunications companies in markets like China and India, where home PC's are less common, said John Kish, chief executive of Wyse.'"

Update 4/19/2006: David Kirkpatrick's "Microsoft's new brain" article for Fortune (April 18, 2006) has this deck: "Brutal competition. A stock going nowhere. Microsoft is in crisis, so Bill Gates has unleashed his new hire, software genius Ray Ozzie, to remake the company - and conquer the Web." The article describes in depth a June 2005 meeting of the top 15 Microsoft executives run by recently hired Ray Ozzie. The article claims that "Put simply, Ozzie's assignment is to Webify everything: To intertwine Microsoft's entire product line - software for consumers, software for businesses, Xboxes, all of it - with the vast and ever-growing power of the Net. 'Everything we do should have a presence on the Web,' Ozzie says." Adam Green says here that "Everyone would have to admit that with Google Base turning out to be the world's biggest RSS database and Ray Ozzie announcing Microsoft's synchronization and replication protocol based on RSS, Dave Winer is having the best week ever!. ... Microsoft and Google are being maneuvered into a massive game of chicken. I'll show everyone my Office data if you'll show your search data, and Dave is instigating it." Google Base, Access, SharePoint and Office Live Apparently, Google Base got Ray Ozzie's attention directed to online databases. His memo mentions Access twice, first in the context of "lightweight development":

The rapid growth of application assembly using things such as REST, JavaScript and PHP suggests that many developers gravitate toward very rapid, lightweight ways to create and compose solutions. We have always appreciated the need for lightweight development by power users in the form of products such as Access and SharePoint. We should revisit whether we’re adequately serving the lightweight model of development and solution composition for all classes of development. [Emphasis added.]
And next in a paragraph labeled "rapid solutions":
How can we utilize our extant products and our knowledge of the broad historical adoption of forms-based applications to jump-start an effort that could dramatically surpass offerings from Quick[B]ase to Salesforce.com? How could we build it to scale to hundreds of millions of users at an unimaginably low cost that would change the game? How could we re-shape our client-side software offerings such as Access and Groove, and our server offerings such as SharePoint, to grow and thrive in the presence of such a service? Could these rapid solutions encourage a new ISV ecosystem and business model? [Emphasis and link added].
Notice the association of Access with SharePoint in both topics. Integration of Access tables with Windows SharePoint Services (WSS 3.0) lists appears to be one of Access 12's major "new" features. (Access 2003 already includes basic WSS 2.0 list import, export, and linking capabilities.) Microsoft Watch's Mary Jane Foley reported on September 29, 2005 that Kevin Johnson, co-president of the Platform Products & Services division, "... hinted there might be another managed service SKU around SharePoint Portal Server in the works." InformationWeek's Barbara Darrow speculated on October 26, 2005: "Within a year, the Redmond, Wash.-based company plans to offer hosted implementations of SharePoint as well as CRM and ERP applications, several sources said." Darrows quotes Bill Gates from a September 15, 2005 interview in Computer Reseller News:
"SharePoint today runs primarily on premises. We have some partners who are doing hosted SharePoint. We are looking at what our role is in helping people with SharePoint,” Gates said [at] Microsoft’s Professional Developers Conference in Los Angeles. “So technologically, the server equals service thing year by year is making good progress."

Update 11/29/2005: Microsoft Watcher Mary Jo Foley says in "Services Microsoft Should, Could (and Just Might) Develop"—"There's no way that Microsoft is going to leave the online classified market to Google & Co. There has to be a GoogleBase competitor in Microsoft's line-up. (Anyone out there heard a codename?)" Micro Persuasion's Steve Rubel claims "Google, Microsoft to Go Hard After Classifieds," and provides a link to Yardley.ca's "Microsoft’s ‘Fremont’ a Craigslist competitor" and TechCrunch's "Microsoft “Fremont” to Launch" items.

The obvious question is how Microsoft can scale and monetize a free (or very low-cost), multi-tenanted, Internet-based version of intranet-oriented WSS 3.0 or SharePoint Portal Server.

The QuickBase Model for Google Base

Ozzie's mention of Intuit's QuickBase in the second topic is significant when viewed in the light of Google's potential entry into the online database business. QuickBase began life as a consumer-oriented, forms-based application for sharing database content on the Internet. Lack of consumer interest in paying $14.95/month for up to 15 databases caused QuickBase to morph into a pricey WSS act-alike for corporate workgroups (US$249/month, minimum, for up to 10 users, 5 MB of data, and 100 MB of attachments.) New Access 12 database templates are likely to mimic packaged QuickBase project management, customer service, professional services, marketing, legal, real estate and IT management applications. WSS 3.0 includes new, lightweight project-management features. Project management is one of QuickBase's most popular pre-built applications, and 37signals offers Basecamp—a Web-based project management app—for US$12 - $99 per month. According to Microsoft's "A discussion of what's new in Access 12" blog, the Access 12 templates are "compatible with SharePoint, so users who want to build collaborative apps on SharePoint with them can do so easily."

Note: BusinessWeek's Rob Hof wrote a brief—but glowing—review of Basecamp ("Teamwork, Supercharged—We test one of the latest Web-based management tools: Basecamp") for the November 21, 2005 issue's cover story, "The Web Smart 50." The Richard Bird of R.BIRD & Company, Inc. mentioned in the review has an About Design blog. Basecamp was the first Ruby on Rails project from which 37signals extracted the Rails Web framework. You'll find Rob's interview of 37signals' founder—"37 Signals, 1 Clear Message—CEO Jason Fried's startup philosophy can be summed up in three short words: Keep it simple." (BusinessWeek links might require registration/subscription.") I'm betting that lightweight project management will become the first poster child for a surfeit of low-cost, ad-supported, or hybrid SaaS. Initial Google Base standard data entry and display forms cover course schedules, events and activities, housing, jobs, news and articles, people profiles, products, recipes, reference articles, reviews, services, travel, vehicles, and wanted ads. But Google claims to offer the ability to define your own table fields and design custom forms, much like Access projects, WSS custom lists, or QuickBase databases. Here's what Google says about Google Base content:

Post your items on Google. Google Base is Google’s database into which you can add all types of content. We’ll host your content and make it searchable online for free.

Examples of items you can find in Google Base: • Description of your party planning service • Articles on current events from your website • Listing of your used car for sale • Database of protein structures

You can describe any item you post with attributes, which will help people find it when they search Google Base. In fact, based on the relevance of your items, they may also be included in the main Google search index and other Google products like Froogle and Google Local.

There's no question that Google Base represents a serious competitive threat to QuickBase, CraigsList, eBay, newspaper classifieds, and—potentially—an online version of Access delivered as an Internet service, WSS reincarnated as an Internet "services platform," or both.
PBS—not InfoWorld—columnist Robert X. Cringely discusses Google Base in his November 17, 2005 piece, "Google-Mart: Sam Walton Taught Google More About How to Dominate the Internet Than Microsoft Ever Did":
With the advent of widespread GoogleBase (again a bit-schlepping app that can be used in a thousand ways -- most of them not even envisioned by Google) there's suddenly a new kind of marketplace for data with everything a transaction in the most literal sense as Google takes over the role of trusted third-party info-escrow agent for all world business. That's the goal. ... Microsoft can't compete. Yahoo probably can't compete. Sun and IBM are like remora, along for the ride. And what does it all cost, maybe $1 billion? That's less than Microsoft spends on legal settlements each year.
Salesforce.com and "Intelligent Reaction" Ozzie mentions Salesforce.com in the same sentence as QuickBase when he implores Microsoft executives to "... jump-start an effort that could dramatically surpass offerings from Quick[B]ase to Salesforce.com." It's an interesting coincidence that Adam Bosworth, who was responsible for designing and delivering Access 1.x, is Google's Engineering VP and, according to ZDNet's Dan Farber, a "long-time friend" of Salesforce.com. Farber's "Bosworth: Intelligent reaction, not intelligent design" blog entry summarizes Salesforce.com's (and Google's) approach to incremental application development and contrasts it with (by implication Microsoft's) ponderous "five-year plans" that reek of "state socialism" in application and API design. Bosworth mentioned in his recorded presentation to Salesforce.com's September 2005 Dreamforce developer's conference that only 20% of Microsoft's original use models for Access turned out to be correct. Bosworth implied that lack of early user input and reaction to use models was responsible for Access 1.0 taking two years to release. No matter what apps Microsoft offers in its initial Office Live beta, user input from frequent Community Technical Previews might prevent the generally poor marks assigned to the Windows Live beta by many initial users. Ray Ozzie would have done well to recommend—or demand—that all Microsoft "Executive Staff and direct reports" listen to Bosworth's "Intelligent Reaction" presentation. Conclusion Ozzie's omission of references to other, more popular Microsoft Office members—Outlook, Word, and Excel (PowerPoint gets a peripheral mention)—and his emphasis on Access and SharePoint indicates to me that an on-line database or list manager with an Access or SharePoint-like front end might be one of Office Live's first offerings. Let me know in the comments what apps you believe are in store for the Office Live beta. Questions of the day: Would you—or your organization—entrust Microsoft or Google with your business data? Would it be worth encrypting your confidential data to gain the benefits of software as a service (SaaS)? --rj

P.S. [Updated 11/13/2005] Read Chris Church's comment (1) for more insight on Ray Ozzie's reference to "forms-based applications" and the future of Office Live. Chris concludes:

I expect Microsoft will continue to beef up the InfoPath and SharePoint connection, and continue to add forms and form-like capabilities to SharePoint itself, while on another front begin providing problem/process specific mini-solutions in Access. By leveraging what customers already have they can continue corralling the market while they improve process management, workflow, and content management functionality. I think the amalgamation of much of this will debut in a very limited form in Office Live. Or, I could be wrong (grin).

Technorati: Other Links: PBS columnist Robert X. Cringely's "Its Deja Vu All Over Again: We've Seen Microsoft's New Live Strategy Before" article has an iteresting take on the press conference and "Paper War: Microsoft Is Leaking Internal Documents to Make Us Think They Have a Plan" lambastes the memo and message. Mini-Microsoft takes on the Live! demo fiasco and offers links to other "interesting blog posts" about the event. eWeek's John Pallato weighs in with an op-ed piece, "Microsoft Memos Reveal On-Demand Anxiety," David Coursey claims "Microsoft 'Live Era' Meets Dead Air" and asks "What Can Ray Ozzie Achieve in Redmond?", and Darryl K. Taft claims "Microsoft's Strategy Memos Don't Tell All," citing several well-respected industry analysts who agree. InfoWorld's Ephraim Schwarts says "Microsoft is Stuck on the C: Drive: Despite its new service offerings, Redmond will have a hard time transitioning from the desktop software model." Vnunet.com's Tom Sanders reports that "Google Base goes far beyond an Ebay competitor" and quotes Gartner VP, Whit Andrews: "If it were me, I would be creating an online database such that that online database could feed other online applications." Forrester's Charlene Li unexpectedly gives Microsoft's SaaS press conference the high-five and expects Google to tie Blogger and Google Groups into Google Base. You can read an official Intuit blog about new QuickBase features in the upcoming Fall Release.

Thursday, November 10, 2005

"Expert One-on-One Visual Basic 2005 Database Programming"

I've completed the final sample code review of Expert One-on-One Visual Basic 2005 Database Programming for WROX with the Visual Studio 2005 RTM bits (build 8.0.50727.42) and its companion SQL Server 2005 RTM (version 9.0.1399).

Update 12/19/2005: The book was published today.

Update 1/6/2006: I've added an Amazon Connect Author "Blog" to the book's detail page on Amazon.com. The posts will include notices of sample source code updates and errata. See my "A First Look at Amazon Connect Author 'Blogs'" post for some issues I encountered when posting to Amazon Connect.

Update: 1/17/2006: Added a review of Tom Rizzo's Pro SQL Server 2005 title by InfoWorld columnist Sean McCown (click the General Database link) and corrected Tom's current job description at Microsoft.

Following are the book's parts (bold) and chapters:

I. ADO.NET 2.0 Basics 1. Migrating from ADO to ADO.NET 2. Introducing New ADO.NET 2.0 Features 3. Adopting Best Practices for Data-Centric Projects II. Data Binding in Windows Forms and Controls 4. Programming TableAdapters, BindingSources, and DataGridViews 5. Adding Data Validation and Concurrency Management 6. Applying Advanced DataSet Techniques III. Data Binding in ASP.NET 2.0 7. Working with ASP.NET 2.0 DataSources and Bound Controls 8. Applying Advanced ASP.NET 2.0 Data Techniques 9. Publishing Data-Driven Web Services IV. SQL Server 2005 and ADO.NET 2.0 10. Upgrading from SQL Server 2000 to 2005 11. Creating SQL Server Projects 12. Exploring the xml Datatype

Tom Rizzo, Microsoft's former Director of Product Management for SQL Server, is the book's technical editor. Tom is one of several authors of Apress's recently published Pro SQL Server 2005. (Tom is now running product management for the SharePoint Portal Server team.)

--rj

Technorati:

Wednesday, November 09, 2005

Adam Bosworth: "Learning from the Web" and Google Base

Microsoft XML guru Dare Obasanjo summarizes Adam Bosworth's "Learning from the Web" article posted in the Association for Computing Machinery (ACM) Queue, and concludes:

The article ends by arguing that database vendors should add native support for the Atom Protocol and wire format. I find this interesting since based on conversations on the atom-protocol list, it is clear that Google is very interested in the Atom API. Perhaps they have already built this Atom store that Adam is arguing for and will expose the Atom API as a way to interact with it. Perhaps this Atom store accessible via Atom feeds and the Atom API is Google Base? Speculation is fun.
Note: The Google Base URL (base.google.com) now opens a login form. Click here for Business Week's initial Google Base article and here for early Google Base screen captures.

Update: 11/25/04: Google Base is now live and is accepting uploads in tab-separated-value (TSV), XML, Atom 0.3, and RSS 1.0/2.0 formats.

Craig Ogg's Software Voices blog offers similar speculation in a "Did Adam Bosworth Reveal the Real Google Base at the MySql Users Conference" post. Gordon Gould, Craig's business parter, also ruminates on Google's intentions for Base. The abstract of Adam's paper, "Database Requirements in the Age of Scalable Services," contains a link to an audio recording of his presentation. Adam distinguishes information stored in a database from content stored in Web pages, and suggests that databases with integrated query processors won't scale to Google's requirements—i.e., a billion or more queries per day. Note: Adam Bosworth is Google's Vice President of Engineering. He and Brad Silverberg were founders of Analytica, which Borland purchased to provide the foundation for its Quattro Pro spreadsheet. While at Microsoft, he was responsible for designing and delivering Microsoft Access 1.x and the HTML rendering engine for Internet Explorer 4.0. Subsequently, he, Tod Nielsen (Access's first and foremost marketing guru), and other Microsoft employees founded Crossgain, which BEA acquired. Borland recently appointed Tod Nielsen as CEO, after his brief stint as an Oracle VP. Bosworth's article makes the following two points about database scaling and caching:

3. Have databases enabled people to harness Moore’s law in parallel? This would mean that databases could scale more or less linearly to handle both the volume of the requests coming in and even the complexity. The answer is no. 4. Do databases optimize caching when it is OK to be stale? No.
Obviously, Google Base requires the capability to scale-out and cache slowly-changing or unchanging "catalog" data. SQL Server 2005 Enterprise Edition's partitioning and new Scalable Shared Databases feature addresses point 3. All except Express editions' capabilitity to invalidate ASP.NET 2.0 page caches with Query Notifications answers "Yes" to point 4. Two other points Bosworth raises have the potential to be solvable with SQL Server 2005's native XML data type and support for XQuery, full-text search, or both:
5. Do databases handle flexible graphs (or trees) well? No, they do not. 6. Have the databases learned from the Web and made their queries simple and flexible? No, just ask a database if it has anyone who, if they have an age, are older than 40; and if they have a city, live in New York; and if they have an income, earn more than $100,000. This is a nightmare because of all the tests for NULL.
The scalability of SQL Server 2005's XQuery implementations, even with XML indexes, remains to be seen. However, I wouldn't expect to be able to search Google Base with XQuery expressions. Bosworth concludes his article with this paean to Oracle on page 5:
Oracle has done a remarkable job of adding XML to its database in the various ways that customers might want. In so doing, it has added a lot of these capabilities. Its ROWID type allows some forms of flexible linkage. But none [of the database vendors] really show that they have learned from the Web.
I don't understand how a ROWID pseudocolumn can provide a better form of "flexible linkage" for XML documents than an auto-incrementing (int identity) primary key. It's my opinion that SQL Server 2005's XML data type will support RSS 2.0/Atom documents and do so more easily and efficiently than Oracle 10g. But accessing a production database with an RSS 2.0/Atom wire protocol appears to me to have serious security repercussions and performance problems. Finally, the jury's still out on the issue of whether SQL Server 2005's xml columns will partition and scale to Googlesque requirements. --rj Technorati:

Monday, November 07, 2005

New SQL Server 2005 Features Announced November 7

Microsoft announced two new SQL Server 2005 features in conjunction with the November 7, 2005 San Francisco SQL Server 2005/Visual Studio 2005/BizTalk Server 2006 release event: Scalable Shared Databases for SQL Server 2005 Enterprise Edition According to Knowledge Base article 910378:

The scalable shared database feature allows you to attach a read-only reporting database to multiple server instances over a storage area network (SAN). A reporting database is a read-only database that is built from one or more production databases that are used exclusively for reporting purposes. To be made into a scalable shared database, a reporting database must reside on one or more dedicated, read-only volumes whose primary purpose is to host the reporting database or a coordinated set of reporting databases. These volumes are known as reporting volumes.

SQL Server Management Studio Express (SSMSE) for SQL Server 2005 Express SSMSE is the November Community Technical Preview (CTP) for the promised SQL Server Express Manager replacement. According to the SSMSE Overview:
Microsoft SQL Server Management Studio Express (SSMSE) provides a graphical management tool for SQL Server 2005 Express Edition (SQL Server Express) instances. SSMSE can also manage relational engine Instances created by any edition of SQL Server 2005. SSMSE cannot manage Analysis Services, Integration Services, SQL Server 2005 Mobile Edition, Notification Services, Reporting Services, or SQL Server Agent.
Here's a capture of SSMSE after executing the Instnwnd.sql script to create the Northwind sample database from SQL Server 2000.

You can't run the SSMSE setup program if you've installed the full version of SSMS; the two products won't run side by side. Click here to view the SSMSE readme.

--rj

Technorati:

P.S. Steve Ballmer announced at the San Francisco release event that Visual Studio 2005 Express editions would be available for free download from the Microsoft Web site until November 7, 2006. More details, including download links and rewards for registering the downloads, are available on Dan Fernandez' blog.

Sunday, October 30, 2005

OpenOffice 2.0 Base Matches Microsoft Access? Surely You Jest.

Updated 10/30/2010: See end of post.

eWeek's October 21, 2005 "OpenOffice Packs a Powerful New Database Punch" article by Lisa Vaas opens with the following lead: "OpenOffice.org's latest update includes a database that matches Microsoft's popular and competing Access database, experts say." If you believe this sentence, I have a bridge to sell you.

Full Disclosure: I make a significant part of my living by writing books about and consulting on Microsoft Access and other Microsoft databases, development platforms, and operating systems.

I've been testing the RC2 and release version of OOo 2.0's Base feature with registered (i.e., attached or linked) SQL Server 2005/Express tables, plus Access 2003 (Jet) *.mdb and comma-separated-value (*.csv) files for the past 10 days or so. My first two sets of tests, which I've documented here and here, indicate that OOo 2.0 Base isn't even close to competitive with Access 2000+ as a desktop database development platform.

The only "expert" Ms. Vaas quoted—who wasn't an OpenOffice.org member—was RedMonk analyst Stephen O'Grady. Mr. O'Grady, however, admitted: "I have not evaluated [Base] to the extent that I can call it 100 percent at functional parity with Access". I can find no evidence at all in the article that any "expert" said or even implied that "OpenOffice.org's latest update includes a database that matches Microsoft's popular and competing Access database ...".

The Details

Ms. Vaas quotes Mr. O'Grady as saying, "Both large and small businesses have investments in Access as a database. From that standpoint, the capabilities in previous versions of OpenOffice.org just didn't cut it. They couldn't help businesses make the transition."

OOo Base 2.0 still won't "help businesses make the transition" with existing Access applications that include Access macros or VBA code, have crosstab queries, or use complex forms and reports. You'll need to manually convert Access macros and VBA code to OOo macros in the OOo Basic dialect, where possible. Base uses OOo Writer forms, which raise fewer events than Access data entry forms. Form controls also raise fewer events than their Access counterparts.

Data Entry Forms

You can't import Access forms to OOo Base. The OOo 2.0 Base Form Wizard lets you create a conventional single-table or master-details form. This master-details form displays a Northwind Orders row and its Order Details line items:

After you create the basic form design with the Wizard, you can customize it with common form controls (widgets). It's a major project to create a navigation form that's equivalent to one you can produce in a few minutes with Access's built-in Switchboard manager.

Formatted Reports

Nor can you import Access reports. The OOo 2.0 Base Report Wizard creates reports with HTML-like tables to display column names and values. There's no evident method for adding subreports, which are critical for most business applications. The following was about the best I could do with the Report Wizard and a complex query from the AdventureWorks 2005 database:

Access provides a Report Chart Wizard to generate reports that include charts whose data source is a table or query. I wasn't able to find a direct method of adding such a chart to an OOo Base report.

HSQLDB Files Are NOT Stored as XML

Ms. Vaas states on the second page of her article, "The full version of the HSQL database engine included in OOo Base stores data in XML files." HSQLDB stores data in a proprietary binary format as a data file that's stored in a compressed (ZIPped ) *.odb file.

Here's a capture of the first few records of the Northwind Customers table from the 1.5-MB decompressed data file that contains all tables from the Northwind sample database:

Stating that HSQLDB stores data in XML files seems to me to be a gratuitous attempt to represent HSQLDB storage as similar to the OpenDocument 1.0 XML format for other OOo documents. There is no way that anyone who had investigated HSQLDB's storage format could confuse the binary format with XML.

If OOo Base can incorporate proprietary binary formats in what are purported to be OpenDocument 1.0-compliant files, why not incorporate *.mdb, *.doc, *.xls, or *.ppt binary images in corresponding *.od? files?

Embedded HSQLDB Does NOT Have JDBC (or ODBC) Support

eWeek's Steven J. Vaughan-Nichols says in his October 20, 2005 "Why OpenOffice.org 2.0 Is Your Best Choice" column: "OpenOffice.org's Base Java-based HSQLDB database engine is a solid database. Its real selling point to me is that it has both good SQL and JDBC (Java Database Connect) support. In short, while I still look to FoxPro for PC-based databases, I can see using OpenOffice.org 2.0 to model a serious, server-based DBMS. The idea of even attempting that same job with Microsoft Access makes me ill."

OOo Base uses the embedded version of HSQLDB that stores its data in the *.ods file. According to Frank Schoenheit of Sun Microsystems Germany, there are no current plans to implement either a JDBC or ODBC driver for the embedded version. Embedded HSQLDB isn't a server-based DBMS and, according to Schoenheit, "is not capable of concurrent multi-user access."

Mr. Vaughan-Nichols offers no explanation why using Access as a desktop database development application makes him "ill." I find Access to be well-suited to creating sophisticated database front ends with minimal effort and development investment. He also mentions that OOo 2.0 lacks a grammar checker but doesn't mention OOo 2.0's omission of a Microsoft Outlook clone.

Query Designer and Relations(hips) Windows

OOo Base's Query Designer window is a da Vinci database tools clone. SQL Server 2000 Enterprise Manager, SQL Server 2005 Management Studio, Visual Studio 2005, and Access data projects (*.adp) use the da Vinci tools.

Here's an example of an T-SQL query against the AdventureWorks database running under SQL Server 2005 Express Edition:

The preceding query serves as the data source for the earlier report sample.

Here's an example of a graphical design for the preceding query:

On the whole, I found OOo Base's query design features to rival those of Access data projects. They probably are an adequate substitute for Access's default query-by-example (QBE) designer. Just don't try executing a Jet crosstab query with PIVOT and TRANSFORM keywords. SQL syntax error messages are cryptic, at best.

The Relations(hips) window also clones the Access version and enables specifying relations between primary and foreign key fields. This screen capture shows the relationships between the four AdventureWorks tables:

The Relations(hips) designer persists the table icons, but not the relation lines for SQL Server 2005 tables. However, the designer does save Jet database relationships correctly.

The Table Wizard for speeding creation of common table designs successfully duplicates the appearance and operation of Access's Table Wizard.

Exporting HSQLDB table data in formats other than spreadsheet or *.csv files doesn't appear to be supported. The 64K row limit for spreadsheet files would appear to limit the number of rows in a *.csv file.

Programming Event-Handlers

Like Access, OOo Base doesn't support recording macros. The OpenOffice.org API (Application Programming Interface) provides a set of platform- and language-independent interfaces for programming event handlers. Universal Network Objects (UNO) implements the API for languages that OOo supports. The 900-page OOo Developer's Guide provides API/UNO documentation with an emphasis on Java and C++ sample code.

Choosing Tools, Macros, Organize Macros displays a choice of four programming languages—OpenOffice.org Basic, JavaScript, BeanShell, and Python. Choosing OpenOffice.org Basic opens this dialog:

The OpenOffice.org Macros node contains sets of standard (global) read-only macros. Following is the OOo Basic IDE displaying the ChooseMarket event-handler:

You won't find many of the conveniences offered by Access's VBA Editor—such as IntelliSense, statement completion, and color-coded reserved words. Documentation for migrating from Access VBA to OOo Base macros—such as "Porting Excel/VBA to Calc/Star Basic," one of the how-to documents for OOo version 1.0—would enable developers to estimate the time required to port VBA code behind forms or in modules to OOo Basic.

Performance Comparisons

Unlike ZDNet's George Ou, I haven't published performance or resource consumption comparisons between Access 2003 and OOo Base at this point. Most of my tests so far have been conduced with with simple tables having fewer than 5,000 rows. I've noticed that opening an embedded HSQLDB clone of the Access 2000 Northwind database (tables only), executing SELECT and INSERT queries, and navigating records in a simple master-details form appear slower than the same operations in Access 2003. Mr. Ou recently gathered more detailed memory and CPU resource consumption comparisons by taking advantage of Sysinternals' free Process Explorer utility.

Comparing performance with directly-connected and linked Jet and SQL Server Northwind-style tables that have, say, 500,000 Orders rows and 2,000,000 Order Details rows and timing execution of the INSERT statements is relatively easy. What's not so simple is generating these rows and timing their addition to an embedded HSQLDB database. Importing *.csv data 64K rows at a time isn't my idea of a great way to spend a weekend, so I'll probably need to get up to speed writing lengthy DDL batches or OOo Basic macros to generate dynamic SQL for the project. Performance tests on complex queries also require evaluating the influence of indexes added to WHERE clause fields.

Conclusion

The development investment to migrate a moderately complex Access front end to OOo 2.0—even without moving the underlying data from the Jet *.mdb file to HSQLDB and the *.odb file—is likely to be much greater than the potential cost recovery from moving to free OOo 2.0. Changing from Windows to Linux requires adopting another RDBMS—such as MySQL, PostgreSQL, or Oracle 10g Express Edition—for multi-user applications or using the embedded HSQLDB for single-user projects that require OpenDocument 1.0-complient files. (This assumes that embedding table data in a compressed, proprietary binary format qualifies as OpenDocument-complient.) As mentioned earlier, embedded HSQLDB databases don't support multiple concurrent users.

Moving data from other RDBMS tables to HSQLDB appears to require comma-separated-value (CSV) or fixed-width files. Importing CSV files with date, time, or date/time fields is a very tedious process, as described in the "OpenOffice Base 2.0 vs. Microsoft Access 2003 - Part II HSQLDB Tables" post's "Importing CSV Dates, Times, or Both" section.

OOo 2.0's Base feature is version 1.0 and appears to have been rushed to release with the other updated OOo 2.0 applications. Base v.1.0 might be useful for simple database projects with one or two tables, such as address lists for mail merge operations or a set of Customers and Contacts tables. But from what I've seen so far, I wouldn't advise clients to migrate from Access multi-user production applications that require automated object navigation, or sophisticated forms and reports. Neither would I advise clients to invest developer time to gain proficiency in writing OOo Basic macros without assurance of substantial usability and developer improvements to Base in forthcoming upgrades.

Frank Shoenheit provides this insight on future OOo Base upgrades:

"This is what I would like to focus [on] for 2.x: Make Base capable to create database-driven 'solutions' or 'applications' or however you would call this. This includes the auto-run forms, dialog-base forms, reasonable scripting support, perhaps a more comprehensible form API, and much more.

So, my priorities here would be: Improve the application, so that people can work with it without too much database knowledge, and then improve (or expose) the export core functionality."

Of course, it might be a while before OOo 2.x with the capability of creating database-driven solutions or applications appears.

[Minor update on 10/23/2005]

According to ZDNet's David Berlind, "... the Massachusetts Senate Post Audit Committee has apparently called for hearing to take place at 1pm on Monday October 31st ..." to discuss "Massachusetts' recent ratification of the OpenDocument Format (ODF) as one of two standard file formats (the other is Adobe's Portable Document Format) that all state agencies must start using on January 1, 2007." ("Is Massachusetts' OpenDocument decision on the rocks?" October 24, 2005).

[Minor updates on 10/25/2005]

1. Is there a connection between Google's forthcoming Google Base and OpenOffice.org Base? The short-lived exposure of Google Base screen captures indicates that you'll be able to upload small database and other document files (<10MB) directly and files >=10MB to Google Base via FTP.

Google's following Terms of Service, however, might cause potential users to have second-thoughts about using Google Base as a future replacement for Intuit's QuickBase service.

“By submitting, posting or displaying Content on or through Google services which are intended to be available to the general public, you grant Google a worldwide, non-exclusive, royalty-free license to reproduce, modify, adapt, publish and otherwise use, with or without attribution such Content on Google services solely for the purpose of displaying, distributing and promoting Google services.”

There's no indication from the screen captures I've seen that one can restrict file access to specific users or user groups. Stay tuned for posts that point to public sample OOo Base (*.odb), Jet (*.mdb), and SQL Server (*.mdf/*.ldf) examples for linking after Google Base goes live. Search Engine Roundtable reports that Google Base would "be introduced during the 'Google Zeitgeist '05 Partner Forum' to be held [10/25 to 10/27/2005] at Google HQ in California."

My take is that the phantom preview was intended to demonstrate Google's hype factor—not their shadowy weltanshauung—to Zeitgeist attendees. 2. Information Week: "Microsoft Lines Up Politician Support In Mass. Format Battle" by W. David Gardner; ZDNet: "InfoWeek: OpenDocument tangled in MA political showdown" by David Berlind; and BetaNews: "Mass. Secretary Attacks Open Doc Plan" by Nate Mook.

[A Later Update on 10/30/2005]

William Grosso comments on the speculation surrounding Google Base and proposes a new unit of measure of Web-application effort named the craig. The effort unit reminds me of the Technocracy movement's attempt to quantify physical and intellectual work by the erg or Joule: "10,000 lines of debugged AJAX code is worth how many ergs?"

PBS's Robert X. Cringely weighs in with a pre-Google Base (10/20/2005) article titled "Energy Crisis: It's Hard to Make a Buck When You are Storing Everyone's Stuff for Free." Cringely notes that "NOW we know why Google bought those 30 acres on the Columbia River in Oregon right next to a generating station from the Bonneville Power Administration. It's a source of cheap, uninterruptible power."

Cringely's latest (10/27/2005) piece, "Changing the Guard: Retiring Baby Boomers Are Going to Invigorate open Source," postulates that future retirees with IT talent will devote their golden years to writing code for Open Source projects.

Technorati:

Saturday, October 29, 2005

Oracle Releases Free 10g Express Edition Beta

Not to be outdone by Microsoft's October 28, 2005 posting of the release version of SQL Server 2005 Express Edition (requires subscription) on MSDN, Oracle the same day posted a beta version of Oracle 10g Release 2 Express Edition for Linux x86 on the Oracle Technical Network (OTN). The Oracle 10.2 XE download, which requires an OTN account, weighs in at 198.5 MB, while SQL Server 2005 Express is a more svelte 54 MB. Martin Lamonica's "Oracle to offer free database" CNet article, also of 10/28/2005, quotes Oracle's Andrew Mendelsohn, senior vice president of Oracle's server technologies division:

There is definitely a market there [for low-end databases] and a demand. And we want them to be using Oracle and not MySQL or SQL Server Express. It's definitely a reaction to the market interest. Even though the database is initially free, standards progress and those university students who are playing with the database today will eventually be working at corporations and making product decisions. We want to have mind-share with those people.

Oracle 10g and SQL Server 2005 Express editions share a 1-CPU with 1-GB RAM limit and a 4-GB maximum database size. IBM has offered a 90-day trial version of its IBM DB2 Universal Database Express Edition since 10/9/2004, and a free download of its DB2 v8.2.3 Personal Developers Edition for single-user desktop database applications since 9/23/2005 (requires registration). The DB2 Express Edition supports up to 2 CPUs and 4 GB RAM on Linux and Windows with these license prices. IBM's Express Edition appears to have been an attempt to counter Oracle Database Standard Edition One—introduced more than a year ago—that carries a license fee of $149 per user. In my opinion, DB2 Personal Developer Edition is likely to become a victim of the "free database" wars as a result of its single-user, desktop-database stigma. The future of Oracle Standard Edition One, which was aimed primarily at workgroups, also is open to question in the current environment. In the meantime, check out the SQL Server 2005 Express Edition specs. The last-minute demise of SQL Server Express Manager for Express database management in favor of a promised restriced version of SQL Server 2005 Management Studio might dissuade developers from adopting SQL Server 2005 Express Edition until Microsoft releases a replacement set of visual management tools. Update: The November 2005 Community Technical Preview of SQL Server Management Studio Express (SSMSE) is now available for download. --rj Technorati: