Tuesday, November 29, 2005

Windows Live "Fremont" vs. Google Base Classifieds

Steve Rubel's Micro Persuasions post "Google, Microsoft to Go Hard After Classifieds" opened the Windows Live Fremont beta's kimono on November 29, 2005 by hitting the #2 position on Memeorandum. According to InfoWorld's Elizabeth Montalbano's "Microsoft to test classified service by year-end" article, Fremont will be rechristened Windows Live Classifieds when it goes into production in the first half of 2006.

Update: Microsoft rechristened Widows Live Classifieds as Windows Live Expo on or about 1/4/2004. A new MSN Spaces Windows Live Expo blog, which has two entries as of 1/19/2006, announced the name change. Steve's post includes links to Greg Yardley's "Microsoft’s ‘Fremont’ a Craigslist competitor" item that includes marketing information about Fremont obtained from a Chinese MSN Places posting. Greg links to Adam Herscher, a Microsoft program manager (PM) who said, inter alia:

Unfortunately, Froogle Local isn't much help yet -- I can't find these products anywhere in Los Angeles county. I guess it's going to take some time and effort to woo over merchants. [I'm also wondering whether this is a market Windows Live Fremont intends to compete in, or if Fremont's goal is just to be a "Craigslist-killer"]. I imagine other competing services will crop up over time too. Google Base, which powers Froogle Local, provides a mechanism for merchants to do bulk imports. But in addition to allowing merchants to regularly push this information to Google Base and n other services, it'd make sense to support a standardized pull mechanism too.
Note: Adam removed the first paragraph's sentence in brackets from his current online posts, but Greg had the foresight to add a link to the cached version from which the missing sentence was extracted.

Clicking Adam's Windows Live Fremont link opens this logon page for the beta version:


Unfortunately, access to the beta version currently is limited to folks with pre-registered group e-mail addresses in the microsoft.com domain:


Products Listings vs. Classified Ads
Google got the jump on Microsoft by signing up organizations who could quickly add large numbers of entries by bulk-uploading files to Google Base.

As an example, most of the more than 13+ million Products items come from bulk uploads of ShopLocal LLC information on retail chain stores—CircuitCity, CompUSA, KMart, OfficeDepot, RadioShack, Staples, Target, and Walgreens, as of November 29, 2005. SF Mobile, Inc.'s single store appeared with 2,523 items on November 30. Products listings power the Froogle Local shopping service, which has more in common with yellow pages than newspaper-style classified advertising.

Note: The number of Products items grew by about one million (to 14,480,050) from November 29 to November 30, 2005. The number the dropped to 10,005,921 as of December 2, 2005, possibly as the result of removing spam or inactivating items that don't comply with Google Base's Program Policies.

Google relies on bulk uploads for the majority of entries for common classified item types. Online rental agents/aggregators—such as Southern California's ApartmentHunters (851 of 52,406 entries)—dominate the Rentals category. The New York Times contributes its real estate classifieds to the Housing category (10,610 of 432,575 entries). Similarly, used-car wholesalers and marketers—such as CarCast—upload bulk listings to Google Base's Vehicles category. Item types less amenable to bulk uploads have far fewer entries; for example, the entire Services item type has only 8,567 entries and Events and Activities has 3,991 items.

Note: This entry was updated on December 2, 2005. Most item counts were made on November 29, 2005.

Microsoft appears to be aiming at social networking to drive traffic, and thus entries to the Fremont database. Here's the Fremont promo text quoted from the Chinese blog:

This product represents a unique offering by Microsoft to address the person-to-person marketplace. The product, code-named “Fremont,” is a dynamic new listing service that enables people to easily buy, sell, or swap among friends, co-workers, or the public. Fremont enhances your ability to:

* Connect with those you trust – your messenger buddies and your coworkers,
* Locate items in your neighborhood or across the country through integration with MapPoint and Windows Live™ Local (formerly Virtual Earth),
* List easily, instantly, and for free.
Note: Virtual Earth's name changes to Windows Live™ Local. (Pinging local.live.com returns 65.55.241.141; tracert times out at an msn.net router, so prepare for a move in the near future.)

Dan Farber's ZDNet "Between the Lines" blog entry of November 30, 2005, "Google Base and Fremont–signs of the Web 2.0 plateau", quotes Microsoft product unit manager Gary Wiseman, who describes Fremont as "a free listing service, with a bunch of twists to make it very unique, such as integration with social networks, in particular integration with MSN Messenger." (Microsoft promises a new Windows Live Messenger beta is "coming soon.") Farber expects Google Base to "become more of a classifieds engine." CNet's Elinor Mills adds more details from Wiseman in her November 29, 2005 "Microsoft tests classifieds service" article.

Windows Live Fremont appears to encourage manual entries by individuals—typified by Craigslist—rather than the bulk uploads that contribute the majority of Google Base entries (e.g., the 13-million Products entries). According to Wiseman, Fremont posters can limit item visibility to those in "their MSN Messenger buddy list ..., in their MSN Spaces network, or in a specific domain name e-mail group."

Dare Obasanjo's December 2, 2005 "Windows Live Fremont: A Social Marketplace" post goes deeper into Fremont's social networking aspect by likening the service to "the bulletin boards in our [college] dorm hallways."
Obviously Microsoft doesn't want Fremont to poach users from the MSN Shopping service, which received a major facelift this year. Thus Fremont appears to be targeted as, in Adam Herscher's original words, a "Craigslist killer." It's not surprising that Microsoft has Craigslist in their sights; according to the Pew Internet & American Life Project, Craigslist is #1 in the "Top classified sites" list posted by ZDNet on November 30, 2005. Craigslist grew their unique audience 156% between September 2004 and September 2005, while second-place Trader Publishing Company grew "only" 90%.

According to Search Engine Watch's "What's New in Shopping Search 2005" article for November 30, 2005:
MSN Shopping now uses product feeds from eBay, PriceGrabber and Shopping.com in addition to the feeds it has always used from major merchants. The result is more than 27 million product offers from over 7,000 stores. But MSN goes beyond simply aggregating product offerings.

MSN Shopping has a team of category managers who specialize in particular areas, such as consumer electronics and jewelry, and tailor the search/browse experience to meet the needs of consumers shopping for those types of products. The company also aggressively "cleans" data from product feeds to avoid duplicate offerings.
Shailesh Prakesh, lead product manager for MSN Shopping explained how MSN Shopping processes eBay auction content in his December 2, 2005 MSN Shopping Insider post, "eBay Selection on MSN Shopping":
Inventory on eBay is constantly changing and in order to bring our consumers the freshest catalog of choices possible, we parse, load, classify and match the tens of millions of eBay items on a daily basis. We have invested in building out our software platform to handle such high churn workloads and have expanded our server infrastructure to efficiently and quickly ingest the inventory available on eBay, along with the catalogs of our existing merchants and aggregators. All told, we sift through hundreds of millions of items every day.
Obviously, adding "high-churn" items from product auctions requires real-time updates to the back-end database. It appears to me that latency of the current Google Base back end wouldn't support real-time processing of auction items.

Note: The MSN Shopping Insider blog has entries posted by Microsoft marketing folks, so the content is mostly PR-related. "See "Bulk-Uploaded Items Disappear from Google Base" for a description of issues with Google Base's duplicate item inactivation process.

Thus it's certain that Fremont will have far fewer entries and, consequently, page views than Google Base or MSN Shopping. Google's combination of Froogle Local Shopping, a Craiglist clone, and other item types—Events and Activities, News and Articles, Recipes, Reference Articles, and Reviews that aren't related to ecommerce—will result in many more page views and thus greater opportunity for Google Base advertising revenue. However, BusinessWeek's Robert Hof gives Froogle thumbs down in his December 5, 2005 "Froogle: Shopping Made Complex" product review. According to Hof, "Froogle still can't match the leaders [Shopping.com, Shopzilla, and Yahoo! Shopping] in what really matters: quickly researching and finding the just the products you want." Forbes' Rachel Rosmarin is even less enthusiastic in her earlier "Google's Empty Stocking" article about three-year-old Froogle, which is still in beta. Obviously, the problems need fixing if Google wants to give Craigslist or competing retail sites a run for the money.

Greg Sterling, the Kelsey Group's local Internet marketing maven, reports from the 2005 Internactive Local Media Conference (ILM 05) that localized search now constitutes 35% to 40% of all search activity. Greg also reported on a ILM 05 demonstration by Erik Jorgensen, general manager for MSN search, of a forthcoming enhancement to the Windows Live Local service: Pictometry aerial oblique imagery, which is mapping with angled shots taken from low flying aircraft. Greg discusses offline vs. online inventory information in his "More on 'Froogle Local'" post and contends that "Google Base is ultimately about local" in his earlier "More Thoughts on Google Base" post.

Local search and shopping clearly is today's hot topic among Web marketers and analysts. Even CNet Networks has climbed on the bandwagon with local shopping features for technology products.

Channel Intelligence provides local availability data for more than 1,600 top-selling products. CNet has a group of participating consumer-electronic retailers —Best Buy, Circuit City, CompUSA, and OfficeMax—similar to that originally reported for Google Base Products listings.

Note: It's likely to be a challenge to select appropriate (related but non-competitive) ads for placement on Froogle Local pages, which are sure to be the largest consumer of Google Base's database server resources.

Open vs. Closed Databases
Craigslist has a much a more granular set of predefined classified advertising categories than Google Base, a pattern that Fremont will follow with a predetermined—probably SQL Server 2005—database schema. Mills quotes Wiseman: "We started this before anyone knew about Google Base. Having seen what Google Base is doing, I don't think they were aiming for a classifieds service. They don't have a taxonomy of listings like we do. They see it as an open database."

Note: Clay Shirky compares ad hoc, user-defined tags, such as Technorati's, with formal ontologies, typified by the original Web page subject classification system used by Yahoo!, in his widely-read "Ontology is Overrated: Categories, Links, and Tags" essay. See "Problems Uploading a Google Base Custom Item Type from a TSV File" for more information on taxonomies for use with Google Base items.

The concept of an open database where online users can define their own schema/metadata is interesting, to say the least. Google Base undoubtedly will be of more interest to developers than Fremont, especially if Google releases a Google Base API. Stay tuned for future posts on customizing Google Base Item Type categories and their attribute name/value pairs.

UIs for Open and Closed Databases
Google Base begain life as a general-purpose online database that's intended to serve as a backend to a multitude of Google—and, potentially, third-party—front ends. Thus classified advertisers and buyers aren't likely to be forced to deal with Google Base's current arcane data entry UI.

The Kelsey Group's Greg Sterling comments in his "Google Base Part III" post of November 17, 2005:
People have also been saying how Google Base is a limited or comparatively poor user experience (I made some earlier comments along these lines).

After chatting with Google about it, I think the company is not that focused on the Google Base user experience (although there undoubtedly will be refinements). Google isn’t going to create hundreds of competitive vertical experiences around the data it collects.
Forrester Research's Charlene Li compares Fremont's UI with that of Google Base and Craigslist in her November 30, 2005 "Why Microsoft’s classifieds service will be better than Google Base" post. Charlene is only one of two bloggers that I've encountered who've been given a Fremont preview. A better UI comparison might be Fremont and Froogle Local, which appears to be the first Google front end for Google Base. Charlene's earlier "Google Base goes live - it's more than just about classifieds" post outlines her view of the future for "Base-enabled applications."

Rashmi Sinha, whose background includes "Human Categorization - or how people make category judgments," offers a more thorough analysis of the Google Base UI in her "The blooming of information architecture at Google: A close look at facets, tags & categories in Google Base" post. Rashmi takes justifiable exception to Google Base's use of tags (called Names) as keywords for filtering items.

Other Trade Press and Blog Coverage: ZDnet's Garrett Rogers is late to the party with his December 1, 2005 "Google races to integrate users with services" Google blog item, as is the Associated Press with this brief "Microsoft develops classified service" article found in USA Today. ComputerWorld's IT BlogWatch for November 30, 2005, "Competition for GoogleBase," offers links to blogs reporting on Fremont, including this post. (The author, Richi Jennings, who lives in Berkshire, UK, isn't directly related to me, although he and I might have many common but distant relatives in and around Red Ruth, Cornwall. Richi recently uncovered an eBay phishing scam described in InfoWorld's December 5, 2005 "Ebay tricked by phony e-mail" article.)

Ben Charney wrote "Microsoft Testing Its Own 'Google Base'" for PC Magazine on November 29. 2005. This article version sheds no new light whatsoever on the Fremont beta. However, the "full story" on eWeek says, "Microsoft plans the first public test sometime later this month, according to a Microsoft spokeswoman." Obviously "this month" should read "next month," as there was only one day left in November.

Mary Jo Foley chimes in with a superficial "Microsoft 'Fremont': Redmond's GoogleBase Killer" piece. Seattle Post-Intelligencer reporter Todd Bishop joins the chorus with a "Microsoft tries classifieds" article. Microsoft Monitor blogger Joe Wilcox takes both Microsoft and Google to task for using betas as "soft launches" for new or upgraded applications in "Even More Beta Blues."

--rj

Technorati:

Sunday, November 27, 2005

Bulk-Uploaded Items Disappear from Google Base

If the lack of real-time addition of your items to Google Base doesn't discourage you from using the beta version of this new service, perhaps delayed disappearance of items that you bulk-upload to Google Base beta 1 will.

You might also be put off by spurious indication of bulk-upload failures and no indication from the Google gods that they don't like your uploaded content. Google Groups' Google Base Help Discussion group has several active threads from beta testers who've lost the entries they uploaded (click here and here for examples.)

Since the original Atom 0.3 XML upload I documented in my "Google Base and Atom 0.3 Bulk Uploads" post, all subsequent uploads of the News and Articles or Reference articles change from Active Items to Inactive Items when published. These uploads were made with second and third Google accounts associated with my sbcglobal.net (DSL) e-mail addresses rather than my usual compuserve.com (dialup) address. Publishing usually takes several hours; I often wait overnight before reviewing item status.

Note: Brian Smith of Comparison Engines reports that Froogle users are having similar problems. His "Traveling Today, Google Base - Is Google Really Ready for User Submitted Information?" post is dated November 1, 2005, so it's not likely that the relationship between Froogle and Google Base reported by the New York Times' John Markoff and Michael Barbaro in "Google's Shopping Service to List User's Local Stores" is responsible for the problem.

My first two bulk uploads of the same transformed Atom 0.3 XML Atom03Dest.xml file to the second and third Google accounts succeeded. Occasionally, uploads will indicate a Failed status (0 Items, 0 Errors) as a result of "Bad data." Despite the reported errors, the \fdbd page displays Active Items (50) Inactive Items (0) as shown here:


Here's a capture of the Details page for the preceding upload:


The "Bad data" error message is less than informative. There is no valid reason for encountering bad data from a known-good Atom 0.3 XML file, which tested OK in a subsequent bulk upload as shown here:


Here are the first few of the 50 unpublished but Active items:



Label attribute values appear as expected for the News and Articles item type in edit and preview modes.

About 1.5 hours after I uploaded the Modified items, Google Base marked the added items with Published status:


Rechecking the fdbd page the next morning showed that all the above 50 entries had been marked inactive:


What's more, the same 50 entries added to the products list that I bulk-uploaded with a text-separated values file to an account with 69 items also became Inactive:



Attempting to open any Inactive Items list by clicking the Inactive Items(50) link displays the following error page:



Although a message indicates that Inactive Items will be removed in the future, this did not occur after more than 72 hours. The error message when attempting to display inactive items prevents removing them to enable re-uploading the original or a modified file.

There were no e-mail or on-line messages from Google indicating the reason for inactivating these bulk-uploaded items. If the reason for inactivation is item duplication, users should be so advised.

Eliminating Item Duplication with Code
I concluded that Google Base uses a simple hash function or cryptographic hash function to create a unique n-byte field value for each item. After publication, it appears that Google tests items for duplication by comparing hash values with previously posted items. However Google doesn't publish what value(s) they include in the hash, nor whether duplication tests apply to inactive items.
I added code to my Atom.xml file transformation application to make minor changes to the Atom 0.3 dateTime element values (issued, modified, and created), as well as to the text of the description attribute (adding [Modified for Google Base bulk upload on 2005-11-29T07:48:51-08:00] to the end of the context element value with the system date/time to assure uniqueness.
Click here to display the 50 News and Articles items uploaded to the Oakleaf_Systems alias.

Interim Conclusion
Altering item content appears to prevent inactivation of subsequent uploads of otherwise duplicate News and Articles and Reference Articles items.
Arbitrarily and silently inactivating users' bulk-uploaded items that comply with Google Base's Editorial Guidelines and Program Policies indicates to me that Google Base beta 1 isn't even close to useful for its advertised applications. (I don't believe that item duplication violates the Editorial Guidelines' "No Repetition: Avoid gimmicky repitition" rule. I can find no restriction on duplicate items in Google Base's terms of service. My tests indicate the Google even includes inactive items in tests for duplication.

However, Google Base doesn't enforce its Program Policies' "Affiliates: Posting is not permitted for the promotion of affiliate sites or products sold through an affiliate marketing relationship." As an example, operators of "The Mall, Online Store" at http://www.free-poker-tips.org/ have posted 233,348 referral links (as of November 29, 2005) to Amazon.com items.

Failure to advise users of the reason for rejecting bulk-uploaded content demonstrates utter contempt for ordinary users. Dan Gillmor decries Google's increasing hubris in his recent Financial Times article.

"Ordinary users" are those who haven't cut the special deals apparently reserved for retail chains—such as CircuitCity, CompUSA, KMart, OfficeDepot, RadioShack, Staples, Target, and Walgreens—or affiliate spammers that bulk upload hundreds of thousands or more items for Froogle Local links.

Question: Why is BestBuy missing from a Froogle Local stores list that includes CircuitCity, CompUSA, RadioShack and Target? Most early trade press articles and blog entries on Froogle Local include BestBuy as major player in the Froogle Local beta.

Probable answer: BestBuy isn't a client of ShopLocal LLC, which supplies the data for products carried by each store of a retail chain. All other stores listed at the bottom of Froogle Local pages are ShopLocal clients.

--rj

Note: As of November 29, 2005, there were 13,745,669 Products items in Google Base. News and Articles reported 8,078 items and there were 32,801 Reference Articles.

Technorati:

Wednesday, November 23, 2005

Jim Gray Podcast About SQL Server 2005 (and Later)

The Aussie SQL Down Under site features frequent podcasts about new features of SQL Server 2005. Their latest (November 22, 2005) one-hour production features Microsoft Research's Jim Gray discussing the future of SQL Server, LINQ, and T-SQL, among a host of other SQL Server 2005 topics. To make access to particular topics easier, I've logged the WMA version of the podcast with brief descriptions of most major topics: 00:00 - Introduction, Jim Gray's CV, and how he came to Microsoft Research. 05:20 - Why it took five years to relase SQL Server 2005. "Database systems have become ecosystems in which you have the traditional tabular data store, an XML store, data mining, cubes, an extract-transfor-load service, a whole security model, management [applications], and self-tuning [features]." 07:00 - Unification of SQL Server and programming languages. The SQL Server team expected to ship V.Next in 2003, but underestimated the effort required to unify SQL Server and the .NET Framework. It was a very painful experience. 09:30 - Issues with feature currency and large development teams. The currency inside SQL Server is a dataset or a Tabular Data Stream, although we're gradually moving away from TDS toward the Web services model. The T-SQL command-in/dataset out model is today's key to unifying access to relational data, text, and XML. 11:15 - Release frequency. Annual releases are very destabilizing but less frequen releases result in huge changes instead of "lots of little ones." 13:45 - CLR, LINQ, and "T-SQL is dead" - "FORTRAN isn't dead." Any CLR projgram has T-SQL at its root. T-SQL is loosely typed and late-bound, so it's very easy to write. 17:05 - Looseness of T-SQL typing. LINQ is wonderful, but it's compiled and its data definitions are static. 19:30 - DB2 and Oracle are much more strict about data typing. T-SQL uses data-type coercion. Jim mentions an ANSI flag to prevent coercion (but I'm not aware such a flag exists.). 20:45 - LINQ. "I'm wildly enthusiastic about LINQ." Microsoft isn't very good at supporting embedded SQL because of type conflicts between T-SQL and programming languages. LINQ treats tables as a class; rows as objects. Tables are enumerable; you can do a For Each on a table or answer to a query. Tables are collections, so cursors go away. "The syntax is a little screwy to make IntelliSense work." 25:00 - What's the story on DLinq and XLinq? Both will become extremely important to folks who like to program in VB and C#. "It's one of the things that might attract you away from T-SQL because it really [offers] early-binding. The amount of gunk you need to write for ADO.NET to get the null program to work is just disgusting." The big selling point for LINQ is it's so easy to get started. 26:45 - CLR types vs. SQL types. "It may seem like a mismatch, but I declare everything to be a SQL type and everything works out great. ... A friend wrote the 'Null Memo,' which was an impassioned plea that we get rid of null values, but we had a group of theory guys who loved three-state logic. We're stuck with nulls." 29:00 - Object purists want to treat the database as a repository for objects. "Just put my objects in the database." The result is a fat table with many sparse columns, which pivots to a skinny table with three columns. 31:45 - Inheritence in LINQ. A LINQ table is a minimalist class that doesn't support much inheritence. The specification is mute about how interitence works in the LINQ model. One way inheritence would work is with "a universal relationship at the bottom." 34.00 - Inheritence in T-SQL. T-SQL doesn't have a class concept at all. It's so loosely typed that its only classes are tables, but you can't pass tables as parameters. "T-SQL is a great scripting language, but it's never going to be as clean as C#, ever, period." 35:15 - Break 37:00 - TerraServer, SkyServer, and spatial indexing in SQL Server 2005. 44:45 - Where are spatial applications heading? Billions of cellphones means that location services are central to future applications. Going beyond four dimensions (latitude, longitude, altitude, and time) is difficult. 48:10 - Very large databases. VLDBs are in our future and most will be spatially oriented. The goal is to tell users about things that are nearby. 49:40 - Evolution of SQL Server. "I came to Microsoft to scale up SQL Server. We've done a reasonable job of scaling up and scaling down, but we haven't done a good job of scaling out to self-organizing arrays of SQL Server instances. Over the next five years, we'll deliver on scale-out; what Oracle calls 'rack'. We're getting beat-up pretty badly about that, because it's the one thing we don't do." 52:00 - SQL Server parity with DB2 and Oracle. "We made a decision not to chase DB2 or Oracle tailpipes. Instead, we made SQL Server solve the next generation rather than the last generation problems. So we added data mining, automanagement, XML support, and a bunch of things we think are forward-looking." 52:20 - Limited resources caused a few things to slide. "In the next five years you'll see many things that were thrown out of the lifeboat just before SQL Server 2005 shipped: WinFS, LINQ, better integration with Visual Studio, more data mining rules, deeper XML support, and more Web services." 54:00 - Scaleout will have a major Web services component story. "Having Web services and Service Broker built into SQL Server means that you don't need IIS any more." 55:15 - What's coming up in Jim Gray's world? "We are working very hard to get scientific literature as well as scientific data online. PubMed Central is run by the National Library of Medicine (NLM) on SQL Server and has the abstracts—mostly in XMLish format—of all of the [NLM's] medical literature. The U.S. Congress has mandated that any research that the National Institutes of Healt (NIH) sponsors be deposited with the NLM and be published within six months of its publication in a journal. This is called taxpayer access, so if you get some exotic disease, you can go to the NLM and see the research that your tax dollars paid for, instead of paying $50 to get a copy of it. We've made a portable version of PubMed that's been installed in the U.K., Italy, and South Africa, and will be installed in Japan and elsewhere. The copies federate with one another using Web services. When a document is deposited in one place, it goes to all the other places. PubMed is a poster child for XML Web services." 59:03 - End Technorati:

Tuesday, November 22, 2005

Forum Shopping for Office Document Standards Bodies?

Microsoft announced on November 21, 2005 that the Microsoft Office Open XML formats—to be implemented next year by the Office 12 version of Word, Excel, and PowerPoint—would be submitted to the European Computer Manufacturers Association (ECMAInternational) and ultimately to the International Standards Organization (ISO, a.k.a. International Organization for Standardization). Backers of Microsoft's ECMA sumission are Apple, Barclays Capital, the British Library, BP, Essilor (France), Intel, NextPage, StatOil ASA and Toshiba. The story quickly rose to the top at Memeorandum. Update: November 30, 2005. For more details on the Office Open XML formats and their submission to ECMA for standardization, see Brian Jones' Office XML Formats blog and Robert Scoble's e-mail interview with Jean Paoli, a co-editor of the W3C XML 1.0 specification and the driving force behind Microsoft's InfoPath application. Jupiter Research's Joe Wilcox sheds light on Microsoft's choice of ECMAInternational as the standards body.

Update December 9, 2005: eWeek's Peter Galli reports that "Microsoft's Office Standard Gets Green Light" from ECMA "[a]t the General Assembly meeting held in Nice on December 8, 2005."

Update December 13, 2005: Microsoft published "Ecma International Standardization of OpenXML File Formats Frequently Asked Questions" and ZDNet's David Berlind "Microsoft releases FAQ on Ecma submission" article comments on the FAQ.

OpenOffice.org and its cohorts—Adobe, Corel, IBM, KDE, and Sun Microsystems—chose the Organization for the Advancement of Information Standards (OASIS) as the initial standards body for the OASIS Open Document Format for Office Applications (OpenDocument format or ODF). OASIS submitted the ODF standard to the ISO International Electrotechnical Commission's Joint Technical Committee (ISO/IEC JTC1) on September 30, 2005. Microsoft has had a long-term relationship with ECMA, having participated in the ECMAScript (JavaScript) standardization process and later submitted .NET's C# language and the Common Language Infrastructure (CLI) for ECMA and ISO/IEC standardization. Sun Microsystems, the commercial force behind OpenOffice.org, has favored OASIS since the origins of the Electronic Business XML (ebXML) and Universal Business Language (UBL) standardization process. OASIS's aegis extends to many Web services standards, such as UDDI 2.0 and 3.0.2, WS-Security, and Web Services Distributed Management (WSDM). Microsoft refused to support ebXML and UBL but was a very active participant—together with IBM—in the development of UDDI the and WS-Security specifications, plus the OASIS standards processes. Microsoft's choice of ECMA, which has no history in the document standards business, appears to me to be an example of standards-body "forum shopping." Forum Shopping Defined Wikipedia defines "forum shopping" as the "the practice adopted by some plaintiffs to get their legal case heard in the court thought most likely to provide a favourable judgment, or by some defendants who seek to have the case moved to a different court." For example, it's a common practice for individual and class-action plaintiffs to attempt try product liability cases in southeastern Texas state courts because these courts have a history of making unusually large awards to plaintiffs. Similarly defendandants in employee non-competition actions prefer California state courts, because California state law generaly favors individual rights and disdains corporate non-compete clauses. An example is the recent action filed in Washington state court by Microsoft against Google and Kai-Fu Lee, a former vice president of Microsoft's Interactive Services Division. Google, in turn, filed a motion in the California state court to throw out Microsoft's Washinton complaint. At present, the Washington action is scheduled for trial in early 2006. Note: The same CNet News.com article also describes Microsoft's non-compete action against Adam Bosworth and Tod Neilsen, who eventually ended up at Google and Borland, respectively. Click here for more on Bosworth at Google. Shopping for Standards Bodies that Support Your View of Intellectual Property Rights Primary candidates for "open" XML document format standards bodies are the W3C, OASIS, ECMA, and, possibly, IETF. (IETF is the standards body for the Atom 1.0 XML syndication format.) There is, however, considerable controversy on what constitutes an "open standard." Microsoft patent attorney Nicos L. Tslilas contends in his recent "The Threat to Innovation, Interoperability, and Government Procurement Options From Recently Proposed Definitions of 'Open Standards'" paper that standards with reasonable and non-discriminitory (RAND) patent licenses qualify as "open standards." Note: "Government Procurement Options" in the paper's title obviously refers to the Commonwealth of Massachusetts' decision to restrict state purchase of office productivity applications to those that support ODF. Following are links to intellectual property rights policies of the preceding four standards bodies, plus ISO/IEC:

Note: GTW Associates' "Intellectual Property Rights Policies of Selected Standards Developers" page has links to most recognized standards-setting organizations, except IETF. ECMA, ISO/IEC, and OASIS fully support RAND patent licensing; OASIS also defines a royalty-free (RF) licensing mode.

W3C "seeks to issue Recommendations that can be implemented on a Royalty-Free (RF) basis." However, section 7.5.3 of W3C Patent Policy, "Alternative Licensing Terms," permits a Patent Advisory Group (PAG) to "propose that specifically identified patented technology be included in the Recommendation even though such claims are not available according to the W3C RF licensing requirements of this policy."

Here's the IETF's IPR policy:

In general, IETF working groups prefer technologies with no known IPR claims or, for technologies with claims against them, an offer of royalty-free licensing. But IETF working groups have the discretion to adopt technology with a commitment of fair and non-discriminatory terms, or even with no licensing commitment, if they feel that this technology is superior enough to alternatives with fewer IPR claims or free licensing to outweigh the potential cost of the licenses.

Note: The IETF would be a logical candidate for standardizing Microsoft's recently announced Simple Sharing Extensions for RSS and OPML (SSE), but SSE's close relationship with RSS 2.0 appears to have caused them to license copyrights in the spec under the Creative Commons Attribution-ShareAlike License (version 2.5). RSS 2.0 uses this license. So far, there's no indication that Microsoft intends to extend SSE to Atom 0.3.

Thus ECMA and OASIS became the finalists in the shopping list, but Sun pre-empted OASIS with ODF. Only ECMA remains to Microsoft as an unabashed champion of RAND licensing. However, Microsoft's initial offer of RF-mode licenses for the Office 2003 XML schema and subsequent change to a "covenant not to sue" moots the RAND-mode licensing issue for all forums.

ECMAInternational is an European organization, so the European Union's xenophobic bureaucrats might consider ECMA preferable to US-based OASIS as the standards body for the millions (or billions) of mostly superfluous Microsoft Office documents the EU produces per year.

An advantage of ECMA appears to be the speed at which standards emerge from TCs. For example, Microsoft, Hewlett-Packard, and Intel submitted the C# and CLI specifications to ECMA on October 31, 2000 and ECMA ratified the two standards on December 14, 2001. The ECMA standard's gestation period was 1-1/8 years.

In contrast, Arbortext, Boeing, Corel, CSW Informatics, Drake Certivo, National Archive of Australia, New York State Office of the Attorney General, Society of Biblical Literature, Sony, Stellent and Sun Microsystems founded the OASIS Open Office XML Format TC in December 2002. The first TC draft was approved in March 2003, the second in December 2004, and the third in March 2005. OpenDocument was approved as an OASIS Standard in May 2005, almost 2-1/2 years after formation of the TC and more than twice as long as the ECMA process.

Will Two Similar XML Document Standards Emerge?

The real issue—as I see it—is: How will ISO/IEC-JTC1 react when the ECMA working group submits in 2006 an almost-identical (or at least very similar) set of XML document standards as OASIS's 2005 submittal of ODF? Will JTC1 require the competing ECMA and OASIS "standards" to be rationalized into a single ISO/IEC standard?

Sun Microsystems' Tim Bray questions the need for two XML document standards in his recent "Thought Experiments" post (updated November 27, 2005.) His distaste for the Office Open XML format undoubtedly derives from his employer's position as the promulgator of OpenOffice and ODF. This conclusion is supported by Bray's position as co-chair of the IETF Atom Working Group. As Microsoft's Dare Obasanjo points out in his "Tim Bray's Hypocrisy and Competing XML Formats" post:

I find it extremely ironic that one of the driving forces behind creating a redundant and duplicative XML format for website syndication would be one of the first to claim that we only need one XML format to solve any problem. For those who aren't in the know, Tim Bray is one of the chairs of the Atom Working Group in the IETF whose primary goal is to create a competing format to RSS 2.0 which does basically the same thing. In fact Tim Bray has written a decent number of posts attempting to explain why we need multiple XML formats for syndicating blog posts, news and enclosures on the Web.

Update December 9, 2005: ZDNet's David Berlind analyzes the ECMA standards path and the issue of pontentially duplicate (or almost-duplicate) ISO stanards in his extended "Microsoft standard proposal turns spotlight to Ecma's process" post. Open Standards vs. Open Source

Open Source advocates insist that standards bodies adapt their Intellectual Property Rights (IPR) policies to accommodate the GNU General Public License (GPL). David Berlind, one of ZDNet's open-source bloggers, raises the issue of what qualifies to him as an "open standard" in his "Public discourse under way over definition of 'open'", "Apache falls victim to OASIS patent shelter," and "Open source: Are Microsoft and other holdouts about to crack?" posts. Joe Wilcox weighs in again with "Assessing the Fallout," an analysis that emphasizes openness and Microsoft's moves to displace Adobe products with equivalents or act/look-alikes engineered in Redmond. Dennis Hamilton sheds additional light on the issue, including copyright ownership, in his November 30, 2005 "Open Standards are not Open Source" post.

Note: I'll believe that IBM is a legitimate "open source" and "open standards" proponent when they open-source their current version of DB2, and other commercial applications and operating systems under royalty-free, fully sublicensable terms.

The degree of the openness of the "open standards" process is difficult to resolve. As an example, the OASIS ODF Technical Committee (TC) has 14 members. Three members are Sun employees, three work for IBM, and one each are employed by Adobe Systems, Intel, and OASIS. Three are listed as individuals: Patrick Durusau is Director of Research and Development at the Society of Biblical Literature; Gary Edwards is principal of Open Business Stack Systems; and David Faure is the maintainer of the KWord and KOffice libraries.

As an example of ECMA TC membership, the TC39 - TG2 - C# technical group has the following 14 nominated representatives: BEA Systems, Borland, COSC of the University of Canterbury, HP, Hitachi, IBM, Indiana University, Intel, IT University of Copenhagen, Macromedia, Mainsoft, Microsoft, Novell, and Plum Hall.

Presumably, the 11 backing organizations listed in the press release will join the future ECMA TC/TG. However, only the 18 Ordinary Members of ECMA have a right to vote. Of the backing organizations, only Microsoft, Intel, and Toshiba are Ordinary Members. How membership status affects an individual organization's right to insist on modifications to a proposed standard isn't clear from ECMA's Web site.

Update November 29, 2005: David Berlind's semi-autobiographical peaen to openness in standards, "It's not about OpenDocument vs MS. It's about open standards," didn't identify the employment of members of the OASIS ODF TC nor mention the control that Sun and IBM exerts over ODF through the TC.

Microsoft's Covenent Not To Sue

Traditionally, Microsoft has favored RAND-mode patent licensing but has granted RF licenses for some IP, such as the Office 2003 XML schemas. The terms of these licensing modes require application developers and users to sign a written contract. Brian Jones says, regarding changes to Microsoft's licensing approach:

[I]n order to clear up any other uncertainties related to how and where you can use our formats, we are moving away from our royalty free license, and instead we are going to provide a very simple and general statement that we make an irrevocable commitment not to sue. I'm not a lawyer, but from what I can see, this "covenant not to sue" looks like it should clear the way for GPL development which was a concern for some folks.

Robert Scoble asked Jean Paoli, "Do I need to sign, or agree to, any licensing agreements to use the formats?" Paoli responded:

No, for the specifications and in our work with Ecma International, we are offering a broad “covenant not to sue” to anyone who uses our formats. This is a new approach that continues our open and royalty-free approach. We think it will be broadly appealing to developers, including most open source developers. ([B]y the way you did not have to sign anything even before this announcement.)

Whether a covenant not to sue does "clear up any other uncertainties," which appear to be the former requirements to apply for and receive a royalty-free license, remains to be seen. Brian added a November 22, 2005 post that explains the covenant not to sue in greater detail. The post includes a link to the "Microsoft Covenant Regarding Office 2003 XML Reference Schemas," which Brian advises will apply to the Office 12 schemas when they're released. Attorney Andy Updegrove's "Microsoft's Format Covenant Fails Comparison Test with Sun's" post purports to compare Microsoft's covenant for Office XML documents with Sun's "Sun OpenDocument Patent Statement" for the ODF published by OASIS. As Microsoft did for the Office 2003 XML reference schemas, (Sun had earlier (in 2002) offered an reciprocal-RF-format IP license for the OpenOffice.org XML File Format Specification.) eWeek's Steven J. Vaughan-Nichols expands on Updegrove's analysis to castigate Microsoft (as usual) in his November 23, 2005 "Legal Analyst Sees Defects in Microsoft Open XML Initiative" piece for the magazine's Linux & Open Source department.

David Berlind's 11/28/2005 ZDNet blog post, "Top open source lawyer blesses new terms on Microsoft's XML file format," says that Larry Rosen, the Open Source Initiative's (OSI) first secretary and general secretary has endorsed Microsoft's "covenant not to sue" approach. The blog item contains the full text of Rosen's statement. ZDNet's Martin Lamonica reports in his "Mass. warms to Microsoft Office standard" article:, "The state is 'optimistic' that Microsoft's Office Open XML document formats will meet the standard for an 'open format' set by Massachusetts, according to a statement issued Wednesday by Gov. Mitt Romney." CMP TechWeb ran a similar story by W. David Gardner, "Mass. Flips, Sides With Microsoft," on the same date.

Judging from most of the 50+ comments to the Scobelizer post and the 450 or so on SlashDot, few—if any—free-lance Open Source proponents would accept any free license or assertion of no license required from Microsoft for the Office Open XML format, no matter who endorses them. The issue for these hard-core folks won't be settled until Microsoft open-sources Office and Windows. eWeek's David Coursey takes up this issue in his "Bill Gates Is Not the Next Linus Torvalds" op-ed article.

More from the trade press on 11/22/2005: Ingrid Marson, ZDNet UK: "Microsoft's standardization move divides experts"; David Berlind, ZDNet's Between the Lines blogger: "Microsoft ECMA/ISO move could give Office formats new lease on life"; John Carroll's ZDNet blog (Carroll is a programmer and Microsoft employee): "Office XML as ECMA and ISO Standard?"; Paul Murphy's ZDNet blog: "OASIS? XML? Permanence?"; Steven Vaughan-Nichols and Mary Jane Foley, eWeek: "From the Outside Looking In: Analysts, Developers on Microsoft, Open Standards"; Steven Vaughan-Nichols, eWeek: "Liar, Liar, Pants on Fire: Microsoft and Open Standards"; David Coursey, eWeek: "How Open Can Microsoft's Formats Be?"; Peter Galli, eWeek: "Microsoft Opens Office File Formats"; Elizabeth Montalbano and Simon Taylor, Computerworld: "Update: Microsoft to open Office document format". Some ZDNet pieces indicated that the reporter(s) hadn't checked all resources available on 11/21/2005. Entries from InfoWorld and InformationWeek were (surprisingly) missing in action as of 3:00 p.m. PST. --rj Disclaimer: I am not an attorney and this post does not purport to offer legal advice.

Technorati:

Friday, November 18, 2005

Recent Articles - ReportViewer, SQL Server 2005, ADO.NET 2.0, and LINQ

Here are links to some of my recent articles and an interview about SQL Server 2005 for Visual Studio Magazine and the .NETInsight newsletter: Build Client-Side Reports Easily VS 2005’s new ReportViewer control and its built-in Report Designer enable Smart Client and Web page designers to lay out, format, embed, export, and print interactive reports without running an SQL Server Report Server. Technorati: Interview by Visual Studio Magazine editor-in-chief, Patrick Meader, about SQL Server 2005

Roger Jennings discusses the new data features that made it into Visual Studio and SQL Server 2005—and a couple features that were dropped late in the process.

Technorati: Streamline Mapping With Orcas and LINQ Use a pair of LINQ Technology Preview add-ins that integrate with .NET 2.0 and VS 2005 to take a look into the future of VS and data. Technorati: Manage Data With VS 2005 Visual Studio 2005's new visual data tools and data-bound controls, together with ADO.NET 2.0 data sources, simplify creating scalable, data-intensive Smart Client and Web applications. Technorati:

Thursday, November 17, 2005

Google Base and Atom 0.3 Bulk Uploads

Adding Web pages or blog posts as News or Articles or Reference Articles item types to Google Base is problematic for content owners. Google's draconian Terms of Service for the content you upload gives Google carte blanche to "reproduce, modify, adapt, publish, and otherwise use, with or without attribution such Content," as well as "to use your trademarks, service marks, trade names, proprietary logos, domain names and any other source or business identifiers."

I'm certainly not enthusiastic about the Google folks modifying or adapting and then publishing my content without attribution. ZDNet's Garrett Rogers discusses these and related issues in his November 16, 2005 post, "Google Base: Preparing for the worst?."

Note: If you're searching on "Google Base," you're likely to see references to "All your base are belong to us," an idiosyncratic Internet message (apparently in pidgin, akin to "him belong me") that's explained in this lengthy Wikipedia entry. It's an interesting sidelight that Al Queda is "the base" in Arabic. The Wiki entry also mentions common derivatives, such as "all your data are belong to us," which is more germane to the Terms of Service issue.

Despite my misgivings about Google's Terms of Service, I decided to invest a few hours of Visual Studio 2005 programming time to clean up Blogger's Atom 0.3 XML file for this site, add some optional tags and attributes, and publish it to Google Base. I'll provide details on the VB 2005 code I used to manipulated the Atom.xml XmlDomDocument object in a future post. As I mentioned in the "Initial Conclusions" section of my earlier "Google Base and Bulk Uploads with Microsoft Access" post:

Initial tests with an Atom 0.3 (Atom.xml) file generated by Blogger for the OakLeafBlog, saved as a local XML file with FireFox 1.5 RC2, and Bulk Uploaded as the Reference Articles item type showed several problems. The description attribute contains HTML markup and error messaages state that the value is limited to a maximum of 10,000 characters. Thus only the shorter OakLeafBlog articles publish to the list; HTML markup contributes substantially to description length. Help Center's "What do I include in 'Description'?" topic says "Please ensure that the description does not contain any HTML as we don't currently recognize or display HTML tags in your item." Help Center also says the maximum description length is 1,000 characters.
I was surprised by the inconsistencies between the help topics and the result of an initial test with a moderate-size Atom.xml document from a Google application (Blogger). So I temporarily increased the size of the main page to include all OakLeafBlog posts (50 as of this post), which would permit more complete tests and let me evaluate issues that relate to creating Google Base-enabled XML files.

Note: Atom 1.0 is the current version of the Atom specification, and Tim Bray says it's awaiting an IETF RFC number as of mid-November 2005. Blogger continues to use the outdated Atom 0.3 spec for syndication, as does the Google Base - Atom 0.3 Specification page. Venture capitalist Bill Burnham says in his "RSS and Google Base: Google Feeds Off The Web" post, "Google intends to build the world's largest RSS 'reader' which in turn will become the world's largest XML database." Alan Wood at Folknology refers to Adam Bosworth's MySQL presentation and provides an Atom-oriented analysis. Most readers treat RSS and Atom feeds similarly.

The ultimate objective of this exercise is to determine whether any benefits accrue to Web site publishers—or, for this example, bloggers—by publishing copies of linked content on Google Base. Much of the initial Googe Base content—such as real-estate listings—consists of links to existing Web pages. Presumably, Google will have spidered the source site's pages previously. Technorati's Niall Kennedy posits:
Why should you go to the trouble of submitting your information to Google Base? You will be completely sure that Google has all your latest content complete with the appropriate link back to your site. Feeding the content directly to Google may help your posts place better in Google search results.
Whether posts uploaded to Google Base gain precedence in Google search results remains to be seen. Mine haven't so far.

Update 11/22/2005: John Markoff and Michael Barbaro of the New York Times report that Google Base now has available the ability to provide a local version of Google's Froogle shopping service. There's no announcement of the new feature in the Google Blog; the Google Base Blog still wants a username and password for access. InfoWorld's Jon Udell posts "Dueling simplicities," which analyzes the potential relationships between Microsoft's proposed Simple Sharing Extensions for RSS and OPML (SSE) specification, Adam Bosworth's "Learning from the Web" presentation, and Google Base. This post follows "The two-way data web" article that was written before&mdash'but published after—the release of the SSE specification.

Update 11/30/2005: The "official" Google Base blog, http://googlebase.blogspot.com/, added an entry with tips on bulk-uploading items.

Completing Your Personal Profile
If you have or create a Google account, which you need for most Google applications, you'll probably find it worthwhile to add the additional default attribute values that apply to Google Base only. See this section in the preceding "Google Base and Bulk Uploads with Microsoft Access" post for details.

Creating the Raw XML Bulk Upload File
The http://oakleafblog.blogspot.com/atom.xml document contains data for 50 posts (<entry> groups) in a 498-KB file for an average of about 10,000 characters per <entry>. FireFox 1.5 RC2 displays the HTML tags in the <content> elements, as shown here, which transform to Google Base description attribute values:
FireFox 1.5's View Page Source command displays the Atom 0.3 source code and enables saving the Atom 0.3 source code to a physical file, which is required for bulk XML file uploads:


The stylesheet employed by Internet Explorer 5+ strips the HTML markup from the XML document's content element but won't display or enable saving the unformatted <content> value locally, as shown here:


Thus, you'll need to substitute FireFox for IE to generate and save a file—OakLeafBlogAtom.xml for this example—for the Bulk Upload operation. (Only FireFox 1.5 RC2 and RC3 have been tested to date.)

Uploading the Atom 0.3 XML File as a Reference Articles Item Type
The Specify a Bulk Upload page's Choose an Existing Type list doesn't offer the News and Articles Item Type, which would be more appropriate for a list of blog posts. (News and Articles and Wanted Item Types appear in the Choose an Existing Item Type list on the Post an Item page for ad hoc items.) News and Articles supports the following standard attributes, in addition to title and description: author, expiration_date, label, news_source, pages, and publish_date. (It's unfortunate that Google didn't adopt standardized metadata terms, such as those of the Dublin Core Metadata Intiative—DMCI.)

Note: Niall Kennedy's "Google Base blog import instructions" post describes for Movable Type or TypePad Pro users how to output your last n blog posts to an Atom.xml file with his Movable Type Google Base template.

Thus, you're stuck with Reference Articles, which doesn't include several attributes that would be useful for qualifying searches. Reference Articles (presumably included in the "Research Studies and Publications - scholarly literature" Informatoin Type) appear to be limited to author, expiration_date, label, pages, publication_name, publication_volume, and publish_date. (Where is publication_number?) However, you can use the Google Base Provider Namespace to define your own custom attribute taxonomy in the Atom 0.3 document.

Update 11/25/2005: You're no longer stuck with Reference Articles as the Item Type for Blogger Atom 0.3 feeds. The Bulk Upload page's Choose an Existing Type list now includes News and Articles and Wanted Ads Item Types. Google also added Blogs, Coupons, Rentals, and Comic Books as standard search categories to the default home page. Rapid ad hoc changes like this demonstrate another advantage of Web-based services.

The process for uploading an Atom 0.3 XML file is similar to that for uploading a tab-separated value text file to create a list of the Products Item Type:

1. After logging in with your Google account, navigate to the Google Base home page and click the Post Multiple Items with a Bulk Upload File link to open the My Items page.

2. Click the Specify a Bulk Upload File link, type the FileNameAtom.xml file name in the text box, select Reference Articles in the Item Type list, and click Specify Bulk Upload File to open the My Items page.

3. Click Browse, navigate to and double-click the file you saved with FireFox to specify it as the source of the registered FileNameAtom.xml file, as shown here:


4. Click Upload and Processs This File. Wait a few minutes (or hours), and then press F5 to determine the publication status of the file. If you can't stand the wait, click the Active Items link after it displays a count of 1 or more to review unpublished items in the list:

5. Click one of the Edit links to display the item in the standard editing form for the Reference Articles Item Type:



Notice the HTML markup in the Description attribute textarea. This example has a substantially lower proportion of markup characters to content than most OakLeafBlog posts. It would be possible—but certainly tedious—to remove the tags manually and add Details attribute-value pairs and Labels keywords tags.

Viewing the Items as a Google Base User
To emulate a search by an ordinary Google Base user, follow this drill:

1. Sign out of your account, navigate to the Google Base home page, type a unique search term, such as xlinq for OakLeafBlog posts, and click Search Base to display the results. Alternatively, click here.


As expected, clicking the OakLeaf Consulting link or here displays all active items for authorid=1063521.

2. Click one of the titles to open the linked page whose URL appears in green, or click here.

Fixing Feed Errors
The inclusion of HTML markup in the description attribute isn't a problem for ordinary users, because they don't see the attribute value. However, large amounts of markup combined with lengthy content can result in failure to post overlength entry groups. In this case, the My Items page displays an error message:


Note: It might take several hours for the preceding warning to appear. Bulk Updates don't occur in real time.

Clicking the Details link displays this page with error messages:


To overcome this problem, you must edit the content element of overlength entries, remove the HTML tags, test for content length, and then trim the string value if it's more than 10,000 characters.

Serious Bug in g:label Custom Attributes Documentation
Google has created its own taxonomy of Atom 0.3 extensions that's identified by an xmlns:g="http://base.google.com/ns/1.0" namespace attribute added to the the feed element. The Google Base - Atom 0.3 Specification page includes an example of use of this namespace to add several predefined elements—g:image_link, g:expiration_date, g:job_function, g:location; and g:label—to specify non-standard attributes for a specifc Item Type. The example for the <g:label> elements is incorrect. The label item of the Google Base - XML Attributes page has the same error.

Following is an abbreviated version of a Blogger Atom 0.3 test file with the Google Base extension namespace attribute and multiple g:label elements added in accordance with the preceding XML document example and attribute specification. Technorati tag names provide the values of the multiple g:label elements.

<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>
<?xml-stylesheet
href="http://www.blogger.com/styles/atom.css"
type="text/css"?>
<feed xmlns="http://purl.org/atom/ns#"
version="0.3" xml:lang="en-US"
xmlns:g="http://base.google.com/ns/1.0">
<link href="https://www.blogger
  .com/atom/11646261"
rel="service.post" title="OakLeaf Systems"
type="application/atom+xml" />
<link href="https://www.blogger
  .com/atom/11646261"
rel="service.feed" title="OakLeaf Systems"
type="application/atom+xml" />
<title mode="escaped" type="text/html">
OakLeaf Systems
</title>
<tagline mode="escaped" type="text/html">
OakLeaf Systems is a Northern California
software consulting organization specializing
in developing and writing about Microsoft SQL
Server/.NET database and Web services projects.
</tagline>
<link href="http://oakleafblog.blogspot.com"
rel="alternate" title="OakLeaf Systems"
type="text/html" />
<id>tag:blogger.com,1999:blog-
11646261</id>
<modified>2005-11-19T14:32:23Z</modified>
<generator url="http://www.blogger.com/"
version="5.15">
Blogger
</generator>
<info mode="xml" type="text/html">
<div xmlns="http://www.w3.org/1999/xhtml">
  This is an Atom formatted XML site feed.
  It is intended to be viewed in a Newsreader
  or syndicated to another site. Please visit
  Blogger Help for more info.
</div>
</info>
<convertLineBreaks
xmlns="http://www.blogger.com/atom/ns#">
true
</convertLineBreaks>
<entry xmlns="http://purl.org/atom/ns#">
<link href=
  "https://www.blogger.com/atom/
    11646261/113227683042337068"
  rel="service.edit"
  title="Google Base and Atom 0.3 Bulk Uploads"
  type="application/atom+xml" />
<author>
  <name>--rj</name>
</author>
<issued>2005-11-17T16:34:00-08:00</issued>
<modified>2005-11-18T21:48:40Z</modified>
<created>2005-11-18T01:20:30Z</created>
<link
  href="http://oakleafblog.blogspot.com/2005/11/
  google-base-and-atom-03-bulk-uploads.html"
  rel="alternate"
  title="Google Base and Atom 0.3 Bulk Uploads"
  type="text/html" />
<id>tag:blogger.com,1999:blog-11646261.
  post-113227683042337068</id>
<title mode="escaped" type="text/html">
  Google Base and Atom 0.3 Bulk Uploads
</title>
<content mode="escaped" type="text/html"
  xml:base="http://oakleafblog.blogspot.com"
  xml:space="preserve">
  Content with HTML tags removed.
</content>
<draft xmlns="http://purl.org/atom-blog/ns#">
  false
</draft>
<g:label>Databases</g:label>
<g:label>Google Base</g:label>
<g:label>XML</g:label>
<g:label>Atom</g:label>
<g:label>RSS 2.0</g:label>
<g:label>Google</g:label>
</entry>
</feed>

Note: Some line-breaks have been inserted at illegal positions to prevent exceeding the left frame width limit.

Click here for a more readable version of the preceding sample file from Google Groups (in print format).

Uploading the complete 256-KB file as a Reference Article resulted in a Failure status report in the My Items page with a single instance of "Bad data" as the reason for the failure. The Upload page reported 0 Items Processed, 0 Items Succeeded, and 0 Active Items. However, after a few hours (overnight), the Active Items page reported all items had Published status. (The Upload page data didn't change.)

Fixing the g:label Attribute Specification Bug
Opening in the Edit page the few entries that had a single Technorati tag—and thus a single <g:label> element, typically LINQ—showed the tag name in the Label textarea. The text associated with the Label control suggests "Keywords or phrases that describe your item. Maximum of 10. Separate with commas." Based on this hint, I changed the <g:label> elements from:


<g:label>Databases</g:label>
<g:label>Google Base</g:label>
<g:label>XML</g:label>
<g:label>Atom</g:label>
<g:label>RSS 2.0</g:label>
<g:label>Google</g:label>

to:

<g:label>
Databases, GoogleBase, XML, Atom, RSS 2.0, Google
</g:label>
 
This change solved the Failure problems, reported Success as the status, and processed all 50 items, as shown here:
Note: The Google Base - XML Attributes page's image item states that a comma-separated list—such as <g:label> leater, power locks, sunroof, ABS </g:label>—is Not acceptable. (It's doubtful that the list isn't acceptable because of leading or trailing spaces or a missing "h" in "leater").

The fix to the g:label attribute format also fixed the missing Labels entries problem with multiple <g:label> elements, as shown here:


The Label tags appear on the edit page immediately after Google processes the upload, so you don't need to wait for Published status to test your editing application.

Use Labels to Refine Google Base User Searches
When you add Label tags to your entries, users can refine their searches by clicking links that return entries that match all tags for an entry as shown here:


Notice that comma-separated Names (tag) values appear under the Titles.
Click here to open the preceding interactive Google Base page, click the More... link to display all Names combinations, and try the various refinement choices. Click the publisher's moniker—Roger Jennings for this example—or click here to display a list of all items (not just Reference Articles) contributed by the publisher (authorid=1071203).

Conclusion
Google needs to clean up its Atom 0.3 documentation to minimize developers' wild-goose chases. The current (beta) UI undoubtedly will confuse potential users. For example, I would not have known the benefit of adding Name tags to search refinement, if I hadn't written a simple VB.NET 2005 project to clean up the description attribute (<context> element) value and add the Google Base namespace and a <g:name> element in the correct format.

Robert Niles, editor of the USC Annenberg Online Journalism Review, concludes: "Right now, the UI is geared more toward people upload information than those looking for it." BusinessWeek's Rob Hoff thinks folks are "Ganging Up on Google." The Solution Watch blog offers a postive review and links to other detailed Google Base reviews.

The World Resources Institute (WRI) claims to have submitted information to Google Base "on a 5 million-record database on sustainable development for 200 countries over a period of up to a century." However, a search of Google Base on "World Resources Institute" returns only 4,253 items that were entered between November 15, 2005 and November 28, 2005 as Research Studies and Publications Item Type. This Item Type appears to have been replaced by Reference Articles. The status of the remaining 4.996 million (purported) items isn't clear as of December 2, 2005.

Watch for updates to this post as other developers add their content to Google Base and keep an eye on the Google Base Help Discussion group to see what problems users encounter.

Technorati: