Sunday, November 27, 2005

Bulk-Uploaded Items Disappear from Google Base

If the lack of real-time addition of your items to Google Base doesn't discourage you from using the beta version of this new service, perhaps delayed disappearance of items that you bulk-upload to Google Base beta 1 will.

You might also be put off by spurious indication of bulk-upload failures and no indication from the Google gods that they don't like your uploaded content. Google Groups' Google Base Help Discussion group has several active threads from beta testers who've lost the entries they uploaded (click here and here for examples.)

Since the original Atom 0.3 XML upload I documented in my "Google Base and Atom 0.3 Bulk Uploads" post, all subsequent uploads of the News and Articles or Reference articles change from Active Items to Inactive Items when published. These uploads were made with second and third Google accounts associated with my (DSL) e-mail addresses rather than my usual (dialup) address. Publishing usually takes several hours; I often wait overnight before reviewing item status.

Note: Brian Smith of Comparison Engines reports that Froogle users are having similar problems. His "Traveling Today, Google Base - Is Google Really Ready for User Submitted Information?" post is dated November 1, 2005, so it's not likely that the relationship between Froogle and Google Base reported by the New York Times' John Markoff and Michael Barbaro in "Google's Shopping Service to List User's Local Stores" is responsible for the problem.

My first two bulk uploads of the same transformed Atom 0.3 XML Atom03Dest.xml file to the second and third Google accounts succeeded. Occasionally, uploads will indicate a Failed status (0 Items, 0 Errors) as a result of "Bad data." Despite the reported errors, the \fdbd page displays Active Items (50) Inactive Items (0) as shown here:

Here's a capture of the Details page for the preceding upload:

The "Bad data" error message is less than informative. There is no valid reason for encountering bad data from a known-good Atom 0.3 XML file, which tested OK in a subsequent bulk upload as shown here:

Here are the first few of the 50 unpublished but Active items:

Label attribute values appear as expected for the News and Articles item type in edit and preview modes.

About 1.5 hours after I uploaded the Modified items, Google Base marked the added items with Published status:

Rechecking the fdbd page the next morning showed that all the above 50 entries had been marked inactive:

What's more, the same 50 entries added to the products list that I bulk-uploaded with a text-separated values file to an account with 69 items also became Inactive:

Attempting to open any Inactive Items list by clicking the Inactive Items(50) link displays the following error page:

Although a message indicates that Inactive Items will be removed in the future, this did not occur after more than 72 hours. The error message when attempting to display inactive items prevents removing them to enable re-uploading the original or a modified file.

There were no e-mail or on-line messages from Google indicating the reason for inactivating these bulk-uploaded items. If the reason for inactivation is item duplication, users should be so advised.

Eliminating Item Duplication with Code
I concluded that Google Base uses a simple hash function or cryptographic hash function to create a unique n-byte field value for each item. After publication, it appears that Google tests items for duplication by comparing hash values with previously posted items. However Google doesn't publish what value(s) they include in the hash, nor whether duplication tests apply to inactive items.
I added code to my Atom.xml file transformation application to make minor changes to the Atom 0.3 dateTime element values (issued, modified, and created), as well as to the text of the description attribute (adding [Modified for Google Base bulk upload on 2005-11-29T07:48:51-08:00] to the end of the context element value with the system date/time to assure uniqueness.
Click here to display the 50 News and Articles items uploaded to the Oakleaf_Systems alias.

Interim Conclusion
Altering item content appears to prevent inactivation of subsequent uploads of otherwise duplicate News and Articles and Reference Articles items.
Arbitrarily and silently inactivating users' bulk-uploaded items that comply with Google Base's Editorial Guidelines and Program Policies indicates to me that Google Base beta 1 isn't even close to useful for its advertised applications. (I don't believe that item duplication violates the Editorial Guidelines' "No Repetition: Avoid gimmicky repitition" rule. I can find no restriction on duplicate items in Google Base's terms of service. My tests indicate the Google even includes inactive items in tests for duplication.

However, Google Base doesn't enforce its Program Policies' "Affiliates: Posting is not permitted for the promotion of affiliate sites or products sold through an affiliate marketing relationship." As an example, operators of "The Mall, Online Store" at have posted 233,348 referral links (as of November 29, 2005) to items.

Failure to advise users of the reason for rejecting bulk-uploaded content demonstrates utter contempt for ordinary users. Dan Gillmor decries Google's increasing hubris in his recent Financial Times article.

"Ordinary users" are those who haven't cut the special deals apparently reserved for retail chains—such as CircuitCity, CompUSA, KMart, OfficeDepot, RadioShack, Staples, Target, and Walgreens—or affiliate spammers that bulk upload hundreds of thousands or more items for Froogle Local links.

Question: Why is BestBuy missing from a Froogle Local stores list that includes CircuitCity, CompUSA, RadioShack and Target? Most early trade press articles and blog entries on Froogle Local include BestBuy as major player in the Froogle Local beta.

Probable answer: BestBuy isn't a client of ShopLocal LLC, which supplies the data for products carried by each store of a retail chain. All other stores listed at the bottom of Froogle Local pages are ShopLocal clients.


Note: As of November 29, 2005, there were 13,745,669 Products items in Google Base. News and Articles reported 8,078 items and there were 32,801 Reference Articles.