Monday, December 12, 2005

Google Base and Blogger Items Missing from Google Search

The foremost incentives for adding user-supplied content to Google Base are the ability to search database content and speed the appearance of content in searches conducted by major search engines—or, at least, by the Google Web search engine. As of December 12, 2005—almost a month after I bulk-uploaded my first 50 Google Base test items—targeted Google searches for any of my 3,500+ custom and News and Articles items failed to return a single result.

Updated December 14, 2005: This items was updated with the following test searches and new results for Google Web search on blog entries.

Click here to test a search on 3845 334510 3311 "This US industry", which should—but doesn't—return this Google Base custom NAICS item: 334510 - Electromedical and Electrotherapeutic Apparatus Manufacturing.

Similarly, a search on on the partial phrase "Modified for Google Base bulk upload on 2005", which is present in every News and Articles item's Description field, returns results for other OakLeaf blog entries, but not my Google Base items.

Others have reported the same problem in the Google Base Help Discussion group. Click here and here to read the threads. Google Base service reps don't appear to participate in the group discussion, so there's no indication if the Google folks intend to fix this problem. Google, however, actively promotes Google Base as an aid to the Google search engine. The November 16, 2005 "'Google Base' Has Grand Ambitions" Associated Press story by Matthew Fordahl says:

Launched as a "beta test" early Wednesday, Google Base has the potential to make instantly available a vast sea of content including — but not limited to — recipes, job ads, photos, DNA sequences, real estate listings and individual standalone databases. Normally, it takes Web "crawlers" days or weeks to scour the Web and feed Google's main search engine with updated information, but they usually can't penetrate content buried in databases. This tool will make locating anything that's been uploaded nearly instantaneous, provided it finds users willing to provide the content. Submitters will also be able to describe what they uploaded with keywords — making searches and filters easier and more reliable.
The AP story goes on to quote Google's Salar Kamangar, vice president of product management: "This is all part of our efforts to make it really easy for anyone with information to make it accessible from Google. We just felt like this piece was just missing before." The capability to make any Google Base information accessible outside of the user-unfriendly Google Base UI—e.g., from Google search—appears to be missing, despite Kamangar's "Grand Ambitions" for the new software as a service (SaaS) project.

eWeek's Elinor Mills quotes Kamangar in her November 15, 2005 "Google Base service goes live" article: "We think about it being a utility so people can more efficiently post information to us. If there is more information in the search results the search experience is better, It is not a separate property we are trying to monetize. We are not at all focused on commerce or local commerce or classifieds." However, it didn't take long for Google management to merge Google Base retail store data entries that enable Froogle Local Shopping (a localized commerce service) for a select group of retail chain merchants.
The following screen capture displays a typical Froogle Local Shopping page for a search on digital camera near oakland. Google Base stores the retail branch locations and inventory items:

PC World magazine's Juan Carlos Perez appears to have received conflicting information from Kamangar, as regurgitated in his November 16, 2005 "Google Base Debuts for Hosting All Content" article:
In addition to appearing on Google Base, items posted there may surface in Google's main Web index, in the Froogle comparison-shopping site, and in the Google Local listing of businesses. [Emphasis added.] In fact, Google does not intend to promote Google Base as a service for information searchers, since the plan is to make Google Base data appear in the company's various search services, said Salar Kamangar, a Google vice president of product management, in an interview. "Our primary goal with Google Base is to extend the ways we have of collecting content to make more information available to searchers," Kamangar said. "Google Base is intended as an information store for other Google properties." The Google Base search service is primarily geared toward those who feed content to it, so they can see how their results appear and can experiment with labels and attributes, he said."We're not driving search] users to Google Base," he said. "This content will be searchable in some way from other Google properties."
For example, an item posted for sale will appear in Froogle searches, while a business listing will appear in Google Local. In a matter of weeks, Google's general Web search will begin delivering Google Base results that are appropriate to that service, Kamangar said.
The Google Base main help screen's "Quick Facts about Google Base" offers the following bullet point: Reach: Items you submit to Google Base can be found on Google Base and, depending on their relevance, may also appear on Google properties like Google, Froogle and Google Local.

The obvious question is: Who determines what results "are appropriate to" Google search users? A Google high priest, ayatollah, or mullah? The Google Base Police? A panel of politically correct censors?

The Google Base Help Center's "Where Will My Item Appear?" topic muddies the waters with the following disclaimer:
The type of information in your item will determine which Google property will display it.

For example, any items you're selling will appear on Froogle. Your master's thesis or short story will appear on Google. And your glowing review of the new restaurant down the block would appear on Google Local.

Please note that these are guidelines only. We're unable to make predictions or guarantees about where your content will appear. [Emphasis added.]
If Google can't even make "predictions ... about where your content will appear," why bother to add items if you aren't a major retail chain with local inventory tracked by It's a good bet that a substantial amount of your carefully added content will end up in the Google Base bit bucket.

Imagine the hue and cry in blogs and the trade press if Microsoft were to offer a database in which some Windows Live or Office Live program manager (or admin) decided "where [or if] your content will appear."

Google Web Search Misses New Blog Entries for Almost 10 Days
Google Base items aren't the only elements missing from Google search.
My earlier "Problems Uploading a Google Base Custom Item Type from a TSV File" entry has a brief section near the end that deals with Google Base and DNA sequences. Searching Google with "dna sequence" oakleaf "google base", surprisingly the search returned no items until December 14, 2005.
Performing the same search on Yahoo! on December 6, 2005 returned a reference to the entire OakLeaf blog, as shown here:

An identical MSN Web search on the same day returned these two results that point at the entire blog for the initial reference (identified by the reference to Pedro [Beltrao]) and the latest entry in which the search term appears:

Searching with Amazon's A9 a few days later and marking the Blog Search check box returned no Web results (for which A9 relies on Google) but found results for 12/12/2005 and 12/8/2005 updates to the original 12/5/2005 version with IceRocket search:

Both IceRocket references point to the appropriate individual post.
Amazon's Alexa and the Lycos search services returned no results.
To give credit where credit is due, the Google Blog Search service—like most Google services, still a Beta version—returned the following references the day after this item's initial posting:

If you or your customers or clients can't find what you're searching for with the Google Web search engine, that doesn't mean that it isn't on the Web. MSN, Yahoo, and A9 search found my new blog additions a day or so after posting. Taking "days or weeks" to spider a new blog post reminds me of the early days of site submissions to Alta Vista and Yahoo!.