Thursday, December 29, 2005

A First Look at Amazon Connect Author "Blogs"

This post is a bit off-topic, but writing books about databases consumes a substantial part of my time and contributes a significant part of my income. So I decided to create an author "blog" on Amazon's newly-announced Amazon Connect service that I discovered via Memeorandum today. I added the quotes around blog because Amazon includes the term plog in some of its Connect URLs and blog in other URLs and pages. According to TDavid, plog is an acronym for "product log," not "Web log." Plogs don't have much in common with today's blogs, in my opinion, so I plan to stick with plog in future references to Amazon Connect's message collections. Click here to display the Amazon page for Special Edition Using Microsoft Office Access 2003. Scroll past Product Details to the Amazon Connect section, which displays my plog in its natural habitat. Click the Roger Jennings' Author Profile Page link to read an extended bio, bibliography (of currently-verified titles), and five Favorite Books, Albums, Movies, and Periodicals. I discovered many posting problems with Amazon Connect that don't occur with standalone blogging tools, such as Blogger. My conclusion is that authors should consider the current Amazon Connect incarnation to be a beta-grade application. One of the characteristics of free (usually ad-supported) online utilities and tools is their perpetual beta state, which permits mini (weekly) or micro (daily or hourly) feature updates and bug fixes. Following are some of the issues I encountered when adding and editing a few messages to my Amazon Connect plog: 1. Creating an author plog involves a Byzantine process of vetting your credentials as the author of each of your books sold by Amazon you want to add to your bibliography. You must provide the e-mail address and telephone number of an agent, editor, or publisher of the book who will verify that you are its author. Fortunately, my current QUE acquisitions editor for Special Edition Using Microsoft Office Access [12] was at work today to verify my current QUE and SAMS titles. Titles from other publishers were verified after January 2, 2006. 2. As several other bloggers have noted, Amazon Connect doesn't publish an RSS 2.0 or Atom 1.0 file for syndicating author plogs. This omission appears to me to be intended to create a walled garden that's intended primarily for visits by Amazon customers. By itself, lack of an RSS or Atom file is sufficient to disqualify the Amazon Connect message collection as a blog. Lack of RSS/Atom syndication will discourage authors from spending significant amounts of time updating their plogs. 3. The WYSIWYG HTML editor lets you insert links to and images of books and other products from the Amazon site, as well as links to external pages. However, the editor doesn't offer a direct HTML editing choice. Thus, you can't add arbitrary images or format your "messages" other than with the editor's toolbar buttons. 4. You can't test external links in message preview mode. You must preview the message before you can post it; posting is required to test links, which open from an amazon.com redirect. It's very easy to forget to post (and thus lose) a new "message." 5. There's no indication of how to save a new or edited message. The only button on the editing page is "Preview your message," which you must click to open a page with a "Post your message" button. 6. All edits to existing messages I posted on December 30, 2005 returned a "An error occurred trying to post your message" warning. However, the edits persisted. The error message appears to be spurious. 7. Adding Favorite Books, Music and Movies to your Author Profile is limited to a list of books, albums, and DVDs or tapes you recently browsed on the Amazon site or that you've marked as owned. Fortunately, Amazon offers most of my favorite books, Brasilian music CDs, and film tapes/DVDs. 8. The Books, Music, and Movies You Own panes in the Build Your List of Favorite ... dialog is broken if you haven't marked any items as owned. For example, a "You don't have any items in ${hash-get canonName} you ownBooks You Own" message appears in the pane if you haven't marked a book you own.

9. Book thumbnail images inserted into message text need top, right, and bottom margins increased.

10. You can't format your author bio ("About Me") with HTML tags, nor can you add working HTML anchors. This omission is additional evidence of Amazon Connect's walled-garden policy. My bio, in particular, needs external links to avoid the need to add unnecessary trivia that's available elsewhere. I'd also like to be able to emphasize words with italic or bold fonts. The About Me textarea should be replaced with an upgraded WYSIWYG HTML editor.

11. The HTML editor needs a spell-checker similar to or, hopefully, better than that provided by Blogger. Spelling gaffes are especially embarrassing for authors.

12. Pasting content from other HTML sources can cause a substantial increase in font size. The HTML editor displays text in a fixed-size serif font so you don't see the font size change until you preview the content in a sans-serif font. The HTML editor doesn't enable font-size selection or correction. Workarounds for this problem are cumbersome, to be generous. Here's a screen capture that shows the increased font size starting at "Magazine Express, Inc."

13. It appears that you can't change the recipient list for a message nor can you delete a message. (If you can, I haven't found how to do it by experimentation or from the FAQ.) The inability to change the recipient list results in the need to duplicate posts that apply to bibliography items added after the post (post post?) or instruct readers to click the See All of AuthorName's Posts link. This is a very serious problem for me.

The preceding are issues (probably bugs) that I've encountered so far. I'll update this post as problems are corrected or new problems appear.

On the positive side:

1. If you log on with your Amazon account, which is difficult to avoid, you can edit your Author Profile and messages anywhere that you can read them. You might need to re-enter your password now and then to enable editing.

2. Amazon's automated e-mail system for verification of your bibliography appears to be quick and informative. You receive immediate e-mail confirmation of verification attempts and verifications that succeed. Amazon help-desk personnel intervene promptly when a verification issue occurs.

3. The ability to associate specific messages with a particular book (or group of books) is a great idea, but the ability to edit the book (recipient) list for a message is crucial.

--rj Technorati: SQL Server Express

Tuesday, December 27, 2005

Squidoo Beta Problems - December 2005

I've been testing Squidoo lenses as a supplement to the OakLeaf blog since the Web application entered the public beta stage in early December 2005. In the process, I created and/or edited three moderately large lenses with the Squidoo beta version(s) that were in effect from December 20 to December 27, 2005. Hopefully, other lens authors will find this account of my recent experiences useful. On December 28, I added some recommendations to increase Squidoo's usefulness. The suggestions appear at the end of this post. I added additional problems and suggestions in January 2006 after creating a very large lens.

Note: Web applications allow their developers to "go live" with updated beta versions at will. Thus, new features get added, some bugs get excised, and other regression bugs appear without notice. This means that the problems reported here might have been fixed by the time you read this or new bugs might have shown up. For the Squidoo team's view of their products status, check the SquidooStatus blog. Unless otherwise noted, the problems appear when editing or viewing with IE 6.0 or Firefox 1.5.

Squidoo Problems

Following are the more important issues that are currently affecting my three lenses, all of which have been reported to the Squidoo development team: 1. One of my off-topic lenses, No More U.S. Custom Surfboards?, lost about 75% of its content in the 12-hour+ period from 6:00 p.m. December 20, 2005 to 6:30 a.m. December 21, 2005. Two Linked List modules and a Text module disappeared from the lens, which was confirmed as successfully published. My recommendation is to keep local backups of your lenses as IE Web Archive - Single File (*.mht) files or Firefox Web Page Complete (*.htm) files. This problem was reported to be fixed by a code update on December 22, 2005. I haven't encountered a reoccurence of the problem, but I still recommend keeping up-to-date backups. 2. After December 20, 2005, I was unable to add this small *.jpg image file to a Text module of the custom surfboards page with IE 6 SP1 and the latest patches. There was no prior problem uploading and inserting the same image. Browsing for, inserting the file name in the text box, and clicking Upload Photo bypasses the Save operation and doesn't add the uploaded image to the module. Other sites also exhibit similar problems with images that previously appeared in text modules. The problem does not occur with Firefox 1.5. For software development purposes, I have IE 6 set as my default browser; other viewers and/or LensMasters might not have Firefox 1.5 installed. This problem appears to have been fixed as of 1/20/2006. (Updated 1/20/2006.)

3. The WYSIWYG HTML editor for the text of the Introduction module became inoperative on December 26, 2005. The HTML editor is frozen. For now, you must edit Introduction text with the pop-up HTML editing dialog. This problem appeared to be fixed as of about 13:00 on December 27, 2005. 4. Adding Technorati tag anchors—such as <a href="http://technorati.com/tag/surfboards" rel="tag">surfboards</a>—to the Introduction text doesn't work because Squidoo strips the rel="tag" attribute/value pair from the anchors. The reason for doing this is unknown. This problem appeared to be fixed as of about February 5, 2006.

5. Adding similar tag anchors to module or link Description text doesn't work, because Squidoo prepends http://www.squidoo.com/surfboards/%22 to the URL and strips the rel="tag" attribute to create the following unusable hybrid: http://www.squidoo.com/surfboards/%22http://technorati.com/tag/surfboards%22. The presence of the rel="tag" attribute appears to cause the erroneous URL problem. This problem appeared to be fixed as of about February 5, 2006.

6. Attempting to claim a Squidoo lens for a Technorati account results in this Quick Claim message: "Good news! We've detected that your blog platform, WordPress, supports Quick Claim, which is the easiest way to claim your blog." Obviously, a Squidoo lens isn't a WordPress blog. (Updated 2/5/2006.)

However, there's now a method for adding a lens to a Technorati account because both Squidoo and Technorati have fixed claim problems. Following is the workaround:

A. Start the Technorati blog claim process, proceed as if you were claiming a conventional Blogger or WordPress blog. The claim will fail.

B. Click Configure Blog, then click the Skip This Step link to bypass the WordPress Username and Password page, and click it again to bypass the Tehnorati Embed page.

C. On the Technorati Link page copy the link from the Step 3 text box to the Clipboard. The link will appear similar to <a href="http://technorati.com/claim/75fm94x63">Technorati" Profile</a>.

D. Open your lens and its Introduction module for editing, and click the HTML button to open the HTML editing window. Move to the bottom of the window, and paste the link into the text.

E. Click Update to close the window, Save the module, and click the Technorati Profile link to verify that it points to your Profile. Then click Publish.

F. Click Claim Blog Now to open the Configure Blog Claim page and complete the entries as you would for a conventional blog.

G. Clear the Show Technorati Embed on my Site check box, and then click Save changes.

Note: The preceding process failed for the OakLeaf databases lens, which was the first lens built in the Squidoo beta, but succeeded for all other lenses, as shown here:

7. I was unable to use I.E. 6.0 to add a Subtitle or Description element to a Linked List module after adding a link to the module and saving it. However, there was no problem doing this with Firefox 1.5. Firefox displays Add a Subtitle and Add a Description links if the Subtitle and Description element are empty; IE 6.0 doesn't. This problem appeared to be fixed at about 13:00 on December 27, 2005.

8. The Search Results page doesn't return consistent results for lens titles or tag names. For example, a search on oakleaf or database (but not databases) returns the OakLeaf Databases Lens and on microsoft or access returns the Microsoft Access Lens. Searching on hsqldb (a tag name) returns both lenses. The reason for the failure of databases is unknown. [This problem returned on 1/24/2006 and was fixed as of 1/26/2006] However, searching on custom or surfboard[s] doesn't return my more recent custom surfboards lens, nor does a search on any of the lens's assigned tags, such as blanks, epoxy, grubby clark, etc.. This failure indicates that a problem occurred with Squidoo's search system prior to publishing the surfboard lens. The result is that Squidoo users can't find my surfboards lens by ordinary search methods. As of 1/18/2006, searching on custom or surfboard[s] and most other tags now returns my custom surfboards lens, However, using custom as the search term places my lens (now rated #111) on the second results page, following several unclaimed lenses. It appears that tag-based searches still require some work. (Updated 1/19/2006.)

Note: On 1/12/2006, the Squidoo Status blog reported problems with corruption of the Squidoo search database. The itermittent nature of the search problem might be due to repeated corruption of the search database.

9. RSS modules don't update automatically to reflect changes to the specified blog. For example, I added an "First Look at Amazon Connect Author 'Blogs'" post. My update frequency for the Recent OakLeaf Blog Atom Links RSS module of the OakLeaf Databases Lens was set at once/day. Several days after adding the post, it didn't appear in the list. I had to manually edit and refresh the list to add the new post. The same problem applies to the RSS module of my Microsoft Access Lens. (Added January 11, 2006, reverified 1/19/2006.)

10. Star ratings (1-Poor through 5-Exceptional) appear to be arbitrary and there's no indication how users can assign ratings to a lens. Squidoo FAQ #15 states: "Your lens may be #406 overall, but still be the top ranked lens and the best rated lens (based on the user-rating star system) in your category." My No More U.S. Custom Surfboards? lens ranked #176 on 1/11/2006) but originally had a user rating of 1-Poor. C. S. Lewis - his life and writings ranked #217 but has a user rating of 4.5 stars. How did ordinary viewers assign these ratings and what prevents a "LensMaster" from logging out (to act as an ordinary viewer) and assigning five stars to his or her lenses? (Updated 1/11/2006.)

Note: Part of the answer to the the above appears to be that only users with Squidoo accounts can assign star ratings to lenses. I was able to assign 5-star ratings to all my lenses by logging on with another account that has no lenses, and clicking the 5-Excellent star repeatedly. (Unlike most polls, you can "vote" as often as you want in a single session.) You don't see the star rating until you log out and view the lens as an ordinary user. Check out my The Black Scholar lens as an example. (Updated 1/11/2006.) 11. Another recent problem is the disappearance of new added items to Link List modules. The empty added items text boxes appear momentarily in IE 6.0, and then disappear, as shown below in IE 6 SP1 with the latest patches:

You must save the Link List with the empty item and then edit the list to complete the entry. This problem began in IE 6 about 1/1/2006 but, like the problem with images in Text modules, does not occur when using Firefox 1.5. Squidoo is running LAMP, which usually implies Firefox as the default browser. However, there are many folks who have only IE 5+ installed. (Updated 1/11/2006.)

This problem appears to have been fixed as of 1/17/2005.

12. Squidoo's ranking method for lenses (LensRank) is unfathomable to be charitable. For example, my expansive The Black Scholar Lens ranked #297 on 1/19/2006, #317 on 1/17/2006 and #767 on 1/16/2006, which was a considerable improvement over #1,996 on 1/14/2006. The lens has been updated almost daily since January 1 and has more than 200 outbound links as of 1/16/2006. According to Squidoo FAQ #16:

"LensRank does a full sweep of all lenses in the Squidoo system and ranks them according to things like frequency of update, traffic, inbound and outbound links and more. We recalculate the rankings every day or so. So, if you just built a lens and are disheartened to see that it's at the bottom of the LensRank pile, or behind several unclaimed UFG lenses, don't worry. Give it a day or two and we bet you'll see the LensRank improve (ie: get smaller...remember, LensRank #1 is best). LensRank is built to reward human-made, updated, curated lenses.

Right now, as we are figuring out the proper LensRank and search ratings and fine-tuning the algorithms, you may find that our search function is not returning the lenses you would expect. In a perfect world, your handmade lens will have a higher LensRank--and therefore be a higher search result--than any unclaimed UFG lens. We are working to make this so. But you can do it too by making a great lens, including a lot of links, and updating often."

The original problem might have resulted from a hiatus in operation of the "ranking engine" during early January 2006. Minor daily changes to the lens, however, haven't resulted in substantial reductions to LensRank, which ranges randomly from about 280 to 320. Unlike my Surfboards lens, it appears impossible to affect the rating of The Black Scholar lens by making additions to or updating it. (Updated 2/5/2006.)

13. Module relocation in the Your Lens Layout frame doesn't work. Moving a module any distance in the main editing page is a PITA. This problem was fixed as of 2/5/2006; relocation now works like a champ.

I'll continue to update this post from time to time with additional problems I encounter or fixes/workarounds I find for the preceding issues. Suggestions for Squidoo Improvements

Following are suggestions that I believe would make Squidoo more useful to me and to readers of my lenses:

1. Permit lensmasters and, perhaps, ordinary Squidoo users to mark lenses as having no significant content, containing deliberately misleading information, or being spam (splenses). Presently, the Report This Lens link applies only to lenses that contain adult content and aren't flagged as adults-only lenses.

2. Add the capability to insert limited-size images into the Description element of Linked List items (similar to that for Text modules). Doing this could improve the page flavor of sites with a large number of links or links with longer descriptions (see the No More U.S. Custom Surfboards? lens's Linked List modules as examples.)

3. Add category sub-elements with domain-based tags to the RSS 2.0 <channel> element, such as <category domain="http://del.icio.us/tag" >surfboards</category> and <category domain="http://www.technorati.com/tag" >surfboards</category>, to support future tag-aware readers. Tag values could be extracted from the Squidoo Related Tags list, but it would be helpful to include an RSS editing feature similar to that for Squidoo tags. This suggestion is in addition to fixing the problems with Technorati tags in preceding problem items 4 and 5.

4. Consider adding RSS 2.0 <item> elements for all modules, preferably with domain-based <category> tags. Doing this might cause RSS readers to attempt to display individual modules instead of the entire Squidoo page.

5. Fix the alignment of the image added to the Introduction text for IE and Firefox browsers in View mode. (Alignment is OK in Edit mode with Firefox 1.5, but not IE 6.0 SP1). Text should be raised about 5px and the image dropped about 10px to correct the problem shown below in IE 6:

6. Fix the user-ranking (star) system and the LensRank algorithm, and explain (in some detail) the theory of ranking lenses. The current explanation appears not to be applicable to this Squidoo implementation. (Updated 1/14/2006.)

7. Correct problems with existing modules and current features before adding new modules, such as Indeed.

8. Increase the maximum number of characters of Text modules from 2,500 to at least 20K. The text limit is inappropriate for this module type and requires splitting text (includings lists) into multiple Text modules. This problem was fixed for lists as of 2/5/2006 by the addition of the Text List Module. (See The Black Scholar Issue List as an example that replaces three Text modules).

9. If possible, fix Firefox 1.5's SmartNavigation failure. When readers return from an outbound link, Firefox moves to the top of the lens page; IE 6.0 returns to the location from which the user clicked the link. This is a serious problem for readers of a large lens, such as The Black Scholar.

Let me know what you think about these suggestions and add your own as comments.

Technorati:

Thursday, December 15, 2005

Squidoo and Google Base: A Tale of Two Aggregators

Other than the obvious relationship between 23-skidoo—a roaring-twenties slang term for "I'm outta here," which might have originated in the Barney Google comic strip—and the Squidoo moniker, similarities between Squidoo and Google Base aren't readily apparent. Squidoo and Google Base are in the beta stage, and both aggregate Other People's Content (OPC, a.k.a. unpaid media). Both services store OPC and its search attributes in one or more databases, which makes them fodder for this blog. Updated December 27, 2005: The Important Beta Bugs section has been moved to the "Squidoo Beta Problems - December 2005" post of December 27, 2005. This update also includes minor revisions for events that occurred after December 20, 2005. Squidoo represents an attempt to reduce the information overload that's resulted from the exponential growth of Internet content. Squidoo pages (called lenses) let writers share their expertise on a particular topic with readers. As Adam Bosworth, vice-president of Google, stated in his "Collaboration, Customization, and Communication" presentation to the Second International Conference on Service Oriented Computing (ICSOC04):

"What has been new is information overload. Email long ago became a curse. Blogreaders only exacerbate the problem. ... "What will be new is people coming together to rate, to review, to discuss, to analyze, and to provide 100,000 Zagat’s, models of trust for information, for goods, and for services. ... "What will be the big enabler? Will it be Attention.XML as Steve Gillmor and Dave Sifry hope? Or something else less formal and more organic? It doesn’t matter. The currency of reputation and judgment is the answer to the tragedy of the commons and it will find a way. This is where the action will be."

Squidoo lenses are intended to be "models of trust for information." Whether lenses can gain similar stature for goods and services remains to be seen.

Google Base, on the other hand, appears to contribute to information overload by duplicating OPC in what could become the world's largest freely-accessible, Internet-based content database. Google Base's saving grace is the ability to categorize content for more precise searching.

Adding Items to Squidoo vs. Google Base

Squidoo offers hosted Web pages to which self-annointed "experts" can add Modules for free-text entry or that generate links to their blogs, Web sites, Flickr images, books, magazine or online articles, items for sale, and/or geographic location with Google Maps (which needs a minor address input fix as of December 15, 2005, but works OK with a fully validated USPS address.)

A Squidoo module is a special-purpose UI for adding and editing a particular type of data, such as RSS 2.0 or Atom 0.3/1.0 for automating multiple links to blog entries, generating and reorganizing lists of books on Amazon.com, manually adding links to Web pages, and processing Flickr photos. Modules enable Squidoo's developers to quickly add, modify, or delete data types. For example, an inoperative Technorati module disappeared on December 15, 2005 and a new Google Maps module arrived the same day.

Squidoo has a much more user-friendly item creation and editing UI than Google Base. I was able to build a reasonably complete "Oakleaf Database Lens" with links to about 70 individual items in less than an hour. Automated, direct import of this blog's Atom 0.3 feed and links to my books on Amazon.com streamlined the lens creation process. There appears to be no limit to the number of items you can import; however, lenses are limited to a single page and modules don't appear to have paging options. Thus, the practical limitation on the number of imported items probably is page height.

Squidoo lacks Google Base's capability to define custom item types, such as taxonomies, and has minimal module customization features. However, Squidoo more than compensates for these limitations by dramatically simplifying the process of bulk-uploading RSS 2.0/Atom 0.3 feeds. Another benefit of aggregating your content with Squidoo is that no Squidoo items self-destruct in 30 days.

The RSS/Atom Feed module updates automatically at user-defined intervals of 30 minutes to one week. Thus, maintaining a current blog entry list is a no-brainer compared with the bulk-update agony described in my "Google Base and Atom 0.3 Bulk Uploads" and "Bulk-Uploaded Items Disappear from Google Base" posts.

The Long Tail, Aggregators, and Filterers Almost all blogs—including this one—and Web sites are destined for the obscurity of the Long Tail. August Capital's David Hornik concludes in his December 13, 2005 "Where's The Money In The Long Tail?" VentureBlog post: [T]here are essentially two general classes of technology [that] will benefit economically from the Long Tail -- aggregators and filterers." Hornik goes on to analyze the roles of aggregators and filterers:

The aggregators are those web businesses that seek to collect up as much of the Long Tail content as is possible, so as to make their "stores" a one stop shop for content no matter how popular or obscure. ... The value to consumers from these content aggregators is that they need not shop in dozens of places on the web in order to acquire a diverse set of content. As a result, aggregators are able to extract a disproportionate amount of value for the sale of each individual piece of content. And while creators are likely to sell slightly more content as a result of the increased ease of salability, they will not likely emerge from the obscurity of the Tail merely because they are made available for sale on Amazon or iTunes. The filterers are those businesses that make it easier to find the content in which we are interested, despite the increasing proliferation of content creators, hosts, aggregators, etc. The purest form of filterer is the search engine. But the more obscure the content, the less effective the generalized search engine will be. ... Again, while these different filtering technologies may make it slightly more likely that an end user finds his or her way to a piece of obscure content, it will not likely be sufficient to catapult an artist into the mainstream. The beneficiary of the filtering is the end user and the filterer, not the content owner per se.
Hornik believes "that it is difficult to be an aggregator without also being a filterer. It will be hard to sustain the scale necessary for an aggregation business if you don't initially also provide some of your own filtering tools."

Filtering by Tagging

Squidoo enables you to assign an arbitrary set of tags to a lens, but not to individual items; Squidoo calls the lens's tag collection a Tag Cloud. As an example, the Oakleaf Database Lens has 30 database-related and a few off-topic tags. Typing database as the tag name in Squidoo's home page and clicking Find It returns a filtered list of lenses that share the tag, as shown here:

You can filter lenses with multiple matching tags by clicking one or more of the Get Picky items, for example sql and xml. You clear the additional filter terms by entering the same or a new key name in the Search text box. (The UI lacks instructions for clearing added search terms.)

Google Base lets you assign custom details name/value attribute pairs and add arbitrary tags to individual items. Searching for individual items by tag name(s) or attribute value(s) enables finer-grained filtering, which is important when filtering large page populations that have popular or broadly scoped tags.

Google is the Internet's most popular and eclectic filtering service, with basic Web page, blog, regional (Google Froogle Local), consumer product (Froogle), video, and (recently) music searches. Google Base intends to automatically assign user-created content to a specific Google filtering property based on its item type. The Google Base Help Center's "Where Will My Item Appear?" topic states:

The type of information in your item will determine which Google property will display it.

For example, any items you're selling will appear on Froogle. Your master's thesis or short story will appear on Google. And your glowing review of the new restaurant down the block would appear on Google Local.

Please note that these are guidelines only. We're unable to make predictions or guarantees about where your content will appear.

Thus, Google, not you, will decide where or if your content might appear. My "Google Base and Blogger Items Missing from Google Search" post has more details on these issues. Increase Blog or Site Traffic with Squidoo? Not Likely. Squidoo's FAQ includes these two unsubstantiated bullet points:
• Increase your traffic: Your lens points (if you want it to) to your blog and to your website. In addition, lenses have huge credibility with search engines, so a lens is going to increase the traffic to your website even for people who don't visit your lens. • [You should build a lens if you] have a Web site and you're not happy with your PageRank in Google, a lens will increase it. That's because a lens provides exactly what search engines are looking for: authoritative insight so people can find what they're looking for.
Seth Godin—the well-known entrepreneuer and author who founded Squidoo—might have authored the preceding claims, but I've found no evidence whatsoever that "lenses have huge credibiility with search engines" or "a lens provides exactly what search engines are looking for." I've yet to find more than one Skidoo page with a Google page rank other than 0; the SquidBlog has a rank of 4/10. Even Chris Anderson's "The Long Tail" lens (#2 of Squidoo's Top 100 lenses) ranks 0 with Google, despite the fact that Chris is credited with coining "The Long Tail" as a proper noun. Chris's blog has a Google rank of 6/10; this blog has a rank of 3/10. To my knowledge, both ranks have been the same for several months. Note: The similarity of the quotes in the preceding "The Long Tail, Aggregators, and Filterers" section from David Hornik's post in Chris's post of December 15, 2005 and this post is coincidental. As far as I can determine, potential lens users must visit the Squidoo site and search for the "expert's" topic to obtain the desired information or links to it. Squidoo appeared to have about 6,500 lenses as of December 15, 2005. But there's no easy way to separate Squidoo's pre-built and unclaimed "starter" lenses from actively-maintained lenses with useful content, so the number of the latter is up in the air. One would need unique visitors counts to determine the value of Squidoo as a whole and page view counts for individual lens activity.

Adding Squiddoo Lenses as Google Base Blog Items

You can add an entry for your Squidoo Lens as a Google Base blog item by signing in to Google Base and selecting the Create Your Own Data Type option on the Dashboard page's Post Your Item region. Type Blogs in the text box, as shown here:

Click Next to open the Edit Item page, which displays custom Details attributes—Home page, Author, and Blog type—for the custom Blogs item type, which doesn't expire. Add up to 10 Labels (tag names), content to the Description textarea, and an optional Skidoo page capture, as shown here:

Click Preview to verify your work, as shown here, and then click Publish to add the item to Google Base:

The newly added items appear in the Posted Items page, as shown here:

Clicking links that display the OakLeaf icon open the page. The other two links open the Oakleaf Systems's [sic] Items page, which is identical to the preceding screen capture.

Whether adding items for your Squidoo Lens(es) to Google Base increases their visibility in Google Web search remains to be seen, because Google search doesn't appear to index Google Base items at this time.

I ran Ping-O-Matic on the Oakleaf Database Lens immediately after completion and initial tests. A lens isn't a blog, so the lens's RSS 2.0 file contains only one item group with title and description elements. Content from modules isn't included as item groups.

Searching Google, MSN, and Yahoo with "squidoo : lenses : " site:www.squidoo.com returns varying numbers of links to others' lenses. After about 10 days I obtained a reference to my lens from Google and MSN (but not Yahoo) search with "squidoo : lenses : oakleaf " site:www.squidoo.com as the search term. It took more than a week for Google to re-spider the Squidoo site.

Comparative Terms of Use

Both Squidoo and Google Base are in the beta testing stage, so there's no guarantee that the services will remain in operation or preserve your content. Both services can terminate your access to the service at will. Thus the termination provisions of the Terms of Use agreements are important. The following is the text of paragraph 12. SQUIDOO CAN TERMINATE YOUR USE OF THE SERVICE of the Squidoo Terms of Service: Squidoo, in its sole discretion, may terminate your password, account or use of or access to the Service (including CO-OP Services), and remove and discard any Lens, for any reason. Some of the more likely reasons for termination are inaccurate information in the registration form, harassment of other Lensmasters, hacking Lenses or the Service, illegal transactions via Squidoo, and otherwise acting in violation of the terms or the spirit of these Terms of Service or other policies making a Lensmaster a fine, upstanding member of the Squidoo community. If Squidoo terminates your use of the Service, then your Lens may be immediately taken down, and your Squidoo Earnings Account may be closed. In this event, Squidoo will not be responsible to you or any third party for any termination of your access to the Service.

The following is the text of Paragraph 10. TERMINATION of the Google Base Terms of Service:

You may cancel your use of Google services and/or terminate this Terms of Service with or without cause at any time by providing notice to Google at googlebase-support@google.com; provided, however, that a terminated account may continue to exist for up to two business days before such cancellation takes effect. Google may at any time and for any reason, including a period of account inactivity, terminate your access to Google services, terminate this Terms of Service, or suspend or terminate your account. In the event of termination, your account will be disabled and you may not be granted access to your account or any files or other content contained in your account although residual copies of information may remain in our system. Except as set forth above or unless Google has previously canceled or terminated your use of Google services (in which case subsequent notice by Google shall not be required), if you have provided an alternate email address, Google will notify you via email of any such termination or cancellation, which shall be effective immediately upon Google's delivery of such notice. Sections 13 through 19 of the Terms of Service (including the section regarding limitation of liability), shall survive expiration or termination.

Business Models

Squidoo adds Google text advertisements to a panel on the right side of each lens. Skidoo claims to share its AdSense revenue with lensmasters by an ambiguous co-operative model, which is inoperative during the beta period. The Squidoo FAQs say the following about the revenue-sharing program:

DOES SQUIDOO MAKE A PROFIT? Yes, Squidoo is an old-fashioned corporation, with real employees and investors. We're not legally organized as a co-op, not in the sense that we've got granola all over the floor or that we are owned and controlled by volunteers. Instead, we've structured the organization so that we're in a partnership with our lensmasters. It's a co-op in the sense that the more you give, the more you get. All lensmasters with traffic get a pro-rated share in the income that we get from the Google AdSense ads that run on every page, for example. In addition, a rotating slate of lensmasters is invited to participate in the panel that chooses the charities that get the money from our charity pool.

We divide up the money we receive in a very public way. First, we pay our bills. That's direct out of pocket expenses like rent and servers and salary and benefits expenses (our CEO doesn't take a salary, and neither does our board of directors). Then, with no other deductions, we pay 5% of our post-expense revenue directly to the charity pool, 50% directly to our lensmasters and retain the rest to pay off investors and employees. Don't quit your day job yet, but you should know that as we all grow, our goal as a co-op is to pay as much money as we can to our lensmasters and to charity.

HOW CAN I MAKE MORE MONEY? Every lens carries Google AdSense ads. Those are used to pay our expenses and to generate royalties for all our lensmaster partners. If you want to increase your royalties, though, you should consider adding commercial modules that the visitors to your lenses will appreciate. You can sell books, point to eBay auctions or generate sales for more than 500 of our partners. Every single one of these modules generates directly attributable revenue for your lens, and we pay a royalty to you or to your chosen charity based on that income.

Build good lenses, feature great stuff and earn more royalties.

Like recording and movie production contracts, the primary issue with advertising and related income distribution is the amount of Squidoo's current and future "direct out of pocket expenses like rent and servers and salary and benefits expenses." A corollary of Parkinson's Law is "Expenses rise to meet [or exceed] income." There's no mention of recovery of start-up expenses, nor is there any cap on amounts to be paid for rent, servers, salary, and benefits. In other words, don't count on Squidoo to pay the rent.

Preliminary Conclusion

Squidoo's closest Web relative probably is Wikipedia. Both services depend on OPC—specifically, information and links provided by individuals who consider themselves experts in a particular subject. Following are the most significant differences between the two services:

  • Wikipedia restricts a topic to a single entry (page).
  • Squidoo encourages multiple lenses (pages) for a particular topic, and a single lens may cover multiple topics, which are specified by tags.
  • Anyone with an Internet connection and a Wikipedia account can edit any topic, which qualifies Wikipedia as a collaborative application. Wikipedia's volunteer editors focus on different fixes, such as countering systemic bias; fact and reference checking; fixing punctuation; and improving grammar. Wikipedia administrators address issues such as vandalism and edit wars.
  • Only the designated lensmaster(s) for a specific Squidoo lens can edit its content. Squidoo relies on individual contributors and doesn't edit content or provide any facility for user feedback by public comments, numeric ratings, or stars. (The five rating stars in the lens's right pane's Lens Stats section currently are unused and read-only.) If enabled in the lensmaster's profile, viewers can send e-mail comments to the lensmaster.
  • Squidoo "use[s] an automated algorithm—LensRank—to rank the lenses. We look at user ratings, lensmaster reputation, clickthrough rates, frequency of updates, inbound and outbound links, and other factors and give the lens a number." (Apparently, lower is better, but—despite a claim of transparency—Squidoo doesn't disclose the basis of its ranking algorithm.)

Safari Software's Bob Walsh, a Squidoo enthusiast, provides independent insight on the value of Squidoo to users, small independendent software vendors (micro-ISVs), and doubters. Jeff Jarvis at Buzz Machine adds a positive review of Squidoo's Web 2.0 preview.

The key to Squidoo's ultimate success and its usefulness to contributors will be ubiquity. Users must associate Squidoo with targeted, accurate information on a wide variety of topics. Wikipedia currently has articles for about 870,000 topics. Type define: database, define: client-server, or define: whatever in the Google search bar and an en.wikipedia.org/wiki/topic link usually shows up in the Definitions list. Ensuring that major search engines return reasonably ranked Squidoo lens references for common search terms will require some heavyweight search-engine optimization (SEO).

Squidoo shares the "guide" approach with the New York Times Company's About.com, but doesn't require guides to be vetted. About.com has been in business since 1997, so it has substantial depth of "consumer information and advice" that's provided by 475 official guides on 50,000+ topics. For example a search on database returns 12,233 references; databases returns 5,363. This surfeit of links minimizes About.com's usefulness to neophyte users. On the other hand, About.com claims it's "a top 15 Web property used by one out of every five people on the Internet," with 22 million monthly users. This might be why the NY Times Co. paid magazine publisher Primedia $410 million to acquire About.com in February 2005. Whether there's a sweet-spot of search specificity and reach for Squidoo remains to be seen.

Squidoo must be prepared to eradicate spam lenses (splenses?) or parked lenses (plenses?) that have minimal or no content. As an example, the Figure Skating lens contains a single link to a page for a Massachusetts figure skating club. Handling spam requires surveillance by Squidoo employees and feedback from viewers or lensmasters. There appear to be thousands of autogenerated empty lenses that should be eradicated. A feature similar to Google Base's Report Bad Item link could prove useful in combatting Squidoo spam and other abuse.

Note: Squidoo provides lensmasters a Report This Lens link in the Lens Stats group of the right frame. The link only appears when a lensmaster is logged on to another's lens. Clicking the link displays this message: "Report this lens only if it contains obscene or illegal content, or if it has adult content and the lens is not marked as Adult." The link also displays a single-line Comments text box.

This link isn't equivalent to Google Base's Report Bad Item link because of its restriction to obscene, illegal, or adult content and lensmaster-only access. Viewers should be able to report lenses as spam or having no or deliberately misleading content to Squidoo. Sending e-mail to the offending lensmaster isn't likely to have the desired (or any) effect.

Technorati:

Monday, December 12, 2005

Google Base and Blogger Items Missing from Google Search

The foremost incentives for adding user-supplied content to Google Base are the ability to search database content and speed the appearance of content in searches conducted by major search engines—or, at least, by the Google Web search engine. As of December 12, 2005—almost a month after I bulk-uploaded my first 50 Google Base test items—targeted Google searches for any of my 3,500+ custom and News and Articles items failed to return a single result.

Updated December 14, 2005: This items was updated with the following test searches and new results for Google Web search on blog entries.

Click here to test a search on 3845 334510 3311 "This US industry", which should—but doesn't—return this Google Base custom NAICS item: 334510 - Electromedical and Electrotherapeutic Apparatus Manufacturing.

Similarly, a search on on the partial phrase "Modified for Google Base bulk upload on 2005", which is present in every News and Articles item's Description field, returns results for other OakLeaf blog entries, but not my Google Base items.

Others have reported the same problem in the Google Base Help Discussion group. Click here and here to read the threads. Google Base service reps don't appear to participate in the group discussion, so there's no indication if the Google folks intend to fix this problem. Google, however, actively promotes Google Base as an aid to the Google search engine. The November 16, 2005 "'Google Base' Has Grand Ambitions" Associated Press story by Matthew Fordahl says:

Launched as a "beta test" early Wednesday, Google Base has the potential to make instantly available a vast sea of content including — but not limited to — recipes, job ads, photos, DNA sequences, real estate listings and individual standalone databases. Normally, it takes Web "crawlers" days or weeks to scour the Web and feed Google's main search engine with updated information, but they usually can't penetrate content buried in databases. This tool will make locating anything that's been uploaded nearly instantaneous, provided it finds users willing to provide the content. Submitters will also be able to describe what they uploaded with keywords — making searches and filters easier and more reliable.
The AP story goes on to quote Google's Salar Kamangar, vice president of product management: "This is all part of our efforts to make it really easy for anyone with information to make it accessible from Google. We just felt like this piece was just missing before." The capability to make any Google Base information accessible outside of the user-unfriendly Google Base UI—e.g., from Google search—appears to be missing, despite Kamangar's "Grand Ambitions" for the new software as a service (SaaS) project.

eWeek's Elinor Mills quotes Kamangar in her November 15, 2005 "Google Base service goes live" article: "We think about it being a utility so people can more efficiently post information to us. If there is more information in the search results the search experience is better, It is not a separate property we are trying to monetize. We are not at all focused on commerce or local commerce or classifieds." However, it didn't take long for Google management to merge Google Base retail store data entries that enable Froogle Local Shopping (a localized commerce service) for a select group of retail chain merchants.
The following screen capture displays a typical Froogle Local Shopping page for a search on digital camera near oakland. Google Base stores the retail branch locations and inventory items:

PC World magazine's Juan Carlos Perez appears to have received conflicting information from Kamangar, as regurgitated in his November 16, 2005 "Google Base Debuts for Hosting All Content" article:
In addition to appearing on Google Base, items posted there may surface in Google's main Web index, in the Froogle comparison-shopping site, and in the Google Local listing of businesses. [Emphasis added.] In fact, Google does not intend to promote Google Base as a service for information searchers, since the plan is to make Google Base data appear in the company's various search services, said Salar Kamangar, a Google vice president of product management, in an interview. "Our primary goal with Google Base is to extend the ways we have of collecting content to make more information available to searchers," Kamangar said. "Google Base is intended as an information store for other Google properties." The Google Base search service is primarily geared toward those who feed content to it, so they can see how their results appear and can experiment with labels and attributes, he said."We're not driving search] users to Google Base," he said. "This content will be searchable in some way from other Google properties."
For example, an item posted for sale will appear in Froogle searches, while a business listing will appear in Google Local. In a matter of weeks, Google's general Web search will begin delivering Google Base results that are appropriate to that service, Kamangar said.
The Google Base main help screen's "Quick Facts about Google Base" offers the following bullet point: Reach: Items you submit to Google Base can be found on Google Base and, depending on their relevance, may also appear on Google properties like Google, Froogle and Google Local.

The obvious question is: Who determines what results "are appropriate to" Google search users? A Google high priest, ayatollah, or mullah? The Google Base Police? A panel of politically correct censors?

The Google Base Help Center's "Where Will My Item Appear?" topic muddies the waters with the following disclaimer:
The type of information in your item will determine which Google property will display it.

For example, any items you're selling will appear on Froogle. Your master's thesis or short story will appear on Google. And your glowing review of the new restaurant down the block would appear on Google Local.

Please note that these are guidelines only. We're unable to make predictions or guarantees about where your content will appear. [Emphasis added.]
If Google can't even make "predictions ... about where your content will appear," why bother to add items if you aren't a major retail chain with local inventory tracked by ShopLocal.com. It's a good bet that a substantial amount of your carefully added content will end up in the Google Base bit bucket.

Imagine the hue and cry in blogs and the trade press if Microsoft were to offer a database in which some Windows Live or Office Live program manager (or admin) decided "where [or if] your content will appear."

Google Web Search Misses New Blog Entries for Almost 10 Days
Google Base items aren't the only elements missing from Google search.
My earlier "Problems Uploading a Google Base Custom Item Type from a TSV File" entry has a brief section near the end that deals with Google Base and DNA sequences. Searching Google with "dna sequence" oakleaf "google base", surprisingly the search returned no items until December 14, 2005.
Performing the same search on Yahoo! on December 6, 2005 returned a reference to the entire OakLeaf blog, as shown here:

An identical MSN Web search on the same day returned these two results that point at the entire blog for the initial reference (identified by the reference to Pedro [Beltrao]) and the latest entry in which the search term appears:


Searching with Amazon's A9 a few days later and marking the Blog Search check box returned no Web results (for which A9 relies on Google) but found results for 12/12/2005 and 12/8/2005 updates to the original 12/5/2005 version with IceRocket search:


Both IceRocket references point to the appropriate individual post.
Amazon's Alexa and the Lycos search services returned no results.
To give credit where credit is due, the Google Blog Search service—like most Google services, still a Beta version—returned the following references the day after this item's initial posting:



If you or your customers or clients can't find what you're searching for with the Google Web search engine, that doesn't mean that it isn't on the Web. MSN, Yahoo, and A9 search found my new blog additions a day or so after posting. Taking "days or weeks" to spider a new blog post reminds me of the early days of site submissions to Alta Vista and Yahoo!.

Technorati:

Monday, December 05, 2005

Problems Uploading a Google Base Custom Item Type from a TSV File

Google Base lets you define custom item types to supplement the standard item types—such as Products, News and Articles, Reference Articles, recipes, and the like. This entry describes what appear to me to be bugs or errors in creating and bulk-uploading files for custom item types.

You create custom item types by specifying a combination of predefined and optional custom attributes by bulk-uploading a user-defined, tab-separated-values (TSV) file with a .txt extension. The predefined title attribute, which has a maximum length of 80 characters, and a description attribute are mandatory. (The description attribute reportedly has a maximum length of 65,536 characters for TSV files; Atom 0.3 XML files limit the description attribute (context element) to 10,000 characters.)

An unique id value (an optional primary key) for an item ensures that items retain their original identity when updated. The TSV file must contain a first row of attribute name headers.

Custom attributes require a c: prefix for the attribute name, as in c:attribute_name; data type assignment (c:attribute_name:data_type) is optional. Valid data types include string, integer, decimal, dateTime, location, URL and boolean. A row of TSV data follows for each entry.

Note: Read my earlier "Google Base and Bulk Uploads with Microsoft Access" entry for a description of the basic bulk-update process for predetermined item types, such as Products.

Problem 1 (Google): The help topic states that boolean values may be Yes/No or True/False. Only true or false is allowed; True or False throws Bad Data errors during bulk uploads. [This problem was reported to be corrected the week of December 5, 2005].

You substitute a custom for a standard item type by specifying the registered file name in the Specify a Bulk Upload File page, selecting the Submit a Custom Item Type option for Item Type, and typing the custom type name in the adjacent text box. Figure 1 is a screen capture of the partial page for the NAICS Industry Taxonomy that the next section describes.

Figure 1

Problem 2 (Google): Figure 1's help topic states that you can "upload up to 100000 items per bulk upload, with a maximum file size of 10MB." Doing so might be possible, but uploading more than a few hundred items results in this error message: "You have exceeded the activity limit for this account on the beta version of Google Base. Not all of your active items may be displayed on your Dashboard." [Tests with about 3,850 items indicate that a maximum of 3,000 appear in the Dashboard and you can browse 1,000 items.]

Bulk-uploading TSV file formatting and operational problems aren't likely to become apparent unless you create a reasonable number of "real-world" custom items that have a combination of predefined and customer attribute name/value pairs. The following sections describe the problems and their solutions or workarounds for two moderately complex custom items created from data sources that aren't subject to copyright restrictions.

Item Types for Industrial Taxonomies Most classified advertising and shopping sites have predetermined merchant and product taxonomies. Industrial taxonomies are hierarchical lists of business and governmental activities that categorize—usually by number—the activities manufacturersers, distributors, wholesalers, and retailers of products, as well as service organizations.

Note: Google Base has a list of sequential—not hierarchical—Business Location Bulk Upload Categories for categorizing Froogle Local stores. Categories 1 to 473 are sequential by class—e.g., Business-to-Business, Education, Entertainment, Government Offices—and specialty—e.g., Manufacturers, Music Schools, Movie Theaters, Courts. Google added categories 474 to 515 (as of December 5, 2005) an ad hoc process. Proprietary taxonomies of this nature are very common, but difficult to maintain (as Google has discovered).

Many industrial taxonomies have appeared to standardize data entry and collection. Most standardized industrial taxonomies are created by governmental or quasi-governmental agencies for collecting, categorizing, and summarizing economic activity. You might liken a standardized taxonomy's numerical values to "Web 2.0" tags, but the numbers have no meaning to users without additional text information, which might require access to a current copy of the taxonomy.

Note: Taxonomy tags require the category number and its specific descriptive title to be useful. Thus taxonomy tags don't qualify as folksonomy tags, which users choose arbitrarily. Tom Gruber's "Ontology of Folksonomy: A Mash-up of Apples and Oranges" paper differentiates between ontology and folksonomy, and distinguishes categories from tags. Many items require multiple tag that can be used, for instance to search for parents in the hierarchy. 37signals.com's "Tag formats: Can't we all just get along?" post by 37signals.com's Matt Linderman describes the various types of multiple tag formatting in common use today."

One of the first such hierarchical industry/activity lists was the U.S. Standard Industrial Classification (SIC), which was established in 1939 and last updated in 1967. The SIC uses a four-digit code for the most detailed entries of the hierarchy. The North American Industry Classification System (NAICS) supplanted SIC in 1997. Although SIC is obsolete, it's still in common use by many mature small an medium-sized businesses (SMBs).

NAICS is the standard industrial taxonomy for the 1994 North American Free Trade Agreement (NAFTA). NAICS's six-digit codes define U.S. industries; five-digit code define industry names common to the three NAFTA signatories. The U.S. Bureau of the Census administers NAICS and updates it every five years. The 2002 update to NAICS has 667 six-digit U.S. industry codes, which include farming, manufacturing, wholesaling, and other sectors that aren't associated with retailing. The domain of the hierarchical NAICS ontology has 1,833 entries with two-, three-, four-, five-, and six-digit codes. You can download NAICS 2002 tables, flat files, or both, from links on this page.

NAICS is acandidate candiate for generating Google Base custom item type that users can search to determine the correct NAICS code for their business. For example, Figure 2 shows that the search for "computer" as a keyword returns 21 items from the NAICS Industry Taxonomy item type. Click here to open the Posted Items page.

Figure 2

Figure 3 is a screen capture of a typical NAICS item as seen by a typical user. Click here to open the OalLeafSystems's [sic] Items page.


Figure 3

Notice that the item type name appears, by default, as a single label attribute value. NAICS tables provide detailed description of most five-digit and six-digit NAFTA industry and all six-digit U.S. industry codes. Users can imply product from industry descriptions; industry descriptions also are useful for categorizing businesses for sale.

Note: A very complex Access (Jet) SQL query generates a table for exporting complete NAICS listings, which must include five- and six-digit NAICS codes, and adds LEFT JOINS for cross-references to SIC codes and descriptions, plus United Nations International Standard Industrial Classification (UN-ISIC or ISIC) Rev. 3.1 codes and descriptions. The table with unique-valued rows for all NAICS, and related SIC, and UN-ISIC codes contains 3,940 items, which include additions to NAICS codes for six-digit NAFTA industries with the sixth digit = 0. Additional items are added for NAICS codes that have more than one SIC or ISIC code. UN-ISIC codes aren't included in this section's screen captures. If you're interested in the Jet SQL or Transact-SQL query to create the sample table, please leave a comment.

Problem 3 (Google): All custom item type items you add expire in 30 days, regardless of the predefined expiration_date value you supply. Monthly expiration might be appropriate for Products items, but Taxonomies definitely should not expire. The appropriate solution is for Google to add an unmarked-by-default "Expires in 30 days" check box to the Specify a Bulk Upload File page. Optionally, users can then specify an expiration_date value; for example, 2007-12-31 would be appropriate for NAICS 2002 data, because NAICS will be updated in 2007.

An append query regenerates the Access 2003 table (NAICS2000Codes) shown by Figure 4 in datasheet view:


Figure 4

Notice that the NAICS_Code, NAICS_Title, SIC_Code, and SIC_Title column names use upper and lower case characters, rather than Google Base's standard lower-case naming convention. Appending the SIC_Code text to NAICS_Code text generates the unique id value.

Note: Figure 3 shows normalization of the column names to Details attribute names by replacing the underscore with a space. However, if you look carefully in Figure 2, Google Base demotes the item type name and Details attribute/value pairs to lower case.

Problem 4 (Google): Lower-casing mixed-case item names and attribute name/value pairs disguises names, values, or both that contain upper-case abbreviations or acronyms. Lower-casing isn't necessary for case-insensitive searches.

Problem 5 (Access 2003): An Access append query to an empty table—rather than a make-table query—is required because the description column often gains the Text data type—rather than the required Memo data type that can handle more than 255 characters. Most description values exceed a length of 255 characters. The id column is specified as the primary key to alert users to potential unique value errors.

Problem 6 (Access 2003): Exporting the content of Access Memo columns to TSV or CSV fields with the Export Text Wizard truncates the description value at 255 characters. The simplest method is to Export the table to an Excel 2003 workbook, add the c: prefix and optional :integer or :string suffix to the custom field name values, save the workbook in Text (Tab delimited) (*.txt) format, open the .txt file in Notepad and replace all double-quotes (") with nothing. Alternatively, export the table to SQL Server 2005 Developer edition or higher and use SQL Server Integration Services (SSIS) to export a LATIN-1-encoded TSV file.

Note: Excel 2003's Save As Text (Tab Delimited) selection doesn't enable specifying empty text qualifiers, so fields whose value starts with an alphabetic character gain "text" qualifiers. Using SSIS to create a LATIN-1 TSV file from an imported Access table is a complex process. Leave a comment if you're interested in this method, and I'll add the illustrated procedure in a subsequent item.

Bulk-unloading the correctly formatted TSV file—NAICS2002Codes.txt for this example—as the the NAICS Industy Taxonomy custom item type throws an "Attribute has too many values error" when attempting to upload the item on file line 174, as shown in Figure 5.


Figure 5

The full text expansion of the Bad Data for SIC Code 2099 is: Food Preparations, NEC (except bouillon, marshmallow creme, spices, extracts, peanut butter, perishable prepared foods, tortillas, tea, spices, dip mix, salad dressing mix, seasoning mix, and vinegar), which contains 15 commas.

Problem 7 (Google): The predefined label type permits up to 10 phrases that have a maximum of 40 characters each in the CSV (comma-separated-values) format without "text" text specifiers. Google Base treats the details values as the CSV text for label attribute values. The only current domain-wide workaround is to replace commas with semicolons in the description field. A better solution to the problem is for Google Base to add all CSV values, flag the item for editing, and prevent it from publishing until edits alter the offending values.

Added 12/12/2005: A related Google Base problem is inconsistent treatment of values containing commas in custom details fields. Figure 5A illustrates truncation of the uploaded UN-ISIC Title (hightlighted) for UN-ISIC Code 5233. The full uploaded text for this title is Retail sale of household appliances, articles and equipment. The clause preceding the comma is missing.



Figure 5A (Added 12/12/2005)

Click here to open the Items page for NAICS code 443112.
The title attribute truncates at 100 characters and adds an unrendered <b> ... </b> HTML tag to the value, as shown for NAICS code 334511 in Figure 6. Click here to open the OakLeafSystems's [sic] Items page.



Figure 6

The full title length for the entry shown in Figure 6 is 113 characters.

Problem 8 (Google): The help topic for the title attribute states that its maximum length is 80 characters. Tests show the correct value is 100 characters. Also, HTML tags aren't valid in Google Base attribute values. The workaround is to truncate the title attribute to a maximum of 100 characters and add ... to the truncated value.

Adding Labels to Filter Items by NAICS Sector, Subsector, and Industry Group
It's a common requirement of hierarchical lists to enable filtering items at upper levels of the hierarchy. For this example, the NAICS hierarchy consists of Sector (two digits) -> Subsector (three digits) -> Industry Group (four digits) -> Industry (five digits) -> NAFTA/U.S. Industry (six digits). There are one or more items for each Industry and NAFTA/U.S. Industry members. Although you can filter on parent numbers or strinconvenientmore covenient to let users specify filter criteria with one or more predefined label values. You can assihierarchy nine hiearchy values as a label value; one label value is reserved for the custom item type; NAICS Industrial Codes from OakLeaf_Systems replaces NAICS Industry Codes from OakLeafSystems for this example.

Note: The custom item type label doesn't appear in the Labels editing textarea until Google Base publishes the original or updated item. As of December 7, 2005, publishing appears to occur more quickly (one hour or less) than in the past (several hours or more).

Numerical values require descriptions to be useful. Thus, a typical label attribute value for the three NIACS parent codes is 11 - Agriculture; Forestry; Fishing ..., 111 - Crop Production, 1111 - Oilseed and Grain Farming or 81 - Other Services (except Public A ..., 811 - Repair and Maintenance, 8112 - Electronic and Precision Equi ....

Individual attributes are limited to 40 characters, so ellipsis indicates a truncated title.

Note: VBA code behind a simple form with a single command button iterates the enhanced NAICS2002Codes table and updates the value of the source table's label column with the appropriate label attribute. If you're interested in the VBA code for this process, add a comment.

Problem 9 (Google): Restricting the length of individual label attributes to 40 characters is an issue when constructing precise keyword filters such as those derived from hierarchical lists. There is no workaround, so Google should increase the maximum length of an individual label attribute to at least 80 (and preferably 100) characters for custom (or all) item types. Multiple lines in Posted Items lists and PostingAlias's items page to display label values are required regardless of the added length.

Note: To display the following figures from Google Base, open the application without signing, click the Google Base Beta image to return to the home page, type OakLeaf_Systems in the Find Items Posted by Others text box, and click Search Base to display the default Posted Items list. Type 33 in the search text box, and click the niacs industrial taxonomy item to display a few items from Sector 33.

Filtering by label attribute values leaves much to be desired in the current (December 7, 2005) beta version. Only one or two of the three hierarchy attributes appear in entries on the the Posted Items page and inclusion of label attribute values is inconsistent in the list. The More or Less option that appears with short label terms also is missing.

Figure 7 illustrates the first part of the the default initial Posted Items page for the enhanced NAICS taxonomy bulk-uploaded to Google Base on December 7, 2005.


Figure 7

Click here to open the current Posted Items page, which displays items in random order. Notice that NAICS code 335122 has no visible hierarchy attribute values; 333924 has one (333 - machinery manufacturing).

Figure 8 illustrates the OakLeaf_Systems's items page for an item thierarchythree hiearchy labels with lengths that exceed 40 characters. Notice that the page is missing the initial Sector label.


Figure 8

Note: The enhanced NAICS2002Codes.txt bulk-upload file (2.2 MB) includes UN-ISIC Rev. 3.1 codes and titles, which appear as Details attributes in Figure 8. The advantage of UN-ISIC codes is that they have a coordinated products taxonomy—United Nations Central Product Classification (CPC or UN-CPC) Ver 1.1. A similar NAICS-coordinated product taxonomy—North American Product Classification System (NAPCS)—is under development for 12 specific NAICS sectors. The initial NAPCS taxonomy is scheduled for completion by the end of 2005. Given governmental productivity issues in this area, don't hold your breath for imminent NAPCS page activation.

Click here to open the OakLeaf_Systems's items page for NAICS code 335122. Clicking either visible hierarchy item with trailing ellipsis returns no matching items.

Figure 9 illustrates the OakLeaf_System's items page for an industry with one fairly short label—333 - Machinery Manufacturing.


Figure 9

Click here to open the OakLeaf_Stystems's items page for NAICS code 333924. Clicking the 333 - Machinery Manufacturing hierarchy item returns 62 items. Click here to display the Posted Items list. Clicking the 3339 - Other General Purpose Machine ... hierarchy entry with trailing ellipsis returns no matching items.

Problem 10 (Google): You can't filter searches with label attribute values that aren't accessible in the Posted Items or PostingAlias (e.g. OakLeaf_Systems) items page. Sufficient space should be allocated to display all individual label attribute values on both pages. If this is impractical for Posted Items, it must be allocated on the PostingAlias items page.

Problem 11 (Google): Adding ellipsis to indicate partial individual label values prevents matching identical valhierarchyilter by hiearchy. Unless there is a simple workaround, the problem must be fixed.

Problem 12 (Google): The Refine Your Search element only displays the lower-cased custom item name; the remaining space for links usually is occupied by inactive Details items. The more ... or fewer ... links that appear for labels in the Products Posted Items list are missing from custom item lists. There is no valid reason for this omission.

Problem 13 (Google): Details items occupy space in the Posted Items list but serve no purpose there. Users should be able to click a Details attribute in this list to filter the list, rather than manually typing Details attribute values in the search text box.

Preliminary Problem Summary
One of the original examples of potential Google Base content cited by Google was the "genome of the 1918 influenza pandemic," according to the November 15, 2005 "Google Base service goes live" article by CNet's Linda Mills. There's considerable controversy about publishing the structure of or recreating the 1918 virus, as Jamais Cascio reports in his original "Sequencing the Killer Flu" and later "Safety in Knowledge" posts. Mills' earlier "Google wants your car listings, events" article includes a reference to "database of protein structures" from a late-October 2005 pre-beta screen capture. Google references to "genome" and "protein structures" have disappeared with good reason: Current Google Base search and filtering capabilities for custom item types are far too primitive and restrictive for sophisticated scientific searches.

DNA Direct's Jason R. Bobe quotes from Chapter 26, "Googling Your Genes," of The Google Story, published November 15, 2005:
One of the most exciting Google projects involves biological and genetic research that could foster important medical and scientific breakthroughs. Through this effort, Google may help accelerate the era of personalized medicine ...

Over dinner and plenty of wine in February 2005, Sergey Brin discussed the prospects for genetics and Google with the maverick biologist Dr. Craig Venter... Not long after the dinner in California, Brin and Page teamed up with Venter...[who is quoted as saying:] "Working with Google, we are trying to generate a gene catalogue to characterize all the genes on the planet and understand their evolutionary development. Geneticists have wanted to do this for generations." Over time, Venter said, Google will build up a genetic database, analyze it, and find meaningful correlations for individuals and populations. [It is utilizing the 30,000 genes discovered by Venter and scientists from the National Institutes of Health when they were racing to beat one another to map the human genome.]
Note: You can read a longer Chapter 26 excerpt courtesy of the Washington Post and learn more about the Human Genome Project from Nature's "Genome Gateway" portal.

Odeo's November 22, 205 Future Tense podcast from American Public Media, "Googling your genes" offers a brief interview with The Google Story's co-author, David Vise, about searching genetic information in a Google Base context.

However, eWeek's Ben Charney quotes an unnamed Venter representative in his November 21, 2005 "Google Gene Project Comes into Question" article: "We do not have any ongoing projects with Google."

Pedro Beltrão casts a jaundiced eye on Google Base as a means of sharing scientific information—such as simple protein sequences as a biological sequence custom item type—in his November 17, 2005 "Google Base and Bioinformatics II" post. Click here to display a sample DNA sequence with associated images. Pedro follows up with a later "Google Base simple tricks" post that offers examples of manually-entered search URLs.

Google Base Beta has a long way to go before it can be considered a truly universal, searchable database. If you can't create an easily searchable and filterable flat table that represents a simple hierarchical industrial taxonomy, it's not likely that Google Base will be useful to store and search/filter items other than the current short-list of predefined item types. Thus, it's not surprising that Google's early references to "genome" and "protein databases" have disappeared.

Technorati: