Sunday, October 02, 2005

Blog Search Engine Comparisons - Google, MSN, Yahoo!, Technorati, Feedster and Ice Rocket

You might have noticed that I've been adding Technorati subject tags to the end of my posts since mid-September. The most common tags are , , and DLinq because I've been testing and writing about Microsoft's new Language Integrated Query, Visual Basic 9.0, and C# 3.0 enhancements since their September 13 public release as technology previews at the Professional Developers Conference (PDC) 2005 in Los Angeles. Update 10/5/2005: Technorati responded to my technical service request of 10/3/2005 on 10/4/2005 and respidered the OakLeaf blog, which corrected the problems with the missing tagged posts, as you can see from clicking the following two links and DLink. The blog-wide topic tags are operational; give XLINQ a try, expand the Tagged link, and click a topic tag, such as SQL Server 2005. Original 10/3/2005: Posts I tagged with LINQ and XLink from September 13 to 26 appear on Technorati Tags pages, as you can see by clicking the LINQ and XLinq links in the preceding paragraph. However, the five posts after September 26 don't appear on the Tags pages, despite having manually pinged the Technorati site repeatedly. DLinq tags I added after September 26 don't register in the tag pages, either. I also added 16 topic tags to Technorati's Blog Finder on September 30. Although the topic tags appear in my Account Profile, searches on any of the added tags don't return links to OakLeaf pages. The effect is as if Technorati embargoed my blog site on September 27. This problem piqued my curiousity regarding the efficacy of other search services with blog posts, so here are search results as of October 3, in descending order of the number of valid hits: Google Blog Search and Blogger Search for LINQ blogurl: returned all 17 posts through October 1 and 18 items after this October 2 post. Feedster Search for LINQ and RSS URL for the Atom feed returned 15 posts. (The first page gave 28 and the second page gave 16 as the result count. 15 is a manual count.) Technorati Search for LINQ and rj originally returned 12 hits of which 11 were my posts on September 26 or before. Update 10/5/2005: This search now returns 19 posts in the last 67 days. Thus Technorati moves to number 1 position in the search results. Ice Rocket Search for LINQ and rj returned 10 hits with September 29 as the most recent date on the first try, then expanded to 42 posts. Searching for LINQ and Author=rj returned 43 posts starting with the latest entry (October 3). Tag=LINQ AND rj returns 39 hits. Most of the 30+ results were duplicates. The duplicates might be a problem with detecting posting dates versus dates of minor post edits or updates. MSN Search with and LINQ returned 9 posts. Note that you must open the Settings page, clear the 'Group results from the same site' checkbox, and click Save to obtain more than two posts per site. Google Search for oakleafblog and LINQ returned 8 non-similar hits. Google Reader (beta, requires registration) finds 10 sources for LINQ, but none from this blog. A search on the OakLeaf blog address returns all current posts; strangely, adding LINQ returns only one. Yahoo! Search with and LINQ OR DLinq OR XLinq returned 1 post on September 18. As of 10/7/2005, it returns 9 or 10 rather random results; some of the links don't include LINQ, XLink, or DLink in the post. Repeated searches appear to increase the number of results. (Your results might differ due to posts added after the preceding tests.) Obviously Google Blog Search is the champ in this case, but that's understandable because Google owns Blogger. Business Week's Stephen Baker calls Google Blog Search "lackluster," but it works for me. (Blogger uses Google Blog Search for Blogger Search). Technorati Text Search doesn't return OakLeaf posts after September 26. (Update: All OakLeaf posts through 10/3/2005 are now returned.) 50% fewer links from MSN Search than Google Blog Search and only a single hit from Yahoo! Search surprised me. Maybe Yahoo is minimizing blog responses in preparation for their future blog and RSS search tool. So far, I haven't found an explanation for the Technorati problems. I'm waiting for Technorati's tech service team to respond to a "My site is not properly indexed" problem report. (Update: Technorati responded to my tech service request and respidered the site on 10/4/2005.) --rj Technorati: Updated 10/3/2005: Added Feedster and Ice Rocket Search results. Ice Rocket's Blogs Trend Tool provides a time-based graphical display of the percentage of all blogs that mention up to three search terms, such as LINQ, XLinq, and/or DLinq. I would have added Bloglines,if this Ask Jeeves offshoot had returned results from a LINQ search dated later than December 8, 2004. Yahoo!'s Russell Beattie asks "What's up with Bloglines," but doesn't answer the question. Inexplicably, Barry Diller intends to drop the Ask Jeeves trademark, which seems to me to be the operation's primary asset. Updated 10/4/2005: See inline updates for fix by Technorati technical service. Updated 10/8/2005: The Wall Street Journal posted a "New Search Engines Help Users Find Blogs" story on 10/7/2005 (free), which contains the following controversial quotation: "David Sifry, chief executive of Technorati, says his company gets an edge from exclusive deals in which some blog-hosting companies ping Technorati before anyone else. After receiving a heads-up, Technorati visits the blog and updates its database." Business Week's Steven Baker comments on the quote in "Is Technorati making deals to get data first?" Jon Udell's "What is the future of open blog infrastructure?" item comments on Baker's story. Personally, I doubt if "exclusive dealers" would add the complexity required to delay pings to Technorati's "competitors." Also added results for new beta version of Google Reader. Differences in results from Google Blog Search are surprising. Looks to me as if Google Reader needs some work.