Wednesday, November 16, 2005

Google Base and Bulk Uploads with Microsoft Access

I learned that Google Base went live last night from a soundbyte on my local NBC TV affiliate's (KNTV) 5:00 am local news. So I checked Memeorandum for the details, read the Google Base announcement, created a Google account, read a few help files, looked over a few sample tables, and decided to try a simple Bulk Upload operation from a tab delimited *.txt file.

Access 11+ offers the simplest, rapid-application-development (RAD) approach to creating and exporting tables in a variety of formats, including tab-delimited text files. Creating a custom table requires running a make-table query that aliases field names to Google's required standard attribute names—such as title and id—optional standard attribute names, like currency and quantity, and custom attribute names. Click here for a list of standard and optional attribute names and value restrictions.

I found a few gotchas in the tab-separated file bulk upload process, which Google describes only for Excel worksheets, so here's a preview of a bulk-uploaded product list. An outline of the process for generating and uploading a NwindProducts.txt file that you create from a Make-Table query against the Northwind Categories, Products, and Suppliers tables follows the preview.

Update: Google Groups has a Google Base Help Discussion group with a few messages in Basics, Bulk Uploads, Individual Items, Search Results, and Miscellaneous categories. Bulk Upload Preview For a preview of the initial list of NwindProducts items in the Google Base UI, click here to search for the two Sir Rodney's products, which opens the following page:

Click the Sir Rodney's Marmalade link or click here to open the details page shown here:

Click the All Items by OakLeaf Systems link or click here to display a list of all the uploaded items, which has links to detail pages similar to the preceeding. OakLeaf's authorid attribute value is 1016205.

Beta Version Issue: The source of the 44 results count is a mystery—perhaps a Beta version problem related to publishing. Overnight, the number of results increased to 77 because Google Base doesn't delete items with 'id' values that aren't included in subsequent bulk updates.

Hopefully, problems shown above with high-order ANSI characters in manufacturer attribute values will be corrected in future versions.

Completing Your Personal Profile
If you have or create a Google account, which you need for most Google software-as-a-service (SaaS) applications, you'll probably find it worthwhile to add the additional default attribute values that apply to Google Base only. The following screen capture displays the fields common to most Google SaaS apps:

With the exception of the Web Site field, the Google Base-specific fields and their default values for the Products and related Item Types are shown here:

Notice the integration of Google Maps with the Item Location attribute value.

Creating the Tab-Delimited File with Access 11
1. Create a SELECT query with column aliases to corresponding standard attribute names, as shown here for three of the eight columns of the MakeNwindProducts query:

Notice that USD is a constant for all currency cells. Here's the Jet SQL statement for the Make-Table version of the SELECT query:

SELECT Products.ProductName AS title,
   Categories.CategoryName AS product_type,
   Products.QuantityPerUnit AS description,
   Products.ProductID AS id,
   Format(UnitPrice, "#0.00") AS price,
   "USD" AS [currency],
   Products.UnitsInStock AS quantity,
   Suppliers.CompanyName AS manufacturer
INTO NwindProducts
FROM Suppliers INNER JOIN (Categories
   INNER JOIN Products ON
     (Categories.CategoryID =
      Products.CategoryID) AND
     (Categories.CategoryID =
      Products.CategoryID)) ON
     (Suppliers.SupplierID =
WHERE (((Products.Discontinued)=No))
ORDER BY Products.ProductID;

2. Execute the Make-Table query to create the NwindProducts table, which should appear as follows:

3. Open the NwindProducts table in Design mode, select the currency field, and set the Unicode Compression property value to Yes, as shown here:

Warning: If you don't make this change, the USD constant will appear in the exported text file as six Unicode bytes, and Google Base will reject your upload.

4. With the NwindProducts table selected, choose Export to open the Export 'NwindProducts' To ... dialog, select Text Files (*.txt; *.csv; *.tab; *.asc) in the Save as Type list, navigate to the appropriate folder in which to save the file —C:\GoogleBase for this example—and type NwindProducts.txt as the file name. Click Export to open the Export Text Wizard.

5. Accept the default Delimited option and, if you haven't created an Export Format previously, click Next, select the Tab option, mark the Include Field Names on First Row option, and select {None} as the Text Qualifier, as shown here:

Warning: If you don't make the preceding changes, your upload will fail.
Click Next.

6. Click Advanced to open the NwindProducts Export Specification dialog, and accept the previously set values for File Format, Field Delimiter, and Text Qualifier; and the the default Language (English) and Code Page (Western European - Windows). Set the Date Order to YMD, the Date Delimiter to - (hyphen), mark the Leading Zeros in Dates check box, and accept the remaining defaults, as shown here:

Caution: The Google help file states that bulk import accepts UTF-8, Latin1, or ASCII encoding. The Export Specification doesn't include a Latin1 option, but the default Western European (Windows) code page is the equivalent of Latin1. Selecting the UTF-8 code page causes incorrect display of characters with diacriticals in the Beta version; selecting US-ASCII omits diacriticals.

Note: The date settings create date values that comply with the XML Schema date datatype—such as 2005-09-05. The sample table doesn't include Jet DateTime fields.

7. Click Save As to open the Save Import/Export Specification input box, change the default Name to NwindProducts Export to Google Base and click OK twice to save the specification. Click Next, verify the path and file name for export, and click Finish to save the tab-delimited file.

8. Open the file in Notepad and verify that the currency field doesn't include a $ prefix and USD appears in the currency column, as shown here:

Uploading the Tab-Delimited File to Google Base

9. From Google Base's home page, click the Bulk Upload Files link

10. If want to set up an FTP account, follow the Google instructions for obtaining an account for the Google Base FTP server. An FTP account isn't required for the Direct Upload default .

11. If you haven't previously uploaded a file, open the Specify a Bulk Upload File page and type NwindProducts.txt as the file name, select Products as the item type, and accept the default English and US Dollar selections, as shown here:

Beta Version Limitation: You can only upload a single feed from a Google account.

12. Click the Specify Bulk Upload File button to specify the file to upload.

13. Click the Upload and Process This File link. If the tab-delimited file's data format and content is correct, the following status page opens:

14. Return to the Google Base home page and your uploaded items appear as shown here (after a few hours):

Hopefully a Beta Version Problem: It takes an inordinately long time to fully publish a compact list of 69 items (the 77 items appears to be an interim count). After about four hours, only six of the items gained 'Published' status. Overnight all 77 items became 'Published.' Is Google Base running on Sun Servers by any chance?

Definite Beta Version Problem: The "How do I delete my item?" help topic states: "If you're updating your items via bulk uploads, simply delete the item(s) from your bulk upload, then submit the bulk upload again." Deleting the 8 discontinued items from the tab-delimited file did not delete them from the list. Discontinued items—such as Alice Mutton or Perth Pasties—appear with a modified time of 12:23 p.m. All non-discontinued items—such as Sir Rodney's Scones—have 1:49 p.m. as the modified time.

15. To edit an entry, click the Edit link for an item that has 'Published' status to display the autogenerated editing form, as partially shown here:

16. To display the following list of [some] OakLeaf entries, clidk the All Items by OakLeaf Systems link:

Hopefully a Beta Version Problem: The source of the 44 search results count is a mystery.

Initial Conclusions
Google Base's beta incarnation doesn't appear to me to be a serious competitor to eBay, CraigsList, QuickBase, or other forms-based database applications offered as SaaS Web apps. The apparent lack of ability to create multiple named tables having identical item_type attributes is a serious limitation, as is the inability to make lists accessible only to specific Google or other user accounts, group roles, or the like.

Other Google Base watchers have posted entries with similar reservations about the immediate utility of the service other than for classified advertising listings. As time permits, I'll add additional links to comments regarding Google Base's initial commercial viability. In the meantime, check this later Memeorandum page snapshot. Stay tuned for the Google Base blog, which currently requires a login ID and password—as did Google Base itself just before the beta went public.

If Google lifts is ban against multiple Bulk Updates from a single account, I'll run a similar test with SQL Server 2005's Adventure Works product listings and SQL Server Integration Service's (SSIS) text export feature.

Initial tests with an Atom 0.3 (Atom.xml) file generated by Blogger for the OakLeafBlog, saved as a local XML file with FireFox 1.5 RC2, and Bulk Uploaded as the Reference Articles item type showed several problems. The description attribute contains HTML markup and error messaages state that the value is limited to a maximum of 10,000 characters. Thus only the shorter OakLeafBlog articles publish to the list; HTML markup contributes substantially to description length.

Help Center's "What do I include in 'Description'?" topic says "Please ensure that the description does not contain any HTML as we don't currently recognize or display HTML tags in your item." Help Center also says the maximum description length is 1,000 characters. Neither statement appears to be true.
I also plan to give XML files and RSS 2.0 feeds a try with bulk uploads. Click here for a detailed list of all Google Base XML attributes.