Monday, April 30, 2012

Flurry of Outages on 4/19/2012 for my Windows Azure Tables Demo in the South Central US Data Center

•• Updated 4/30/2012 with root cause analysis (RCA) details.

Updated 4/26/2012 with details of similar outage in the North Central U.S. data center on 4/25/2012 and another South Central U.S. data center outage on the same date. See items marked near the end of this post.

My OakLeaf Systems Azure Table Services Sample Project (Tools v1.4 with Azure Storage Analytics) demo application, which runs on two Windows Azure compute instances in Microsoft’s South Central US (San Antonio) data center incurred an extraordinary number of compute outages on 4/19/2012. Following is the Pingdom report for that date:

imageimage

image

The Mon.itor.us monitoring service showed similar downtime.

This application, which I believe is the longest (almost) continuously running Azure application in existence, usually runs within Microsoft’s Compute Service Level Agreement for Windows Azure: “Internet facing roles will have external connectivity at least 99.95% of the time.”

The following table from my Uptime Report for my Live OakLeaf Systems Azure Table Services Sample Project: March 2012 post of 4/3/2012 lists outage and response time from Pingdom for the last 10 months:

image

The Windows Azure Service Dashboard reported Service Management Degradation in the Status History but not problems with existing hosted services:

image

[RESOLVED] Partial Service Management Degradation

19-Apr-12
11:11 PM UTC We are experiencing a partial service management degradation in the South Central US sub region. At this time some customers may experience failed service management operations in this sub region. Existing hosted services are not affected and deployed applications will continue to run. Storage accounts in this sub region are not affected either. We are actively investigating this issue and working to resolve it as soon as possible. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.

20-Apr-12
12:11 AM UTC We are still troubleshooting this issue and capturing all the data that will allow us to resolve it. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.

1:11 AM UTC The incident has been mitigated for new hosted services that will be deployed in the South Central US sub region. Customers with hosted services already deployed in this sub region may still experience service management operations failures or timeouts. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.

1:47 AM UTC The repair steps have been executed and successfully validated. Full service management functionality has been restored in the affected cluster in the South Central US sub-region. We apologize for any inconvenience this has caused our customers.

However, Windows Azure Worldwide Service Management did not report problems:

image

I have requested a Root Cause Analysis from the Operations Team and will update this post when I receive a reply. See below.

Mon.itor.us reported that my OakLeaf Systems Azure Table Services Sample Project (Tools v1.4 with Azure Storage Analytics) demo suffered another 30 minute outage on 4/25/2012 starting at 8:10 PM.

•• Avkash Chauhan, a Microsoft Sr. Escalation Engineer and frequent Windows Azure and Hadoop on Azure blogger, reported the root cause of the outage as follows:

At approximately 6:45 AM on April 19th, 2012 two network switches failed simultaneously in the South Central US sub region. One role instance in your ‘oakleaf’ compute deployment was behind each of the affected switches. While the Windows Azure environment is segmented into fault domains to protect from hardware failures, in rare instances, simultaneous faults may occur.

As this was a silent and intermittent failure, detection mechanisms did not alert our engineering teams of the issue. We are taking action to correct this at platform and network layers to ensure efficient response to such issues in the future. Further, we will be building additional intelligence into the platform to handle such failures in an automated fashion.

We apologize for any inconvenience this issue may have caused.

Thank you,

The Windows Azure team


• The Windows Azure Operations Team reported [Windows Azure Compute] [North Central US] [Yellow] Windows Azure Service Management Degradation on 4/24/2012:

Apr 24 2012 10:00PM We are experiencing a partial service management degradation with Windows Azure Compute in the North Central US sub region. At this time some customers may experience errors while deploying new hosted services. There is no impact to any other service management operations. Existing hosted services are not affected and deployed applications will continue to run. Storage accounts in this region are not affected either. We are actively investigating this issue and working to resolve it as soon as possible. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.

Apr 24 2012 11:30PM We are still troubleshooting this issue, and capturing all the data that will allow us to resolve it. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.

Apr 25 2012 1:00AM We are working on the repair steps in order to address the issue. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.

Apr 25 2012 1:40AM The repair steps have been executed successfully, the partial service management degradation has been addressed and the resolution verified. We apologize for any inconvenience this caused our customers.

Following is a snapshot of the Windows Azure Service Dashboard details pane on 4/25/2012 at 9:00 AM PDT:

image

The preceding report was similar to the one I reported for the South Central US data center above. That problem affected hosted services (including my Windows Azure tables demo app). I also encountered problems creating new Hadoop clusters on 4/24/2012. Apache Hadoop on Windows Azure runs in the North Central US (Chicago) data center, so I assume services hosted there were affected, too.

Update 4/26/2012: Microsoft’s Brad Sarsfield reported in a reply to my Can't Create a New Hadoop Cluster thread of 4/25/2012 on the HadoopOnAzureCTP Yahoo! Group:

Things should be back to normal now. Sorry for the hiccup. We experienced a few hours of deployment unreliability on Azure.

Update 4/25/2012 9:30 AM PDT: The problem continues with this notice:

Apr 25 2012 3:12PM We are experiencing a partial service management degradation with Windows Azure Compute in the North Central US sub region. At this time some customers may experience errors while carrying out service management operations on existing hosted services. There is no impact to creation of new hosted services. Deployed applications will continue to run. Storage accounts in this region are not affected either. We are actively investigating this issue and working to resolve it as soon as possible. Further updates will be published to keep you apprised of the situation. We apologize for any inconvenience this causes our customers.


Rememberence of Things Past: Fluidyne Instrumentation, Urethane Foam, Artificial Kidneys and Flowmeters

imageI started Fluidyne Instrumentation from my Oakland, California apartment in 1970, after selling my first company, Polytron Corporation to Olin Corporation (previously Olin Mathieson Chemical Corp.). Olin was best known for its Winchester rifles and ammunition but had ventured (mis-adventured as it turned out) into the chemical business. By 1970, Polytron was the largest manufacturer in the US of rigid polyurethane foam (R-PUF) components, trademarked Polycel and Autofroth, with chemical plants in Richmond, California and Brookpark, Ohio. Rigid [poly]urethane foam is commonly used to insulate refrigerators and freezers; refrigerated warehouses, trucks and rail cars; raise sunken ships; and manufacture surfboards by foam-in-place techniques. Polytron also manufactured (in Richmond) machines to mix and dispense R-PUF for insulation and marine salvage.

Below is an early Fluidyne foam dispensing machine with a reciprocating mixing head for filling 4-foot by 8-foot insulated building panels for refrigerated warehouses:

A 3,450-rpm motor drove the pin-type mixer in this close-up of the foam mixing head for the dispensing machine pictured above:

Fluidyne had its offices for several years in the second and third floors of this architecturally significant “flatiron” style building at 17th St. and San Pablo Ave, Oakland, 94612.

Image courtesy of Catherine Haley

Urethane Foam for Surfboards and Salvaging Ships

My No More U.S. Custom Surfboards? Squidoo Lens describes a major event in the foam surfboard industry and a brief description of the use of urethane foam to raise the Navy’s U.S.S. Frank Knox:

Salvaging a Beached Destroyer with Polyurethane Foam
A bit off the foam-surfboard topic but an interesting foam-related story

In the early 1960s, Polytron developed and patented a method for raising sunken ships with urethane foam, which led us into some interesting projects for the U.S. Navy. Our largest project in 1965 was refloating the U.S.S. Frank Knox (DDR-742), which had run hard aground (at 16 knots) on Pratas Reef in the South China Sea.

The official U.S. Navy story and a brief third-party account of the salvage effort make interesting reading. Unfortunately, the Time and Newsweek articles about the grounding aren't available on the Web.

As mentioned in the official account, use of explosives to free the vessel resulted in blowing much of the foam back out of its hull. Thus, Polytron made multiple air/sea shipments of more than 100,000 pounds of foam components for the project. The Navy's final challenge was chopping and scraping the foam out of the vessel at Subic Bay prior to a trip to Yokosuka Japan for remaining repairs.

Subsequently, William E. Lowery, a former Polytron vice-president, and I started a company called Polycel One, which manufactured a single-component semi-rigid urethane foam sealant of the same name delivered from an aerosol can. This clip from the October 1980 issue of Popular Science magazine shows Polycel One in action:

image

We sold Polycel One to W. R. Grace and Co. in 1979.

Fluidyne Analytical Instruments for Polyurethane Foam

Fluidyne’s initial product line consisted of instruments to measure the reactivity of urethane components by determining the foam’s rate-of-rise after mixing the components together,  machines for accurately dispensing very small amounts of urethane foam and elastomers (rubber), trademarked Microshot, and ultimately positive-displacement fluid flowmeters, automotive fuel economy measurement systems, and computer-based data acquisition and control systems. Max Machinery Co., Lafayette and later Healdsburg, California, owned by John K. Max (deceased), manufactured the instruments, dispensing machines, and flowmeters for Fluidyne.

The Journal of Cellular Plastics published my “Instrumental Analysis of the Performance Characteristics of Rigid Urethane Foaming Systems” article in its May, 1969 issue (subscription to SAGE periodicals required). 

L. H. Hanusa and R. N. Hunt’s The Microcomputer in the Laboratory: Data Acquisition and Calculation of Foam Reactivity Profiles published in the Journal of Cellular Plastics, Volume 18 (2), January 1982. From the introduction:

Foam reactivities are determined in a variety of ways. The most prevalent characterization is visual observation and the determination of discrete values. For example, with lab or machine free rise samples, values such as initiation, gellation or string point, tack, and rise time are generally determined with just a stick and a stopwatch. In 1964, ASTM issued a Standard Method (D2237) for determining the “Rate of Rise Properties of Urethane Foam Systems.” The method describes a “continuous” measurement of the volume expansion of Urethane foam systems as a function of time. A comprehensive paper by Roger Jennings (ref. 1) described both method and equipment for the “instrumental determination of various characteristics of rigid urethane foaming systems.”

In fact, articles continue to be published describing either simplified or specialized techniques for studying the expansion polymerization of urethane-like foam systems. The equipment described by R. Jennings was produced by “Fluidyne Instrumentation.” In fact, many “Fluidynes” can be found throughout the Industry. Some have been modified either for special applications or to improve the operation of the equipment. In our own labs, we have such modified “Fluidyne Equipment” along with equipment developed by Bayer AG (ref.2).

The equipment was extensively used for quality control of our products. Despite this, it was often bypassed in our development efforts because of time constraints. The problem was the laborious and time consuming steps required to obtain data from the charts and carry out subsequent calculations. To overcome these problems, we decided to interface a Radio Shack TRS-80 microcomputer with the Fluidyne equipment. The conversions greatly increased the usefulness and facilitated the operation of this equipment. Rapid changes in integrated microcircuitry provide alternatives to traditional analog circuits. Selected applications are described. …

This paper was originally presented at the [Society of the Plastics Industry] (SPI) Polyurethane Division 26th Annual Technical Conference, November 1-4, 1981, Fairmont Hotel, San Francisco, CA.

Urethane Elastomer Processing Equipment for Artificial Kidneys and Skateboard Wheels

Over the years, Fluidyne urethane dispensing equipment became the world standard for manufacturing hollow-fiber kidney hemodialysis cartridges. The cartridges consist of a plastic cylinder containing thousands of strands of hollow synthetic fibers anchored at both ends by cast-in-place urethane elastomer dispensed from a Fluidyne machine. The machines injected liquid urethane components into cartridges spinning in a centrifuge; spinning kept the urethane at the ends of the cylinder. After curing, a microtome sliced the relatively soft ends to enable blood to flow through the fibers, as shown in the diagram below (courtesy of Wikipedia):

image 

Another major market for urethane elastomer processing equipment was the production of skateboard and rollerblade wheels.


Positive Displacement Flowmeters for Measuring Fuel Economy

Fluidyne became well known for its positive displacement (piston) low-flow meters, which the world’s major automobile manufacturers and their suppliers used for measuring vehicular fuel consumption. Here’s the first Fluidyne Model 214 Positive-Displacement Flowmeter manufactured for me by Max Machinery:

This view shows the same flowmeter (before black anodizing) with Fluidyne’s first analog flow rate transmitter, which consisted of a small d-c generator magnetically coupled to the Brooks piston flowmeter mechanism sandwiched between two machined flanges:

Caterpillar Tractor Co. used Fluidyne 214 flowmeters with the 5P2150 fuel flow meter, which Fluidyne manufactured for the firm under contract.

One of the few references to Fluidyne piston flowmeters on the Internet today is in United States Patent 4192185, Flowmeter for Liquids, published on 3/11/1980:

Mention should be made of the fact that a flowmeter for liquids is known (see "Precision Automotive Fuel Economy Testing System" published by Fluidyne Instrumentation, O[a]kland, California), in which four measuring cylinders are provided. The pistons within these cylinders are connected via piston rods to a common crank shaft and a single counting element is provided to count the piston strokes of the measuring piston. This flowmeter is, naturally, extremely expensive and difficult to fabricate and, because of the more complex mechanism, may not be as reliable in practice as a system using free-floating measuring pistons.

Fluidyne piston flowmeters weren’t extremely expensive; they cost less than competing methods for measuring low flow rates, such as fluid wheatstone bridges, which were then used for automotive fuel flow measurement.

image Fluidyne Helix flowmeters gained widespread use for measuring larger flows of more viscous liquids, such as bunker fuels for cargo and passenger vessels and diesel fuel for large generators. At the right is a copy of an advertisement for Helix flowmeters from the November 1977 issue of Chemical & Engineering News magazine.

imageMax Machinery, Inc. (@MaxFlowMeters) continued to manufacture piston, gear and helical flowmeters after protracted litigation with Fluidyne in the Northern District of California federal court over the rights to product designs, registered trademarks and copyrights. The litigation ultimately concluded in Fludyne’s favor with a 7-figure settlement from Max Machinery.


Fluidyne Data Acquisition Systems Using Electronic Calculators and Computers

Fluidyne manufactured electronic data acquisition systems for its flowmeters in a plant in Santa Rosa, California. Wang Laboratories announced the Wang 700 programmable calculator and priced it at US$4,900 in 1969. I purchased one in 1970 to perform gas flow calculations for an apple vacuum-drying plant in Sebastopol, California I was designing at the time for Vacu-Dry Company. (At that time, running Fluidyne wasn’t a full-time job.) As a side note, Vacu-Dry and Polytron shared the same patent and trademark attorney firm, Eckhoff, Hoppe, Slick, Mitchell & Anderson of San Francisco, California. Ernest Anderson’s advice regarding protection of Fluidyne trademarks and copyrights was critical to Fluidyne’s success with its litigation with Max Machinery.

The Wang 700 calculator I received had a connector marked I/O on the back. I was interested in using the Wang 700 to make real-time urethane foam rise-rate and pressure calculations from Fluidyne foam instruments, as well as liquid flow measurement calculations by connecting it to one or more flowmeters. I called Wang labs and talked to Dr. An Wang about the connector’s purpose, but it turned out that it wasn’t wired. (Dr. Wang often answered phone after working hours in those days.) He sent a tech to the shop to wire the connector, and I designed an interface with wire-wrapped DEC DTL (digital-transistor logic) boards to handle the BCD (binary-coded decimal) output of the flowmeter electronics.

One of the most interesting applications for data acquisition systems with Wang calculators was an automated system for screening carriers of Tay-Sachs disease with an Abbott Laboratories ABA-100 Bichromatic Analyzer developed by physicians from Toronto’s Hospital for Sick Children. Fluidyne manufactured the electronic interface between the ABA-100 and the Wang Calculator under contract to Abbott Laboratories, Inc. and assisted in programming the automated screening tests. The FDA required Fluidyne to conform to its Good Manufacturing Processes (GMPs) for medical devices, which included NASA-grade component soldering. The Fluidyne Interface was instrumental in later screening procedures for Sickle Cell Anemia carriers.

Fluidyne continued producing interfaces for Wang computers, including the 2200 series (which had a BASIC interpreter in ROM), the Commodore CBM (a commercial version of the Commodore PET), DEC PDP-11s, and (ultimately) IBM-compatible PCs.

Programming Intel Personal Computers with Microsoft Windows and Writing Books, Articles and Blogs

image After the litigation with Max Machinery concluded, I decided to write computer software. I wrote a program to manage the business of American Leak Detection franchisees in dBASE III+ Developer Edition (compiled with Clipper), a few other dBASE front ends, and then a WordPerfect macro to Microsoft WordBasic converter for my wife, Alexandra. I became acquainted with Ron Person, who wrote Special Edition Using Microsoft Word and Special Edition Using Microsoft Excel for QUE Books, at the Bay Area Microsoft Word Users Group and became technical editor for his then-current Word title. He was too busy to write Special Edition Using Microsoft Access and recommended me as the author to QUE. The rest is history, including 12 more editions of Special Edition Using Microsoft Access, the latest edition of which was renamed to Microsoft Access 2010 In Depth.

My Amazon Author Page lists the 30+ books I’ve written about Microsoft operating systems and related software.

For decade-old details of my proclivity for BASIC programming, see my An (Almost) Lifetime of BASIC post of 4/25/2006, which contains the archive of my entry in Apress's 2001 VB @ 10 Years Project as previously cached by Google.com. However, since about mid-2010, I’ve adopted .NET’s C# as my programming language of choice. I became a contributing editor for Fawcette Technical Publication’s Visual Basic Programmers Journal (VBPJ) and wrote many cover articles for it and its successor, Visual Studio Magazine.

Subsequent Events

I currently curate the OakLeaf Systems and Android MiniPCs and TVBoxes blogs and write articles for TechTarget’s SearchCloudComputing.com site and other cloud computing Web sites.

Most of Fluidyne’s registered trademarks for the products described above have expired because of disuse.

My apologies to Marcel Proust for using the earlier English translation of his À la recherche du temps perdu classic.


Friday, April 27, 2012

Creating a Private Data Marketplace with Microsoft Codename “Data Hub”

•• Updated 5/11/2012 with change of dataset name from US Air Carrier Flight Delays, Monthly to US Air Carrier Flight Delays to reflect change of SQL Azure dataset to new multi-month/year format. (See my Creating An Incremental SQL Azure Data Source for OakLeaf’s U.S. Air Carrier Flight Delays Dataset post of 5/8/2012 for more information about the new dataset.)
• Updated 4/30/2012 with link to OakLeaf’s new US Air Carrier Flight Delays, Monthly (free) data set on the public Windows Azure Marketplace DataMarket.
Contents:

Introduction

SQL Server Labs describes their recent Codename “Data Hub” Consumer Technical Preview (CTP) as “An Online Service for Data Discovery, Distribution and Curation.” At its heart, “Data Hub” is a private version of the public Windows Azure Marketplace DataMarket that runs as a Windows Azure service. The publishing process is almost identical to the public version, except for usage charges and payment transfers. “Data Hub” enables data users and developers, as well as DBAs, to:

  • Make data in SQL Azure discoverable and accessible in OData (AtomPub) format by an organization’s employees
  • Enable data analysts and business managers to view and manipulate data from the Marketplace with Service Explorer, Excel, and Excel PowerPivot
  • Publish datasets for further curation and collaboration with other users in the organization
  • Federate data from the Windows Azure Marketplace DataMarket for the organization’s employees (in addition to the organization’s uploaded data)

The initial CTP supports the preceding features but is limited to SQL Azure as a data source and OData (AtomPub) as the distribution format. Microsoft is considering other data source and distribution formats.


•• Note: OakLeaf’s US Air Carrier Flight Delays, Monthly data sets are publicly accessible at https://oakleaf.clouddatahub.net/ by clicking the Government or Transportation category link. To issue a query with Data Explorer, do this:

  • Click the Sign In button at the top right of the page
  • Log in with your Windows Live ID
  • Click the US Air Carrier Flight Delays link on the landing page
  • Click the Add to Collection button
  • Click the US Air Carrier Flight Delays link in the My Collection page
  • Click the Explore Data button to open the Data Explorer
  • Click the Run Query button to display the first 23 rows of data:

image

For a more detailed description, see the Exploring the User’s Experience at the end of this post.

Alternatively, you can register with the public Windows Azure Marketplace Datamarket and then subscribe to the same datasets from OakLeaf’s new US Air Carrier Flight Delays data set. After you subscribe to the free dataset, you can also use it as a data source for Apache Hadoop on Windows Azure Hive tables.


Obtaining CTP Invitations

“Data Hub” and the “Data Transfer” CTP are invitation-only CTPs. You can request an invitation by clicking the Start Here link for “Data Hub” to open the Welcome page:

imageClick images to display full size screen capture.

Complete the questionnaire and wait for the e-mail that advises you’ve been approved as a user. At present, users are limited to use of the CTP for three weeks.

“Data Hub” integrates “Data Transfer” for uploading comma-separated-value (*.csv) data files to existing or new SQL Azure database tables. As noted below, I found using “Data Transfer” independently of “Data Hub” worked for some large files that “Data Hub” wouldn’t process. Therefore, I recommend you apply for the “Data Transfer CTP” by clicking the Start Here button on the landing page:

image

My Test-Drive SQL Azure Labs’ New Codename “Data Transfer” Web UI for Copying *.csv Files to SQL Azure Tables or Azure Blobs of 11/30/2011 was an early tutorial. I’ll post an updated tutorial for using Codename “Data Transfer” with On_Time_Performance_YYYY_MM.csv files shortly.


Creating Your “Data Hub” Marketplace

After you receive your “Data Hub” invitation, follow the instructions in the e-mail and complete the Create Your Marketplace page. The domain prefix, oakleaf for this example, must be unique within the clouddatahub.net domain. Specify the Start and End IP Addresses for your expected users. (0.0.0.0 and 255.255.255.255 admit all users making your Marketplace public):

image

Click Create Marketplace to take ownership of the subdomain:

image

After the creating process completes, your homepage (at https://oakleaf.clouddatahub.net for this example) appears as shown here:

image

Your Account Key is equivalent to a password for “Data Hub” administration by users with Live IDs other than the administrator’s.


Provisioning a SQL Azure Database to Store *.csv Data

Click the Publish Data menu link to open the Welcome to the Publishing Portal page, which has Connect, Publish, Approve, Federate and View Marketplace menu links.

Click the Connect link to open the Connect to your Data Sources page, which offers five prebuilt sample data sources, AdventureWorks … US Data Gov:

image

I previously provisioned the On_Time_Performance data source from one of the four Free Trial SQL Azure Databases you receive with your invitation and On_Time_Performance2 data source from an existing database created with the “Data Transfer” CTP.

To use one of the four free SQL Azure databases, click the Free Trial SQL Azure Databases link in the left pane of the preceding page to open the Create New Database page. Type a unique name for your database (On_Time_Performance_Test for this example):

image

And click Create to add the database to the Data Sources list and display the Upload a File page.


Understanding the FAA’s On_Time_Performance.csv Files

The Creating the Azure Blob Source Data section of my Using Data from Windows Azure Blobs with Apache Hadoop on Windows Azure CTP post of 4/6/2012 described the data set I wanted to distribute via a publicly accessible, free Windows Azure DataMarket dataset. The only differences between it and the tab-delimited *.txt files uploaded to blobs that served as the data source for an Apache Hive table were:

  • Inclusion of column names in the first row
  • Addition of a formatted date field (Hive tables don’t have a native date or datetime datatype, so Year, Month and DayOfMonth fields were required.)
  • Field delimiter character (comma instead of tab)

Following is a screen capture of the first 20 data rows of the ~500,000-row On_Time_Performance_2012_1.csv table:

image

You can download sample On_Time_Performance_YYYY_MM.csv files from the OnTimePerformanceCSV folder of my Windows Live SkyDrive account. The files are narrowed versions of the On_Time_On_Time_Performance_YYYY_MM.csv files from the Bureau of Transportation’s Research and Innovative Technology Administration site. For more information about these files, see The FAA On_Time_Performance Database’s Schema and Size section of my Analyzing Air Carrier Arrival Delays with Microsoft Codename “Cloud Numerics” article of 3/26/2012. Each original *.csv file has 83 columns, about 500,000 rows and an average size of about 225 MB.

On_Time_Performance_2012_1.csv has 486,133 data rows, nine columns and weighs in at 16.5 MB. Other On_Time_Performance_YYYY_MM.csv files with similar row count and size are being added daily. There also are truncated versions of the files with 100, 1,000, 10,000, 100,000, 150,000 and 200,000 rows that I used for determining the large-file *.csv upload problem with “Data Hub” in the On_Time_Performance_Test subfolder.

Tab-delimited sample On_Time_Performance_YYYY_MM.txt files (without the first row of column names and formatted date) for use in creating blobs to serve as the data source for Hive databases are available from my Flight Data Files for Hadoop on Azure SkyDrive folder.

Provision of the files through a private Azure DataMarket service was intended to supplement the SkyDrive downloads. I also plan to provide the full files in a free submission to the Windows Azure Marketplace Datamarket site in early May.


Uploading *.csv Files to Your Data Source

Download one of the smaller test files from my SkyDrive On_Time_Performance_Test subfolder to your local machine. Choose the 10,000-row file if you have a DSL connection with slow upload speed.

Click the Upload a File page’s Browse button and navigate to the *.csv file you downloaded:

image

Click the Upload button to start the upload to an Azure blog process. You receive no feedback for the progress of the upload until the Update the Table Settings page opens:

image

All data rows are useful, so accept the default Include settings and click Submit to transfer the data from an Azure blob to an SQL Azure table named for the *.csv file, which replaces the above page with a Loading File to Database message. It often takes longer to load the file to the database than to upload it to a blob.

When the Success message appears, click the Connect menu link to verify that the new database appears in the Data Sources list:

image


Publishing the New Data Source

Click the Publish menu link to open the My Offerings page and click the Add Offering button to open the 1. Data Source page.

Select the table(s) you want to include in the data source, type a Friendly Name for the column, click the Columns button for the table to display the column list and clear the columns that you don’t want to be queryable:

image

Indexes are required on all queryable columns. Month and Year don’t need to be queryable because they are the same for all rows in the table.

Click the 2. Contacts button to save the Data Source information and open the Contacts page. Type your Name, Email alias and Phone Number:

image

Click the 3. Details button to open the Details page. Complete the required data, select from one to four categories, open the Documentation list and add links to appropriate documentation URLs, and add a logo image:

image

Click the 4. Status/Review menu link to open the Status/Review page. Click the Request Approval button to send a request the Data Hub team to approve the entry.

image

Note: I will post a tutorial on federating content from the public Windows Azure Marketplace DataMarket after Microsoft approves my pending submission, which is identical to this private Marketplace entry.


Previewing and Approving the Marketplace Submission

Click the Approve menu link to open the My Approvals page. Click the Approve/Decline button to open the message pane, mark the Approve option, type an optional message to the requester, and click the Display Actions button to open Preview Offering choices:

image

Click Preview Offering in Marketplace to verify that the offering details appear as expected:

image

Click the Explore This Dataset link to open the dataset in the Service Explorer and click the Run Query button to display the first 22 of 100 rows with this default URL query https://api-oakleaf.clouddatahub.net/Data.ashx/default/US_Air_Carrier_Flight_Delays_Monthly/preview/On_Time_Performance_2012_1?$top=100:

image

Type a Carrier code, such as WN in the Carrier text box and OAK in the Dest[ination] text box to provide data for Southwest Airlines flights to Oakland with this URL query: https://api-oakleaf.clouddatahub.net/Data.ashx/default/US_Air_Carrier_Flight_Delays_Monthly/preview/On_Time_Performance_2012_1?$filter=Carrier%20eq%20%27WN%27%20and%20Dest%20eq%20%27OAK%27&$top=100:

image

You also can visualize and export data, as well as click the Develop button to open a text box containing the current URL query. Click the XML button to display formatted OData (AtomPub) content:

image

Return to the My Approvals page and click the Send button to notify the originator of the approval and clear the My Approvals counts.


Publishing the Offering

Click the Publish menu link to display the offering with the Draft Approved as the status and the Publish option enabled:

image

Click the Publish button to publish your offering for others to use:

image


Exploring the User’s Experience

Click the Marketplace menu link to return to the landing page, type a search term (OakLeaf for this example) to find your offering:

image

Click the offering name link to open it as an ordinary user and display the Add to Collection button:

image

Click Add to Collection to add the offering to the My Collection list:

image

When you sign out and navigate to the default URL, https://oakleaf.clouddatahub.net/, the landing page appears as shown here:

image

The Science & Statistics item is the default Data.gov offering.  Government or Transportation and Navigation link opens the user view of the offering landing page with a Sign In to Add to Your Collection button:

image

The service URL is: https://api-oakleaf.clouddatahub.net/default/US_Air_Carrier_Flight_Delays_Monthly/. Navigating to this URL and clicking the Show All Data button displays the default collections (for an earlier data source):

image


OakLeaf Systems Site Is Creeping up on 1,000,000 Pageviews

OakLeafLogoMVP100pxJust looked Blogger’s new Stats page for the OakLeaf Systems site and found it’s had 947,476 pageviews in its history. Pageviews appear to be running consistently close to 60,000 per month, which encourages me to continue publishing it.

image

DZone SyndicationAn editor at DZone, a syndicator of selected OakLeaf blog posts, reported today: “Our DZone readers are really enjoying your content - your posts regularly get over 1,000 views within a couple days of getting republished.”

I’m not sure what the chart above measures, perhaps it’s average pageviews per post. Another question is “Where is Google in the Traffic Sources list?” Here’s the answer:

image

As expected, “cloud computing” is the most popular search term:

image

Thanks for reading my blog!