Friday, May 04, 2012

Five Months of U.S. Air Carrier Flight Delay Data Available on the Windows Azure Marketplace DataMarket

• Updated 5/12/2012 for an updated, expandable SQL Azure data source, currently with five (up from two) months of data and name change.

imageMy (@rogerjenn) Creating a Private Data Marketplace with Microsoft Codename “Data Hub” post of 4/27/2012 describes a set of monthly On_Time_Performance_YYYY_MM.csv files for February 2012 and earlier, which are a narrowed version of the U.S. Federal Aviation Administration (FAA)’s On_Time_On_Time_Performance_YYYY_MM.csv files. These files are available in *.zip archives for each month since January 1987 from the Bureau of Transportation’s Research and Innovative Technology Administration site.

For more information about these files, see The FAA On_Time_Performance Database’s Schema and Size section of my Analyzing Air Carrier Arrival Delays with Microsoft Codename “Cloud Numerics” article of 3/26/2012. Each original *.csv file has 83 columns, about 500,000 rows and an average size of about 225 MB. The narrowed version has 9 columns and the same number of rows.

Update 5/4/2012: Monthly On_Time_Performance_YYYY_MM.csv files for January 2011 through February 2012 are now available from my SkyDrive account. Files for January through May 2011 were added on 5/4. These files can be used by Microsoft to reproduce my problems uploading the December 2011 file to my Windows Azure Marketplace DataMarket dataset.

image_thumb15_thumbTo subscribe to the data set, go the the Windows Azure Marketplace DataMarket landing page, create an account if you don’t have one, log in, and type OakLeaf in the Search the Marketplace text box to display the data and app offers:


Click the US Air Carrier Flight Delays, Monthly link to open the Offer page:


Click the Supscribe button to open the Sign Up page and mark the I Have Read and Agree to … check box to enable the Sign Up button:


Click the Sign Up button to open the Thank You page:


Optionally, Click the Explore This Dataset link to open the data set exploration page, type the abbreviation for your favorite air carrier (WN = Southwest Airlines for me) and click the Run Query button to return the query for the default month and year, January 2012 for this offer:


When this update was posted, the data set covered from October 2011 to February 2012. Additional months for 2012 will be added as they become available from the FAA and for 2011 and earlier as I feel the urge.

Subscribing adds an Active entry to to the status column in your My Data list. Disabling an earlier version requires a request to the MarketPlace staff, who suspend the entry and disable the Use and Cancel options.


The goal is to upload at least 30 more months of data to the data set if I can arrange for a freebie or reduced price on the 25-GB or more SQL Azure database size required. See my Microsoft Codename “Data Transfer” and “Data Hub” Previews Don’t Appear Ready for BigData post updated 5/3/2012 for details about the data upload problem with *.csv files.