Sunday, December 02, 2012

Windows Azure and Cloud Computing Posts for 11/29/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI, Access Control, Connect, SQL Azure Database, and other cloud-computing articles. image_thumb7_thumb1

Updated 12/2/2012 5:00 PM PST with new articles marked .
   Updated
11/30/2012 12:00 PM PST with new articles marked .

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:


Azure Blob, Drive, Table, Queue, HDInsight and Media Services

•• Denny Lee (@dennylee) described Getting your Pig to eat ASV blobs in Windows Azure HDInsight in 12/2/2012 post:

imageRecently I was asked how could I get my Pig scripts to access files stored in Azure Blob Storage through the command line prompt. While it is possible to do this from HDInsight Interactive JavaScript console, to automate scripts and use the grunt interactive shell, it is easier to run these commands from the command line. To do this, you will need to:

  • imageEnsure your HDInsight Azure cluster is connected to Azure Blob Storage subscription / account
  • Familiarize yourself with the pig / grunt interactive shell
Connecting HDInsight Azure to Azure Blob Storage

image1) To do this, go to the Manage Cluster live tile as noted in the screenshot below

image

2) Click on Set up ASV to place in your Azure Blob Storage account information.

image

3) Specify the Azure storage account and passkeys as noted below.

image

And now, you’ve connected your Azure Blob Storage account to your HDInsight Azure cluster.

Accessing the Pig/Grunt Interactive Shell

To access your Pig/Grunt interactive shell, from the Metro interface:

1) Click on the Remote Desktop live tile.

image

2) Once you’ve logged in, click on the Hadoop Command Line shortcut located in the top left corner

image

3) From the Hadoop command line shell, switch to the pig folder and execute the pig.cmd:

cd c:\apps\dist\pig-{version}-SNAPSHOT folder
bin\pig

and now you’re able to grunt!

image

Quick Pig / Grunt Example

Now that you’re working with the grunt interactive console, you can run some simple Pig commands. For example, the example below allows you to load and view the first line and “schema” of the the logs you had loaded.

A = LOAD 'asv://weblog/sample/' USING TextLoader as (line:chararray);
illustrate A;

The results of the illustrate command are noted below:

2012-10-27 20:27:20,266 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at:
...

------------------------------------------------------------------------

...

| ASV_LOGS     | line:chararray

...

|              | 2012-12-14 19:26:31 W3SVCPING 10.0.0.0 GET /Cascades/atom.aspx - 80 - 10.0.0.24 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+5.1;+.NET+CLR+1.1.4322;+MSOffice+12) - 200 0 0 144532 484 1124 |

------------------------------------------------------------------------

...


• Andy Cross (@andybareweb) continued his series with Bright HDInsight Part 2: Smart compressed ouput on 11/29/2012:

imageFollowing on from Bright HDInsight Part 1, which dealt with consuming a compressed input stream in a Map/Reduce job, we’ll now see how to extend this scenario to emit compressed data from the output of your Map/Reduce job.

Scenario

imageAgain, the scenario is very much one of reducing overhead on the management, security and storage of your data. If you are to leave your resulting work at rest in a remote system, you should reduce its footprint as much as possible.

Reiteration of tradeoff

image_thumb75_thumb1Remember that you are shifting an IO to a Compute bound problem – compression requires inflation prior to utilisation of the data. You should run metrics on this to see if you’re truly saving what you think you might be.

Command Line

Again, this is achieved by using an argument on the command line:

mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec

Using the C# SDK for HDInsight

To do this, in your configure method, simply append the AdditionalGenericParameter as below;

config.AdditionalGenericArguments.Add(“-D \”mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec


• Andy Cross (@andybareweb) started a series with Bright HDInsight Part 1: Consume Compressed Input on 11/29/2012:

imageThis is the first of a series of quick tips on HDInsight (Hadoop on Azure), and deals with how to consume a compressed input stream.

The scenario

imageThe HDInsight product and Big Data solutions in general by definition deal with large amounts of data. Data at rest incurs a cost, whether in terms of management of that data, security of it or simply the storing of it. Where possible, it is good practise to reduce the weight of data in a lossless way, without losing or compromising the data quality.

imageThe standard way to do this is by using compression on the static data, where common strings are deflated from the static file and indexed so that they can be later repositioned in an inflation stage.

The problem with this for HDInsight or Hadoop in general is that the stream becomes a binary stream which it cannot access directly.

Configuring a Hadoop Job to read a Compressed input file

In order to configure a Hadoop Job to read the Compressed input, you simply have to specify a flag on the job command line. That flag is:

-D "io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec"

view raw inputflag.txt This Gist brought to you by GitHub.

This causes an additional map task to be undertaken which loads the input as a GzipStream before your map/reduce job begins. NOTE: This can be a time consuming activity – if you’re planning on loading this file many times for parsing, your efficiency saving by doing this will be limited.

A 2Gb GZipped example file of mine was inflated to 11Gb, taking 20mins.

Filename dependency

If you try this approach, you might find a problem where the input strings to your Map job still seem to be binary. This is a sign of the stream not be inflated by the infrastructure. There is one last thing you must ensure in order to trigger the system to begin the inflation process – this is that the filename needs to be of the relevant extension. As an example, to use GZip, the filename must end with .gz such as “mylogfile.log.gz”

Using the HDInsight C# Streaming SDK to control Input Compression

In order to use the C# Streaming SDK with this flag, one simply modifies the Job.Configure override in order to add and additional generic argument specifying this flag.

//public override HadoopJobConfiguration Configure(ExecutorContext context)

 

HadoopJobConfiguration config = new HadoopJobConfiguration();

config.InputPath = "/input/data.log.gz";

config.OutputFolder = "/output/output" + DateTime.Now.ToString("yyyyMMddhhmmss");

 

config.AdditionalGenericArguments.Add("-D \"io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec\"");

 

return config;

view raw Job.cs This Gist brought to you by GitHub.

You will find an additional Task specified in your JobTracker, which takes the stream and inflates for your runtime code.

A better way

If you can control your input stream (i.e. it is not provided by a third party), you should look at a better compression algorithm than the one shown here, Gzip. A better approach is to use LOZ, a splittable algorithm that allows you to better distribute your work. The Gzip algorithm must be processed in a sequential way, which makes it very hard to distribute the workload. LOZ (which is configued by using com.hadoop.compression.lzo.LzoCodec) is splittable, allowing better workload distribution.

image_thumb1


<Return to section navigation list>

Windows Azure SQL Database, Federations and Reporting, Mobile Services

• Jamie Thomson (@jamiet) reported Olympics data available for all on Windows Azure SQL Database and Power View on 11/30/2012:

imageAre you looking around for some decent test data for your BI demos? Well, if so, Microsoft have provided some data about all medals won at the Olympics Games (1900 to 2008) at OlympicsData workbook - Excel, SSIS, Azure sample; it provides analysis over athletes, countries, medal type, sport, discipline and various other dimensions. The data has been provided in an Excel workbook along with instructions on how to load the data into a Windows Azure SQL Database using SQL Server Integration Services (SSIS).

imageFrankly though, the rigmarole of standing up your own Windows Azure SQL Database ok, SQL Azure database, is both costly (SQL Azure isn’t free) and time consuming (the provided instructions aren’t exactly an idiot’s guide and getting SSIS to work properly with Excel isn’t a barrel of laughs either). To ease the pain for all you BI folks out there that simply want to party on the data I have loaded it all into the SQL Azure database that I use for hosting AdventureWorks on Azure.

You can read more about AdventureWorks on Azure below however I’ll summarise here by saying it is a SQL Azure database provided for the use of the SQL Server community and which is supported by voluntary donations.

To view the data the credentials you need are:

  • Server mhknbn2kdz.database.windows.net
  • Database AdventureWorks2012
  • User sqlfamily
  • Password sqlf@m1ly

Type those into SSMS and away you go, the data is provided in four tables [olympics].[Sport], [olympics].[Discipline], [olympics].[Event] & [olympics].[Medalist]:

I figured this would be a good candidate for a Power View report so I fired up Excel 2013 and built such a report to slice’n’dice through the data – here are some screenshots that should give you a flavour of what is available:

A view of all the available data

All Olympics data

Where do all the gymastics medals go?

Where do all the gymnastics medals go?

Which countries do top ten all-time medal winners come from?

Which countries do the top 10 medal winners of all time come from?

You get the idea. There is masses of information here and if you have Excel 2013 handy Power View provides a quick and easy way of surfing through it. To save you the bother of setting up the Power View report yourself you can have the one that I took these screenshots from, it is available on my SkyDrive at OlympicsAnalysis.xlsx so just hit the link and download to play to your heart’s content. Party on, people!


As I said above the data is hosted on a SQL Azure database that I use for hosting “AdventureWorks on Azure” which I first announced in March 2013 at AdventureWorks2012 now available for all on SQL Azure. I’ll repeat the pertinent parts of that blog post here:

I am pleased to announce that as of today … [AdventureWorks2012] now resides on SQL Azure and is available for anyone, absolutely anyone, to connect to and use for their own means.

This database is free for you to use but SQL Azure is of course not free so before I give you the credentials please lend me your ears eyes for a short while longer. AdventureWorks on Azure is being provided for the SQL Server community to use and so I am hoping that that same community will rally around to support this effort by making a voluntary donation to support the upkeep which, going on current pricing, is going to be $119.88 per year. If you would like to contribute to keep AdventureWorks on Azure up and running for that full year please donate via PayPal to adventureworksazure@hotmail.co.uk

Any amount, no matter how small, will help. If those 50+ people that retweeted me beforehand all contributed $2 then that would just about be enough to keep this up for a year. If the community contributes more than we need then there are a number of additional things that could be done:

  • Host additional databases (Northwind anyone??)
  • Host in more datacentres (this first one is in Western Europe)
  • Make a charitable donation

That last one, a charitable donation, is something I would really like to do. The SQL Community have proved before that they can make a significant contribution to charitable orgnisations through purchasing the SQL Server MVP Deep Dives book and I harbour hopes that AdventureWorks on Azure can continue in that vein. So please, if you think AdventureWorks on Azure is something that is worth supporting please make a contribution.

I’d like to emphasize that last point. If my hosting this Olympics data is useful to you please support this initiative by donating. Thanks in advance.


Josh Twist (@joshtwist) described Using excel to access your Mobile Services data in an 11/28/2012 post to his Joy of Code blog:

imageIf you have Microsoft Excel it’s easy to connect it to your Mobile Service SQL database to take a look at your data. This is great if you use Mobile Services for error reporting etc and want to perform some analysis on those numbers.

imageFirst, you need to get the server name for your Mobile Service. Login to the Windows Azure portal and locate your Mobile Service. Go the CONFIGURE tab and you’ll see the name of your SQL Database and it’s Server.

image

Server names are autogenerated random strings so you may want to make a note.

image_thumb75_thumb2Now that you have the name of your server, we need to open up the server to accept connections from your machine. Navigate to the SQL Databases tab in the portal:

image

Once in the SQL Databases part of the portal, click on the SERVERS tab to list all your servers. Now locate the name of your server from earlier, and click on the first column (it’s name) to select the item.

image

Next, click on the CONFIGURE tab to configure your server. This will show you a list of allowed IP addresses. The SQL guys have made it easy for you, and provided a quick option to add your own IP to the list.

You should see your current IP (CURRENT CLIENT IP ADDRESS) and they have even provided a button that you can click to enable this IP.

image

Click the ADD TO ALLOWED IP ADDRESSES button and then press SAVE at the bottom of the screen.

image

We’re almost done with the portal. The last thing we need to do is get the full server name. Click the DASHBOARD tab

image

On the right hand side of the DASHBOARD you should see your administrator login name, the option to reset your password (in case you can’t remember it) and the MANAGE URL. This

image

If you already know you admin login and password (you do, right?) then all you need from here is the manage URL. Note that we don’t want the ‘https://’ scheme at the start of the name. Now, in excel open a new workbook. Click the Data tab and the From Other Sources dropdown and choose From SQL Server.

image

You’ll now be prompted for the server name. Enter the name from the MANAGE URL field above but without the ‘https://’ scheme at the start.

image

Click Next; this will connect to your SQL database server. Now you need to choose the database (this is the first part from the SQL Databse field in the first screen shot above)

SNAGHTML2ffad4e

And now you can choose which tables you want to view.

SNAGHTML2ff21e1

When you’re happy with your choice click Finish (or you can click Next to configure this connection if you’d like to reuse it again in the future, I’ll save that as an exercise for the reader).

Finally, decide how you’d like the data to be imported, I’m going to go for a good old fashioned Table:

SNAGHTML3019af6

And finally you’ll be prompted for those credentials once more…

SNAGHTML302e5db

et Voila!

SNAGHTML303f584

And don’t worry, any changes you make aren’t synced back to the server so you won’t break your live service.

image_thumb18


<Return to section navigation list>

Big Data, Marketplace DataMarket, OData and Cloud Numerics

• Barb Darrow (@gigabarb) reported kudos for Window Azure in her 6 key takeaways from the Obama For America tech team post of 11/29/2012 to GigaOm’s Cloud blog:

imageYou can tell that the Obama For America tech team is used to working in close quarters. At their panel at AWS re: Invent Thursday, the more than a half dozen on stage — including CTO Harper Reed – often finished each other’s sentences, stepped on each others lines and generally seemed to have a pretty good time.

imageIt was hard (nigh impossible) to track the banter, but here are some highlights — without attribution. When Amazon posts the video, I will add the link. It’s definitely must-see viewing.

What technologies rocked their world?

There were big shoutouts to Cloudability, Chartbeat, Puppet (see disclosure), Asgard and New Relic. “If you’re not using New Relic, you’re paying too much for devops people and you’re doing it wrong,” said one team member.

What left you cold?

Anti-shout out to Exchange.

What are your cardinal rules?

image“We don’t care what technology you run as long as it works … and that’s why we’re all moving to Azure,” noted one member. All that was missing was a drum roll. [Emphasis added.]

Why no women?

There were no females on stage, but the team gave props to women on the team, including Carol Davidsen. She wrote the Optimizer that analyzed TV demographics and allowed the campaign to buy “really cheap” targeted ads instead of using that money to pay for ads on big national outlets. Because of Optimizer, “we were able to go niche,” said Reed.

Techies need to make their workplace more hospitable to women, he said. “There’s a lot of neck beardiness — you guys here need to get better,” he told the audience that was mostly male and mostly (presumably) techies. “Innovation comes from diversity.”

Will their work for the campaign be productized?

They don’t know. But if it doesn’t live beyond the campaign itself, someone noted, “we have failed.”

What next for them?

Some of the team are looking for jobs, so check them out. Starting salary? Oh, about $750K.

Disclosure: Puppet Labs is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, founder of Giga Omni Media, is also a venture partner at True.

Full disclosure: I’m a registered GigaOm analyst.

image_thumb8No significant OData articles today


<Return to section navigation list>

Windows Azure Service Bus, Access Control Services, Caching, Active Directory, WIF and Workflow

•• Vittorio Bertocci (@vibronet) described Using the BootstrapContext property in .NET 4.5 in an 11/30/2012 post:

imageFew minutes ago, while exchanging mails with some colleagues, I suddenly realized that, although we gave various heads-ups about the change from BootstrapToken to BootstrapContext when moving from WIF v1 to the .NET Framework 4.5, we didn’t really provide the details of the new model in a search engine–friendly fashion.

imageWe do have sample code that shows you how to use the new object model, however that’s cozily packed away in the July 2012 Identity Training Kit (lab WebSitesAndIdentity, exercise Ex3-InvokingViaDelegatedAccess) hence opaque to search engines. Well, let’s shine a bit of light on it with a brief Friday night’s post! :-)

First of all: what the heck am I talking about? Let’s build a bit of context for the readers who didn’t land there while searching for “BootstrapContext” or “BootstrapToken in 4.5”.

With the classic WS-Federation authentication settings for a Web site, a user attempting to access a protected resource (aspx page, controller route, etc) gets redirected to one identity provider (your local ADFs, Windows Azure Active Directory, etc) to sign in. Once the user signs in, his/her browser session gets redirected back to the Web site by some javascript which triggers a POST that, among other things, contains the security token certifying the identity of the user and the successful outcome of the authentication operation. WIF deserializes and validates the token; if everything is as expected, the user is considered authenticated, a session cookie gets created and the content of the token is deserialized in a ClaimsPrincipal. Subsequent requests for Web site resources will be accompanied by the session token, which is enough for WIF to establish that the user is authenticated without the need to re-trigger the identity provider dance. Oh delightful surprise: we call a token that was used to create a session a bootstrap token. So far so good, right? Usual stuff.

Now: by default, the session cookie contains all the necessaire to reconstitute the ClaimsPrincipal at every postback. That includes, for example, the list of claims describing the user; that does NOT include, however, the token from the identity provider that originally carried the user’s claims. After all, the token absolved its function the moment it provided enough data for the Web site to validate it (signature from the expected issuer, validity timeframe, etc) and to obtain the required user claims. Once a token has been used for establishing a session, maintaining its raw bits would significantly bloat the size of the session cookie without obvious reasons: what counts at this point is the set of claims describing the user, how they got communicated is just an implementation detail. Or is it?

Although the above remains valid in the general case, there are few important exceptions. Sometimes the Web site will want to hold on the bits of the bootstrap token: the classic example is the case in which the Web site needs to invoke a backend service by flowing (in some capacity) the identity of its user, which in turn requires communicating the identity of said user to an external entity (for example, an STS). The bootstrap token already contains a nicely marshallable-through-boundaries (William Chaucer would NOT like me) representation of the user identity, and in fact various delegation flows (such as the famous ActAs you might have heard about) call for its use.

Fantastic! Now that I did due diligence and provided you some background about what the bootstrap token is and why you would want to access it in your app, we can get to the fun part: how to do it. I’ll just show you, and add the commentary afterwards.

First of all, you need to let WIF know that you want to opt in into saving the bootstrap token in the session cookie. You can accomplish this by flipping a switch in the config file:

 <system.identityModel>
    <identityConfiguration saveBootstrapContext="true">

That’s pretty straightforward. The next fragment is a tad more involved.
Say that you got to the point in your code where you want to access the bootstrap token. Here there’s how you do it, formatted for your screens:

BootstrapContext bootstrapContext = 
  ClaimsPrincipal.Current.Identities.First().BootstrapContext 
                                          as BootstrapContext;
SecurityToken st = bootstrapContext.SecurityToken;

When you elect to save the bootstrap token, WIF will make it available to you in the BootstrapContext property of the first (and usually only) ClaimsIdentity of the identities collection stored by ClaimsPrincipal in the current thread.

You might be wondering: why all the casting, first to BootstrapContext and then to SecurityToken? Good eye, my friend! :-)

In v1 we did have a property BootstrapToken, of type SecurityToken, which stored the token bits directly. We were able to offer that property because all classes were in the same assembly, hence we had no restrictions on types.
When we moved the WIF classes in .NET 4.5, we managed to put ClaimsPrincipal at the very core of .NET, mscorlib.dll. That has a long list of advantages, which I won’t restate here; however it also introduces some constraints. Case on point: the SecurityToken class now lives in a different assembly, System.IdentityModel.dll, which has to be explicitly referenced in order to be loaded. That lead to the decision of exposing the bootstrap token in a property of type Object, and to rely on you to make the appropriate casting operation. In my experience (I’ve been evangelizing WIF around the world for many years) the situations in which you need to get to the bootstrap token are not super common, and when they occur they are usually fairly advanced hence I hope that those two extra casting operations won’t add too much burden :-).


• Steven Martin (@stevemar_msft) posted Windows Azure Active Directory: Making it easier to establish Identity Management in the cloud to the Windows Azure blog on 11/30/2012:

imageIdentity Management and Access Control are anchors for virtually all critical processes, collaboration and transactions. To that end, it was no accident that Access Control was one of the very first cloud services we delivered within Windows Azure. In the years since, customers have often reminded us that in addition to being highly reliable and easy to use, identity management should be delivered by a cloud directory available for use by any business, large or small, around the world.

imageBuilt with the customer insights gained over the years from Windows Server Active Directory (the identity management system that 95% of enterprises to manage identities in their on-premise computing environments) Windows Azure Active Directory is built for business and built for the cloud.
To help make identity management in the cloud accessible and available to every business and organization in the world, we are announcing today that two key features of Windows Azure Active Directory are available at no charge.

  • Access control provides centralized authentication and authorization by integrating with consumer identity providers, such as Facebook, or using on-premises Windows Server Active Directory. By having Access Control available you can create a single application that can allow users to login with both their Organizational Credentials stored in Windows Azure AD or Windows Server AD, or to login in using popular consumer service identity services like Microsoft Account, Facebook, Google, or Twitter. Historically, Access Control has been priced based on the number of transactions. We are now making it free.
  • Core Directory & Authentication enables capabilities such as single sign-on, user and group management, directory synchronization and directory federation. These features are currently free in the Windows Azure AD Developer Preview and will remain free after it reaches general availability.

With these changes Microsoft is making it easier than any other vendor to get started with identity management in the cloud. Try out Windows Azure Active Directory today and experience how easy it is to create an identity for your organization.

For more technical details on Active Directory, check out Alex’s Simon’s posts Enhancements to Windows Azure Active Directory Preview and Windows Azure now supports federation with Windows Server Active Directory.

image_thumb9


<Return to section navigation list>

Windows Azure Virtual Machines, Virtual Networks, Web Sites, Connect, RDP and CDN

• Jim O’Neil (@jimoneil) continued his series with Practical Azure #4: Using the Content Delivery Network from Microsoft Dev Radio on 11/30/2012:

imageContent Delivery Networks are key to delivering great experiences in media and content-rich applications regardless of your end users’ locations, and in Windows Azure it’s dead simple to use. Check out the fourth episode of my DevRadio Practical Azure series to find out how!

Download: MP3
MP4
(iPod, Zune HD)
High Quality MP4
(iPad, PC)
Mid Quality MP4
(WP7, HTML5)
High Quality WMV
(PC, Xbox, MCE)

And here are the Handy Links for this episode:


• Thomas Conté (@tomconte) described How to use the Windows Azure Service Management API from cURL to create Virtual Machines on 11/29/2012:

imageRecently a partner asked me for some simples examples of using the Windows Azure Management API from that most universal Swiss army knife of HTTP debug tools, namely the infamous cURL. Specifically, he was looking for an example of how to create a Linux machine from the command line using just cURL.

imageThere is already a blog post out there showing you how to access the Management API using cURL (http://blog.toddysm.com/2012/01/accessing-windows-azure-rest-apis-with-curl.html) however it doesn't cover the creation of VMs, so here are a few more samples.

First, the blob post above shows you how to create a new certificate and upload it to the Windows Azure management portal. But there is a simpler way, using a "Publish Settings" profile, that you use for example with the cross-platform Command Line Tools.

Here is the article that will explain you how to download this publishing profile: https://www.windowsazure.com/en-us/manage/linux/common-tasks/manage-certificates/

As explained in the article, you can extract from the profile the Base64-encoded certificate necessary to authenticate your calls to the Management API.

Let's say you saved the Base64 version in a file called cert.base64; here's how I decoded it:

python -c 'print open("cert.base64", "rb").read().decode("base64")' > cert.pfx

Then I checked it using OpenSSL: (password is empty)

openssl pkcs12 -info -in cert.pfx

And finally you can convert it to PEM format: (again, password is empty)

openssl pkcs12 -in cert.pfx -out cert.pem -nodes

Now you can used this certificate with the -E option in cURL!

Here are some examples... Again, the goal is to create a Linux virtual machine; in order to do that, we must first create a Hosted Service, and then "deploy" a Linux VM inside this Hosted Service. Here are the steps using cURL. All the URLs begin with your subscription ID, that you can find in the publishing profile as well.

Create a Hosted Service:

curl -v -d @CreateHostedService.xml \
-H 'Content-Type: application/xml' \
-H 'x-ms-version: 2010-10-28' \
-E cert.pem: \
https://management.core.windows.net//services/hostedservices

The CreateHostedService.xml file contains the parameters for the Hosted Service:

<?xml version="1.0" encoding="utf-8"?>
<CreateHostedService xmlns="http://schemas.microsoft.com/windowsazure">
  <ServiceName>testfromapi</ServiceName>
  <Label>VGVzdCBmcm9tIEFQSQ==</Label>
  <Description></Description>
  <Location>West Europe</Location>
</CreateHostedService>

The most important partq in this file are the Location element, in which you indicate the data center you want to create this Hosted Service in (I chose West Europe), and the ServiceName element, that will be used later to deploy a VM.

List the Hosted Service:

curl -H 'Content-Type: application/xml' \
-H 'x-ms-version: 2010-10-28' \
-E cert.pem: \
https://management.core.windows.net//services/hostedservices/testfromapi

This should return your newly created Hosted Service.

List OS images:

curl -H 'Content-Type: application/xml' \
-H 'x-ms-version: 2012-03-01' \
-E cert.pem: \
https://management.core.windows.net//services/images

This will return a list of all available OS images. Let's use Ubuntu: CANONICAL__Canonical-Ubuntu-12-04-amd64-server-20120528.1.3-en-us-30GB.vhd

Now the last step, create the VM:

curl -v -d @TestLinuxVM.xml \
-H 'Content-Type: application/xml' \
-H 'x-ms-version: 2012-03-01' \
-E cert.pem: \
https://management.core.windows.net//services/hostedservices/testfromapi/deployments

Please note the URL references the name of the Hosted Service we created earlier ("testfromapi" in my example).

Now let's look at the XML file:

<Deployment xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/windowsazure">
  <Name>testlinux</Name>
  <DeploymentSlot>Production</DeploymentSlot>
  <Label>testlinux</Label>
  <RoleList>
    <Role i:type="PersistentVMRole">
      <RoleName>testlinux</RoleName>
      <OsVersion i:nil="true" />
      <RoleType>PersistentVMRole</RoleType>
      <ConfigurationSets>
        <ConfigurationSet i:type="LinuxProvisioningConfigurationSet">
          <ConfigurationSetType>LinuxProvisioningConfiguration</ConfigurationSetType>
          <HostName>testlinux</HostName>
          <UserName>azure</UserName>
          <UserPassword>P@ssw0rd!</UserPassword>
          <DisableSshPasswordAuthentication>false</DisableSshPasswordAuthentication>
        </ConfigurationSet>
        <ConfigurationSet i:type="NetworkConfigurationSet">
          <ConfigurationSetType>NetworkConfiguration</ConfigurationSetType>
          <InputEndpoints>
            <InputEndpoint>
              <LocalPort>22</LocalPort>
              <Name>SSH</Name>
              <Protocol>tcp</Protocol>
            </InputEndpoint>
          </InputEndpoints>
        </ConfigurationSet>
      </ConfigurationSets>
      <DataVirtualHardDisks />
      <Label>dGVzdGF0b3Rv</Label>
      <OSVirtualHardDisk>
        <MediaLink>https://portalvhdsfxcplfk8zny2r.blob.core.windows.net/vhds/testfromapi-testlinux-2012-11-29-593.vhd</MediaLink>
        <SourceImageName>CANONICAL__Canonical-Ubuntu-12.04-amd64-server-20120924-en-us-30GB.vhd</SourceImageName>
      </OSVirtualHardDisk>
      <RoleSize>Small</RoleSize>
    </Role>
  </RoleList>
</Deployment>

This one is more complicated! There are good articles on Michael Washam's blog if you want to dig into the details, but here are the important parts:

  • In the LinuxProvisioningConfigurationSet, enter the login and password for the user that will be created on the machine.
  • The InputEndpoint definition opens port 22 (SSH).
  • In the OSVirtualHardDisk section, you must indicate the URL of the blob that will contain the new OS Disk for the machine; you can choose any name for the VHD, as long as it is unique! Also make sure the URL points to a Storage Account located in the same data center as the Hosted Service you created above, otherwise the call will fail (you cannot boot a machine in Dublin with an OS Disk in Singapore :)

This should give you a running Linux VM.

You can retrieve the details of the VM once it's created:

curl -H 'Content-Type: application/xml' \
-H 'x-ms-version: 2012-08-01' \
-E cert.pem: \
https://management.core.windows.net//services/hostedservices/testfromapi/deployments/testlinux

You will get an XML response including, among other details, the status of the instance (Running, Stopped, etc.) and, most importantly, the public port that was assigned for SSH when the VM was created (look for the PublicPort element). This should let you connect to your new VM via SSH.


Young Chou announced the availability of his TechNet Radio: Virtually Speaking with Yung Chou – Managing VMs with System Center 2012 App Controller in Windows Azure presentation on 11/29/2012:

In today’s episode Yung Chou shows us how to use System Center 2012 App Controller to easily configure, deploy and manage virtual machines and services across private and public clouds. In part one of this series he demos for us how to connect App Controller to Windows Azure.

image

Download

After watching this video, follow these next steps:

Step #1 – Start Your Free 90 Day Trial of Windows Azure and deploy VMs in the cloud
Step #2 – Download and install Windows Server 2012 and System Center 2012
Step #3 – Learn, build, and experiment IaaS

Resources:

Websites & Blogs:

Videos:

Virtual Labs:


Michael K. Campbell posted Taking Windows Azure Virtual Machines for a Test Drive to the DevPro blog on 11/29/2012:

imageIn my articles "Microsoft Azure and the Allure of 100 Percent Application Availability" and "Microsoft Windows Azure: Why I Still Haven't Tried It," I wrote about Windows Azure and my experiences with the platform. Since then I've had some great experiences working with SQL Azure that's largely due to some fantastic uptimes. However, I've also wanted to test the Windows Azure VM Role, which has recently become much easier to take for a test drive than what was previously possible. Although I'm still evaluating how these virtual machines (VMs) stack up against competing offerings and their ability to handle my specific workloads, I'm already seeing that there are a lot of things that make Azure VMs an attractive option.

Easier VM Provisioning Means There's No Need to Upload VHD Files

imageWindows Azure VMs have been available for awhile now, so it's not like they just became available for testing. However, Microsoft recently announced new Azure Management Portal improvements that includes the ability to quickly provision VMs directly from existing base images instead of having to create them by uploading VHD files to Microsoft's data centers. This enhancement was the exact breakthrough that I've been waiting for that led me to take these VMs for test drive. Although the ability take an on-premises machine and upload it to the cloud was a neat option, up until now it was simply way too encumbering and labor intensive for my tastes to the point that it has prevented me from trying out the Windows Azure VM Role. As I've been wondering about whether Windows Azure VMs would be worth my time and effort, I've actually been very happy using competing offerings -- most especially with Cytanium's cloud offerings where I've been able to find dirt-cheap VMs that have experienced excellent uptime and performance.

Great Pricing and Features

imageAlthough I've been more than happy with my Cytanium VM, pricing is one factor that has kept the Windows Azure's VM Role in the back of my mind. Azure's VMs appear to be priced slightly higher compared to competing offerings, but the one big exception is Windows Azure's Extra-Small (XS) VM role. For roughly $9 per month, this role provides a VM with 768MB of RAM, a shared CPU, and roughly 40GB of storage with 20GB allocated to a system drive and another 20GB assigned for application data. Granted, that's an obscenely tiny and low-powered server, but the application that I'm going to be hosting doesn't need very much horsepower at all. Instead, what I actually need are several low-end hosts that can be geographically distributed throughout the globe for better availability testing and redundancy purposes. In my case, the Azure XS role is actually a decent fit (even if I couldn't actually wait long enough for a Windows Server 2008 R2 XS instance to boot, whereas a Windows 2012 XS instance took a few minutes to book but was responsive once it was up and running).

imageAnother big thing that has kept me interested in potentially using Azure VMs is the fact that Microsoft has several data centers that are spread throughout the world. This option is something that makes Azure more attractive in terms of providing options that can alleviate potential outages stemming from natural disasters, along with the kinds of manmade disasters that might otherwise render an entire data center offline.

As I started looking into some of the redundancy options available for Azure VMs, I bumped into a several impressive findings. The following highlights some of the bigger benefits that I found:

  • Availability Sets. In terms of availability, one of the realities of hosting entire VMs is that they will periodically need to be rebooted, which is an unfortunate consequence in terms of what that does to uptime. Furthermore, Microsoft acknowledges that the company periodically needs to update hosted VMs to keep everything running smoothly and securely. Accordingly, Microsoft has created a set of incredibly easy configuration options that you can use to create pools or groups of linked servers so you can distribute downtime by putting machines into different availability sets to decrease downtime overall.
  • Load Balancing. Along the same lines, Azure VMs can be easily teamed or load-balanced to not only increasing uptime, but also to serve as a mechanism for an even distribution of workloads among multiple peered machines.
  • Linux VMs. Okay, this one doesn't have anything to do with availability or redundancy -- at least for my purposes. However, I find it remarkable that Microsoft is actually offering Linux hosting at all. Linux hosting lets me easily deploy a whole host of application servers (presumably running Windows) for an application and then spin up a Memcached appliance when needed. In other words, the mere fact that Microsoft provides Linux hosting proves to me that there are still departments at Microsoft who get it, which is sadly the complete opposite of what I frequently complain and kvetch about in terms of what we see from the Windows and Office folks who frequently drive me batty.

In the end, I'm still in the process of evaluating how well Azure VMs will meet my needs. But so far I like what I'm seeing especially in terms of the features and options that are geared toward helping me achieve greater uptime and availability.

Related Content:

Magnus Mårtensson (@noopman) described Continuous Delivery to Windows Azure Not Only Microsoft Style with TeamCity in 14 detailed steps on 11/29/2012:

imageHere is the write up on how to Use Git, GitHub and TeamCity to push-build-test your Windows Azure Web Sites (WAWS) and by means of Continuous Delivery (CD) to Windows Azure. Because the Windows Azure Platform is so openly manageable and supports open standards and protocols Not Only Microsoft (noMicrosoft) products are enabled to take full advantage of the power of no-friction, no-hassle publication to Windows Azure! After reading this blog post you can remove any deployment work from your teams and enable you to put full focus on building your Applications to provide the fastest Business Value too! We are also going to run TeamCity on a Windows Azure Virtual Machine (VM or WAVM).

imageThis scenario is for those who already have and/or want to have complete and full control of their Continuous Integration environment and like to use noMicrosoft products. The rest of us can just very happily go on and use Team Foundation Service TFService. (You can if you want have Gangnam Style for your CD! Sorry couldn’t resist.)

imageShameless plug and “comical” side note

The Windows Azure Fast (AzureFast) contest is still on-going it seems and you can check (and if you like support with votes and comments) my video “Windows Azure ‘Super Fast’ Web Sites” here:

This is a true story and I hope you like it as much as I enjoyed producing it!

Back to order with a gentle disclaimer: There is serious risk this post will run long since it contains quite a few steps. I am redoing the whole thing as I type this, in order to make sure it’s reproducible and to be able to get the screen shots for this post.

Background

At the recent Windows Azure Conf (AzureConf) I did a session entitled Continuous Delivery Zen with Windows Azure:

Html 5 embedded version here (or go to the link above):

I wanted to prove that any kind of third party tooling-interaction works well with Windows Azure. When I asked my friends at JetBrains if it was possible to use TeamCity to do Continuous Delivery they naturally said ‘yes’ and some digging and researching ensued. It was not completely easy to nudge everything into place but most of the steps were really powerful and easy to perform.

In the video I make a promise at 46:18 that I would write this blog post after the conference. There was just one more thing I wanted to fix in my scenario ;~) before this blog post. That thing – pushing to Git in Windows Azure from TeamCity turned out to be a little tricky for a relative Git n00b such as myself so that, naturally, took a while. But with out any (further) ado here are the steps to reproduce for your projects:

What we are going to do
  1. Create a Windows Azure Virtual Machine (VM) with Windows Server 2012.
  2. Remote to our server and install TeamCity on it.
  3. During the installation I will add a data disk to the machine stored in Windows Azure Storage. This disk will be used by TeamCity to store data and configuration related to the running of my TeamCity Services.
  4. Also I will configure port forwarding on port 80 in the Windows Azure Portal to my VM and configure the local firewall on my Windows Server to allow traffic on this port. The same port will naturally be configured in the TeamCity setup wizard.
  5. Use temporary storage for the TeamCity Build Agent.
  6. Enable MSBuild on the VM
  7. Install Git on my VM.
  8. Create a new Windows Azure Web Site where our site will be deployed.
  9. Set up my Web Site to support Git publishing. I will NOT set up GitHub integration for the Web Site. TeamCity will be monitoring my GitHub repository for any changes and pull that code when I push.
  10. Create a repository on GitHub to store my web site code.
  11. Clone this repository to my local machine.
  12. Create a brand new template ASP.NET MVC 4.0 Web Site in the local repository with a Unit Test project.
  13. Push my web site to the GitHub repository.
  14. Configure TeamCity to pull any changes from my Repository, build the code, run the Unit Tests and push my Web Site on to Git for my Web Site in Windows Azure.

After all of this, each time TeamCity pushes a new version to Git on Windows Azure that server will (once again build my Web Site) version my push and deploy my new version of the Web Site.

As you can see there are a LOT of steps to be done here. So let’s get going!

Taking advantage of the redeploy old version feature in the Windows Azure portal

A quick note that compared to my demo at AzureConf the last steps have changed slightly. That was the tricky change I wanted to get done before this write up. At AzureConf, I mentioned that I use WebDeploy. This version below does not. Instead I push my code using a TeamCity Command Line task to push my site to my Git repository in Windows Azure. The advantage of this approach is that the Windows Azure Git sever will keep a record of my old deployments and enable me to “step back in time” by redeploying an old version of my site. This process takes seconds only and is very powerful. I will show you this near the end of my post. (Don’t scroll down just read on, dear reader.)

Step 1) Create a Windows Azure Virtual Machine with Windows Server 2012

Nothing could be simpler than getting full control of your very own VM in Windows Azure! In the Windows Azure Portal you simply click New => Compute => Virtual Machine => Quick Create. Set a VM name, which is the dns name. Mine is teamcityrocks.cloudapp.net. Select Windows Server 2012, a small will due. Set the Administrator password. Choose the location. I picked North Europe, aka. Dublin, Ireland. Hit “Create Virtual Machine”!

Creating a Virtual Machine in Windows Azure.

It takes Windows Azure a few minutes to allocate a slot in a physical VM hosting server, download a fresh image of our server to it, launch it, configure it and hook it into the network and load balancers to supply it with it’s DNS name. Great time to fetch coffee. Or beer.

The VM is Created - a brand new Windows Server 2012

Step 2) Remote to the VM and install TeamCity

In fact I will not click the “Finish” step in the TeamCity installation wizard until step 4.

A newly created VM in Windows Azure has only one endpoint configured on it. That’s the Remote Desktop endpoint. If that was not in there it would be impossible to connect to the server.

There is only one endpoint by default on the VMs and that's the Remote Dekstop Endpoint

All you need to do now is hit connect at the bottom of the page and log in using the credentials you set up for ‘Administrator’ when you created the VM.

Click connect to remote to the instance.

Et voilà! ;~) (I know just a few phrases of French.)

Our new Windws Azure Virtual Machine

Next download the latest release of TeamCity on the JetBrains Web Site. In my case that turned out to be Team City 7.1.2. Team City has a 60 day evaluation period which is perfect for this demo.

Note: Naturally you can install TeamCity any way you please or you can use a pre-existing already installed TeamCity Server and simply skip this step.

Download and launch the TeamCity installer.

Follow all of the default installation steps until “Specify Server Configuration Directory”.

Step3) Add a Data Disk to the VM for TeamCity data storage

You are able to specify where you want to store data for TeamCity during this setup step. If you open up Windows Explorer on the VM you’ll note that you have two disks by default on the VM:

  • C: the OS disk on which you are currently installing TeamCity.
  • D: Labeled ‘Temporary Storage’. According to the WAVM documentation “Each VHD also has a temporary storage disk, which is labeled as drive D. The applications and processes running in the VM can use this disk for transient and temporary storage.” We will use this disk later as a temporary work location for TeamCity.

We want one more disk. This dis will be for persistent storage of data TeamCity will want to keep. Two really cool thing about Cloud features that you can take advantage of in Windows Azure in this case are “self-provisioning” and “pay-as-you-go”. We are simply going to requisition a data disk which will be stored as a “Page Blob” in Windows Azure Storage (WA Storage blog). The size of this page blob as we define it isn’t really very important because even if I define a disk of say 1 TB and only use a fraction of this for storage I will only pay for what I use in Windows Azure! Amazing.

To add a data disk to a VM you go to the Windows Azure portal, browse to the VM and select “Attach” => “Attach empty disk”. I went with a 100 GB data disk for this demo.

Attach an empty disk to the virutal machine

It is also possible to later detach data disks from VMs right here in the portal if you want to copy that data somewhere else. Or you can upload a pre-existing data disk as a VHD to Windows Auzre Storage and migrate your current TeamCity server to a VM in Windows Azure. Possibilities are endless.

Once the adding of the disk completes you head back to your Remote Desktop connection and open up “Disk Management”. Immediately your system recognizes the new disk and prompts you to “Initialize Disk”. Initialize it and then right click the disk in Disk Management and select to create a “New Simple Volume…”. Follow that wizard and perhaps label your disk “Data”. Once that completes you have a brand new empty data disk attached to your VM:

Our new volume on the virtual machine

We return now to the TeamCity installation wizard we are following and specify our new data disk for “server configuration directory”. In my case the drive was “F:\”. The TeamCity installation now commences.

Step 4) Set port 80 in the Windows Azure portal, in the server fire wall and in TeamCity

After a while you get to specify ‘80’ for the “TeamCity server port”. In order for traffic to flow from the Internet to Team City you have to configure and open up that path through the data center and into your Windows Server.

In the Windows Azure portal select the VM and go to “Endpoints” => “Add Endpoint”.

Add an endpoint to the virtual machine

I configured it to be “TeamCityHttp” with external and internal port 80. Our Windows Server does not have the Internet Information Server Service activated by default so that port should be available to us.

Specify endpoint details

Once that port is configured in the Windows Azure Load Balancers (by defining it in the Windows Azure Portal) we return to the Remote Desktop and open up “Windows Firewall with Advanced Security”. Got to “Inbound Rules” => “New Rule…”. Select “Port”, “Specific local port” = 80 and “Allow the Connection”. Finish the Wizard by naming your Firewall rule aptly:

Configure the same port in the Windows Server Firewall

Step 5) Use temporary storage for the TeamCity Build Agent

After the TeamCity Server has installed the next step is to configure the default Build Agent. TeamCity works in such a way that the Service will control a set of Build Agents which will perform the actual builds for the Service. The only thing I have configured here is to use our VM Temporary storage for Build Agent “tempDir” and “workDir”. They are set to ‘d:\temp’ and ‘d:\work’ respectively.

Set temp and work directories in TeamCity to the temporary disk.

Next step is to configure the accounts for running the TeamCity Services. I selected “SYSTEM account” for both of them; Build Agent and TeamCity Server.

The installation completes and you get to browse to the TeamCity server on the local machine, the VM over Remote Desktop.

TeamCity is installed

Hit “Proceed” and let TeamCity create it’s database, initialize and start up. Accept the “License Agreement for JetBrains® TeamCity”.

Create the TeamCity administrator account.

Create an admin account

And so the installation of TeamCity on a Windows Azure Virtual Machine with Windows Server 2012 concludes!

You can now browse to the TeamCity Services using the VM:

TeamCity is now installed

Or even better in your browser as usual:

Browse to the VM externally

We will configure the TeamCity Services a little bit later. There is one more thing we need to do on our VM and that is to install Git.

Step 6) Enable MSBuild on the VM

At this step you have to make sure you can run the latest version of MSBuild on the build server. You will also need the correct Visual Studio Build Process Templates on the VM. This might mean you have to install Visual Studio on the Build Server. What you need to do to make sure you live up to the Microsoft License requirements is up to you. For this demo I will copy my msbuild folder from C:\Program Files (x86)\MSBuild to the same location on the VM. This can be accomplished with a copy from local machine and paste to the build server. Please note that this is probably not the way you need to set this up for a production build environment, but I’ll leave this step up to you.

Enable msbuild templates on the VM

Step 7) Install Git on the VM

When we (finally get to) push our Web Site to Windows Azure we will do so using Git. Therefor we need to download Git for Windows and install it on the VM. I happened to get the Git-1.8.0-preview20121022 version.

Download and launch Git on the sever.

The only thing I changed in the Git installation Wizard was to “Run Git from the Windows Command Prompt” because that is what we are going to do from TeamCity when we deploy our WAWS.

Set to use Git from the Command Prompt

Step through and finish the rest of the installation. Verify that your server is property Git-Enabled by opening up a command window and typing the command ‘Git’.

Test that git was properly installed.

You should get a response saying that git offers you a set of commands to run.

Step 8) Create a new Windows Azure Web Site where our site will be deployed

Now it’s time to start creating the site we want to deploy to in Windows Azure. This task is accomplished in the Windows Azure Portal under New => Compute => Web Site => Quick Create. Enter a url and a region and hit “Create Web Site”.

Create a new Windows Azure Web Site (WAWS)

Seconds later the WAWS is created:

Creating a WAWS takes only seconds

And you can browse to a nice template Web Site with a public Url in Windows Azure: (My site was called continuousdeliverywithteamcity.azurewebsites.net.)

A new WAWS site has only a template site.

Step 9) Set up the Web Site to support Git publishing

Click on the Web Site in the portal and then on “Set up Git publishing”. A Git repository for your WAWS is created in Windows Azure.

Set up a Git Repository in Windows Azure for your Web Site.

And finishes:

Git is enabled in seconds.

Right now you can go and integrate with a project you already have on CodePlex, GitHub or BitBucket. We are NOT going to do that here. Instead we want TeamCity to monitor changes in our Souce Control system and take action as we push/commit.

Another option is to not use Git at all but instead opt to integrate with the Team Foundation Service that Microsoft provides. This may be a better option for teams that are more used to running TFS.

Step 10) Create a repository on GitHub to store my web site code

For this demo I am creating a new empty GitHub repository. Naturally you can use any repository with a Web Site you already own. For the demo feel free to follow along and just create a new demo repository.

Create a Git Repository in GitHub for Source Control.

I selected to add a .gitignore file for csharp. We will make one modification to this file later on.

Step 11) Clone this repository to my local machine

I like to use GitHub for Windows and the Git Shell.

Copy the Git Http Read+Write Access Url from the newly created GitHub repository. Mine was https://github.com/MagnusDemo/teamcitydemowebsite.git

Open up the Git Shell in a new directory and use the git clone command:

git clone https://github.com/MagnusDemo/teamcitydemowebsite.git

You now have the Git repository cloned from GitHub to your local system.

Clone the GitHub repository to the local machine

Step 12) Create your Web Site in the Git repository

Naturally, again, you can use any Web Site you like here. It can be ASP.NET or PHP out of the box and you can use your favorite (Visual Studio) IDE. Honestly use notepad if you want! I’m going to go with Visual Studio and do File => New => Project => ASP.NET MVC 4.0 (and .NET Framework 4.5) Web Application:

Create an empty ASP.NET MVC 4.0 project

I also opt to create a Unit Test project:

Also create a Unit Test project

When the project has been created I make just a simple modification to the template on the index page: “”

Make a custom change to the template

You can naturally configure your VM to run Microsoft Unit Tests framework but we will swap that for NUnit in this demo. Remove the reference to Microsoft.VisualStudio.QualityTools.UnitTestFramework.dll in your Unit Test Project. Then right click on “References” and select “Manage NuGet packages…” Search for “NUnit” and install it in your project. Then go into your HomeControllerTests.cs and correct it to use NUnit:

Set up NUnit

Now you should hit F5 in Visual Studio to ensure your Web Application runs locally. Also you should run the Unit Tests in your Solution to see that those run OK.

Step 13) Push my web site to the GitHub repository

When you use a modern project template in Visual Studio it is going to use the NuGet Pagage Manager and install some packages as library references along with your template. For the default setup I chose there are 30 (!) such packages. When you build your code it is possible to “Enable NuGet Package Restore” for your solution. We are not going to use that method here though I’m certain that is also a good option.

Do NOT do this:

It is possible to set up NuGet package restore for a solution

What happens when you do this is the build process when you build your code will include a step for downloading any missing NuGet packages before building your code. Again we are not doing this.

Instead we are going to make one small change to our .gitignore file which was added when we created the GitHub repository. We will comment out the ignore for “packages” which will make it so that we will commit all of our packages along with our Solution.

Make sure packages are pushed to GitHub.

The default is “packages” and we change that to “#packages” effectively removing that ignore.

Now it’s time to run the following Git commands at our Git Shell prompt:

  • git add . (Adds all of our files to the Git Repository.)
  • git commit –am “Initial Commit.” (Confirms or Commits that my changes – all of the adds – are correct.)
  • git push (Pushes the local commit to the remote repository – the GitHub source.)

323 files just got pushed to GitHub.

Add, commit and push the lot to GitHub

(Can I just comment that I think that’s a LOT of files for being just a template web application? But that’s a side note and not really important right now.)

Over in your repository on GitHub in the browser you can now verify that indeed the code has been pushed there.

See the Source Code pushed to GitHub

Intermission

Wow – that’s a lot of work that we have just done! Time to just step back and see what we’ve done so far, where we are at, and what remains:

  • We have a WAVM running Windows Server 2012, TeamCity and Git.
  • We have a WAWS configured with Git.
  • We have a GitHub repository.
  • We have a Solution with a Web Application and a Unit Test project pushed to that repository.

The thing we have not done is configure TeamCity to tie all of this together. That’s our next and final big step. The last step is done automatically by Windows Azure and then we have some verifications to do. Plus I promised you I’d cover Windows Azure integrated Web Application Version Control and the “stepping back in time” feature.

Step 14) Configure TeamCity

Here is what we want TeamCity to do:

  • Notice when we push changes to the GitHub repo.
  • Pull those changes down locally on the VM.
  • Build the Solution.
  • Run the Unit Tests.
  • If everything succeeds go ahead and push the new Web Application to the Git Server on Windows Azure.

Go back to the browser and the TeamCity tab and click on “Create a project”. Name the project appropriately.

Create a new TeamCity Project

On the next screen select “Create build configuration” and enter the following data:

Create a Build Configuration

Then click on “VCS Settings >>”

On the screen “Version Control Settings” I select “VCS Checkout Mode” = “Automatically on Agent (if supported by VCS roots)” and then click above that on “Create and attach a new VCS root”.

Add Version Control Settings that point back to GitHub

Select “Git” from the drop down “Type of VCS”.

The “Fetch Url” is the copied from the GitHub Repository under the “Git Read-Only” link. Mine is git://github.com/MagnusDemo/teamcitydemowebsite.git

Set the Git read url

I made one more change here. For some reason I had problems with the TeamCity git version for pulling code from GitHub in this demo. Instead of figuring out why that is I set the setting “Path to Git” = “C:\Program Files (x86)\Git\bin\git.exe”.

Set the path to the Git.exe

No other settings or changes are required in this screen. At the bottom “Test Connection” to make sure TeamCity can read your GitHub repository. Then hit “Save”. Back in “Version Control Settings” click “Add Build Step >>” and then select “Runner Type” = “Visual Studio (sln)” since we are building a Visual Studio Solution.

Name your Build step and select the “Solution File Path”:

Select the Solution to build

The result should be something like this:

The first build step is set up

Add another build step and make it an “NUnit“ step.

Create an NUnit build step for the tests

In this screen the most important part is to set the path to the .dll containing your tests. Mine was “TeamCityDemoWebApplication.Tests\bin\Release\TeamCityDemoWebApplication.Tests.dll”. Also set .NET Runtime Version to 4.0.

Configure verision of tests and point to the unit tests

Your result after this should be something like this: Two Build Steps 1) Build in Release mode. 2) Run Unit Tests.

Both build and test build steps are now added

Next go back to the TeamCity Project overview screen (one step up) and “Create build configuration” again. This configuration is for pushing the Solution to Git in Windows Azure.

Create a Build Configuration for the deployment step

Select our existing “VCS root” and check “Display options” for “Show changes from snapshot dependencies”. Then “Add Build Step” = “Command Line”. On this screen there are a few specifics we need to control.

First we need to set the path to the .exe we want to use. That would be the path to the git.exe installed on the VM previously. In my case it’s “C:\Program Files (x86)\Git\bin\git.exe”.

Second we need to supply Command Parameters to make a git push. The base for this command is

git push <destination> <branch>

The command is “git push” the destination will be figured out below and the branch will be ‘master’ which is the default branch and the only one we have.

Regarding the destination we have to figure out how to git push with user credentials in one go. There are a few things we need to learn here:

A) supplying username and password in a git request means prepending them to the destination url on the following format:

https://<username>:<password>@<destination url>/<repositoryname>.git

If we don’t supply the credentials we will be prompted for username and/or password and this won’t work when running a command line on a build server. We have to push with no extra prompts.

B) We need to find the credentials to use for access to our Git Repository in Windows Azure for our WAWS.

Each user has ONE (and only one) set of personal credentials for accessing deployments in Windows Azure. You can set and reset those in the portal. The same credentials will be used for all of your WAWS. These are NOT the credentials we will be using here.

Additionally each WAWS you create also gets a set of Web Site specific credentials. This is one special set of credentials for one WAWS. There is a really neat trick to how you can find the base for this url in the WA Portal.

Go to the WAWS in the Windows Azure Portal and click into the Configure section. Copy the Git “Deployment Trigger Url”. Mine is:

https://$continuousdeliverywithteamcity:K6bTXnAxtwNDbGDpuNC8k3yBcJyw4y2x6lrSZKJP6G3hKmCvb6Wr6XQ0j0g4@continuousdeliverywithteamcity.scm.azurewebsites.net/deploy

Yes I am showing you my username and password here but it’s not a problem. By the time I publish this blog post I will have deleted this demo WAWS so you won’t be able to use the credentials for anything.

The only thing we need to do now is modify the url slightly at the end and replace “deploy” with our WAWS name:

https://$continuousdeliverywithteamcity:K6bTXnAxtwNDbGDpuNC8k3yBcJyw4y2x6lrSZKJP6G3hKmCvb6Wr6XQ0j0g4@continuousdeliverywithteamcity.scm.azurewebsites.net/continuousdeliverywithteamcity.git

Putting this Url together with the rest of the command makes the whole thing this in my case:

git push https://$continuousdeliverywithteamcity:K6bTXnAxtwNDbGDpuNC8k3yBcJyw4y2x6lrSZKJP6G3hKmCvb6Wr6XQ0j0g4@continuousdeliverywithteamcity.scm.azurewebsites.net/continuousdeliverywithteamcity.git master

Paste your version into “Command Parameters” and hit Save.

The settings for the Git push Command Line step

This should be similar to your result:

The Deploy step defined

There are two more thing we want to configure in the TeamCity Deploy Step: 1) Build Triggers and 2) Dependencies

Start by clicking on “Dependencies“ => “Add new Snapshot Dependency” and select the “Full Build” do depend on.

Add a build dependency

Next click on “Add new artifact dependency” and set that to the same “Full Build” (selected by default for me). Configure the rest of the settings same as this screen shot:

Add an artifact dependency

Now click on “5 Build Triggers” => “Add new trigger” => “VCS Trigger” and configure it accordingly:

Add a build trigger

This should be the result of adding the Build Trigger:

Added the build trigger

Now you have your TeamCity Project set up with two Build Configurations.

Installation and configuration is DONE! I know there has been a lot of configuration in this step of the blog post but NOW it’s time to test this bad girl! Ler

TeamCity Deploy Project with two parts.

Test the result

Click on “Projects” in the top left and you have a screen like this. Click on “Run” for the “Full Build” project and hopefully, fingers crossed, some magic will happen!

When the Full Build and Deploy Build Configurations both have run it should look like this:

First successful build on both parts

The last thing TeamCity does is push to the Git repository in Windows Azure for our WAWS.

Next the Windows Azure Git Server will take over and do the following with the Web Site:

  • Build it. (Again)
  • Store the new version for the future.
  • Deploy the new version.

All this “just happens” for us automatically. It’s the default setup in Windows Azure.

Go back to the tab you had open to your site (or reopen one if you have closed it). My site url was http://continuousdeliverywithteamcity.azurewebsites.net/. Here is my result:

The site successfully deployed

Make the following change to one of your unit tests and push that change to GitHub:

Assert.Fail(“Making sure TeamCity runs my tests.”);

This should be similar to your result. As you can see the Full Build failed on the build server because one test failed.

Fail the build on purpose due to a failing Unit Test

Remove the fail from the test and make a new change to your index page. Now we will deploy a newer version of the code and see that this is automatically deployed to our WAWS:

Make a new change to the index page

After pushing these TWO changes to GitHub the build passes through my TeamCity Service and deploys to my WAWS:

The new version is now deployed

The “Step Back in Time” feature

After having done all of this it’s nice to get a little treat! We have now deployed twice to our WAWS. One time with the message “Team City Deployed this!” and the second time with the message “Team City Deployed this! New Change!”. If you go to the portal and click on Deployments for the WAWS you will see them both:

Two deployment versions in the Windows Azure Portal

The mind-blowing treat is that you can click on the previous deployment and select “Redeploy” at the bottom of the page. This will redeploy your previous Web Site version in a matter of seconds!

Step back in time and redeploy the old version

When you browse back to the site and refresh you can see the result:

The old version is now redeployed

Yes that’s right you are back to the previous version of the site! But wait, there’s more! We are running this Web Site in one single free instance. Even if you scale this Web Site out to multiple instances, say three instances serving your Web Application for three times the throughput, the steps back and forth in history are just as fast. All of your Web Sites, even scaled out, will redeploy any version you pick in sync! That is just #awesome!

(This is the reason I did not want to use WebDeploy from our TeamCity server because then I’d effectively bypass this built in functionality. Sure I can do the same from TeamCity by running a previous Deploy build step but that’s not integrated is it!)

Summary

Rounding off, I re-remind myself to write shorter blog posts. #EpicFail

Most of this setup is really simple and straight forward. TeamCity is an excellent tool for those who love to take 100% control of their build environment, but I’m sorry to say it’s advanced nature takes some learning and getting used to. Once you master that tool though you have awesome power! (And awesome responsibility.)

Before I sign off I’d like to take this opportunity to thank the people who helped out in my research with git and TeamCity stumblings. I want to call them out and give credit where credit is due:

  • Maxim Manuylov and Hadi Hariri at JetBrains for helping me set up and configure TeamCity.
  • David Ebbo for assisting me with git and that narly one-liner I needed to push from TeamCity to WAWS.
  • Brady Gaster for cheering on and giving helpful hints and encouragement.

This is it folks – Every time my code is pushed it gets built, tested and deployed automagically using TeamCity in Windows Azure to my Windows Azure Web Site. Build fails and Unit Test fails will stop the deployment. If some other kind of logical error gets deployed I can step back in time and redeploy an old version of my site in a matter of seconds!

image_thumb11


<Return to section navigation list>

Live Windows Azure Apps, APIs, Tools and Test Harnesses

The Windows Azure Customer Advisory Team (CAT) reported We Have Moved to MSDN on or before 11/30/2012:

image_thumb75_thumb5All new content will be located on the MSDN Library.

See you there!

image_thumb22


<Return to section navigation list>

Visual Studio LightSwitch and Entity Framework 4.1+

The Visual Studio LightSwitch Team (@VSLightSwitch) added How to: Create or Add an HTML Client Project and other topics to their HTML Client Screens MSDN documentation:

imageYou can use a HTML client project to add cross-browser, mobile web client capabilities to your Visual Studio LightSwitch application. HTML clients are based on HTML5 and JavaScript and optimized for touch-first displays on mobile devices such as tablet computers that are running on a variety of operating systems.

You can add an HTML Client project to an existing LightSwitch desktop or web application, or you can create a solution that contains only an HTML Client project. In either case, a single Server project contains the data layer for your application. Each solution can contain multiple HTML Client projects but only one Desktop Client project.

NoteNote

When you add an HTML Client project to a LightSwitch solution, it’s upgraded to Microsoft LightSwitch HTML Client Preview 2 for Visual Studio 2012. The file structure of the solution is modified, and you can no longer open the solution on a computer that doesn’t have Microsoft LightSwitch HTML Client Preview 2 for Visual Studio 2012 installed.

To create an HTML Client solution
  1. On the menu bar, choose File, New, Project.

  2. In the New Project dialog box, choose either the LightSwitch HTML Application (VB) or LightSwitch HTML Application (C#) template.

    NoteNote

    JavaScript is the programming language for the HTML Client project. Depending on your choice, either Visual Basic or C# is the programming language for the Server project.

  3. Name the project, and then choose the OK button.

    The solution is added to Solution Explorer with Server and Client projects.

To add an HTML Client project to an existing solution
  1. In Solution Explorer, choose the Solution node.

  2. On the menu bar, choose Project, Add Client.

  3. In the Add Client dialog box, choose HTML Client.

  4. Name the project, and then choose the OK button.

    A client project is added to Solution Explorer and set as the Startup project.

See Also


Beth Massi (@bethmassi) reported New LightSwitch VS 2012 “How Do I?” Videos Released! in an 11/29 post:

imageCheck it out! We just released a couple new videos on the LightSwitch Developer Center’s “How Do I?” video section. These videos continue the Visual Studio 2012 series where I walk through the new features available in LightSwitch in VS 2012.

How Do I: Perform Automatic Row-Level Filtering of Data?

In this video, see how you can perform row level filtering by using the new Filter methods in LightSwitch in Visual Studio 2012. Since LightSwitch is all about data, one of the features we added in Visual Studio 2012 is the ability to filter sets of data no matter how or what client is accessing them. This allows you to set up system-wide filtering on your data to support row level security as well as multi-tenant scenarios.

How Do I: Deploy a LightSwitch App to Azure Websites?

In this video lean how you can deploy your LightSwitch applications to the new Azure websites using Visual Studio 2012. Azure Websites are for quick and easy web application and service deployments.

You can start for free and scale as you go. One of the many great features of LightSwitch is that it allows you to take your applications and easily deploy them to Azure.


Return to section navigation list>

Windows Azure Infrastructure and DevOps

BusinessWire reported Quest Software Delivers Enhanced Monitoring and Management for Windows Azure-based Applications in an 11/28/2012 press release:

imageQuest Software, Inc. (now part of Dell) today announced the general availability of Foglight® for Windows® Azure Applications, as well as beta availability of Foglight for Cloud Cost Management. Both are available immediately via the new Foglight On-Demand platform. This enhanced product offering, together with a new, currently free, cost management tool, enables developers to easily optimize the performance of applications built on the Windows Azure platform.

image_thumb75_thumb6News Facts:

  • Foglight for Windows Azure Applications - Now generally available, Foglight for Windows Azure Applications is a quick-to-deploy diagnostic and monitoring solution that helps assure the performance and availability of applications built on the Windows Azure platform. In addition, the solution provides real-time data on the health of the application, together with actual end-user experience measurements. Specifically, Foglight for Windows Azure Applications provides developers and DevOps specialists with these benefits:
    • Quality of Service insight enables a deeper understanding of the application’s performance quality, especially when it is performing less optimally than intended.
    • Synthetic availability testing ensures 24/7 application availability by combining actual website page load times with internal Azure application performance assurance data to ensure a complete view of all aspects of the application service delivery. With this information, developers can quickly identify and resolve performance issues no matter where they originate.
    • Real user monitoring enables developers and DevOps specialists to determine how many visitors the application is supporting, where users are coming from, and what browsers/devices they are using, so they can tailor and improve appeal accordingly.
    • Transaction analysis provides the capability to track which transactions take longest, which are the most common, which consume the most time overall, and which have the greatest negative impact on a user’s experience. This enables developers to pinpoint where to focus efforts to improve the user experience.
    • Application and infrastructure health enables developers to identify problems with the Azure infrastructure or the application, and receive warnings if there are issues that have the potential to affect end users. In addition, developers also can receive alerts on issues that already are impacting end users.
  • Foglight for Cloud Cost Management – Available as a beta today, this new, currently free, billing and usage management tool analyzes Windows Azure accounts to help identify and minimize waste. The product drills into subscription data to provide a detailed view of resource utilization that can be interpreted to show exactly where expenses originate in the cloud. By plugging into an organization’s Windows Azure account, customers using the product can slice and dice billing detail within the application for customized reporting and project mapping. Specifically, the product provides:
    • Increased insight into billing
    • Features that enable developers to manage and forecast expenses
    • The ability to manage multiple Windows Azure accounts

Through a partnership with Satory Global, a consulting firm recognized as a Microsoft Azure Circle West Region partner of the year, Quest also offers developers planning and architecture assistance for future Azure applications, as well as migration services to efficiently move an application from on-premises to the cloud, or to build the application directly in the cloud.

Quotes:

Steve Rosenberg, general manager, Performance Monitoring Business Unit, Quest Software

“The enhanced functionality in Foglight for Windows Azure Applications and the accompanying new and free tools open the door for Quest to provide valuable services at a nominal price to small- and mid-sized organizations that are leveraging Azure today. For large enterprises, these new solutions provide the assurance that when they are ready to leverage cloud platforms, or look to use SaaS-based application performance monitoring (APM), Quest will be there to support them.”

Anne Isaacs, CEO, Satory Global

“As Microsoft continues to extend Azure’s functionality and versatility, our clients recognize Foglight for Windows Azure Applications to be a valuable tool to manage and monitor applications hosted either entirely in Azure, or in hybrid environments.”

Supporting Resources:

About Quest Software (now a part of Dell)

Dell Inc. (NASDAQ: DELL) listens to customers and delivers innovative technology and services that give them the power to do more. Quest, now a part of Dell’s Software Group, provides simple and innovative IT management solutions that enable more than 100,000 global customers to save time and money across physical and virtual environments. Quest products solve complex IT challenges ranging from database management, data protection, identity and access management, monitoring, user workspace management to Windows management. For more information, visit http://www.quest.com or http://www.dell.com.


<Return to section navigation list>

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

image_thumb75_thumb7No significant articles today


<Return to section navigation list>

Cloud Security and Governance

• Dan Reagan (@danreagan) described Protecting your Windows Store app from unauthorized use in an 11/29/2012 thread in the Building Windows Store apps with C# or VB forum:

imageWindows Store app licensing provides base piracy protection for all apps acquired from the Windows Store. It protects your apps from being installed on unauthorized machines and from various attempts to tamper with your app or the system on which it is running. Recently, we’ve seen attempts from a small group of users to circumvent these protections, and wanted to provide you with some techniques that you can use to further ensure that only valid customers use your apps.

About the licensing system
When customers acquire your app from the Store, the licensing system registers a license for that app. This license is tied to a customer’s Microsoft account, and is validated against the user’s machine when the app is installed, launched or updated. If a valid license does not exist, the app won’t run.

To help make sure customers run your app in the way that you intended, the licensing system also validates the integrity of your app logic installed by your app package. If this has been altered after installation, Windows displays a glyph in the lower-right corner of the app’s tile in the Start screen, letting customers know that something might be wrong with the app. When customers try to launch an app in this state, Windows prompts them to repair the app, which starts a reinstallation of the app from the Store. When this happens, a customer won’t lose any of the data or settings associated with the app—regardless of whether the alteration happened because of a hardware failure, potential fraudulent actions, or some other the reason.

This design helps ensure users don’t mistakenly or unintentionally work around the protection systems that we have put in place to safeguard your apps. And, after the app is re-installed, customers can quickly return to enjoying the app as when they initially acquired it.

Circumvention attempts
Unfortunately, a few people have attempted to circumvent the Windows licensing system to get apps for free. These attempts fall into two types: modifying Windows system components, or modifying a specific app.

Modifying Windows system components
When an attacker wants to get access to a wide set of apps for free, or circumvent existing licensing restrictions, they may attempt to hack Windows directly. This strategy requires an individual to use their administrative privileges to replace or modify the behavior of Windows system components. When a user makes this decision, it often has implications well beyond the ability to acquire apps for free. The download packages that enable these exploits often carry malicious payloads that have significant adverse effects such as system instability, personal data loss and reduced performance.

We have seen people use this method to extend trial durations, convert trials to full featured applications and to install an applications on more machines than 5 computers. Fortunately, Windows RT devices aren’t affected by these attempts, because these devices prevent installation of exploits that require installing untrusted code.

To prevent users from employing this strategy to illegally use your apps, we have a receipt feature that allows you to validate a user’s access to your app and service. Your app is able to obtain a signed receipt for any app-related transaction made through the Windows Store, such as the initial purchase of the app and any in-app purchases. Your app can then use this info to determine what services or features it can access for that user. A sample of a user’s receipt info is shown here.

<Receipt Version="1.0" 
ReceiptDate="2012-11-25T11:34:05-08:00" 
CertificateId="b809e47cd0110a4db043b3f73e83acd917fe1336" 
ReceiptDeviceId="4e362949-acc3-fe3a-e71b-89893eb4f528">
  <AppReceipt
    Id="182A6BB6-A7CE-4040-94E9-44AF572D7FD5"
    AppId="contoso.SampleApp_5q2xcn1j1t576"
    PurchaseDate="2012-11-14T15:48:12-08:00"
    LicenseType="Full"/>
  <ProductReceipt
    Id="90576E27-037A-4AE1-8BB8-32BD5E92E940"
    ProductId="In App Product 1"
    PurchaseDate="2012-11-18T15:49:23-08:00" 
    ProductType="Durable"
    AppId="contoso.SampleApp_5q2xcn1j1t576"/>
  <Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
      ...
  </Signature>
</Receipt>


The ID of each receipt element is unique per user (and device) and you can validate it on your server to confirm that the transaction is legitimate for that user and not a fraudulent transaction. This is especially valuable when your app has its own authentication mechanism because it allows you to validate that each user that appears to have purchased your app is, in fact, a unique customer. For more information about how to validate the receipt see Verifying purchases using receipts.

Modifying an app
We’ve seen some people attempt to alter an app to modify its behavior or get access to your intellectual property by trying to modify your app directly. Often, this strategy is used to hide advertisements and gain access to content or features that they otherwise wouldn’t have access.

There are several steps you can take to protect against this strategy—such as safeguarding elements that ship with your app and by designing your app so that the sensitive elements are on your server and not in your app.

In the app, you can both obfuscate and encrypt the sensitive portions of your app. We believe that having a rich choice of obfuscation and encryption options—tuned to the types of exploits you are trying to thwart—will help you to take the appropriate steps in protecting your apps as necessary.

You can also protect your more sensitive algorithms and business logic from reverse engineering by putting them on a server and not in your app. This keeps them in an environment that is completely in your control and requires that you only pass the initial data and the results between your service and your app. For even more protection, you can encrypt and decrypt static data or a data stream within your app by using the DataProtectionProvider class in the Windows Runtime.


@danreagan || Team Manager, Windows Store Developer Solutions #WSDevSol || Want more solutions? See our blog! http://aka.ms/t4vuvz


Chris Hoff (@Beaker) posted a Quick Quip: Capability, Reliability and Liability…Security [Engineer] Licensing in an 11/29/2012 post:

imageEarlier today, I tweeted the following and was commented on by Dan Kaminsky (@dakami):

@beaker “In general, security engineers are neither.”

— Dan Kaminsky (@dakami) November 28, 2012

image_thumb2…which I explained with:

We practice and focus on physical infrastructure, network and application security as relevant discipline slices, but “information?”…

— [Christofer] Hoff (@Beaker) November 28, 2012

This led to a very interesting comment by Preston Wood who suggested something very interesting from the perspective of both leadership and accountability:

@beaker @dakami It’s time for a degreed or licensed requirement for security decision makers – just like other critical professions

— Preston Wood (@preston_wood) November 28, 2012

…and that brought forward another insightful comment:

@beaker @dakami would be nice if there were engineers out there that did accept liability.It’d end charlatanism quickly.

— ǝɔʎoſ ʇʇɐW (@openfly) November 28, 2012

Pretty interesting, right? Engineers, architects, medical practitioners, etc. all have degrees/licenses and absorb liability upon failure. What about security?

What do you think about this concept?

Previous attempts to license “Software Engineers” in the State of California failed, and I expect similar efforts for “Security Engineers” will meet the same fate. The cost of errors and omissions insurance for “Security Engineers” is likely to be astronomic.


<Return to section navigation list>

Cloud Computing Events

•• Andy Cross (@andybareweb) and Richard Conway (@azurecoder) will present HDInsight on Azure to the UK Windows Azure User Group in London, UK on 1/25/2013:

4 sessions:

Tyler Doerksen (@tyler_gd) announced on 11/28/2012 a Vancouver Web Camp, Dec 4, In-Person and Online:

Vancouver Web Camp

imageThe name of the game is building beautiful, interactive, and fast web sites! On December 4th join Brady Gaster, Jon Galloway, and Xinyang Qiu direct from Redmond as they show you how to build sites using:

  • ASP.NET
  • HTML5
  • jQuery
  • WebAPI
  • SignalR
  • image_thumb75_thumb8and of course Windows Azure

If you cannot make it in person to Vancouver you can join me online for the virtual event stream. Register Here…

Check out this list of great topics:

  • What’s new in ASP.NET 4.5
  • Building and deploying websites with ASP.NET MVC 4
  • Creating HTML5 Applications with jQuery
  • Building a service layer with ASP.NET Web API
  • Leveraging your ASP.NET development skills to build Office Apps
  • Building and leveraging social services in ASP.NET
  • Building for the mobile web
  • Realtime communications with SignalR
  • Using Cloud Application Services

In-Person Registration

Online Registration


<Return to section navigation list>

Other Cloud Computing Platforms and Services

• Andrew Brust (@andrewbrust) posted Amazon Redshift: ParAccel in, costly appliances out to ZDNet’s Big on Data blog on 11/30/2012:

imageBack in July, Data Warehouse vendor ParAccel announced it had a new investor: Amazon. Then yesterday, Amazon announced its new cloud Data Warehouse as a service offering: Redshift. And, none too surprisingly, it turns out that Redshift is based on ParAccel’s technology. I spoke to Rich Ghiossi and John Santaferraro, ParAccel’s VPs of Marketing and Solutions/Product Marketing, respectively, who explained some of the subtleties to me and helped me think through some others.

We don't need no stinkin' appliances
imageParAccel takes a rather radical approach compared to other vendors in the Massively Parallel Processing (MPP) Data Warehouse category: the company designed its software to run on commodity hardware. Most MPP vendors (including Teradata, HP/Vertica, IBM/Netezza, EMC/Greenplum and Microsoft) sell their products only in the form of an appliance. Inside those appliance cabinets, typically, lies a cluster of finely tuned server, storage and networking hardware, It’s an optimized, high-performance approach to data warehousing. It’s also expensive, and it keeps certain customers out. ParAccel decouples MPP technology from expensive appliance hardware.

Down with false choices
imageHadoop, of course, takes the commodity hardware approach as well. And that likely accounts for its runaway popularity as a Big Data platform. But MPP is big data technology too, as I’ve said many times before:

The problem with Hadoop, though, is that its native query mechanism is MapReduce code, rendering it incompatible with the massive product and skillset ecosystem around SQL. Over the last several months, vendors such as Cloudera and Microsoft have sought a fusion of SQL and Hadoop. Other vendors, like Rainstor and Hadapt, have been pursuing that fusion for a while.

Pure play
But why hybridize SQL with Hadoop, when MPP data warehouses that can handle Petabyte-scale big data workloads use SQL natively? Chiefly, the reason has been because MPP carried the appliance barrier-to-entry, so you had to choose between SQL on an appliance and Hadoop on commodity hardware. ParAccel smashed that dichotomy, but the company is still growing and so, for many, the dichotomy has stood.

imageBut Amazon is attacking that dichotomy further, because now ParAccel-based, petabyte-scale MPP technology is elastic. It’s available in the cloud, on-demand, running on a cluster sized according to your needs. You don’t have the build the cluster; and you don’t have to provision the hardware.

Appliances only scale up to what’s inside them, and that may be a lot more than needed initially. As far as elasticity goes, that’s the worst of both worlds. With Redshift, and these are Amazon’s own words, "Scaling a cluster to improve performance or increase capacity is simple and incurs no downtime."

This opens up all sorts of scenarios. Amazon claims the cost of Redshift is under $1000 US per Terabyte, per year. So many organizations could quite easily keep their core data warehouse in the cloud. But Redshift seems to lend itself to ephemeral use too: why spin up an Elastic MapReduce Hadoop cluster to analyze your data when you can spin up an MPP data warehouse (that your existing BI tools can query) just as easily?

On-prem, and off
Of course $1000/TB/year that means you’ll be paying at least $1 million/year for a Petabyte data warehouse. But when you factor in the hardware, storage, personnel/management, power and other costs of running such a large warehouse on premise, that ain’t so bad. If you’re really working at Petabyte-scale, that number shouldn’t bother you.

Does that mean on-premise MPP data warehouses are passé? I wouldn’t say so. First, there’s the issue of bandwidth restraints on data movement that I cited in my news piece on Redshift yesterday. But second, the full on-premise ParAccel product includes features like On-Demand Integration Services, extensibility, user-defined functions, embedded analytics and certain optimizations that Redshift doesn’t offer.

Shifting winds
This is definitely a case of "use the right tool for the right job." But the appliance-shy, who have been trying to run their data warehouses on conventional, non-MPP relational databases and have found performance lacking, now have some choices, including the ability to try-before-they-buy by using Redshift in the cloud.

And which conventional relational database might Amazon wish customers to "shift" that warehouse from? Well there’s a big one that uses a lot of red in its logo. Just sayin’.


• David Linthicum (@DavidLinthicum) asserted “Amazon.com's new data warehousing service is aimed at pulling customers from traditional providers like IBM, Oracle, and Teradata” in a deck for his Big data gets Amazoned with Redshift article of 11/30/2012 for InfoWorld’s Cloud Computing blog:

imageBig data is, well, big. And messy. And intimidating to many IT shops and would-be business users. Its complexity also suggests high prices, which also gives would-be adopters pause. So Amazon.com's announcement of its low-cost Redshift big data service as part of the Amazon Web Services (AWS) suite is a big deal.

imageRetailers have long complained about being "Amazoned" -- that is, undercut in price and surpassed in both availability and service by Amazon.com's online store. AWS has "Amazoned" the hosted services world as well, making what we now call IaaS cheap and easy. Now big data looks as if it might get "Amazoned," too.

Amazon's sales pitch for Redshift lays down the gauntlet for the big data vendors such as IBM, Oracle, and Teradata, which typically come out of the pricier, more complex business intelligence or data warehousing worlds: "Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. Amazon Redshift offers you fast query performance when analyzing virtually any size data set using the same SQL-based tools and business intelligence applications you use today."

Redshift is a clear attempt to steal big business in big database from the traditional enterprise providers. In other words, Redshift targets competitors that pull billions out of enterprises through ongoing licenses and support fees.

To net some of those billions, Amazon has made it easy for you to test-drive Redshift, by incorporating into to the AWS Management Console, where you can launch a Redshift cluster. Clusters start with a few hundred gigabytes of data and scale to a petabyte or more, for less than $1,000 per terabyte per year. (I suspect most enterprises will blow well past a terabyte, so estimate your likely usage before you make the move.)

Most enterprise hate to pay those huge fees to support huge databases. In the past, there were no other options. But now there is. If Redshift lives up to just a portion of the expectations set by Amazon, it could force enterprises to rethink how they consume database technology. The cost benefits alone could drive some quick changes.

What may keep many enterprises off Redshift is the fact that most of those charged with maintaining data warehouses in larger enterprises are not big fans of cloud computing. They consider the technology too new, insecure, and unreliable.

On the other hand, if Redshift can do what the on-premise databases can do at a fraction of the budget, those concerns will evaporate -- or be overruled by business units. In that case, bye-bye Oracle, hello guys who also sell books.


Jeff Barr (@jeffbarr) described The New AWS Data Pipeline in an 11/29/2012 post:

imageData. Information. Big Data. Business Intelligence. It's all the rage these days. Companies of all sizes are realizing that managing data is a lot more complicated and time consuming than in the past, despite the fact that the cost of the underlying storage continues to decline.

image_thumb111Buried deep within this mountain of data is the "captive intelligence" that you could be using to expand and improve your business. Your need to move, sort, filter, reformat, analyze, and report on this data in order to make use of it. To make matters more challenging, you need to do this quickly (so that you can respond in real time) and you need to do it repetitively (you might need fresh reports every hour, day, or week).

Data Issues and Challenges
Here are some of the issues that we are hearing about from our customers when we ask them about their data processing challenges:

Increasing Size - There's simply a lot of raw and processed data floating around these days. There are log files, data collected from sensors, transaction histories, public data sets, and lots more.

Variety of Formats - There are so many ways to store data: CSV files, Apache logs, flat files, rows in a relational database, tuples in a NoSQL database, XML, JSON to name a few.

Disparate Storage - There are all sorts of systems out there. You've got your own data warehouse (or Amazon Redshift), Amazon S3, Relational Database Service (RDS) database instances running MySQL, Oracle, or Windows Server, DynamoDB, other database servers running on Amazon EC2 instances or on-premises, and so forth.

Distributed, Scalable Processing - There are lots of ways to process the data: On-Demand or Spot EC2 instances, an Elastic MapReduce cluster, or physical hardware. Or some combination of any and all of the above just to make it challenging!

Hello, AWS Data Pipeline
Our new AWS Data Pipeline product will help you to deal with all of these issues in a scalable fashion. You can now automate the movement and processing of any amount of data using data-driven workflows and built-in dependency checking.

Let's start by taking a look at the basic concepts:

A Pipeline is composed of a set of data sources, preconditions, destinations, processing steps, and an operational schedule, all definied in a Pipeline Definition.

The definition specifies where the data comes from, what to do with it, and where to store it. You can create a Pipeline Definition in the AWS Management Console or externally, in text form.

Once you define and activate a pipeline, it will run according to a regular schedule. You could, for example, arrange to copy log files from a cluster of Amazon EC2 instances to an S3 bucket every day, and then launch a massively parallel data analysis job on an Elastic MapReduce cluster once a week. All internal and external data references (e.g. file names and S3 URLs) in the Pipeline Definition can be computed on the fly so you can use convenient naming conventions like raw_log_YYYY_MM_DD.txt for your input, intermediate, and output files.

Your Pipeline Definition can include a precondition. Think of a precondition an an assertion that must hold in order for processing to begin. For example, you could use a precondition to assert that an input file is present.

AWS Data Pipeline will take care of all of the details for you. It will wait until any preconditions are satisfied and will then schedule and manage the tasks per the Pipeline Definition. For example, you can wait until a particular input file is present.

Processing tasks can run on EC2 instances, Elastic MapReduce clusters, or physical hardware. AWS Data Pipeline can launch and manage EC2 instances and EMR clusters as needed. To take advantage of long-running EC2 instances and physical hardware, we also provide an open source tool called the Task Runner. Each running instance of a Task Runner polls the AWS Data Pipeline in pursuit of jobs of a specific type and executes them as they become available.

When a pipeline completes, a message will be sent to the Amazon SNS topic of your choice. You can also arrange to send messages when a processing step fails to complete after a specified number of retries or if it takes longer than a configurable amount of time to complete.

From the Console
You will be able to design, monitor, and manage your pipelines from within the AWS Management Console:

API and Command Line Access
In addition to the AWS Management Console access, you will also be able to access the AWS Data Pipeline through a set of APIs and from the command line.

You can create a Pipeline Definition in a text file in JSON format; here's a snippet that will copy data from one Amazon S3 location to another:

{
"name" : "S3ToS3Copy",
"type" : "CopyActivity",
"schedule" : {"ref" : "CopyPeriod"},
"input" : {"ref" : "InputData"},
"output" : {"ref" : "OutputData"}
}

Coming Soon
The AWS Data Pipeline is currrently in a limited private beta. If you are interested in participating, please contact AWS sales.

Stay tuned to the blog for more information on the upcoming public beta


Jeff Barr (@jeffbarr) announced Amazon ElastiCache - Now With Auto Discovery in an 11/29/2012 post:

imageAmazon ElastiCache gives you the power to improve application performance by adding a flexible, scalable caching layer between your application and your database.

Today we are happy to announce a new feature that will make ElastiCache even easier to use. Our new Auto Discovery feature will allow your applications to automatically and transparently adapt to the addition or deletion of cache nodes from your cache clusters. Your applications can now react more quickly to changes in your cache cluster without downtime or manual intervention.

image_thumb111To use Amazon ElastiCache you have to set up a cache cluster. A cache cluster is a collection of cache nodes. You choose the number and the type of nodes to match the performance needs of your application. In the past, if you changed the nodes in your cache cluster (for example, by adding a new node), you would have to update the list of node endpoints manually. Typically, updating the list of node endpoints involves reinitializing the client by shutting down and restarting the application, which can result in downtime (depending on how the client application is architected). With the launch of Auto Discovery, this complexity has been eliminated.

All ElastiCache clusters (new and existing!) now include a unique Configuration Endpoint, which is a DNS Record that is valid for the lifetime of the cluster. This DNS Record contains the DNS names of each of the nodes that belong to the cluster. Amazon ElastiCache will ensure that the Configuration Endpoint always points to at least one such “target” node. A query to the target node then returns endpoints for all the nodes in the cluster. To be a bit more specific, running a query means sending the config command to the target node. We implemented this command as an extension to the Memcached ASCII protocol (read about Adding Auto-Discovery to Your Client Library for more information).

You can then connect to the cluster nodes just as before and use the Memcached protocol commands such as get, set, incr, and decr. The Configuration Endpoint is accessible programmatically through the ElastiCache API, via the command line tools, and from the ElastiCache Console:

To take advantage of Auto Discovery, you will need to use a Memcached client library that is able to use this new feature. To get started, you can use the ElastiCache Cluster Client, which takes the popular SpyMemcached client and adds Auto Discovery functionality. We have a Java client available now (view source), which can be downloaded from the ElastiCache Console:

We plan to add Auto Discovery support to other popular Memcached client libraries over time; a PHP client is already in the works.

ElastiCache remains 100% Memcached-compatible so you can keep using your existing Memcached client libraries with new and existing clusters, but to take advantage of Auto Discovery you must use an Auto Discovery-capable client.


Emil Protalinski (@EmilProtalinski) reported Following Amazon price cuts, Google slashes Cloud Storage prices by another 10% for a total of 30% in an 11/29/2012 post to The Next Web:

1306463_34868199

imageGoogle on Monday announced new Cloud Platform features, including new storage and compute capabilities, lower prices, and more European Datacenter support. The price cut was some 20 percent, but after Amazon cut its S3 prices by about 25 percent on Wednesday, Google responded on Thursday by adding another 10 percent price slash on top, for a total 30 percent reduction.

The new prices apply to all Cloud Storage regions and the new DRA Storage. Here’s the new table with all the information you need to get up to speed (see the Google Cloud Storage Pricing page for more):

image

For comparison, here is Amazon’s storage table:

image

imageGoogle is clearly eager to start a price war with Amazon. Here’s what the company had to say about the move:

We are committed to delivering the best value in the marketplace to businesses and developers looking to operate in the cloud. That’s why today we are reducing the price of Google Cloud Storage by an additional 10%, resulting in a total price reduction of over 30%.

As far as we’re concerned, this sounds like marketing spin. Google looked at Amazon’s prices, saw that they were lower, and decided it had to react. Every other explanation is just an attempt to make Google customers feel like they are being treated well, in our opinion.

image_thumb111The reality is Amazon offers great bang for your buck when it comes to using its cloud, and Google is struggling to match them. Amazon has been reducing prices ever since it first launched its storage services.

Going forward, it looks like Google and Amazon will be battling on price in more than just the tablet space. That shouldn’t surprise anyone as the two firms have been trading punches for months now.

See also – Google introduces six-month trial for Cloud SQL, ups storage 10x to 100GB, adds EU datacenter and Amazon’s Maps API now available to all developers, becomes part of Mobile App SDK

Image credit: Hans Thoursie

It will be interesting to see whether Microsoft meets Amazon or Google prices for Windows Azure Blob Storage.


Jeff Barr (@jeffbarr) reported an Amazon S3 Storage Price Reduction (24 to 28%) in an 11/29/2012 post:

imageI'm writing to you from the floor of AWS re:Invent, where a capacity crowd is learning all about the latest and greatest AWS developments. As part of the welcoming keynote, AWS Senior VP Andy Jassy announced that we’re reducing prices again. This is our 24th price reduction - we continue to innovate on our customers’ behalf, and we’re delighted to pass savings on to you.

image_thumb111We’re reducing the price of Amazon S3 storage by 24-28% in the US Standard Region, and making commensurate price reductions in all our nine regions worldwide as well as reducing the price of Reduced Redundancy Storage (RRS). Here are the new prices for Standard Storage in the US Standard Region:

image

The new prices are listed on the Amazon S3 pricing announcement page. The new prices take effect on December 1, 2012 and will be applied automatically.

Andy also announced that Amazon S3 now stores 1.3 trillion objects and is regularly peaking at over 800,000 requests per second. We’ve often talked about the benefits of AWS’s scale. This massive scale is enabling us to make these Amazon S3 price reductions across all of our nine Regions world-wide.

We are also reducing the per-gigabyte storage cost for EBS snapshots, again world-wide. Here are the new prices:

image


Chris Hoff (@Beaker) answered Why Amazon Web Services (AWS) Is the Best Thing To Happen To Security & Why I Desperately Want It To Succeed in an 11/29/2012 post to his Rational Survivability blog:

imageMany people who may only casually read my blog or peer at the timeline of my tweets may come away with the opinion that I suffer from confirmation bias when I speak about security and Cloud.

That is, many conclude that I am pro Private Cloud and against Public Cloud.

I find this deliciously ironic and wildly inaccurate. However, I must also take responsibility for this, as anytime one threads the needle and attempts to present a view from both sides with regard to incendiary topics without planting a polarizing stake in the ground, it gets confusing.

Let me clear some things up.

Digging deeper into what I believe, one would actually find that my blog, tweets, presentations, talks and keynotes highlight deficiencies in current security practices and solutions on the part of providers, practitioners and users in both Public AND Private Cloud, and in my own estimation, deliver an operationally-centric perspective that is reasonably critical and yet sensitive to emergent paths as well as the well-trodden path behind us.

I’m not a developer. I dabble in little bits of code (interpreted and compiled) for humor and to try and remain relevant. Nor am I an application security expert for the same reason. However, I spend a lot of time around developers of all sorts, those that write code for machines whose end goal isn’t to deliver applications directly, but rather help deliver them securely. Which may seem odd as you read on…

The name of this blog, Rational Survivability, highlights my belief that the last two decades of security architecture and practices — while useful in foundation — requires a rather aggressive tune-up of priorities.

Our trust models, architecture, and operational silos have not kept pace with the velocity of the environments they were initially designed to support and unfortunately as defenders, we’ve been outpaced by both developers and attackers.

Since we’ve come to the conclusion that there’s no such thing as perfect security, “survivability” is a better goal. Survivability leverages “security” and is ultimately a subset of resilience but is defined as the “…capability of a system to fulfill its mission, in a timely manner, in the presence of attacks, failures, or accidents.”

Sharp readers will immediately recognize the parallels between this definition of “survivability,” how security applies within context, and how phrases like “design for failure” align. In fact, this is one of the calling cards of a company that has become synonymous with (IaaS) Public Cloud: Amazon Web Services (AWS.) I’ll use them as an example going forward.

So here’s a line in the sand that I think will be polarizing enough:

image_thumb111I really hope that AWS continues to gain traction with the Enterprise. I hope that AWS continues to disrupt the network and security ecosystem. I hope that AWS continues to pressure the status quo and I hope that they do it quickly.

Why?

Almost a decade ago, the Open Group’s Jericho Forum published their Commandments. Designed to promote a change in thinking and operational constructs with respect to security, what they presciently released upon the world describes a point at which one might imagine taking one’s most important assets and connecting them directly to the Internet and the shifts required to understand what that would mean to “security”:

  1. The scope and level of protection should be specific and appropriate to the asset at risk.
  2. Security mechanisms must be pervasive, simple, scalable, and easy to manage.
  3. Assume context at your peril.
  4. Devices and applications must communicate using open, secure protocols.
  5. All devices must be capable of maintaining their security policy on an un-trusted network.
  6. All people, processes, and technology must have declared and transparent levels of trust for any transaction to take place.
  7. Mutual trust assurance levels must be determinable.
  8. Authentication, authorization, and accountability must interoperate/exchange outside of your locus/area of control
  9. Access to data should be controlled by security attributes of the data itself
  10. Data privacy (and security of any asset of sufficiently high value) requires a segregation of duties/privileges
  11. By default, data must be appropriately secured when stored, in transit, and in use.

These seem harmless enough today, but were quite unsettling when paired with the notion of “de-perimieterization” which was often misconstrued to mean the immediate disposal of firewalls. Many security professionals appreciated the commandments for what they expressed, but the the design patterns, availability of solutions and belief systems of traditionalists constrained traction.

Interestingly enough, now that the technology, platforms, and utility services have evolved to enable these sorts of capabilities, and in fact have stressed our approaches to date, these exact tenets are what Public Cloud forces us to come to terms with.

If one were to look at what public cloud services like AWS mean when aligned to traditional “enterprise” security architecture, operations and solutions, and map that against the Jericho Forum’s Commandments, it enables such a perfect rethink.

Instead of being focused on implementing “security” to protect applications and information based at the network layer — which is more often than not blind to both, contextually and semantically — public cloud computing forces us to shift our security models back to protecting the things that matter most: the information and the conduits that traffic in them (applications.)

As networks become more abstracted, it means that existing security models do also. This means that we must think about security programatticaly and embedded as a functional delivery requirement of the application.

“Security” in complex, distributed and networked systems is NOT a tidy simple atomic service. It is, unfortunately, represented as such because we choose to use a single noun to represent an aggregate of many sub-services, shotgunned across many layers, each with its own context, metadata, protocols and consumption models.

As the use cases for public cloud obscure and abstract these layers — flattens them — we’re left with the core of that which we should focus:

Build secure, reliable, resilient, and survivable systems of applications, comprised of secure services, atop platforms that are themselves engineered to do the same in way in which the information which transits them inherits these qualities.

So if Public Cloud forces one to think this way, how does one relate this to practices of today?

Frankly, enterprise (network) security design patterns are a crutch. The screened-subnet DMZ patterns with perimeters is outmoded. As Gunnar Peterson eloquently described, our best attempts at “security” over time are always some variation of firewalls and SSL. This is the sux0r. Importantly, this is not stated to blame anyone or suggest that a bad job is being done, but rather that a better one can be.

It’s not like we don’t know *what* the problems are, we just don’t invest in solving them as long term projects. Instead, we deploy compensation that defers what is now becoming more inevitable: the compromise of applications that are poorly engineered and defended by systems that have no knowledge or context of the things they are defending.

We all know this, but yet looking at most private cloud platforms and implementations, we gravitate toward replicating these traditional design patterns logically after we’ve gone to so much trouble to articulate our way around them. Public clouds make us approach what, where and how we apply “security” differently because we don’t have these crutches.

Either we learn to walk without them or simply not move forward.

Now, let me be clear. I’m not suggesting that we don’t need security controls, but I do mean that we need a different and better application of them at a different level, protecting things that aren’t tied to physical topology or addressing schemes…or operating systems (inclusive of things like hypervisors, also.)

I think we’re getting closer. Beyond infrastructure as a service, platform as a service gets us even closer.

Interestingly, at the same time we see the evolution of computing with Public Cloud, networking is also undergoing a renaissance, and as this occurs, security is coming along for the ride. Because it has to.

As I was writing this blog (ironically in the parking lot of VMware awaiting the start of a meeting to discuss abstraction, networking and security,) James Staten (Forrester) tweeted something from @Werner Vogels keynote at AWS re:invent:

Werner: “There’s no excuse not to use fine grained security to make your apps secure from the start.” Echoing @kindervag Zero Trust

— Staten7 (@Staten7) November 29, 2012

I couldn’t have said it better myself :)

So while I may have been, and will continue to be, a thorn in the side of platform providers to improve the “survivability” capabilities to help us get from there to there, I reiterate the title of this scribbling: Amazon Web Services (AWS) Is the Best Thing To Happen To Security & I Desperately Want It To Succeed.

I trust that’s clear?

/Hoff

P.S. There’s so much more I could/should write, but I’m late for the meeting :)

Related articles


Andrew Brust (@andrewbrust) asserted “Amazon Web Services steps into the world of cloud-based data warehousing, and Jaspersoft's right there with them” in a deck for his Amazon announces “Redshift” cloud data warehouse, with Jaspersoft support article of 11/28/2012 for ZD Net’s Big Data blog:

imageIn Las Vegas today, Amazon Web Services (AWS) is having its first ever AWS re:Invent conference, catering to partners and customers of Amazon's cloud platform. And for the worlds of BI and Big Data, Amazon’s announcing what could be a ground-breaking new offering: its "Redshift" Data Warehouse as a Service.

Partners Gathering
image_thumb111While details are still forthcoming, there are already partnerships forming. Specifically, open source BI provider Jaspersoft announced its support for Redshift today as well. The company explained that its data visualization and analytics technology can integrate with Redshift, and can be used in a standalone fashion or be embedded within applications. Jaspersoft can connect to Redshift whether the former is running on-premise or in the cloud.

Amazon's Web site for Redshift also notes MicroStrategy as a Redshift partner.

Is it Big Data?
Amazon is pitching Redshift as a Big Data platform and Jaspersoft is pitching its support of Redshift as a Big Data analytics solution. According to Jaspersoft personnel, Redshift is a relational data warehouse of Amazon's own fashioning, that can handle multi-terabyte and even petabyte-scale data volumes. And AWS' site makes it clear that Redshift employs a clustered, Massively Parallel Processing (MPP) architecture, which would be necessary for such workloads. AWS' site also suggests that, like many MPP products, Redshift is based on PostgreSQL.

But data warehousing in the cloud can be tricky, given how bandwidth constraints can impede the upload of the large volumes of data necessary to get an implementation kicked off. Perhaps AWS Import/Export’s support of customers shipping physical hard drives, whose contents can then be loaded into AWS’ Simple Storage Service (S3), can mitigate the bandwidth impediments.

Post-Transactional Cloud
Whether we call it data warehousing, BI or Big Data, Redshift puts it in the cloud, and takes AWS well beyond the transactional database capabilities of its Relational Database Service (RDS).

Microsoft, Google and Rackspace: it’s your move now.


James Hamilton posted Redshift: Data Warehousing at Scale in the Cloud to his Perspectives blog on 11/28/2012:

imageI’ve worked in or near the database engine world for more than 25 years. And, ironically, every company I’ve ever worked at has been working on a massive-scale, parallel, clustered RDBMS system. The earliest variant was IBM DB2 Parallel Edition released in the mid-90s. It’s now called the Database Partitioning Feature.

image_thumb111Massive, multi-node parallelism is the only way to scale a relational database system so these systems can be incredibly important. Very high-scale MapReduce systems are an excellent alternative for many workloads. But some customers and workloads want the flexibility and power of being able to run ad hoc SQL queries against petabyte sized databases. These are the workloads targeted by massive, multi-node relational database clusters and there are now many solutions out there with Oracle RAC being perhaps the most well-known but there are many others including Vertica, GreenPlum, Aster Data, ParAccel, Netezza, and Teradata.

What’s common across all these products is that big databases are very expensive. Today, that is changing with the release of Amazon Redshift. It’s a relational, column-oriented, compressed, shared nothing, fully managed, cloud hosted, data warehouse. Each node can store up to 16TB of compressed data and up to 100 nodes are supported in a single cluster.

Amazon Redshift manages all the work needed to set up, operate, and scale a data warehouse cluster, from provisioning capacity to monitoring and backing up the cluster, to applying patches and upgrades. Scaling a cluster to improve performance or increase capacity is simple and incurs no downtime. The service continuously monitors the health of the cluster and automatically replaces any component, if needed.

The core node on which the Redshift clusters are build, includes 24 disk drives with an aggregate capacity of 16TB of local storage. Each node has 16 virtual cores and 120 Gig of memory and is connected via a high speed 10Gbps, non-blocking network. This a meaty core node and Redshift supports up to 100 of these in a single cluster.

There are many pricing options available (see http://aws.amazon.com/redshift for more detail) but the most favorable comes in at only $999 per TB per year. I find it amazing to think of having the services of an enterprise scale data warehouse for under a thousand dollars by terabyte per year. And, this is a fully managed system so much of the administrative load is take care of by Amazon Web Services.

Service highlights from: http://aws.amazon.com/redshift

Fast and Powerful – Amazon Redshift uses a variety to innovations to obtain very high query performance on datasets ranging in size from hundreds of gigabytes to a petabyte or more. First, it uses columnar storage and data compression to reduce the amount of IO needed to perform queries. Second, it runs on hardware that is optimized for data warehousing, with local attached storage and 10GigE network connections between nodes. Finally, it has a massively parallel processing (MPP) architecture, which enables you to scale up or down, without downtime, as your performance and storage needs change.

You have a choice of two node types when provisioning your own cluster, an extra large node (XL) with 2TB of compressed storage or an eight extra large node (8XL) with 16TB of compressed storage. You can start with a single XL node and scale up to a 100 node eight extra large cluster. XL clusters can contain 1 to 32 nodes while 8XL clusters can contain 2 to 100 nodes.

  • Scalable – With a few clicks of the AWS Management Console or a simple API call, you can easily scale the number of nodes in your data warehouse to improve performance or increase capacity, without incurring downtime. Amazon Redshift enables you to start with a single 2TB XL node and scale up to a hundred 16TB 8XL nodes for 1.6PB of compressed user data. Resize functionality is not available during the limited preview but will be available when the service launches.
  • Inexpensive – You pay very low rates and only for the resources you actually provision. You benefit from the option of On-Demand pricing with no up-front or long-term commitments, or even lower rates via our reserved pricing option. On-demand pricing starts at just $0.85 per hour for a two terabyte data warehouse, scaling linearly up to a petabyte and more. Reserved Instance pricing lowers the effective price to $0.228 per hour, under $1,000 per terabyte per year.
  • Fully Managed – Amazon Redshift manages all the work needed to set up, operate, and scale a data warehouse, from provisioning capacity to monitoring and backing up the cluster, and to applying patches and upgrades. By handling all these time consuming, labor-intensive tasks, Amazon Redshift frees you up to focus on your data and business insights.
  • Secure – Amazon Redshift provides a number of mechanisms to secure your data warehouse cluster. It currently supports SSL to encrypt data in transit, includes web service interfaces to configure firewall settings that control network access to your data warehouse, and enables you to create users within your data warehouse cluster. When the service launches, we plan to support encrypting data at rest and Amazon Virtual Private Cloud (Amazon VPC).
  • Reliable – Amazon Redshift has multiple features that enhance the reliability of your data warehouse cluster. All data written to a node in your cluster is automatically replicated to other nodes within the cluster and all data is continuously backed up to Amazon S3. Amazon Redshift continuously monitors the health of the cluster and automatically replaces any component, as necessary.
  • Compatible – Amazon Redshift is certified by Jaspersoft and Microstrategy, with additional business intelligence tools coming soon. You can connect your SQL client or business intelligence tool to your Amazon Redshift data warehouse cluster using standard PostgreSQL JBDBC or ODBC drivers.
  • Designed for use with other AWS Services – Amazon Redshift is integrated with other AWS services and has built in commands to load data in parallel to each node from Amazon Simple Storage Service (S3) and Amazon DynamoDB, with support for Amazon Relational Database Service and Amazon Elastic MapReduce coming soon.

Petabyte-scale data warehouses no longer need command retail prices of upwards $80,000 per core. You don’t have to negotiate an enterprise deal and work hard to get the 60 to 80% discount that always seems magically possible in the enterprise software world. You don’t even have to hire a team of administrators. Just load the data and get going. Nice to see.


<Return to section navigation list>

0 comments: