Sunday, March 04, 2012

Windows Azure and Cloud Computing Posts for 2/24/2012+

A compendium of Windows Azure, Service Bus, EAI & EDI Access Control, Connect, SQL Azure Database, and other cloud-computing articles. image222

image433

Note: This post includes articles published on 2/24/2012 through 3/2/2012 while I was preparing for and attending the Microsoft Most Valuable Professionals (MVP) Summit 2012 in Seattle, WA, and the following weekend.

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:


Azure Blob, Drive, Table, Queue and Hadoop Service

Denny Lee (@dennylee) posted BI and Big Data–the best of both worlds! on 2/29/2012:

imageAs part of the excitement of the Strata Conference this week, Microsoft has been talking about Big Data and Hadoop. It started off with Dave Campbell’s question: Do we have the tools we need to navigate the New World of Data?. And some of the tooling call outs specific to Microsoft include references to PowerPivot, Power View, and the Hadoop JavaScript framework (Hadoop JavaScript– Microsoft’s VB shift for Big Data).

imageAs noticed by GigaOM’s article Microsoft’s Hadoop play is shaping up, and it includes Excel; the great call out is:

to make Hadoop data analyzable via both a JavaScript framework and Microsoft Excel, meaning many millions of developers and business users will be able to work with Hadoop data using their favorite tools.

Big Data for Everyone!

The title of the Microsoft BI blog post says it the best: Big Data for Everyone: Using Microsoft’s Familiar BI Tools with Hadoop – it’s about helping make Big Data accessible to everyone by use of one of the most popular and powerful BI tools – Excel.

So what does accessible to everyone mean – in the BI sense? It’s about being to go from this (which is a pretty nice view of Hive query against Hadoop on Azure Hive Console)

image

and getting it Excel or PowerPivot.

The most important call out here is that you can use PowerPivot and Excel to merge data sets not just from Hadoop, but also bring in data sets from SQL Server, SQL Azure, PDW Oracle, Teradata, Reports, Atom feeds, Text files, other Excel files, and via ODBC – all within Excel! (thanks @sqlgal for that reminder!)

From here users can manipulate the data using Excel macros and PowerPivot DAX language respectively. Below is a screenshot of data extracted from Hive and placed into PowerPivot for Excel.

image

But even more cooler – data visualization wise – your PowerPivot for Excel workbook (once uploaded to SharePoint 2010 with SQL Server 2012) and you can create an interactive Power View report.

image

For more information on how to get PowerPivot and Power View to connect to Hadoop (in this case, its Hadoop on Azure but conceptually they are the same), please reference the links below:

So what’s so Big about Big Data?

As noted by in the post What’s so Big about Big Data?, we call out that Big Data is important because of the sheer amount of machine generated data that needs to be made sense of.

As noted by Alexander Stojanovic (@stojanovic), the Founder and General Manager of Hadoop on Windows and Azure:

It’s not just your “Big Data” problems, it’s about your BIG “Data Problems”

To learn more, check out the my 24HOP (24 Hours of PASS) session:

Tier-1 BI in the Age of Bees and Elephants

In this age of Big Data, data volumes become exceedingly larger while the technical problems and business scenarios become more complex. This session dives provides concrete examples of how these can be solved. Highlighted will be the use of Big Data technologies including Hadoop (elephants) and Hive (bees) with Analysis Services. Customer examples including Klout and Yahoo! (with their 24TB cube) will highlight both the complexities and solutions to these problems.

Making this real, a great case study showcasing this includes the one at Klout, which includes a great blog post: Big Data, Bigger Brains. And below is a link to Bruno Aziza (@brunoaziza) and Dave Mariani’s (@dmariani) YouTube video on how Klout Leverages Hadoop and Microsoft BI Technologies To Manage Big Data.


Avkash Chauhan (@avkashchauhan) explained Primary Namenode and Secondary Namenode configuration in Apache Hadoop in a 2/27/2012 post:

Apache Hadoop Primary Namenode and secondary Namenode architecture is designed as below:

Namenode Master:

imageThe conf/masters file defines the master nodes of any single or multimode cluster. On master, conf/masters that it looks like this:

localhost

This conf/slaves file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will run. When you have both the master box and the slave box to act as Hadoop slaves, you will see same hostname is listed in both master and slave.

On master, conf/slaves looks like as below:

localhost

If you have additional slave nodes, just add them to the conf/slaves file, one per line. Be sure that your namenode can ping to those machine which are listed in your slave.

Secondary Namenode:

imageIf you are building a test cluster, you don’t need to set up secondary name node on a different machine, something like pseudo cluster install steps. However if you’re building out a real distributed cluster, you must move secondary node to other machine and it is a great idea. You can have Secondary Namenode on a different machine other than Primary NameNode in case the primary Namenode goes down.

The masters file contains the name of the machine where the secondary name node will start. In case you have modified the scripts to change your secondary namenode details i.e. location & name, be sure that when the DFS service starts its reads the updated configuration script so it can start secondary namenode correctly.

In a Linux based Hadoop cluster, the secondary namenode is started by bin/start-dfs.sh on the nodes specified in conf/masters file. Initially bin/start-dfs.sh calls bin/hadoop-daemons.sh where you specify the name of master/slave file as command line option

Start Secondary Name node on demand or by DFS:

Location to your Hadoop conf directory is set using $HADOOP_CONF_DIR shell variable. Different distributions i.e. Cloudera or MapR have setup it differently so have a look where is your Hadoop conf folder.

To start secondary name node on any machine using following command:

$HADOOP_HOME/bin/hadoop –config $HADOOP_CONF_DIR secondarynamenode

When Secondary name node is started by DFS it does as below:

$HADOOP_HOME/bin/start-dfs.sh starts SecondaryNameNode

>>>> $bin”/hadoop-daemons.sh –config $HADOOP_CONF_DIR –hosts masters start secondarynamenode

In case you have changed secondary namenode name say “hadoopsecondary” then when starting secondary namenode, you would need to provide hostnames, and be sure these changes are available to when starting bin/start-dfs.sh by default:

$bin”/hadoop-daemons.sh –config $HADOOP_CONF_DIR –hosts hadoopsecondary start secondarynamenode

which will start secondary namenode on ALL hosts specified in file ” hadoopsecondary “.

How Hadoop DFS Service Starts in a Cluster:


In Linux based Hadoop Cluster:

  1. Namenode Service : start Namenode on same machine from where we starts DFS .
  2. DataNode Service : looks into slave file and start DataNode on all slaves using following command :
    1. #>$HADOOP_HOME/bin/hadoop-daemon.sh –config $HADOOP_CONF_DIR start datanodeSecondaryNameNode Service: looks into masters file and start SecondaryNameNode on all hosts listed in masters file using following command
    2. #>$HADOOP_HOME/bin/hadoop-daemon.sh –config $HADOOP_CONF_DIR start secondarynamenode

Alternative to backup Namenode or Avatar Namenode:

Secondary namenode is created as primary namenode backup to keep the cluster going in case primary namenode goes down. There are alternative to secondary namenode available in case you would want to build a name node HA. Once such method is to use avatar namenode. An Avatar namenode can be created by migrating namenode to avatar namenode and avatar namenode must build on a separate machine.

Technically when migrated Avatar namenode is the namenode hot standby. So avatar namenode is always in sync with namenode. If you create a new file to master name node, you can also read in standby avatar name node real time.

In standby mode, Avatar namenode is a ready-only name node. Any given time you can transition avatar name node to act as primary namenode. When in need you can switch standby mode to full active mode in just few second. To do that, you must have a VIP for name node migration and a NFS for name node data replication.

The Namenode Master paragraph is a bit mysterious.


Wely Lau (@wely_live) expained Uploading Big Files in Windows Azure Blob Storage using PutListBlock in a 2/25/2012 post:

imageWindows Azure Blob Storage could be analogized as file-system on the cloud. It enables us to store any unstructured data file such as text, images, video, etc. In this post, I will show how to upload big file into Windows Azure Storage. Please be inform that we will be using Block Blob for this case. For more information about Block Blob and Page Block, please visit here.

imageI am assume that you know how to upload a file to Windows Azure Storage. If you don’t know, I would recommend you to check out this lab from Windows Azure Training Kit.

Uploading a blob (commonly-used technique)

The following snippet show you how to upload a blob using a commonly-used technique, blob.UploadFromStream() which eventually invoking PutBlob REST-API.

protected void btnUpload_Click(object sender, EventArgs e)
{
    var storageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
    blobClient = storageAccount.CreateCloudBlobClient();

    CloudBlobContainer container = blobClient.GetContainerReference("image2");
    container.CreateIfNotExist();

    var permission = container.GetPermissions();
    permission.PublicAccess = BlobContainerPublicAccessType.Container;
    container.SetPermissions(permission);

    string name = fu.FileName;
    CloudBlob blob = container.GetBlobReference(name);
    blob.UploadFromStream(fu.FileContent);
}

The above code snippet works well in most case. Although you could upload at maximum 64 MB per file (for block blob), it’s more recommended to upload using another technique which I am going to describe more detail.

Uploading a blob by splitting it into chunks and calling PutBlockList

The idea of this technique is to split a block blob into smaller chunk of blocks, uploading them one-by-one or in-parallel and eventually join them all by calling PutBlockList().

protected void btnUpload_Click(object sender, EventArgs e)
{
    CloudBlobClient blobClient;
    var storageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
    blobClient = storageAccount.CreateCloudBlobClient();

    CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
    container.CreateIfNotExist();

    var permission = container.GetPermissions();
    permission.PublicAccess = BlobContainerPublicAccessType.Container;
    container.SetPermissions(permission);

    string name = fu.FileName;
    CloudBlockBlob blob = container.GetBlockBlobReference(name);

    blob.UploadFromStream(fu.FileContent);

    int maxSize = 1 * 1024 * 1024; // 4 MB

    if (fu.PostedFile.ContentLength > maxSize)
    {
        byte[] data = fu.FileBytes;
        int id = 0;
        int byteslength = data.Length;
        int bytesread = 0;
        int index = 0;
        List<string> blocklist = new List<string>();
        int numBytesPerChunk = 250 * 1024; //250KB per block

        do
        {
            byte[] buffer = new byte[numBytesPerChunk];
            int limit = index + numBytesPerChunk;
            for (int loops = 0; index < limit; index++)
            {
                buffer[loops] = data[index];
                loops++;
            }
            bytesread = index;
            string blockIdBase64 = Convert.ToBase64String(System.BitConverter.GetBytes(id));

            blob.PutBlock(blockIdBase64, new MemoryStream(buffer, true), null);
            blocklist.Add(blockIdBase64);
            id++;
        } while (byteslength - bytesread > numBytesPerChunk);

        int final = byteslength - bytesread;
        byte[] finalbuffer = new byte[final];
        for (int loops = 0; index < byteslength; index++)
        {
            finalbuffer[loops] = data[index];
            loops++;
        }
        string blockId = Convert.ToBase64String(System.BitConverter.GetBytes(id));
        blob.PutBlock(blockId, new MemoryStream(finalbuffer, true), null);
        blocklist.Add(blockId);

        blob.PutBlockList(blocklist);
    }
    else
        blob.UploadFromStream(fu.FileContent);
}
Explanation about the code snippet

Since the idea is to split the big file into chunks. We would need to define size of each chunk, in this case 250KB. By dividing actual size with size of each chunk, we should be able to know number of chunk we need to split.

image

We also need to have a list of string (in this case: blocklist variable) to determine the blocks are in one group. Then we will loop to through each chunk and perform and upload by calling blob.PutBlock() and add it (as form of Base64 String) into the blocklist.

Note that there’s actually a left-over block that didn’t uploaded inside the loop. We will need to upload it again. When all blocks are successfully uploaded, finally we call blob.PutBlockList(). Calling PutListBlock() will commit all the blocks that we’ve uploaded previously.

Pros and Cons
The benefits (pros) of the technique

There’re a few benefit of using this technique:

  • In the event where uploading one of the block fail due to whatever condition like connection time-out, connection lost, etc. We’ll just need to upload that particular block only, not the entire big file / blob.
  • It’s also possible to upload each block in-parallel which might result shorter upload time.
  • The first technique will only allow you to upload a block blob at maximum 64MB. With this technique, you can do more almost unlimited.
The drawbacks (cons) of the technique

Despite of the benefits, there’re also a few drawbacks:

  • You have more code to write. As you can see from the sample, you can simply call the one line blob.UploadFromStream() in the first technique. But you will need to write 20+ lines of code for the second technique.
  • It incurs more storage transaction as may lead to higher cost in some case. Referring to a post by Azure Storage team. The more chuck you have, the more storage transaction is incurred.

Large blob upload that results in 100 requests via PutBlock, and then 1 PutBlockList for commit = 101 transactions

Summary

I’ve shown you how to upload file with simple technique at beginning. Although, it’s easy to use, it has a few limitation. The second technique (using PutListBlock) is more powerful as it could do more than the first one. However, it certainly has some pros and cons as well.


<Return to section navigation list>

SQL Azure Database, Federations and Reporting

Cihan Biyikoglu (@cihangirb) started a Scale-First Approach to Database Design with Federations: Part 1 – Picking Federations and Picking the Federation Key series on 2/29/2012:

imageScale-out with federations means you build your application with scale in mind as your first and foremost goal. Federations are build around this idea and allow application to annotate their schema with additional information for declaring its scale-out intent. Generating your data model and your database design is basic parts of the app design. At this point of the app lifecycle, it is also time to pick your federations and federations key and work in the scale-first principals into your data and database design.

imageIn a series of posts, I’ll walk through the process of designing, coding and deploying applications with federations. If you’d like to design a database with scalability in mind using sharding [as] the technique, this post will also help you get there as well... This is a collection of my personal experience designing a number of sharded databases in past life and experience of some of the customers in the preview program we started back in June 2011, plus the experiences of customers who are going live with federations.

In case you missed earlier posts here is a quick definition of federation and federation key;

Federation is an object defined to scale out parts of your schema. Every database can have many federations. Federations use federation members which are regular sql azure databases to scale out your data in one or many tables and all associated rich programming properties like views, indexes, triggers, stored procs. Each federation has a name and a federation key which is also called a distribution scheme. Federation key or federation distribution scheme defines 3 properties;

  • A federation key label, help self document the meaning of the federation key like tenant_id or product_id etc. ,
  • A data domain to define the distribution surface for your data. In v1, data domains supported are INT, BIGINT, UNIQUEIDENTIFIER (guid) and VarBinary – up to 900 bytes.
  • A distribution style, to define how the data is distributed to the data domain. At this point distribution style can only be RANGE.

Lets assume you are designing the AdventureWorks (AW) database. You have a basic idea on the entities at play like Stores, Customers, Orders, Products, SalesPeople etc and thinking about how you’d like to scale it out…

Picking your Federations:

Picking your federations and federations keys is much like other steps of data modeling and database design processes like normalization. You can come up with multiple alternatives and they optimize for various situations. You can get to your final design only by knowing the transactions and queries that are critical to your app.

Step #1: Identify entities you want to scale out: You first identify the entities (collection of tables) in your database that is going to be under pressure and needs scale out. These entities are your candidate federations in your design.

Step #2: Identify tables that make up these entities: After identifying these central entities you want to scale out like customer, order and product, it is fairly easy to traverse the relationships and identify the group of objects tied to these entities with relationships, access patterns and properties to these set of entities. Typically these are groups of tables with one-to-one or on-to-many relationships.

Picking your Federation Key:

Step #3: Identify the Federation Key: Federation key identify the key used for distributing the data and define the boundary of atomic units. Atomic units are rows across all scaled out tables (federated tables) that share the same federation key instance. One important rule in federation is that atom cannot be SPLIT. Ideal federation keys have the following properties;

  1. Atomic Units is the target of most query traffic & transaction boundaries.
  2. Distributes workload equally to all members equally – decentralize load to many atomic units as opposed to concentrating the load.
  3. Atomic units cannot be split, so largest atomic unit does not exceed the boundaries of a single federation member.

Walking through AW

The database design for AW is something I hope you are already familiar with. You can fid details here on the schema for SQL Azure; http://msftdbprodsamples.codeplex.com/releases/view/37304. For this app we want to be able to handle Mil customers, 100 Mil orders, and Mil products. These are the largest entities in our database. I’ll add a few more details on the workload to help guide our design; here are our most popular transactions and queries;

‘place an order’, ‘track/update orders’, ‘register/update a customer’, ‘get customer orders’,’get top ordering customers’, ‘register/update products’, ’get top selling products’

and here are some key transactions and consistent query requirements;

‘place an order’, ‘import/export orders for a customer and stores’, ‘monthly bill for customers’

Step #1: We have the classic sales database setup with customer, order and product in AW. In this example, we expect orders to be the parts that will be most active, the tables will be the target of most of our workload. We expect many customers and also handle cases where there are large product catalogs.

Step #2: In AW, Store and customer tables are used to identify the customer entity. SalesTerritory, SalesOrdersHeader and SalesOrderDetail tables contain properties of orders. Customer and Order entities have one-to-many relationship. On the other hand, Product entity has a many-to-many back to Order and to Customer entity. When scaling out, you can align one-to-one and one-to-many relationships together but not many-to-many relationships. Thus we can only group Customers and Orders together but not products.

Step #3: Given the Customer (store and customer tables) and Order (SalesOrderHeader, SalesrderDetail, SalesTerritory) we can think of a few setups here for the federation key.

- StoreID as the Federation Key:

  • Well that would work for all transactions so #1 of ideal federation key principal taken care of! That is great.
  • However stores may have varying size and may not distribute the load well if there the customer and order traffic variance between stores is too wide. Not so great on #2 principal of ideal federation keys.
  • StoreID as a federation key will mean all customers and all their orders in that store will be a single atomic unit (AU). If you have stores that could get large enough to challenge the computational capacity of a single federation member, you will hit the ‘split the atom’ case and get stuck because you cannot.

So StoreID may be too coarse a granule to equally distribute load and may be too large an AU if a store gets ‘big’. By the way, TerritoryID is a similar alternative and has a very similar set of issues so same argument applies to that as well.

- OrderID as the Federation Key:

  • OrderID certainly satisfy #2 and #3 of the ideal federation key principals but has an issue with #1 so lets focus on that.
  • That could work as well but is too fine a granule for queries that are orders per customer in the system. It also won’t align with transactional requirements of import/export of customer and store orders. Another important note; with this setup, we will need a separate federation for the Customer entity. It means that queries that may be common like ‘get all orders of a customer’ or ‘get order of customer dated X/Y/Z’ will need to hit all or at least multiple members. Also with this setup we lose ability to transact multiple order from a customer. We may do that when we are importing or exporting a customers orders.

Fan-out is not necessarily bad. It promotes parallel executions and can provide great efficiencies. However efficiencies are lost when we hit all members and can’t fast-eliminate members that don’t have any data to return and when cost of parallelization overwhelms the processing of the query. With OrderID as the federation key, queries like ‘get orders of a customer’ or ‘get top products per customer’ will have to hit all members.

- OrderDetailID as the Federation Key:

  • The case of OrderDetailID has the same issues as OrderID case above but amplified on principal #1. With this setup, we will lose transactional boundary to place a single order in a transaction. In this case, there will be more queries that will need to be fanned-out like ‘get all customer orders’ or ‘get an order’… Makes assembling order a full fan-out query which can get quite expensive.

- CustomerID as the Federation Key:

  • With CustomerID #2 and #3 will not likely be an issue. The only case is where a customer gets so large that it overwhelms a member and computational capacity of AU. For most cases, CustomerID can be a great way to decentralize the load getting around issues StoreID or TerritoryID would create.
  • This setup also satisfy almost all of #1 as well except 2 cases; one is ‘get top selling products across all customers’. However that case isn’t satisfied in any of the other alternatives either. This setup does focus the DML (INSERT/UPDATE/DELETE) transactions and satisfy both multiple order and single order placement transaction to work seamlessly. So looks like a good choice for #1 standpoint. Second is import/export at store boundary; for example import of stores all customers and orders will not be possible with this setup in a transaction. Some people may be able to live with consistency at the customer level and be ok relaxing consistency at the store level. you need to ask yourself; Can you work with eventual consistency without transactions at the store level by depending on things like datetime stamps or some other store level sequence generator. if you can this is the right choice.

How about Product Entity?

We have not touched on how to place products in this case. To remind you the issue; there is a many-to-many relationship in customers vs orders thus federation aligned with orders and/or customers Well, you have 3 choices when it comes to products.

- Products as a Central Table: In this case, you’d leave Product entity in the root. That would risk making root and products a bottleneck. Especially if you have a fast updating catalog of products and you don’t build caching facilities to minimize hitting the root for product information for popular queries and transaction in the system. The advantage to this setup is that product entity can be maintained in a single place with full consistency.

- Product as a Reference Table: In this case, you would place a copy of the products entity in each federation member. This would mean you need to pay more for storing this redundant information. This would mean updating the product catalog will have to be done across many copies of the data and will mean you need to live with eventual consistency on product entity across members. That is at any moment in time, copies of products in member 1 & 2 may not be identical. Upside is this would give you good performance like local joins.

- Product as a separate Federation: In this case, you have a separate federation with a key like productID that holds product entity = all the tables associated with that. You would set up products in a fully consistent setup so there would not be redundancy and you would be setting up products with great scale characteristics. You can independently decide how many nodes to engage and choose to expand if you run out of computational capacity for processing product queries. Downside compared to the reference table option is that you no longer enjoy local joins.

To tie all this together, given the constraints most people will choose customerID as the fed key and place customers and orders in the same federation and given the requirement to handle large catalogs, most people will choose a separate federation for products.

In this case, I choose a challenging database schema to show the highlight the variety of options and for partition-able workloads life isn’t this complicated. However for sophisticated schemas with variety of relationships, there pros and cons to evaluate much like other data and database design exercises. It is clear that doing design with federations puts ‘scale’ as the top goal and concern and that may mean that you evaluate some compromises compromises on transactions and query processing.

In part 2, I’ll cover the schema changes that you need to make with federations which is step 2 in designing o migrating an existing database design over to federations.


Steve Jones (@way0utwest) asserted The Cloud is good for your career in a 2/29/2012 post to the Voice of the DBA blog:

imageI think most of us know that the world is not really a meritocracy. We know that the value of many things is not necessarily intrinsic to the item; value is based on perception. That’s a large part of the economic theory of supply and demand. The more people want something, the more it should cost.

imageI ran across a piece that surveyed some salaries and it shows that people working with cloud platforms are commanding higher salaries, even when the underlying technologies are the same. It seems crazy, but that’s the world we live in. Perceptions drive a lot of things in the world, especially the ones that don’t seem to make sense.

Should you specialize in cloud technologies? In one sense, the platform is not a lot different than what you might run in your local data center. Virtualized machines, connections and deployments to remote networks, and limited access to the physical hardware. There are subtle differences, and learning about them and working with them, could be good for your next job interview, when the HR person or technology-challenged manager asks you about Azure.

Part of your career is getting work done, using your skills and talents in a practical and efficient manner. However a larger part of it, IMHO, is the marketing of your efforts and accomplishments. The words you choose, the way you present yourself, these things matter. I would rather be able to talk about SQL Azure as a skill, and then relate that to local Hyper-V installations of SQL Server than the other way around.

If you are considering new work, or interesting in the way cloud computing might fit into an environment, I’d urge you to take a look at SQL Azure or the Amazon web services. You can get a very low cost account for your own use, and experiment. You might even build a little demo for your next interview that impresses the person who signs the paychecks.


Cihan Biyikoglu (@cihangirb) asked What would you like us to work on next? in a 2/28/2012 post:

image10 weeks ago we shipped federations in SQL Azure and it is great to see the momentum grow. This week, I'll be spending time with our MVPs in Redmond and we have many events like this where we get together with many of you to talk about what you'd like to see us deliver next on federations. I'd love to hear from the rest of you who don't make it to Redmond or to one of the conferences to talk to us; what would you like us to work on next?

imageIf you like what you see today, what would make it even better? if you don't like what you see today, why? what experience would you like to see taken care of, simplified? Could be big or small. Could be NoSQL or NewSQL or CoSQL or just plain vanilla SQL. Could be in APIs, tools or in Azure or in SQL Azure gateway or the fabric or the engine? Could be the obvious or the obscure...

Open field; fire away and leave a comment or simply tweet #sqlfederations and tell us what you'd like us to work on next...


Peter Laudati (@jrzyshr) posted Get Started with SQL Azure: Resources on 2/28/2012:

imageEarlier this month, SQL Azure prices were drastically reduced, and a new 100mb for $5 bucks a month pricing level was introduced. This news has certainly gotten some folks looking at SQL Azure for the first time. I thought I’d share some resources to help you get started with SQL Azure.

imageUnlike the new “Developer Centers” for .NET, Node.js, Java, and PHP on WindowsAzure.com, there does not appear to be a one-stop shop for finding all of the information you’d need or want for SQL Azure. The information is out there, but it is spread around all over the place.

I’ve tried to organize these into three high-level categories based on the way one might think about approaching this platform:

  • What do I need to know about SQL Azure to get started?
  • Can I migrate my data into it?
  • How do I achieve scale with it?

Note: This is by no means an exhaustive list of every SQL Azure resource out there. You may find (many) more that I’m not aware of. You also may come across documentation & articles that are older and possibly obsoleted by new features. Be wary of articles with a 2008 or 2009 date on the by line.

Getting Started
Understanding Storage Services in Windows Azure

Storage is generally provided “as-a-service” in Windows Azure. There is no notion of running or configuring your own SQL Server ‘server’ in your own VM. Windows Azure takes the hassle of managing infrastructure away from you. Instead, storage services are provisioned via the web-based Windows Azure management portal, or using other desktop-based tools. Like in a restaurant, you essentially look at a menu and order what you’d like.

Storage services in Windows Azure are priced and offered independently of compute services. That is, you do not have to host your application in Windows Azure to use any of the storage services Windows Azure has to offer. For example, you can host an application in your own datacenter, but store your data in Azure with no need to ever move your application there too. Exploring Windows Azure’s storage services is an easy (and relatively low-cost) way to get started in the cloud.

There are currently three flavors of storage available in Windows Azure:

  • Local storage (in the compute VMs that host your applications)
  • Non-relational storage (Blobs, Tables, Queues – a.k.a. “Windows Azure Storage”)
  • Relational storage (SQL Azure)

For a high-level overview of these, see: Data Storage Offerings in Windows Azure

Note: This post is focused on resources for SQL Azure only. If you’re looking for information on the non-relational storage services (Blobs, Tables, Queues), this is post “is not the droids you’re looking for”.

Getting Started With SQL Azure

I started my quest to build this post at the new WindowsAzure.com (“new” in December 2011). Much of the technical content for the Windows Azure platform was reorganized into this new site. Some good SQL Azure resources are here. Others are still elsewhere. Let’s get started…

Start here:

  • What is SQL Azure? – This page on WindowsAzure.com explains what SQL Azure is and the high-level scenarios it is good for.
  • Business Analytics – This page on WindowsAzure.com explains SQL Azure Reporting at a high-level.
  • Overview of SQL Azure – This whitepaper is linked from the “Whitepapers” page on WindowsAzure.com. It appears to be older (circa 2009), but appears to provide a still relevant overview of SQL Azure
  • SQL Azure Migration Wizard (Part 1): SQL Azure – What Is It? – This screencast on Channel 9 by my colleagues Dave Bost and George Huey may have “Migration Wizard” in the title, but it goes here in the first section. These guys provide a good high-level overview of what SQL Azure is.

Get your hands dirty with the equivalent of a “Hello World” example:

BIG FLASHING NOTE #1:

You must have SQL Server Management Studio 2008 R2 SP1 to manage a SQL Azure database! SSMS 2008 and SSMS 2008 R2 are just NOT good enough. If you don’t have SSMS 2008 R2 SP1, it will cause a gap in the space time continuum! The errors you will receive if you don’t have SSMS 2008 R2 SP1 are obscure and not obvious indicators of the problem. You may be subject to losing valuable hours of your personal time seeking the correct solution. Be sure you have the right version.

BIG FLASHING NOTE #2:

You CAN run SQL Server Management Studio 2008 R2 SP1 even if you’re NOT running SQL Server 2008 R2 SP1. For example, if you need to still run SQL Server 2008 R2, 2008, or older edition, you can install SSMS 2008 R2 SP1 side-by-side without impacting your existing database installation. Disclaimer: Worked on my machine.

BIG FLASHING TIP:

How can I get SQL Server Management Studio 2008 R2 SP1?

Unfortunately, I found it quite difficult to parse through documentation to find the proper download for this. Searching for the “ssms 2008 r2 sp1 download” on Google or Bing will give will present you with Microsoft Download center pages that have multiple file download options. I present you with two options here:

  • Microsoft SQL Server 2008 R2 SP1 – Express Edition – This page contains multiple download files to install the Express edition of SQL Server. The easiest thing to do here is download either SQLEXPRWT_x64_ENU.exe or SQLEXPRWT_x86_ENU.exe depending on your OS-version (32 vs 64 bit). These files contain both the database and the management tools. When you run the installation process, you can choose to install ONLY the management tools if you don’t want to install the database on your machine.
  • Microsoft SQL Server 2008 R2 SP1 – This page contains multiple download files to install just SP1 to an existing installation of SQL Server 2008 R2. If you already have SQL Server Management Studio 2008 R2, you can run SQLServer2008R2SP1-KB2528583-x86-ENU.exe or SQLServer2008R2SP1-KB2528583-x64-ENU.exe, depending on your OS version (32 or 64 bit) to upgrade your existing installation to SP1.

The reason I call so much attention to this issue is because it is something that WILL cause you major pain if you don’t catch it. While some documents call out that you need SSMS 2008 R2 SP1, many do not provide the proper download links and send you on a wild goose chase looking for them. Thank me. I’ll take a bow. Smile

The next place I recommend spending time reading is the SQL Azure Documentation on MSDN.

Content here is broken down into three high-level categories:

You can navigate the tree on your own, but some topics of interest might be:

  • SQL Azure Overview - I’m often asked what’s the difference between SQL Azure & SQL Server. This sheds some light on that.
  • Guidelines & Limitations – This gets a little more specific on SQL Server features NOT supported on SQL Azure.
  • Development: How-to Topics – There’s a smorgasbord of “How To” links here on how to connect to SQL Azure from different platforms and technologies.
  • Administration – All the details you need to know to manage your SQL Azure databases. See the “How-to” sub-topic for details on things like backing up your database, importing/exporting data, managing the firewall, etc.
  • Accounts & Billing in SQL Azure – Detailed info on pricing & billing here. (Be sure to see my post clarifying some pricing questions I had.)
  • Tools & Utilities Support – Many of the same tools & utilities you use to manage SQL Server work with SQL Azure too. This is a comprehensive list of them and brief overview of what each does.

Windows Azure Training Kit – No resource list would be complete without the WATK! The WATK contains whitepapers, presentations, demo code, and labs that you can walkthrough to learn how to use the platform. This kit has grown so large, it has its own installer! You can selectively install only the documentation and sample labs that you want. The SQL Azure related content here is great!

Migrating Your Data

The resources in the previous section should hopefully give you a good understanding of how SQL Azure works and how to do most basic management of it. The next task most folks want to do is figure out how to migrate their existing databases to SQL Azure. There are several options for doing this.

Start here: Migrating Databases to SQL Azure – This page in MSDN provides a high-level overview of the various options.

Three migration tools you may find yourself interested in:

  • SQL Azure Migration Wizard – The SQLAzureMW is an open source project on CodePlex. It was developed by my colleague George Huey. This is a MUST have tool in your toolbox!

    SQLAzureMW is designed to help you migrate your SQL Server 2005/2008/2012 databases to SQL Azure. SQLAzureMW will analyze your source database for compatibility issues and allow you to fully or partially migrate your database schema and data to SQL Azure.

    SQL Azure Migration Wizard (SQLAzureMW) is an open source application that has been used by thousands of people to migrate their SQL database to and from SQL Azure. SQLAzureMW is a user interactive wizard that walks a person through the analysis / migration process.

  • SQL Azure Migration Assistant for MySQL
  • SQL Azure Migration Assistant for Access

Channel 9 SQL Azure Migration Wizard (Part 2): Using the SQL Azure Migration Wizard - For people who are new to SQL Azure and just want to get an understanding of how to get a simple SQL database uploaded to SQL Azure, George Huey & Dave Bost did a Channel 9 video on the step by step process of migrating a database to SQL Azure with SQLAzureMW. This is a good place to to get an idea of what’s involved.

Tips for Migrating Your Applications To The Cloud – MSDN Magazine article by George Huey & Wade Wegner covering the SQLAzureMW.

Overview of Options for Migrating Data and Schema to SQL Azure – I found this Wiki article on TechNet regarding SQL Azure Migration. It appears to be from 2010, but with updates as recent as January 2012. The information here appears valid still.

Scaling with SQL Azure

Okay, you’ve figured out how to get an account and get going. You’re able to migrate your existing databases to SQL Azure. Now it’s time to take it to the next level: Can you scale?

Just because you can migrate your existing SQL Server database to SQL Azure doesn’t mean it will scale the same. SQL Azure is a multi-tenant “database-as-a-service” that is run in the Azure datacenters on commodity hardware. That introduces a new set of concerns regarding performance, latency, competition with other tenants, etc.

I recommend watching this great video from Henry Zhang at TechEd 2011 in Atlanta, GA:

There is a 150GB size limit on SQL Azure databases (recently up from 50GB). So what do you do if you’re relational needs are greater than that limit? It’s time to learn about art of sharding and SQL Azure Federation. While SQL Azure may take away the mundane chores of database administration (clustering/replication/etc), it does introduce problems which require newer skillsets to solve. This is a key example of that.

Start off by watching this video by Chihan Biyikoglu from TechEd 2011 in Atlanta, GA:

In this talk Chihan explains what a database federation is, and how they work in SQL Azure.

Note: This talk is from May 2011 when SQL Azure Federations were only available as a preview/beta. The SQL Azure Federations feature was officially released into production in December 2011. So there may be variances between the May video and current service feature. He released a short updated video here.

Next read George Huey’s recent MSDN Magazine Article:

  • SQL Azure: Scaling Out with SQL Azure Federation
    In this article, George covers the what, the why, and the how of SQL Azure Federations, and how SQL Azure Migration Wizard and SQL Azure Federation Data Migration Wizard can help simplify the migration, scale out, and merge processes. This article is geared to architects and developers who need to think about using SQL Azure and how to scale out to meet user requirements. (Chihan, from the previous video, was a technical reviewer on George’s article!)

Follow that up George Huey as a guest on Cloud Cover Episode #69:

  • Channel 9 Episode 69 - SQL Azure Federations with George HueyGeorge covers a lot of the same information with Wade Wegner in a Channel 9 Cloud Cover session on SQL Azure Federations. As the article above, this video covers what SQL Azure Federations is and the process of migrating / scaling out using SQL Azure Federations and some of the architectural considerations that need to be considered during the design process.

As a follow-up to the SQL Azure Migration Wizard, George has also produced another great tool:

  • SQL Azure Federation Data Migration Wizard (SQLAzureFedMW)
    SQL Azure Federation Data Migration Wizard simplifies the process of migrating data from a single database to multiple federation members in SQL Azure Federation.
    SQL Azure Federation Data Migration Wizard (SQLAzureFedMW) is an open source application that will help you move your data from a SQL database to (1 to many) federation members in SQL Azure Federation. SQLAzureFedMW is a user interactive wizard that walks a person through the data migration process.

That about wraps up my resource post here. Questions? Feedback? Leave it all below in the comments! Hope this helped you on your way to learning SQL Azure.


Gregory Leake posted Protecting Your Data in SQL Azure: New Best Practices Documentation Now Available on 2/27/2012:

imageWe have just released new best practices documentation for SQL Azure under the MSDN heading SQL Azure Business Continuity. The documentation explains the basic SQL Azure disaster recovery/backup mechanisms, and best practices to achieve advanced disaster recovery capabilities beyond the default capabilities. This new article explains how you can achieve business continuity in SQL Azure to enable you to recover from data loss caused by failure of individual servers, devices or network connectivity; corruption, unwanted modification or deletion of data; or even widespread loss of data center facilities. We encourage all SQL Azure developers to review this best practices documentation.


<Return to section navigation list>

MarketPlace DataMarket, Social Analytics, Big Data and OData

My (@rogerjenn) Windows 8 Preview Engagement and Sentiment Analysis with Microsoft Codename “Social Analytics” post of 3/4/2012 begins:

The VancouverWindows8 data set created by the Codename “Social Analytics” team and used by my Microsoft Codename “Social Analytics” Windows Form Client demo application is ideal for creating a time series of Twitter-user engagement and sentiment during the first few days of availability of the Windows 8 preview bits.

Following is a screen capture showing summary engagement (tweet count per day) and sentiment (Tone) data for 481,098 tweets, retweets and replies about Windows 8 from 2/12 to 3/4/2011:

image

Note: The execution time to retrieve the data was 04:55:05. Hour data is missing from the text box. (This problem has been corrected in the latest downloadable version.) The latest data (for 3/4/2011) isn’t exact because it’s only for 23 hours.

The abrupt increase in Tweets/Day from an average of 24,205 to more than 100,000 on 2/29/2011 is to be expected because Microsoft released the Windows 8 Consumer Preview and Windows 8 Server Beta on that date.

For more information about the downloadable “Social Analytics” sample app, see the following posts (in reverse chronological order):


Paul Patterson described Microsoft LightSwitch – Consuming OData with Visual Studio LightSwitch 2011 in a 3/3/2012 post:

imageThe integrated OData (Open Data Protocol) data source features included with Microsoft Visual Studio 2011 (Beta) LightSwitch looks exciting.

Just how simple is it to consume OData with LightSwitch? Check it out…

imageLaunch the new Visual Studio 2011 Beta and select to create a new LightSwitch project. I named mine LSOData001.

Next, in the Solution Explorer, Right-Click the Data Sources node and select Add Data Source…

imageThen, from the Attach Data Source Wizard, select OData Service from the Choose a Data Source Type dialog…

Microsoft has produced a great resource for OData developers. www.OData.org is where you’ll find some terrific information about all things OData, including a directory of OData publishers.

For this example post, I wanted to try out the Northwind OData service that is available. So, in the connection information dialog that resulted from the step above, I entered the source address for the OData service, and then tested the connection…

With a successful connection, I closed the Test Connection dialog and then clicked Next. This action opens the Choose your Entities dialog. From that dialog I selected the Customers AND the Orders (not shown below). Make sure to also select Orders.

Clicking the Finish button (above) causes LightSwitch to evaluate the entities selected. In the example I am using, the Customers and Orders entities each have some constraints that need to be included as part of the operation to attach the datasource. Shippers, Order_Details, and Employees are entities that need to be included in the operation. Also, note the warning about the many-to-many relationship that is not supported.

Next, I click the Continue button to… continue :)

…and PRESTO! The entity designer appears, showing the Customer entity.

Next I quickly add a List Detail screen so that I can actually see that the binding works when I run the application.

From the entity designer, I click “Screen…”…

Then I define what kind of screen, including the data source, to create…

I click the OK button (above), which results in the display of the screen designer. And being the over-enthusiastic sort of fella I am, I click the start button just to give myself that immediate rush of accomplishment….

Way COOL!!… er, I mean, yeah that’s what I knew would happen!

This opens a lot of opportunity, this whole OData goodness. I have lots of new ideas for posts now.


Glenn Gailey (@ggailey777) posted Blast from the Past: Great Post on Using the Object Cache on the Client on 3/2/2012:

In the course of trying to answer a customer question on how to leverage the WCF Data Services client for caching, I came across a great blog post on the subject—which I had almost completely forgotten about:

Working with Local Entities in Astoria Client

image(This post—by OData developer Peter Qian—is, in fact, so old that is refers to WCF Data Services by its original code name “Astoria.”)

The customer was looking for a way to maintain a read-only set of relatively static data from an OData service in memory so that this data could be exposed by his web app. As Peter rightly points out, the best thing to do is use a NoTracking merge option when requesting the objects. In this case, the object data is available in the Entities collection of the DataServiceContext and can be exposed in various ways. The entity data stored by the context is wrapped in an EntityDescriptor that includes the entity’s tracking and metadata, so some fancier coding is involved to expose this cached data as an IQueryable<T>, which have the LINQ—iness that we all really want.

Just re-read the post yourself, and see if you agree with me that it’s a rediscovered gem for using the WCF Data Services client.

Light bulbTip:

Remember to try and avoid the temptation to use IQueryable<T> collections on the client as the data source for another OData feed (using the reflection provider). This kind of republishing using the WCF Data Services client can get you into a lot of problems. This is because the WCF Data Services client library does not support the full set of LINQ queries used by an OData service.

David Campbell asked Do we have the tools we need to navigate the New World of Data? in a 2/29/2012 post to the SQL Server Team blog:

imageLast October at the PASS Summit we began discussing our strategy and product roadmap for Big Data including embracing Hadoop as part of our data platform and providing insights to all users on any data. This week at the Strata Conference, we will talk about the progress we have been making with Hortonworks and the Hadoop ecosystem to broaden the adoption of Hadoop and the unique opportunity for organizations to derive new insights from Big Data.

In my keynote tomorrow, I will discuss a question that I’ve been hearing a lot from the customers I’ve been talking to over the past 18 months, “Do we have the tools we need to navigate the New World of Data?” Organizations are making progress at learning how to refine vast amounts of raw data into knowledge to drive insight and action. The tools used to do this, thus far, are not very good at sharing the intermediate results to produce “Information piece parts” which can be combined into new knowledge.

imageI will share some of the innovative work we’ve been doing both at Microsoft and with members of the Hadoop community to help customers unleash the value of their data by allowing more users to derive insights by combining and refining data regardless of the scale and complexity of data they are working with. We are working hard to broaden the adoption of Hadoop in the enterprise by bringing the simplicity and manageability of Windows to Hadoop based solutions, and we are expanding the reach with a Hadoop based service on Windows Azure. Hadoop is a great tool but, to fully realize the vision of a modern data platform, we also need a marketplace to search, share and use 1st and 3rd party data and services. And, to bring the power to everyone in the business, we need to connect the new big data ecosystem to business intelligence tools like PowerPivot and Power View.

There is an amazing amount of innovation going on throughout the ecosystem in areas like stream processing, machine learning, advanced algorithms and analytic languages and tools. We are working closely with the community and ecosystem to deliver an Open and Flexible platform that is compatible with Hadoop and works well with leading 3rd party tools and technologies enabling users of non-Microsoft technologies to also benefit from running their Hadoop based solutions on Windows and Azure.

We have recently reached a significant milestone in this journey, with our first series of contributions to the Apache Hadoop projects. Working with Hortonworks, we have submitted a proposal to the Apache Software Foundation for enhancements to Hadoop to run on Windows Server and are also in the process of submitting further proposals for a JavaScript framework and a Hive ODBC Driver.

The JavaScript framework simplifies programming on Hadoop by making JavaScript a first class programming language for developers to write and deploy MapReduce programs. The Hive ODBC Driver enables connectivity to Hadoop data from a wide range of Business Intelligence tools including PowerPivot for Excel.

In addition, we have also been working with several leading Big Data vendors like Karmasphere, Datameer and HStreaming and are excited to see them announce support for their Big Data solutions on our Hadoop based service on Windows Server & Windows Azure.

Just 10 years ago, most business data was locked up behind big applications. We are now entering a period where data and information become “first class” citizens. The ability to combine and refine these data into new knowledge and insights is becoming a key success factor for many ventures.

A modern data platform will provide new capabilities including data marketplaces which offer content, services, and models; and it will provide the tools which make it easy to derive new insights taking Business Intelligence to a whole new level.

This is an exciting time for us as we also prepare to launch our data platform that includes SQL Server 2012 and our Big Data investments. To learn more on what we are doing for Big Data you can visit www.microsoft.com/bigdata.


OData.org published OData Service Validation Tool Update: Ground work for V3 Support on 2/28/2012:

imageOData Service Validation Tool was updated with 5 new rules and the ground work for supporting OData v3 services:

  • 4 new common rules
  • 1 new entry rule
  • Added version annotation to rules to differentiate rules by version
  • Modified the rule selection engine to support rule version.
  • Change rule engine API to support rule versions
  • Changed database schema to log request and response headers
  • Added a V3 checkbox for online and offline validation scenarios

This update brings the total number of rules in the validation tool to 152. You can see the list of the active rules here and the list of rules that are under development here.

Moreover, with this update the validation tool is positioned to support rules for v3 services for OData V3 payload validation.

Keep on validating and let us know what you think either on the mailing list or on the discussions page on the Codeplex site.


The Microsoft BI Team (@MicrosoftBI) asked Do We Have The Tools We Need to Navigate The New World of Data? in a 2/28/2012 post:

imageToday, with the launch of the 2012 Strata Conference, the SQL Server Data Platform Insider blog is featuring a guest post from Dave Campbell, Microsoft Technical Fellow in Microsoft’s Server and Tools Business. Dave discusses how Microsoft is working with the big data community and ecosystem to deliver an open and flexible platform as well as broadening the adoption of Hadoop in the enterprise by bringing the simplicity and manageability of Windows to Hadoop based solutions.

Be sure to also catch Dave Campbell’s Strata Keynote streaming live via http://strataconf.com/live at 9:00am PST Wednesday February 29th, 2012 where he will cover the topics mentioned in the SQL Server DPI blog post in more detail. [See post above.]

Finally, to learn more about Microsoft’s Big Data Solutions, visit www.microsoft.com/bigdata.


The Microsoft BI Team (@MicrosoftBI) reported Strata Conference Is Coming Up! on 2/27/2012:

imageThe O’Reilly Strata Conference is the place to be to find out how to make data work for you. It’s starts tomorrow Tuesday February 28th and goes through Thursday March 1st. We’re excited to be participating in this great event. We’ll be covering a wide range of hot topics including Big Data, unlocking insights from your data, and self-service BI.

If you’re attending the event, you’ll want to check out our 800 sq. ft. booth. We’ll be at #301, and will have 3 demo pods featuring our products and technologies including what’s new with Microsoft SQL Server 2012, including Power View, as well as Windows Azure marketplace demos. We have a run-down of the sessions we’ll be speaking at, but first, a special note about February 29. From 1:30-2:10 PM. Microsoft Technical Fellow Dave Campbell will be hosting Office Hours. This is your chance to meet with the Strata presenters in-person. Dave Campbell is a Technical Fellow working on Microsoft’s Server and Tools business. His current product development interests include cloud-scale computing, realizing value from ambient data, and multidimensional, context-rich computing experiences. For those attending the conference, there is no need to sign up for this session. Just stop by the Exhibit Hall with your questions and feedback.

For those who aren’t able to attend in person, we’ll have representatives from the Microsoft BI team tweeting from the event. Be sure you are following @MicrosoftBI on Twitter so you can participate. Start submitting your questions today to @MicrosoftBI and we will incorporate questions into the Office Hours for Dave to answer.

Be sure to also catch Dave Campbell’s Strata Keynote streaming live via http://strataconf.com/live at 9:00am PST Wednesday February 29th, 2012. Mac Slocum will also be interviewing Dave at 10:15 A.M. PST on February 29th. You can watch this interview at the link above as well.

Now, a summary of what we’ll be presenting on!

SQL and NoSQL Are Two Sides Of The Same Coin

The nascent NoSQL market is extremely fragmented, with many competing vendors and technologies. Programming, deploying, and managing noSQL solutions requires specialized and low-level knowledge that does not easily carry over from one vendor’s product to another.

A necessary condition for the network effect to take off in the NoSQL database market is the availability of a common abstract mathematical data model and an associated query language for NoSQL that removes product differentiation at the logical level and instead shifts competition to the physical and operational level. The availability of such a common mathematical underpinning of all major NoSQL databases can provide enough critical mass to convince businesses, developers, educational institutions, etc. to invest in NoSQL.

In this article we developed a mathematical data model for the most common form of NoSQL—namely, key-value stores as the mathematical dual of SQL’s foreign-/primary-key stores. Because of this deep and beautiful connection, we propose changing the name of NoSQL to coSQL. Moreover, we show that monads and monad comprehensions (i.e., LINQ) provide a common query mechanism for both SQL and coSQL and that many of the strengths and weaknesses of SQL and coSQL naturally follow from the mathematics.

In contrast to common belief, the question of big versus small data is orthogonal to the question of SQL versus coSQL. While the coSQL model naturally supports extreme sharding, the fact that it does not require strong typing and normalization makes it attractive for “small” data as well. On the other hand, it is possible to scale SQL databases by careful partitioning.

What this all means is that coSQL and SQL are not in conflict, like good and evil. Instead they are two opposites that coexist in harmony and can transmute into each other like yin and yang. Because of the common query language based on monads, both can be implemented using the same principles.

Do We Have The Tools We Need To Navigate The New World Of Data?

In a world where data increasing 10x every 5 years and 85% of that information is coming from new data sources, how do our existing technologies to manage and analyze data stack up? This talk discusses some of the key implications that Big Data will have on our existing technology infrastructure and where do we need to go as a community and ecosystem to make the most of the opportunity that lies ahead.

Unleash Insights On All Data With Microsoft Big Data

Do you plan to extract insights from mountains of data, including unstructured data that is growing faster than ever? Attend this session to learn about Microsoft’s Big Data solution that unlocks insights on all your data, including structured and unstructured data of any size. Accelerate your analytics with a Hadoop service that offers deep integration with Microsoft BI and the ability to enrich your models with publicly available data from outside your firewall. Come and see how Microsoft is broadening access to Hadoop through dramatically simplified deployment, management and programming, including full support for JavaScript.

Hadoop + JavaScript: what we learned

imageIn this session we will discuss two key aspects of using JavaScript in the Hadoop environment. The first one is how we can reach to a much broader set of developers by enabling JavaScript support on Hadoop. The JavaScript fluent API that works on top of other languages like PigLatin let developers define MapReduce jobs in a style that is much more natural; even to those who are unfamiliar to the Hadoop environment.

The second one is how to enable simple experiences directly through an HTML5-based interface. The lightweight Web interface gives developer the same experience as they would get on the Server. The web interface provides a zero installation experience to the developer across all client platforms. This also allowed us to use HTML5 support in the browsers to give some basic data visualization support for quick data analysis and charting.

During the session we will also share how we used other open source projects like Rhino to enable JavaScript on top of Hadoop.

Democratizing BI at Microsoft: 40,000 Users and Counting

Learn how Microsoft manages a 10,000 person IT Organization utilizing Business Intelligence capabilities to drive communication of strategy, performance monitoring of key analytics, employee self-service BI, and leadership decision-making throughout the global Microsot IT organization. The session will focus on high-level BI challenges and needs of IT executives, Microsoft IT’s BI strategy, and the capabilities that helped to drive BI internal use from 300 users to over 40,000 users (and growing) through self-service BI methodologies.

Data Marketplaces for your extended enterprise: Why Corporations Need These to Gain Value from Their Data

imageOne of the most significant challenges faced by individuals and organizations is how to discover and collaborate with data within and across their organizations, which often stays trapped in application and organizational silos. We believe that internal data marketplaces or data hubs will emerge as a solution to this problem of how data scientists and other professionals can work together to in a friction-free manner on data inside corporations and between corporations and unleash significant value for all.

This session will cover this concept in two dimensions.

Piyush from Microsoft will walk through the concept of internal data markets – an IT managed solution that allows organizations to efficiently and securely discover, publish and collaborate on data from various sub-groups within an organization, and from partners and vendors across the extended organization

Francis, from ScraperWiki, will talk through stories of both how people have already used data hubs, and stories which give signs of what is to come. For example – how Australian activists use collaborative web scraping to gather a national picture of planning applications, and how Nike are releasing open corporate data to create disruptive innovation. There’ll be a section where the audience can briefly tell how they use the Internet to collaborate on working with data, and ends with a challenge to use open data as a weapon.

Additional Information

For additional details on Microsoft’s presence at Strata and Big Data resources visit this link.

Finally, if you’re not able to attend Strata and want a fun way to interact with our technologies, be sure to participate in our #MSBI Power View Contest happening Tuesdays and Thursdays through March 6th 2012.


The Microsoft BI Team (@MicrosoftBI) posted Ashvini Sharma’s Big Data for Everyone: Using Microsoft’s Familiar BI Tools with Hadoop on 2/24/2012:

In our second Big Data technology guest blog post, we are thrilled to have Ashvini Sharma, a Group Program Manager in the SQL Server Business Intelligence group at Microsoft. Ashvini discusses how organizations can provide insights for everyone from any data, any size, anywhere by using Microsoft’s familiar BI stack with Hadoop.

Whether through blogs, twitter, or technical articles, you’ve probably heard about Big Data, and a recognition that organizations need to look beyond the traditional databases to achieve the most cost effective storage and processing of extremely large data sets, unstructured data, and/or data that comes in too fast. As the prevalence and importance of such data increases, many organizations are looking at how to leverage technologies such as those in the Apache Hadoop ecosystem. Recognizing one size doesn’t fit all, we began detailing our approach to Big Data at the PASS Summit last October. Microsoft’s goal for Big Data is to provide insights to all users from structured or unstructured data of any size. While very scalable, accommodating, and powerful, most Big Data solutions based on Hadoop require highly trained staff to deploy and manage. In addition, the benefits are limited to few highly technical users who are as comfortable programming their requirements as they are using advanced statistical techniques to extract value. For those of us who have been around the BI industry for a few years, this may sound similar to the early 90s where the benefits of our field were limited to a few within the corporation through the Executive Information Systems.

Analysis on Hadoop for Everyone

imageMicrosoft entered the Business Intelligence industry to enable orders of magnitude more users to make better decisions from applications they use every day. This was the motivation behind being the first DBMS vendor to include an OLAP engine with the release of SQL Server 7.0 OLAP Services that enabled Excel users to ask business questions at the speed of thought. It remained the motivation behind PowerPivot in SQL Server 2008 R2, a self-service BI offering that allowed end users to build their own solutions without dependence on IT, as well as provided IT insights on how data was being consumed within the organization. And, with the release of Power View in SQL Server 2012, that goal will bring the power of rich interactive exploration directly in the hands of every user within an organization.

Enabling end users to merge data stored in a Hadoop deployment with data from other systems or with their own personal data is a natural next step. In fact, we also introduced Hive ODBC driver, currently in Community Technology Preview, at the PASS Summit in October. This driver allows connectivity to Apache Hive, which in turn facilitates querying and managing large datasets residing in distributed storage by exposing them as a data warehouse.

This connector brings the benefit of the entire Microsoft BI stack and ecosystem on Hive. A few examples include:

  • Bring Hive data directly to Excel through the Microsoft Hive Add-in for Excel
  • Build a PowerPivot workbook using data in Hive
  • Build Power View reports on top of Hive
  • Instead of manually refreshing a PowerPivot workbook based on Hive on their desktop, users can use PowerPivot for SharePoint to schedule a data refresh feature to refresh a central copy shared with others, without worrying about the time or resources it takes.
  • BI Professionals can build BI Semantic Model or Reporting Services Reports on Hive in SQL Server Data tools
  • Of course all of the 3rd party client applications built on the Microsoft BI stack can now access Hive data as well!

Klout is a great customer that’s leveraging the Microsoft BI stack on Big Data to provide mission critical analysis for both internal users as well as to its customers. In fact, Dave Mariani, the VP of Engineering at Klout has taken some time out to describe how they use our technology. This is recommended viewing not just to see examples of applications possible but also to get a better understanding of how new options complement technology you are already familiar with. Dave also blogged about their approach here.

Best of both worlds

As we mentioned in the beginning of this blog article, one size doesn’t fit all, and it’s important to recognize the inherent strengths of options available to choose when to use what. Hadoop broadly provides:

  • an inexpensive and highly scalable store for data in any shape,
  • a robust execution infrastructure for data cleansing, shaping and analytical operations typically in a batch mode, and
  • a growing ecosystem that provides highly skilled users many options to process data.

The Microsoft BI stack is targeted at significantly larger user population and provides:

  • functionality in tools such as Excel and SharePoint that users are already familiar with,
  • interactive queries at the speed of thought,
  • business layer that allows users to understand the data, combine it with other sources, and express business logic in more accessible ways, and
  • mechanisms to publish results for others to consume and build on themselves.

Successful projects may use both of these technologies in complementary manner, like Klout does. Enabling this choice has been the primary motivator for providing Hive ODBC connectivity, as well as investing in providing Hadoop-based distribution for Windows Server and Windows Azure.

More Information

This is an exciting field, and we’re thrilled to be a top-tier Elite sponsor of the upcoming Strata Conference between February 28th and March 1st 2012 in Santa Clara, California. If you’re attending the conference, you can find more information about the sessions here. We also look forward to meeting you at our booth to understand your needs.

Following that, on March 7th, we will be hosting an online event that will allow you to immerse yourself in the exciting New World of Data with SQL Server 2012. More details are here.

For more information on Microsoft’s Big Data offering, please visit http://www.microsoft.com/bigdata.


<Return to section navigation list>

Windows Azure Access Control, Service Bus and Workflow

Alan Smith reported PDF and CHM versions of Windows Azure Service Bus Developer Guide Available & Azure Service Bus 2-day Course in a 3/1/2012 post:

imageI’ve just added PDF and CHM versions of “Windows Azure Service Bus Developer Guide”, you can get them here.

The HTML browsable version is here.

I have the first delivery of my 2-day course “SOA, Connectivity and Integration using the Windows Azure Service Bus” scheduled for 3-4 May in Stockholm. Feel free to contact me via my blog if you have any questions about the course, or would be interested in an on-site delivery. Details of the course are here.


Alan Smith announced availability of this Windows Azure Service Bus Developer Guide in a 2/29/2012 post:

imageI’ve just published a web-browsable version of “Windows Azure Service Bus Developer Guide”. “The Developers Guide to AppFabric” has been re-branded, and has a new title of “Windows Azure Service Bus Developer Guide”. There is not that much new in the way of content, but I have made changes to the overall structure of the guide. More content will follow, along with updated PDF and CHM versions of the guide.


<Return to section navigation list>

Windows Azure VM Role, Virtual Network, Connect, RDP and CDN

Avkash Chauhan (@avkashchauhan) described Windows Azure CDN and Referrer Header in a 2/27/2012 post:

imageThe Windows Azure Azure CDN, like any other CDNs, attempts to be a transparent caching layer. The CDN doesn’t care who the referring site might be. Like any other CDN, Windows Azure CDN keep things transparent and have no concern of what the referring site is. So it is correct to say that Windows Azure CDN does not have any dependency on Referrer Header. Any client solution created using referrer header will have no direct impact on how Windows Azure CDN works.

imageIf you try to control CDN access based on be referrer, that may be a good idea because it is very easy to use tools like wget or curl to create a http request which has your referrer URL. Windows Azure CDN does not publicly support any mechanism for authentication or authorization of requests. If a request is received, it’s served. There is no way for Windows Azure CDN to reject requests based on referrer header or on any other header.


<Return to section navigation list>

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Himanshu Singh posted Real World Windows Azure: Interview with Rajesekar Shanmugam, CTO at AppPoint on 3/2/2012:

imageAs part of the Real World Windows Azure interview series, I talked to Rajasekar Shanmugam, Chief Technology Officer at AppPoint, about using Windows Azure for its cloud-based process automation and systems integration services. Here’s what he had to say.

Himanshu Kumar Singh: Tell me about AppPoint.

Rajasekar Shanmugam: AppPoint is an independent software vendor and we provide process automation and system integration services to enterprises. At the core of our solutions is a business-application infrastructure with a unique model-driven development environment to help simplify the development process. We originally developed our platform on Microsoft products and technologies, including the .NET Framework 4 and SQL Server 2008 data management software.

HKS: What led you to consider a move to the cloud?

RS: We traditionally used our platform to develop solutions that were deployed on-premises at customer locations, but we started seeing interest from our customers for cloud-based computing solutions. In fact, several of our customers had previously deployed AppPoint solutions by using third-party hosting providers, but with limited success. Some of these customers who attempt cloud-based deployment struggle to integrate the solution with their enterprise resource planning [ERP] systems and still have to invest in additional software for virtual machines and don’t get the benefits of true cloud computing.

imageHKS: Tell me about the move to the cloud with Windows Azure.

RS: To meet customer demand for cloud-based process automation solutions, we developed a version of our existing business process automation platform, called AppsOnAzure, on Windows Azure. Windows Azure was the obvious choice for our platform, which is already built on the .NET Framework and offers a complete end-to-end architecture.

HKS: Can you tell me about a customer using AppsOnAzure?

RS: Yes, Biocon is a prominent biotechnology company in India that earned revenues of approximately U.S.$516 million in 2010. As part of its manufacturing process, Biocon manages artwork for drug labels, leaflets, and packages, which is regulated by the Food and Drug Administration and involves multiple stakeholders and approval cycles. This artwork management process was manual, time-consuming and sometimes error-prone, which led to expensive product recalls. To help speed the artwork process and minimize error-prone processes, Biocon turned to us for a cloud-based solution.

HKS: How did AppsOnAzure help Biocon?

RS: We used AppsOnAzure to develop an automated artwork management system for Biocon in only eight weeks. This system automates and controls the entire artwork life cycle process, from concept to product and eliminates the paper-based approach. Biocon employees can upload documents and files, which are processed through worker roles in Windows Azure and stored in Blob Storage. Biocon uses SQL Azure to store its relational data, including product information for the drugs it manufactures. We also implemented Windows Azure AppFabric Service Bus to connect to Microsoft Dynamics, the on-premises ERP system that Biocon uses.

HKS: Is AppsOnAzure useful for other companies?

RS: Yes, There has been tremendous change in mind set, where customers are moving their LOB/Business critical apps onto cloud. Though above solution was developed for Biocon, we’ve found that the solution is relevant across a variety of industries and we already have several customers evaluating the solution. We license the solution on an unlimited-user basis, and customers can either sign up for Windows Azure themselves, paying for hosting and storage on a per-use basis directly with Microsoft, or they can choose to have AppPoint manage instances of Windows Azure for them and pay a flat-rate monthly subscription fee.

HKS: What are some of the benefits you’ve seen from using Windows Azure?

RS: We’ve already seen some dramatic benefits as a result of our move to Windows Azure. The first is speed to market: the development process is so straightforward with Windows Azure that we expect to have 20 more cloud-based process automation solutions ready for customers to evaluate in less than eight months. This faster time to market coupled with less investment in infrastructure also lowers customer barriers to adoption, which has led to more companies evaluating a move to the cloud with us.

And as a Microsoft Partner, we receive ongoing support from Microsoft, which helps the us reach new customers. While some customers embrace the cloud, other customers can be more hesitant without proper education about the cloud-computing model. Being part of the Microsoft partner community and working with the Windows Azure Incubation sales team to help educate customers on the benefits of cloud computing has a tremendous influence on our ability to tap into new markets.

Read the full case study. Learn how other companies are using Windows Azure.


Yvonne Muench (@yvmuench) asked and answered Can the Cloud Help Me Thrive in a Down Market? in a 2/29/2012 post:

imageEver wonder how that Washington Post magically appears outside the door of your hotel in downtown Tokyo? It’s probably due to a Vancouver Canada ISV called NewspaperDirect. But wait, isn’t the newspaper industry in a downward spiral of declining circulation, declining ad rates, and shuttered daily newspapers? Many blame technology and the Internet. Yet NewspaperDirect is growing rapidly. And they’re using Windows Azure. In fact, they’re one of the largest users in Canada (read more in this recent case study). I reached out to Igor Smirnoff, VP of Digital, to learn what role Windows Azure plays in helping them change with the times.

It turns out their original business, started 12 years ago, is a digital printing platform – enabling those out-of-market newspapers to be delivered to your hotel halfway across the world. They are supplied on demand to hotels, airports, cruise ships, retail outlets, corporate and government offices, home subscribers etc., and sold via a traditional license to print. But it’s a niche product – it has a small customer base and a high price point. Not a growth business.

They also have a newer digital platform, started six years ago. As traditional papers scramble to figure out digital news delivery NewspaperDirect’s ePublishing platform, SmartEdition, lets them outsource to provide readers with a high quality, full-content digital replica of their printed edition. Today over a thousand newspaper publishers partner with NewspaperDirect for their ePaper solution, such as Canada’s Globe and Mail, UK’s The Times and just recently The New York Times to name a few. I mention large known brands, but really this platform allows the ISV to reach a broader market, including smaller regional and community newspapers. And unlike other ePublishing vendors who license software or charge development and operational fees, NewspaperDirect’s business model is revenue sharing with zero-operational cost to publishers. NewspaperDirect takes care of hosting, payments, customer service, and technical support.

But where it gets really interesting is their aggregated online newspaper kiosk, a global consumer offering called PressDisplay.com. This just happens to be the part of their business on Windows Azure. It’s like Netflix for newspapers. You don’t subscribe to a particular title. It’s all you can eat, from 2000+ newspapers, from 95 countries, covering 52 languages. A companion PressReader app pushes the content to your end device for anytime offline reading. PressDisplay is a subscription service where customers pay a monthly fee for unlimited access to the entire collection of titles. PressDisplay.com’s companion app, PressReader, is a top three media app in many European countries and the world’s most popular newspaper app for Android. Those troubled newspapers – they get to list for free. Any time their paper is opened by a reader they not only get a royalty fee, they also get credits from the Audit Bureau of Circulation for a “sold digital replica”. This improves their circulation numbers and helps prop up their ad rates. What a sweet deal! Much better than printing at money losing levels in order to get higher circulation numbers…

I recently had a vacation in Rome so I played around with it on my iPad. I browsed three newspapers with easy page flipping and zoom in/out functionality. As news unfolded I variously searched on Amanda Knox, Steve Jobs, and Silvio Berlusconi. Fascinating to compare coverage of Seattle’s own Knox in the Seattle Times versus the British Tabloids, and to read the tributes to Jobs from the Middle East. Berlusconi, on the other hand, didn’t fare too well no matter where I read about him… Editors, researchers and students could have a field day with this. Search engines only show articles freely available online; because many newspapers only have 30% - 60% of their print version online, that leaves a lot out. NewspaperDirect has 100% replicas. “All the news fit to print”, so to speak.

imageWhen NewspaperDirect switched to Windows Azure from third-party hosting they enjoyed on-demand scalability and more time to focus on their core business. That’s great, but it’s a familiar story. Here’s what I really love about how this company and their app use Windows Azure:

  • A bold bet. They moved the fastest growing part of their business to Windows Azure. You’ve got to admire that. Not long ago PressReader offered a few hundred titles available on computers and a limited number of phones; today more than 2000 publications are available on whatever device readers choose.
  • In-house innovation. They launch improvements on their Windows Azure-hosted kiosk first, and then into the white label ePapers. So in a sense they use Windows Azure and the kiosk as a public demo center, showing off the newest bells and whistles.
  • Any end device. Their app works on iOS, Kindle, Blackberry and Android handhelds and tablets… oh and of course PCs, smartphones, and Windows 7 Slate PCs. Other ISVs – are you listening? Windows Azure works really well on competitive end devices.
  • It helps traditional newspapers adapt and survive. This ISV’s Windows Azure-hosted offerings provide a chance for newspapers to keep up ad rates protecting traditional revenue sources, while modernizing content, increasing its availability, and making it look cool to new audiences like younger people, classrooms, early adopters. It offers things like translation on the fly, social media integration, and audio for blind readers.

I recently saw a quote in an article in The New Yorker by Arianna Huffington: “People love to talk about the death of newspapers, as if it’s a foregone conclusion. I think that’s ridiculous,” she says. “Traditional media just need to realize that the online world isn’t the enemy. In fact, it’s the thing that will save them, if they fully embrace it.”

I think she’s on to something. Technological change as the good guy. That’s refreshing.


Liam Cavanagh (@liamca) continued his series with What I Learned Building a Startup on Microsoft Cloud Services: Part 6 – Securing the service and building trust on 2/28/2012:

imageI am the founder of a startup called Cotega and also a Microsoft employee within the SQL Azure group where I work as a Program Manager. This is a series of posts where I talk about my experience building a startup outside of Microsoft. I do my best to take my Microsoft hat off and tell both the good parts and the bad parts I experienced using Azure.

imageI knew my next step was to get an SSL certificate for my domain so that I could secure communications through https and encrypt the user’s connection information that is stored in the Cotega system database. Before I did this I needed to create a domain name. Unfortunately Windows Azure does not provide domain name registration or SSL certificates so I decided to look into using GoDaddy which is also where this blog is hosted. I found a domain name I liked through the GoDaddy auctions for about $10 called Cotega. I was also pretty lucky because there was a deal on the GoDaddy SSL Certificate for $12 / year. If I were to do this again and there was not a deal on the GoDaddy SSL certificate, I would probably take a closer look at FreeSSL’s free SSL certificate as I have read about other startups who chose this option.

Integrating SSL Certificates and CNames

imageSince I am hosting my web site on DiscountASP.NET I needed to load the certificate there. To configure the domain name, you need to set up a CName. A CName is basically a redirection that happens when someone enters www.cotega.com into a browser. In the background the CName links www.cotega.com to my DiscountASP.NET hosted IP address. This also works with the SSL certificate that is hosted with my web site so that https and http both leverage this IP address.

If I had of known at the start that I was going to host my web site on DiscountASP.net, I would have also used them to provide my domain name. Luckily I was able to use my previously created GoDaddy SSL Certificate and domain name with only minor configuration changes on DiscountASP.net.

Self Signed Certificates

The other thing that I wanted to use a certificate for is the encryption of users connection strings that are stored in the Cotega system database. For this I wanted to have a really high level of encryption so I chose to create an unsigned 2,048 bit MD5 X.509 certificate using the makecert.exe utility. You would not want to use a makecert generated certificate for your SSL traffic as browsers will reject an unsigned certificate. If you want to learn more about this, there is a great overview of the issues here. However, for encrypting the connection information within my MVC controller, this was a perfect solution because I control the entire encryption process which means the “man in the middle attack” does not apply but I could implement a really high level of encryption for my users.

Working with Customers to Building Trust

One of the interesting differences I learned in building services at Microsoft vs. Cotega is in the area of trust. At both Microsoft and Cotega, I spend a lot of time making sure a user’s information is secured. However, at Microsoft I did not have to spend as much time explaining this to customers. I think this comes from their past experiences with Microsoft where they have learned of Microsoft privacy and security policies and have grown over time to trust them. However, at Cotega, the service and company is new and has not built the level of trust that Microsoft has. As a result, I found that it was important for me to work closely with these customers to ensure that I can build that level trust which has primarily come from one-on-one discussions. A side benefit of this close relationship with customers is the early feedback I get on the service. But I will talk about that more later…


Ryan Dunn (@dunnry) described Setting Up Diagnostics Monitoring In Windows Azure in a 2/27/2012 post:

imageIn order to actually monitor anything in Windows Azure, you need to use the Diagnostics Manager (DM) that ships out of box. SaaS providers like AzureOps rely on this data in order to tell you how your system is behaving. The DM actually supports a few data sources that it can collect and transfer:

  • Performance Counters
  • Trace logs
  • IIS Logs
  • Event Logs
  • Infrastructure Logs
  • Arbitrary logs

imageOne of the most common issues I hear from customers is that they don't know how to get started using the DM or they think they are using it and just cannot find the data where they think they should. Hopefully, this post will clear up a bit about how the DM actually works and how to configure it. The next post will talk about how to get the data once you are setup.

Setting UP The Diagnostics Manager

Everything starts by checking a box. When you check the little box in Visual Studio that says "Enable Diagnostics", it actually modifies your Service Definition to include a role plugin. Role plugins are little things that can add to your definition and configuration similar to a macro. If you have ever used Diagnostics or the RDP capability in Windows Azure, you have used a role plugin. For most of their history, these plugins have been exclusively built by Microsoft, but there is really nothing stopping you from using it yourself (that is another topic).

image

image

If we check our SDK folder in the plugins directory in the 'diagnostics' folder, you will actually find the magic that is used to launch the DM.

<?xml version="1.0" ?> <RoleModule xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition" namespace="Microsoft.WindowsAzure.Plugins.Diagnostics"> <Startup priority="-2"> <Task commandLine="DiagnosticsAgent.exe" executionContext="limited" taskType="background" /> <Task commandLine="DiagnosticsAgent.exe
/blockStartup" executionContext="limited" taskType="simple" /> </Startup> <ConfigurationSettings> <Setting name="ConnectionString" /> </ConfigurationSettings> </RoleModule>

Here, we can see that the DM is implemented as a pair of startup tasks. Notice, it is using a task type of background (the other is blocking until it gets going). This means that the DM exists outside the code you write and should be impervious to your code crashing and taking it down as well. You can also see that the startup tasks listed here will be run with a default priority of -2. This just tries to ensure that they run before any other startup tasks. The idea is that you want the DM to start before other stuff so it can collect data for you.

You can also see in the definition the declaration of a new ConfigurationSettings with a single Setting called 'ConnectionString'. If you are using Visual Studio when you import the Diagnostics plugin, you will see that the tooling automatically combines the namespace with the settings name and creates a new Setting called Microsoft.Windows.Plugins.Diagnostics.ConnectionString. This setting will not exist if you are building your csdef or cscfg files by hand. You must remember to include it.

Once you have the plugin actually enabled in your solution, you will need to specify a valid connection string in order for the DM to operate. Here you have two choices:

  1. Running in emulation, it is valid to use "UseDevelopmentStorage=true" as ConnectionString.
  2. Before deploying to cloud, you must remember to update that to a valid storage account (i.e. "DefaultEndpointsProtocol=https;AccountName=youraccount;AccountKey=nrIXB.")
Common Pitfalls

It seems simple enough, but here come the first set of common pitfalls I see:

  1. Forgetting to set the ConnectionString to a valid storage account and deploying with 'UseDevelopmentStorage=true'. This has become less of a factor in 1.6+ SDK tooling because you will notice the checkbox that says, "Use publish storage account as connection string when you publish to Windows Azure". However, tooling will not help you here for automated deploys or when you forget to check that box.
  2. Using "DefaultEndpointsProtocol=http" in the connection string (note the missing 's' from 'https'). While it is technically possible to use the DM with an http connection, it is not worth the hassle. Just use https and save yourself the hassle of troubleshooting this later.
  3. Setting an invalid connection string. Hard to believe, but I see it all the time now on AzureOps. This usually falls into two categories: deleting a storage account, and regenerating a storage key. If you delete a storage account, but forget to remove that as the ConnectionString, things won't work (shocking, I know). Further, if you decide to regenerate the primary or secondary storage keys and you were using them, stuff won't work here either. Seems obvious, but you won't actually get any warning on this. Stuff won't work and you will have to figure that out yourself. A good 3rd party provider (like AzureOps) will let you know however.
  4. Forgetting to co-locate the diagnostics storage account with the hosted service. This one might not show itself until you see the bill. The diagnostics agent can be pretty chatty. I have seen GBs of data logged in a single minute. Forgetting to co-locate that would run you a pretty hefty bandwidth bill in addition to slowing you down.
Best Practices

Setting up the Diagnostics Manager is not terribly hard, but easy to get wrong if you are not familiar with it. There are some other subtle things you can do here that will shoot yourself in the foot however. Here are some things you can do that will make your life easier:

  1. Always separate your diagnostics storage account from other storage accounts. This is especially important for production systems. You do not want diagnostics competing with your primary storage account for resources. There is an account wide, 5000 transactions per second limit across tables, queues, and blobs. When you use a single storage account for both, you could unintentionally throttle your production account.
  2. If possible, use a different diagnostics storage account per hosted service. If that is not practical, at least try to separate storage accounts for production versus non-production systems. It turns out that querying diagnostics data can be difficult if there are many different systems logging to the same diagnostics tables. What I have seen many times is someone use the same diagnostics account for load testing against non-production systems as their production system. What happens is that the amount of data for the non-production system can greatly exceed the production systems. The query mechanism for then finding production data is akin to finding a needle in the haystack. It can take a very long time in some cases to query even for simple things.
  3. Don't' use the Anywhere location. This applies to all storage accounts and all hosted services. This might seem obvious, but I see it all the time. It is possible to use Anywhere location with affinity groups and avoid pitfall #4, but it is not worth the hassle. Additionally, if you have a 3rd party (like AzureOps) that is monitoring your data, we cannot geo-locate a worker next to you to pull your data. We won't know where you are located and it could mean big bandwidth bills for you.

At this point, if you have enabled the DM, and remembered to set a valid connection string, you are almost home. The last thing to do is actually get the data and avoid common pitfalls there. That is the topic for the next post.


The Windows Showcase Team posted a 00:03:06 Microsoft Codename Data Hub Overview video on 2/26/2012:

So.. that wasn’t so painful, was it?

“Actually, yes, it was. But I’m going to give it a try anyway. Thanks, Kevin!”

You’re quite welcome. Tell your friends.

---

Did you find this useful? Are you doing this install along with me? Let us know in the comments. Share your experiences here with the rest of the massive community that is “the people who read my blog”.


Kevin Remde (@KevinRemde) continued his series with Screencast: System Center 2012 Unified Installer (Part 2 of 3) on 3/1/2012:

imageContinuing on where we left off from our part 1 screencast yesterday, when we discussed the downloads and the prerequisites and what you needed in place for the installation; today in part 2 we discuss how to configure your servers.

“Is it really all that complicated, Kevin?”

imageActually, yes. If you read the User Guide, you’ll see that there are several things you need to do to prepare the servers prior to using the Unified Installer to do the deployment to them. So in today’s screencast, I walk you through that process.

NOTE: It is viewed best in full-screen, and 1280x768 if you can.

Make sure you come back tomorrow for part 3, where we actually run the Unified Installer, and deploy the entire set of components in System Center into my test lab!

---

Did you find this useful? Are you doing this install along with me? Let us know in the comments. And I’ll see you back here tomorrow for Part 3.


Kevin Remde (@KevinRemde) started a series with Screencast: System Center 2012 Unified Installer (Part 1 of 3) on 2/29/2012:

imageLike many of you, I’ve been playing with the Release Candidate of System Center 2012. And also like many of you, I have heard that there is something called the “Unified Installer”, that supposedly will allow a person to install many – or even all – of the components of System Center 2012 on to multiple servers.

“Really? That’s cool.”

imageYeah, that’s what I thought, too. But after looking at it for a bit, I came to realize quickly that it wasn’t really as straightforward as just launching a Setup.exe and clicking Next –> Next –> Next –> Install –> Finish. In fact, it requires no small amount of thought and pre-work to get things downloaded, extracted, and configured properly prior to ever launching the Unified Installer.

In other words: It was a set of screencasts just screaming to be created. Smile

So that’s what I’ve done. Here is Part 1, where I describe how to download and extract the components, plus download and prepare the prerequisites that are, um.. pre-required.

NOTE: It is viewed best in full-screen, and 1280x768 if you can.

As promised, here are the links mentioned in the screencast:

---

Did you find this useful? Are you going to do this install along with me? Let us know in the comments. And I’ll see you back here tomorrow for Part 2.

The Microsoft Server and Cloud Platform Team (@MSCloud) reported Brad Anderson's MMS 2012 Keynote Abstracts are Available! on 2/29/2012:

imageCheck out the abstracts for Corporate Vice President, Brad Anderson's, MMS 2012 Keynotes:

Day 1: Microsoft Private Cloud. Built for the Future. Ready Now.
Cloud computing and the delivery of true IT as a Service is one of the most profound industry shifts in decades. Join Brad Anderson, Corporate Vice President of Microsoft’s Management and Security Division, as he shares Microsoft’s vision for cloud computing and shows how System ...Center 2012, as part of the Microsoft private cloud, will enable you to deliver the promise of cloud computing in your organization today.

Day 2: A World of Connected Devices
Clouds and cloud-connected devices are changing the world of work and our daily interactions. Tech-savvy and always-connected, people want faster, more intuitive technology, uninterrupted services, and the freedom to work anywhere, anytime, on a variety of devices. Join Brad Anderson, Corporate Vice President of the Management and Security Division at Microsoft to learn how System Center 2012 and Windows Intune can help IT embrace this new reality today, and in the future, by making the right intelligent infrastructure investments.

Watch the interview with Brad on TechNet Edge to learn more about why he is excited to be back at MMS 2012 again this year.


<Return to section navigation list>

Cloud Security and Governance

Richard Santalesa reported NIST Releases Public Draft SP800-53 Addressing Cybersecurity Threats & Privacy Controls in a 2/29/2012 post to the Info Law Group blog:

imageYesterday the National Institute of Standards and Technology (NIST) released the 4th iteration of what will ultimately be a mainstay document for federal agencies required to comply with provisions of the Federal Information Security Management Act (FISMA) and FIPS 200. As a result it should have a significant affect on federal cloud security practices that will ultimately also effect commercial non-governmental cloud usage.

imageWeighing in at 375 pages, NIST’s Special Publication 800-53, Rev. 4, entitled Security and Privacy Controls for Federal Information Systems and Organizations, is the first “public draft” of SP800-53. Previous iterations of parts of SP800-53 were released essentially piecemeal (i.e. Appendix J, Privacy Control Catalog, was earlier distributed separately, etc.). Given the breadth and scope of SP800-53 follow-up posts will examine specific notable sections of this important NIST SP. In addition, the public comment period for SP 800-53 runs until April 6, 2012. Comments may be sent via email to sec-cert@nist.gov.

This latest public draft includes major changes that include…

...according to NIST:

  • New security controls and control enhancements;
  • Clarification of security control requirements and specification language;
  • New tailoring guidance including the introduction of overlays;
  • Additional supplemental guidance for security controls and enhancements;
  • New privacy controls and implementation guidance;
  • Updated security control baselines;
  • New summary tables for security controls to facilitate ease-of-use; and
  • Revised minimum assurance requirements and designated assurance controls.

NIST notes that "[m]any of the changes were driven by particular cyber security issues and challenges requiring greater attention including, for example, insider threat, mobile and cloud computing, application security, firmware integrity, supply chain risk, and the advanced persistent threat (APT)."

Interestingly, despite the cloud-heavy focus of many recent NIST SP's and reports, the release stresses that "in most instances, with the exception of the new privacy appendix, the new controls and enhancements are not labeled specifically as 'cloud' or 'mobile computing' controls or placed in one section of the catalog." In following posts I'll explore the ramifications of this orientation and examine why NIST's approach makes sense in light of the current infosec and threat landscape. We'll also dig through the expected additional markup versions of Appendices D, F and G following the comment period and Appendices E and J, containing security and privacy controls. Stay tuned.

To discuss the latest SP800-53 public draft or expected implications of the recommended controls on your entity's security and data infrastructure please feel free to contact me or any of the InfoLawGroup team of attorneys.

 


Integracon (@Integracon) reported Failure to Meet Requirements Hinders LA Cloud Implementation in a 2/29/2012 post:

imageAfter two years of trying and thousands of dollars investment, the City of Los Angeles withdrew their plans to migrate the police department to Google’s hosted email and office application.[1] Most of the city departments have already migrated, but the 13,000 member police force is forced to remain on Novell GroupWise because the cloud solution could not meet FBI requirements.

imageThe city contracted with CSC to facilitate the systems integration on 2009, but both CSC and Google failed to meet commitments to implement security requirements again and again. This delay brought the migration of the LAPD to a standstill. Google and CSC both lost a percentage of service fees, and Google must pay for the Novell Groupwise service. The city is so frustrated that they are considering suing both organizations for failure to meet compliance requirements after assurances that the requirements would be met.

imageGoogle and CSC claim that the FBI requirements are not compatible with cloud implementations. They also claim that they did not realize the FBI requirements when they committed to the project. But Jeff Gould, CEO of IT consulting firm Peerstone Research, suggests that their excuses are disingenuous. Google and CSC should have known requirements clearly detailed in the CJIS policy document when they signed the contract.

This fiasco could have been prevented if the cloud providers had done the proper research to understand the exact requirements for the LAPD in advance. This raises a fundamental issue for a cloud services. When a municipality contracts with a cloud vender, they should make certain the provider fully understands the various requirements in advance. In fact, this is a lesson that any business should keep in mind when sourcing cloud services. Make sure that both the service that facilities your systems integration as well as cloud provider understand the full extent of requirements in advance.

Integracon is a leader in cloud implementations because our engineers focus on details and delivery. Proper preparation and planning is essential for successful implementation, and meeting all requirements is not optional but essential. To learn more about our depth of experience in cloud service implementation, contact Integracon today. We have the deep experience that is essential and can over a list of satisfied customers who continue to rely our service and support.

[1] Jaikumar Vijayan. “Los Angeles police drop plans to move to Google Apps.” December 27, 2011 http://news.techworld.com/data-centre/3326744/los-angeles-police-drop-plans-to-move-to-google-apps/?cmpid=TD1N10&no1x1&olo=daily%20newsletter.


Scott M. Fulton, III (@SMFulton3) described the Expert Panel at RSA 2012: Who's Responsible for Cloud Security? in a 2/27/2012 post to the ReadWriteCloud blog:

image"Whose problem is this? Whose problem is a vulnerability in an app? Is it the app developers? Is it the service provider of the operating system? Or is it the distribution center of the application?"

These aren't questions presented to an expert panel by attendees at the Cloud Security Alliance Summit at RSA in San Francisco this morning. These are questions coming from that panel - specifically, from a professional security analyst whose firm openly experiments with app store security, including from Google's app stores for Android and Chrome OS.

120227 CSA panel.jpg


Pictured above, from left to right: Philippe Courtot, CEO, Qualys; Don Godfrey, security consultant, Humana; Matt Johansen, Threat Research Center manager, WhiteHat Security; Patrick Harding, CTO, Ping Identity.

Matt Johansen runs the Threat Research Center for WhiteHat Security, a private analysis firm that specializes in determining the relative security characteristics of Web sites and Web apps on behalf of their proprietors. Sometimes their research extends outside the security of the app itself, and into the environment in which it's distributed and propagated.

Speaking a one of a powerhouse panel assembled by Qualys CEO Philippe Courtot, Johansen related some of WhiteHat's experiences with testing the fringes of Google security. He noted that consumers' expectations of responsibility are based on consumers' history - when someone buys tainted food, they blame the supermarket, even though legally the farmer may be at fault. Maybe there should be some sort of code review process at Google, he suggested.

Maybe. "When I was doing some research on the Chrome OS, we uploaded an extension to the Chrome Web store called, 'Malicious Extension,'" admitted Johansen. "There was absolutely no code review process there at all." The app contained fake buttons which read, "Steal cookie," and the like. For a while, it stayed available for download until WhiteHat took it down. But before that, he approached Google to demonstrate the problem and to ask them the string of questions which led this article.

"I've never gotten the same answer twice from anyone that I've asked," he remarked. "It's an interesting problem, and I think we're going to see it more and more. One of the scariest facts about it is, the iPad didn't exist more than two years ago... [So] we don't really know the answer to these problems. Who's problem is it to fix this vulnerability in an app that you're installing on your operating system, and that has permissions that it maybe shouldn't."

Everyone who's installed an app on a smartphone has seen the permissions screen which informs the user what kinds of information may be shared. A banking app should be expected to communicate a certain quantum of personal data, specifically with the bank. That's if the app works properly. If it doesn't, it may share something else instead. Or it may share the right data with the wrong source. If that ends up compromising the integrity of someone's bank accounts, who's responsible? It's such a new industry, Johansen pointed out, that the question really hasn't had time to be answered before the technology behind it became suddenly ubiquitous.

The Cloud as Agitator

To an ever-greater extent, the mobile app serves as a facilitator between a device and a cloud-based service. It's a "cloud" service, as opposed to a conventional Web server, because its structure is virtual, its location is variable, and the resources it provides are made to appear local - as though the user installed them on his phone.

That doesn't change everything, though, argued panelist and Ping Identity CTO Patrick Harding. "The cloud doesn't solve developers building insecure applications," Harding told the RSA audience Monday morning. "They're going to do that no matter what. What people are finding, though, is that SaaS applications [developers] specifically have a business incentive to seriously write secure applications. But as you drift down the stack, so to speak, the risk goes up. If you talk about IaaS and people deploying to the cloud there, you're not getting the same level of analysis and control as somebody like a Salesforce or a Google, or someone like that, might have."

Matt Johansen may have a different perspective. One service WhiteHat provides, for example, is asset discovery - taking inventory of a customer's digital resources. A Web app serves as the public doorway for data stored elsewhere, he explained. With respect to a vulnerability management job, WhiteHat often finds that its clients have no clue how many Web apps they have, nor how many Web sites they need the firm to analyze. "That seems to be one of the harder questions to answer for a lot of people," said Johansen, "and I think that's very telling. I think that's kinda scary. If you have a footprint on the Internet with your applications, and you don't even know the size of them, how are you going to manage every entry point into your data when you don't even know where the doors are?"

Ping's Patrick Harding took the opportunity of speaking before the CSA Summit to stomp just a bit further on one of his pet peeves: the growing uselessness of passwords as lynchpins for authenticity. Cloud computing only exacerbates this problem, Harding believes, because cloud-based resources typically require authentication.

"I actually think that passwords are the Achilles' heel of cloud security," Harding said, striking a familiar theme. "For all the money that people are going to spend on encrypting their data and putting Web app firewalls in front of them... if I can get your password from any one of the applications that you use, I've got instant access to all that data, essentially."

Harding noted that in his research, Web apps that use a person's e-mail address as her identifier (Google Apps being the most prominent of these) tend to provoke that person to utilize the same password for each app. One very dangerous discovery that Ping made, in conjunction with Google, is that when corporate e-mail addresses are used to identify apps users, the apps password ends up being the e-mail password.

"With the cloud, what you start to see is a lot more applications available for users. It's that much cheaper, it's that much quicker to deploy applications out in the cloud," stated Harding. "So there's just going to be more of them. Every one of those applications is going to end up being accessible from my laptop, from my mobile phone, from my iPad... it could be any point at any time. That whole anywhere, anytime access is just ending up forcing the exposure of login forms to the outside world."

Grafting Identity Back Onto APIs

One class of resource whose architects often eschew the need for identity and authentication, is the API. A growing number of Web apps are actually remote clients for open APIs, as the panel acknowledged. Many architects believe anonymous access is a necessary factor for open APIs, and that security is a matter best addressed by security architects - API architects need to focus on providing the answer, not questioning the questioner.

I asked the CSA panelists, if they were indeed the ones tasked with securing open APIs, how do they approach this task without introducing identity back into the picture, and wrecking the developers' vision of beauty through anonymity. Ping Identity's Patrick Harding commended me for asking a question that answered itself.

"API architects are in the wild, wild west," Harding responded. "They love it because it's simple and easy, and completely forget about securing them in any way at all. The only standards that exist in the REST world for security, up until the last two years, was HTTP basic, and SSL. The same stuff we've had for, I don't know, 20 years. It's crazy."

OAuth, which we've talked about here in RWW, does address one method of trusting someone else with the task of authenticating and authorizing the user, thus giving API developers one way to take the subject off their hands without ignoring security altogether. Harding suggests more API architects look into OAuth. "It doesn't speak to, 'Is my API secure, per se?'" he noted. "How do I know that SQL injections aren't being slapped through that API effectively, via JSON messages?"

WhiteHat's Matt Johansen acknowledges OAuth adds identity to the mix, but endorses it as what needs to be done. "Tokenization and checking the source and destination... is adding identity to the problem," he said, "but it is helping solve it."

The Cloud Security Alliance holds its annual Summit event as part of the RSA Conference, complete with its own panel session, keynote speaker, and innovator awards.


<Return to section navigation list>

Cloud Computing Events

Bruce Kyle posted Western US Dev Camps Showcase Cloud, HTML5, Phone, Students on 3/1/2012:

imageMostly hosted on University campuses across the West Region, these DevCamps provide both student and professional developers the opportunity to choose a technology and start learning and building applications for it. Camps covering the web, phone, and cloud platforms will be presented simultaneously.

clip_image002

Developer Camps (DevCamps for short) are free, fun, no-fluff events for developers, by developers. Attendees learn from experts in a low-key, interactive way and then get hands-on time to apply what they’ve learned.

About Cloud Camp

At the Azure DevCamps, you’ll learn what’s new in developing cloud solutions using Windows Azure. Windows Azure is an internet-scale cloud computing and services platform hosted in Microsoft data centers. Windows Azure provides an operating system and a set of developer services used to build cloud-based solutions. The Azure DevCamp is a great place to get started with Windows Azure development or to learn what’s new with the latest Windows Azure features.

About HTML5 Web Camp

As developers, you keep hearing a lot about HTML5, but many don’t know what it actually means or is truly capable of. If you want to learn about it, your choices are limited; you can either pick up a book or attend an expensive conference that often only scratches the surface. The HTML5 Web Camp is an opportunity to connect with designers and developers and show you what’s possible, and how you can start using it today. HTML5 WebCamp is a completely free event, and will start with demos and presentations, and will end with hands on lab sessions. The HTML5 Camp will allow you to walk through materials and get your questions answered! Space is limited.

About Phone Camp

Take your apps to the next level at the Windows Phone Dev Camp. We’ll be covering how to bring the cloud to your app by consuming external APIs, add UX polish with Expression Blend, reach new markets through globalization and localization, and keep your app running at peak performance through advanced debugging and testing. In the afternoon, we’ll break into labs to reinforce what we covered, and offer a chance to present your application for a chance to win in our application competition.

Registration Links
 

clip_image002

clip_image004

clip_image006

Los Angeles
3/30
Register Register Register
Irvine
4/20
Register Register Register
Redmond
4/27
Register

Register

Register

Los Altos Hills

Coming Soon

Coming Soon

Coming Soon

Denver
5/18
Blog Blog Blog
Phoenix
5/25
Register Register Register

If you’re a student, interested in attending DevCamp (Phone Camp, particularly), we’d love to have you join us for a night of preparation and pizza! We want to be sure you have all the tools you need to be successful with the professional developer community.

image

I’ll post links for student registration events as soon. In the meantime mark your calendar.


Bruno Terkaly (@brunoterkaly) described Azure/Cloud, HTML5, Windows Phone Free Training Events Coming To Your City in a 2/29/2012 post:

imageDeveloper Camps (DevCamps for short) are free, fun, no-fluff events for developers, by developers. You learn from experts in a low-key, interactive way and then get hands-on time to apply what you’ve learned. Where else can you get so much good stuff all in a day?

imageRegister today at the following links:

CityDate

Cloud RegLink

HTML5 Reglink

Windows Phone RegLink

Los Angeles - March 30

Click Here

Click Here

Click Here

Irvine - April 20

Click Here

Click Here

Click Here

Redmond - April 27

Click Here

Click Here

Click Here

Denver - May 18

Click Here

Click Here

Click Here

Phoenix - May 25

Click Here

Click Here

Click Here

 

 

 

 

Come prepared

We want you to hit the ground running.

image

Please come prepared.
You don’t want to waste your time at an Azure DevCamp downloading and installing files. Setting up can take quite some time and will interfere with productivity. I will also provide some posts for those that could not attend.

This post includes:

image

Azure DevCamp - Hardware: Minimum

The following hardware is needed to install the needed software:

image

** Important Note **

Running Macintosh computers is not recommended. Storage and Compute emulators do not always work properly.

Azure DevCamp - Software: Supported Operating Systems

Don’t expect Windows XP to work. Here are the supported OS’s.

image

Visual Studio and SQL Server Express

Various versions of Visual Studio will work. Please be sure you have one of the following:

image

Free Trial Account / Azure SDK

Free Trial
image
image


Labs and PowerPoints

You are about to install Windows Azure Camps Kit.

The Windows Azure Camps Training Kit includes the hands-on labs and presentations that are used for the Windows Azure Camp events.

Installer for Azure Web Camps: http://www.contentinstaller.net/Install/ContentGroup/WAPCamps

Video – to prepare you are ready

The main thing to remember is that most of the labs can be done with emulators. This means you don’t necessarily need to deploy your app to the cloud (Microsoft Data Center). You can run most of these labs all on your local computer. This is important as there is not often enough network bandwidth at large events.
Currently, there is no audio. But it should be easy to follow.

Video to verify setup: http://brunoblogfiles.com/videos/PrepareForAzureDevCamp.wmv

Conclusion

Please verify you can run the sample demonstrated in the video above. This will give you a huge head start once you attend the boot camp.


The Microsoft MVP Award Program posted Talking Cloud with Windows Azure MVP Magnus Martensson on 2/28/2012:

  • imageWindows Azure MVP Magnus Martensson
  • From: Sweden
  • Time as an MVP: 2 months
  • Blog: Magnus Martensson
Which technical communities are you most active in?

In forums I could certainly become more active. My BIG passion is to actually meet people and make them as excited and passionate about the greatest job in the world as I am! (Lowly Stackoverflow profile)

Then you need a great blog to share your thoughts and code samples. It does not hurt to make it very professional: http://magnusmartensson.com/

How did you first start in community?

imageSharing knowledge and a passion for technology was the obvious way into communities’ years ago. Today, I am one of the driving forces behind both Swedish .NET User Group (Swenug) and Swedish Windows Azure Group (SWAG). I started with the goal to share and my attitude has always been to give away my current knowledge as a guarantee that I will gain new knowledge in the process.

What's the best technical tip you have for implementing Cloud deployment?

Without doubt it is to design for Windows Azure in your applications but to add a layer of abstraction to shield your code from the Windows Azure dependency. The reason we do this is to make sure your code runs on a standard server environment and/or on a build server but more importantly from a unit test! The same SOLID principles of good design apply to Windows Azure as they do in all other code projects. We must never forget this!

What do you tell people who are considering using the Cloud, but aren't sure about making the move?

First of all we are all going to wind up there in the end. It will all be the Cloud. It may be that it is not a public cloud in all situations but all IT will be as a Service. IT will be a utility and the paradigm change how we think about our applications is an imminent reality. Let’s pretend without really knowing that this is the real reality for us down the line: Wouldn’t you rather get going now when there is potential to take advantage of the frontier spirit out there in the Clouds rather than being one of the stragglers that come in late to the party and miss out on all the opportunities!? There are stakes to be claimed in the Cloud now – get going!
Secondly, the Cloud is the most exciting and empowering change I have ever seen in our industry yet and I have never had so much fun as when I’ve worked with the Cloud! IT becomes so much easier on PaaS!

I’ve also written about how Find a Cloud you don’t have to pull the virtual cord form on my blog.

Do you have a blog/website link to Cloud related Tips or deployment stories you would like to share?

I share knowledge and code through my OSS project Azure Contrib. This code is available as Nugets. I find that the best way to relate to code today is to extract general (reusable) pieces of functionality and put them out there on the blog but also as OSS through CodePlex and as Nugets.

What words of advice do you have for new MVPs?

If you are already an MVP you’ve been through the same thing I have, but I would advise any new MVP to continuously build on the professionalism of a public profile. Today I have my ego site my Facebook Business Person page and my twitter account as my most used channels of interaction with communities, clients and fans.

Also, professionally approach the whole public presentation gambit. I have a professional network here in Sweden organized by Microsoft Sweden, called MEET – Microsoft Extended Expert Team, which is a network of “the usual suspects” whom always show up at all the Microsoft events and speak. A great way for Microsoft to be able to give back to us enthusiastic and frequent speakers and a much appreciated initiative from us in the network. The activities are professional speaker, presenter, theatrical and pedagogic coaches etc. Be professional, improve, evolve, hire professional coaches who train speakers for a living!


Ricardo Villalobos (@ricvilla) recommended Don’t miss it! Windows Azure Discovery Event in Boulder, CO March 20th, 2012 in a 2/27/2012 post:

imageJoin us for this free, invitation-only Windows Azure Discovery Event, brought to you by Metro – Microsoft’s Early Adopter Program and the Global Windows Azure incubation team. Our goal is to help ISVs and software startups understand latest updates on Microsoft’s Cloud Computing offerings with the Windows Azure Platform, discuss the opportunities for the cloud, and show resources available to ISVs and startups to get started using Windows Azure and SQL Azure today. I will be personally talking about different topics around the Windows Azure platform, including the Management portal, using BLOB storage, support for NodeJS, Service Bus messaging, among others.

imageRegister now!

I hope to see you there.


Abishek Lal reported he’ll be Presenting at TechDays Hong Kong in a 2/27/2012 post:

One week to go for TechDays in Hong Kong! I am looking forward to meeting developers at this premier technology training event and will be presenting several sessions on Windows Azure which is part of the following key focus areas:

  • VIEW Soon-to-be-released products such as Microsoft System Center 2012, SQL Server 2012 and Windows Server® 8
  • LEARN about Windows® Phone 7.5; Kinect™ for Windows, Azure™, Visual Studio®
  • EXPLORE the latest technologies in hands-on labs, with certification for early bird participants
  • SHARE field experiences and expert tips and tricks

Below is a list of the specific Azure related sessions:

5th May @ 10:00am PBC258 Abhishek Lal

Windows Azure: An Overview

6th May @ 9:00am PBC353 Scott Golightly

Controlling Application Access with Windows Azure

6th May @ 11:00am PBC305 Abhishek Lal

Using Microsoft Visual Studio® to Build Applications that Run on Windows Azure

6th May @ 1:30pm PBC216 Sam Leung

Understanding the Application Portfolio Assessment and Migration Strategy to Windows Azure

6th May @ 4:45pm PBC384 Abhishek Lal

Windows Azure Service Bus Introduction: Why, What, How

7th May @ 9:30am PBC276 Ben Ng

A Lap Around Microsoft Dynamics CRM and Microsoft Dynamics CRM Online

7th May @ 11:00am PBC389 Scott Golightly

Windows Azure and Windows Phone – Creating Great Apps

7th May @ 1:30pm PBC283 Matt Valentine

Coding 4Fun – Kinect, Microcontrollers and Windows Phone

7th May @ 3:15pm PBC379 Abhishek Lal

Windows Azure Service Bus: Advanced Messaging Features


<Return to section navigation list>

Other Cloud Computing Platforms and Services

James Downey (@james_downey) asked Eventual Consistency: How Eventual? How Consistent? in a 3/2/2012 post:

imageIn my last post on Riak, I discussed how an application developer could use the values of N (number of replication nodes), R (read quorum), and W (write quorum) to fine tune the availability, latency, and consistency trade-offs of a distributed database.

The idea of fine tuning consistency with the values of N, W, and R was first made popular in 2007 when Amazon engineers published an article explaining the design principles of Dynamo, a highly available and scalable distributed database used internally at Amazon. The principles popularized by the article were incorporated into three popular open-source NoSQL databases: Riak, Cassandra, and Project Voldemort.

Recall that W+R>N assures consistency. Such a system of overlapping nodes is referred to as a strict quorum, in that every read is guaranteed to return the latest write. Many developers, however, choose to configure W+R<N for the sake of greater availability or lower latency. Such a system, known as a weak or partial quorum, does not guarantee consistency. But these systems do use various mechanisms to guarantee eventual consistency, meaning that in the absence of additional writes to a key, the key’s value will eventually become consistent across all N nodes. (For a good summary of eventual consistency, see the post on the topic by Werner Vogel, Amazon’s CTO.)

But eventual consistency is an odd guarantee. How long might eventual take? What is the probability of an inconsistent read? Perhaps because the answers to these questions depend on many factors, NoSQL providers do not provide specifics.

Now a group of computer science graduate students at Berkeley are in pursuit of answers. Using data samples from Internet-scale companies and statistical methods (most notably Monte Carlo analysis), the team has put together mathematical models and a simulation tool to determine both average and upper bound answers to these key questions regarding eventual consistency. They refer to the model as Probabilistically Bounded Staleness (PBS). The model factors in the values of N, W, and R as well as sample latencies for read write operations. As of yet, the model does not account for nodes entering or leaving the cluster.

PBS is a brilliant application of statistics to a computer science problem. I enjoyed hearing Peter Bailis, a member of the research team, describe the research at a Basho-sponsored meet-up this Tuesday. To learn the details of PBS, visit the team’s web site. If your business depends on the tradeoffs of eventual consistency, this research is of tremendous importance.


Rod Mercado (@RodMatDell) offered Dell’s Virtual Network Architecture: Our Point of View in a 2/27/2012 post:

It’s time to re-think networking.

imageToday, companies are faced with an explosion of data growth. Applications have been migrated from laptops to “the cloud,” while an increasingly mobile workforce demands ubiquitous, secure access to all resources from any device. The proliferation of cloud, virtualization, mobility & traffic growth requires that businesses have to become more agile and flexible just to keep up.

imageThis requires a new approach. Dell’s Virtual Network Architecture (VNA) is built for today’s dynamic IT environments and positions the network as an enabler for business; one that intelligently connects you to the workloads, applications and data you need to effectively grow your business.

An open framework for efficient IT infrastructure and workload intelligence, VNA allows customers to achieve more, get real results faster and maximize efficiency through…

  • Dell Virtual Network ArchitectureFabrics that fit so customers can scale up and out performance for all types of data centers and campus environments. Customers have the ability to use low power, small form factor systems with distributed core (video) or scale to larger chassis systems as their performance and density needs require.
  • Virtualized services that are the foundation of Dell FTOS and the Open Automation framework which provides for server-like plug and play networking.
  • Simplifying the complex through tighter solution integration by using end to end 10Gb Ethernet solutions with Dell servers (@Dell_Servers), storage (@Dell_Storage), and networking that enables customers to realize the full power of 10Gb Ethernet in their IT environments.
  • Mobilizing users by enabling networking for the campus, remote facilities, corporate issued devices or a personal device, like smartphones and tablets. (Dell Wireless Networking Video)

For Dell Networking, we’re happy to announce that we have fully integrated Force10 Networks and are accelerating our product, solutions and sales efforts. In the six months since acquiring Force10 Networks, Dell has dramatically extended the availability of Force10 products like the Dell Force10 Z9000 (video) worldwide. The Dell Force10 products, technology and engineering have and will continue to play a critical role in Dell VNA.

Throughout 2012, Dell Networking will continue to exhibit technology leadership through a number of product launches from Force10, PowerConnect and M-Series Blade IO. We will feature these at upcoming events including Interop and Dell Storage Forum. We’d like to invite you to come visit us at these events.

For detailed information on Dell Networking and VNA, visit Dell TechCenter where you can find these and other resources...

Continue the conversation on twitter by following @DellNetworking and hashtags #DoMoreIT and #DellVNA.


<Return to section navigation list>

0 comments: