OakLeaf Systems: Windows Azure and Cloud Computing Posts for 1/10/2011+

A compendium of Windows Azure, Windows Azure Platform Appliance, SQL Azure Database, AppFabric and other cloud-computing articles.

AzureArchitecture2H640px3
Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

Azure Blob, Drive, Table and Queue Services
SQL Azure Database and Reporting
Marketplace DataMarket and OData
Windows Azure AppFabric: Access Control and Service Bus
Windows Azure Virtual Network, Connect, RDP and CDN
Live Windows Azure Apps, APIs, Tools and Test Harnesses
Visual Studio LightSwitch
Windows Azure Infrastructure
Windows Azure Platform Appliance (WAPA), Server App-V, Hyper-V and Private Clouds
Cloud Security and Governance
Cloud Computing Events
Other Cloud Computing Platforms and Services

To use the above links, first click the post’s title to display the single article you want to navigate.

Azure Blob, Drive, Table and Queue Services

No significant articles today.

SQL Azure Database and Reporting

Noel Yuhanna published an eight-page SQL Azure Raises The Bar On Cloud Databases analysis for Forrester Research on 11/2/2010 (missed when posted). From the Executive Summary and Table of Contents:

Over the past six months, Forrester interviewed 26 companies using Microsoft SQL Azure to find out about their implementations. Most customers stated that SQL Azure delivers a reliable cloud database platform to support various small to moderately sized applications as well as other data management requirements such as backup, disaster recovery, testing, and collaboration.

Unlike other DBMS vendors such as IBM, Oracle, and Sybase that offer public cloud database largely using the Amazon Elastic Compute Cloud (Amazon EC2) platform, Microsoft SQL Azure is unique because of its multitenant architecture, which allows it to offer greater economies of scale and increased ease of use. Although SQL Azure currently has a 50 GB database size limit (which might increase in the near future), a few companies are already using data sharding with SQL Azure to scale their databases into hundreds of gigabytes.

Application developers and database administrators seeking a cloud database will find that SQL Azure offers a reliable and cost-effective platform to build and deploy small to moderately sized applications.

Table of Contents

Microsoft SQL Azure Takes A Leading Position
Among Cloud Databases

Microsoft’s Enhancements To SQL Azure Give It
Key Advantages

SQL Azure Has Limitations That Microsoft Will
Probably Address

Other Database Vendors Currently Offer Only
Basic Public Cloud Database Implementations

Case Studies Show That SQL Azure Is Ready
For Enterprise Use

Case Study: TicketDirect Handles Peak Loads
With A Cost-Effective SQL Azure Solution

Case Study: Kelley Blue Book Leverages SQL
Azure In A Big Way

Case Study: Large Financial Services Company
Uses SQL Azure For Collaboration

Recommendations

SQL Azure Offers A Viable Cloud Database
Platform To Support Most Apps

The analysis concludes with these Recommendations:

SQL AZURE OFFERS A VIABLE CLOUD DATABASE PLATFORM TO SUPPORT MOST APPS

Application developers and information management professionals considering the cloud for data management should evaluate the Microsoft SQL Azure cloud database, as it offers a multitenant architecture and delivers a cost-effective, on-demand, zero-administration database platform.

When considering SQL Azure:

Start with smaller applications. Focus initially on deploying smaller applications that store a few gigabytes of data. SQL Azure can leverage TDS, REST, and SOAP protocols; therefore, any application that supports such protocols can take advantage of the Azure cloud.
Consider data sharding for larger databases. SQL Azure can also support a data sharding model, breaking a larger data set down into multiple logical databases. With sharded data, applications can manage larger databases of hundreds of gigabytes, but this model is only applicable to applications that do not require automated consistency and integrity across shards.
Give priority to security. Ideally, to ensure security and compliance, you’ll want to move only those applications that contain nonsensitive data. However, if you plan to move sensitive data such as credit card numbers, social security numbers, or other private information, take extra precaution, including data-in-motion and data-at-rest encryption, data masking, auditing, and monitoring.

MarketPlace DataMarket and OData

Srinivasan Sundara Rajan explained how to Get DisConnected on the Cloud with WCF Data Services: SQL Azure Data Access Patterns on 1/10/2011:

Connected and DisConnected Data Access Patterns
In multi-tiered applications, the EIS Tier (Enterprise Information Systems) is the last in the sequence, and how efficiently it is accessed is key to the performance and scalability of the system.

Connected Access Pattern: A user or an application is constantly connected to a data source until it is operating on the data, i.e., the client deals with the data present on the server without keeping any local copy of the data.

Some of the implementations of the Connected Access Pattern are:

CURSORS in Databases
JDBC Result Set and equivalent implementations in JPA (Java Persistent Architecture)

DisConnected Access Pattern: A user or an application is not constantly connected to a data source until it is operating on data, i.e., the client stores a copy of the data taken from the server in a local cache. All the operations are done on this local copy and finally changes are made on the server in one go with the final copy.

Some of the implementations of the DisConnected Access Pattern are:

ADO.net Implementation
Service Data Object (SDO) Implementation

Cloud and DisConnected Architecture
The following points emphasize how the DisConnected architecture is most suitable to cloud-based implementations:

Database instances reside on Virtual Servers and by theory Virtual Servers can be migrated to a cloud platform and thus there is no guarantee that a long-running transaction will find the same database connection to complete the transaction.
In a multi-tenant SaaS model, the potential of locking the tables and rows is high when we adopt a connected model, and a disconnected model ensures that the database is minimally locked on an optimistic approach.
PaaS platforms for a database such as Amazon RDS naturally implement Web service calls for DB access, making it automatically support the disconnected architecture.

SQL Azure
The SQL Azure Database provides a cloud-based database management system (DBMS). This technology lets on-premises and cloud applications store relational data on Microsoft servers in Microsoft data centers. As with other cloud technologies, an organization pays only for what it uses, increasing and decreasing usage (and cost) as the organization's needs change. Using a cloud database also allows converting what would be capital expenses, such as investments in disks and DBMS software, into operating expenses.

WCF Data Services
WCF Data Services enables the creation and consumption of data services for the Web or an intranet by using the Open Data Protocol (OData). OData enables you to expose your data as resources that are addressable by URIs. This enables you to access and change data by using the semantics of the representational state transfer (REST), specifically the standard HTTP verbs of GET, PUT, POST, and DELETE.

SQL Azure data can be exposed as WCF Data Services by using the Entity Framework provider. The WCF Data Service framework provides an API that allows data to be created and consumed over HTTP using a RESTful service. All the database operations can be exposed as a service in this manner.

The Visual Studio development tool provides an easy interface to create an entity data model, and the created data service can be deployed on a Azure Platform on Cloud.

As HTTP-based RESTful services are used as an access mechanism, WCF Data Services naturally provide a DisConnected data architecture for cloud-based databases such as SQL Azure.

The following figure, the courtesy of the vendor, shows the disconnected architecture of WCF Data Services.

Summary
Tight coupling of the data access with the rest of the layers makes it inefficient in multi-tiered application and this is even more evident with cloud-based data access. WCF Data Services provides a perfect solution for data access for a cloud database such as SQL Azure.

Srinivasan works at Hewlett Packard as a Solution Architect.

Windows Azure AppFabric: Access Control and Service Bus

Christian Weyer reported thinktecture StarterSTS now officially powered by Windows Azure in a 1/10/2011 post:

A few hours ago I got the final notice that StarterSTS is now officially allowed admittance to the Azure Cloud olymp:

OK, Dominick [Baier, pictured at right]: Up to releasing 1.5…

Translated, this means that StarterSTS passed the Microsoft Platform Ready (MPR) test.

For more details about MPR, see my Old OakLeaf Systems’ Azure Table Services Sample Project Passes New Microsoft Platform Ready Test post of 11/5/2010.

The Windows Azure App Fabric Team reported FabrikamShipping sample updated to include Access Control December CTP Release on 1/10/2011:

Check out the latest updates to the FabrikamShipping SaaS sample application. This sample provides a great example on how the Access Control service helps solve real-life identity federation requirements.
You can find more details in Vittorio Bertocci’s blog post.
If you want to learn more on the Access Control service please visit our website or the MSDN developers website.
If you haven’t started checking out Windows Azure AppFabric already make sure to take advantage of our free trial offer. Click the link below and get started!

As always, feel free to leave us feedback on the comments section of this blog.

Vittorio Bertocci (@vibronet) announced New FabrikamShipping SaaS Release for Windows Azure SDK 1.3 and December Labs Release of ACS

I hope you took some to time to rest last month, because now it’s time to dive right back in!
To get you back into your cloud habit, I would suggest a SaaS entrée. It’s already few weeks that the online instance of FabrikamShipping SaaS (*) has been updated: today we are releasing the corresponding source code package, which has been revved to take advantage of the Windows Azure SDK 1.3 and the December Labs release of ACS. As usual, you can hit the live instance here and you can download the source code package here (the enterprise companion stays the same, a testament to good service orientation if you will).
The new release also contains many other improvements; here there’s a list of the most visible ones.

Updated to the Windows Azure SDK 1.3 and December Labs Release of ACS. Already mentioned.
Custom Home Realm Discovery Pages. As you know ACS offers a minimal HDR page, automatically generated according to the list of IPs you configured, so that you can easily test your solutions. However in production you probably want to offer your user an experience which is consistent with your web application’s look and feel: to that purpose, ACS also offers a way of obtaining the list of the supported IPs programmatically so that you can create your own dynamic HDR page using whatever skin you deem appropriate. In the new FabrikamShipping SaaS release we eliminated all the test HDRs (for the account activation, sign in and more) in favor of custom-built ones, so that you can study how to implement the same mechanism in your own solutions
Added Facebook as Accepted Identity Provider. In the first release we supported Windows Live Id and Google as IPs for the small business subscriptions. In many asked us to add Facebook, so we did.
The Subscription Process Now Accepts All Supported IPs. When you start the provisioning process you need to secure the session. Whereas in the first release we admitted only Windows Live Id, in line with the Windows Azure subscriptions themselves, now we support the same list of IPs as the ones allowed in the small business editions: Facebook, Windows Live Id, Google
Self-Service Provisioning Retry. FabrikamShipping SaaS uses prerelease software, and as such it is occasionally subjected to downtimes and errors. Some times a subscription provisioning will fail for an occasional error, but thanks to the architecture of the provisioning execution engine a simple retry right after will succeed. Until now the retry functionality was exposed just to the administrative console, which means that if you would have stumbled in one such occasional error your provisioning would be stuck until myself or one of our valiant admins would have noticed the error in the console and retried. In this new release we surface the retry directly in the provisioning workflow screen, so that you can invoke it yourself without waiting for us. We needed that less and less, hence the hope is that you’ll never have to use it, but just in case…

Just saying that the sample has been updated to 1.3 does not really do justice to all the individual improvements. For example, our Southworks friends Sebastian and Matias came out with a great way of showing off startup tasks: in the first FabrikamShipping SaaS we were manually including the files that Cspack needs to run in our worker role provisioning engine, now the entire SDK gets installed as a startup task.
Grab the new source code package and have fun!
(*) Don’t know what I am talking about? For an introduction to the FabrikamShipping SaaS scenario check out this video. For a deeper dive in the solution, here there’s the recording of a session I gave at TechEd Europe last November.

Wade Wegner posted in 1/2010 a template for PowerPoint (*.pptx) presentations about the Windows Azure AppFabric Service Bus:

Windows Azure Virtual Network, VM Role, Connect, RDP and CDN

Steve Plank (@plankytronixx) posted Video Presentation - Windows Azure Connect: from scratch to his Plankytronixx blog on 1/10/2011:

In this video/demo we go from scratch for folks with no knowledge of Windows Azure Connect (previously known as “Project Sydney”) covering what it is, how to create a virtual network and how to domain-join an instance running in the Windows Azure cloud. The demo gives click-by-click instructions and leaves nothing out (aside from assuming obvious things like VS is installed and the Windows Azure SDK/Tools are installed). Even if you don’t yet have a Windows Azure account – the very last few minutes talks about how to get a free account so you can try this technology out. Also how to deal with those annoying little things like certificate management are covered in click-by-click detail and a couple of shortcuts, plus a troubleshooting section, with a demo.

Click the image to watch the video.

Avkash Chauhan posted Windows Azure VM Role - Handling Error : The expiry time is invalid. A parameter was incorrect. Details: The expiry time is invalid on 1/9/2011:

Once you have VM role enabled with your subscription, you can start working on VM Role using the step by step guide provided as below:

http://msdn.microsoft.com/en-us/WAZPlatformTrainingCourse_VMRoleLab

After finished creating your VHD for VMRole, and uploading using CSUPLOAD script, you might encounter following error:

C:\Program Files\Windows Azure SDK\v1.3>csupload add-vmimage -Connection

"SubscriptionID=*********************************************;CertificateThumbprint=********************************************" -LiteralPath C:\Applications\Azure\baseimage.vhd -Name baseimage.vhd

Windows(R) Azure(TM) Upload Tool 1.3.0.0

for Microsoft(R) .NET Framework 3.5

Copyright (c) Microsoft Corporation. All rights reserved.

Using temporary directory C:\Applications\Azure...

Found previously prepared data for VHD C:\Applications\Azure\baseimage.vhd.

Found previously created VM image.

Name                  : baseimage.vhd

Label                 : baseimage.vhd

Description           : baseimage.vhd

Status                : Pending

Uuid                  : 8fd4e45a-3245-1789-ae34-2gf634a235c2

Timestamp             : 2011-01-03T11:12:21Z

MountSizeInBytes      : 32212254720

CompressedSizeInBytes : 5517869568

Recovering from an unexpected error: A parameter was incorrect. Details: The expiry time is invalid.

Waiting before continuing with upload...

Waiting before continuing with upload...

Waiting before continuing with upload...

Waiting before continuing with upload...

Reason:

It is possible that your machine time clock is not synchronized correctly. A wrong time on your host machine where CSUPLOAD tool is executing, could cause this error.

Solution:

To solve this problem, please update your machine clock time correctly depend on your geographical location and then rerun the CSUPLOAD tool.

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Dom Green (@domgreen) described Scaling Instances on the Dev Fabric in an 1/10/2011 post:

Whilst looking at how to scale applications in Windows Azure I have had the need to see how the nodes react when the number of cloud instances of a web or worker role change.

One way to view how the application would react and see the number of instances scale up or down is to use the development fabric that ships with the Azure SDK.

If we have a simple “Hello World” Azure application and configure it to use a single web and worker role we will have the following configuration and nodes.

ServiceConfiguration for cloud application.

WebRole and Development Fabric.

Now that the application is running within the development fabric we want to increase the amount of instances that are running for one of the nodes. We can easily go back into Visual Studio and alter the ServiceConfiguration.csfg file to identify the number of nodes that we wish to have for our Web Role.

Updated ServiceConfiguration.

The problem with updating the Service Configuration in Visual Studio is that the changes do not get applied to the running service within the developer fabric, to get these changes applied we have to update the configuration of the running service with the newly saved config. This is done by executing a update on the service with the CSRun command line tool.

We can open the Windows Azure SDK Command Prompt and browse to the application directory containing the ServiceConfiguration.cscfg for the project we are running ( ..\Projects\WindowsAzureDevFabricProject\WindowsAzureDevFabricProject ). From here we make the following call into the development fabric with CSRun:

csrun /update <deployment-id>;<new-configuration-to-apply>

csrun /update 46;ServiceConfiguration.cscfg

The deployment ID, 46 in our example can be found by looking in the development fabric simulator environment. Using this and the new configuration (updated in Visual Stuido) we have now been able to scale up our web role and add an extra node to the application.

CSRun Command and Development Fabric with added node

You will now be able to scale by increasing and decreasing the number of nodes in the development fabric, giving you the ability to debug and view how your application manages with these configuration changes.

Greg Olliver described Accommodating Your Batch Window in Windows Azure on 1/10/2011:

Certain functions of a system need to run once per day in a typical “batch window” scenario.
You might already be aware that Azure compute is paid by the CPU-hour. If you’re occupying the CPU, whether or not actively using it, you’re still paying for it. You occupy the CPU ‘slot’ by having a role deployed, regardless of state – it doesn’t have to be in the ‘Ready’ state.
So how do you get your batch of work done without paying for the slot 24 hours per day?
The answer is found in the Windows Azure Service Management API. The CreateDeployment API can be used to deploy an app into an existing Hosted Service. You can do this in various ways including client code, a PowerShell script, or a web or worker role that’s running in Windows Azure.
There are a couple of versions of the Service Management API: managed code and REST. At the time of this writing, the REST version of the API is the most complete and feature rich. The managed version doesn’t currently support the ability to CreateDeployment.
For my purposes, I looked at how to use an Azure role to deploy an app. The PowerShell script already exists and is easy to use – as depicted in one of the Hands on Labs in the Windows Azure Platform Training Kit. By creating code that runs in a worker role you can then replicate (multi-instance) it in a management pack for your service, thus giving it high availability and fault tolerance.
The sample code that I provide is in a web role. I used the web page to display diagnostics as I went along. The project is just a new cloud solution with a web role as provided by Visual Studio 2010. I’m using the 1.3 version of the Windows Azure SDK, but everything I’ve done here can also be done with the 1.2 version. In my code, I:

Load the certificate
Initialize storage
(user clicks “test” button)
Get ServiceConfiguration.cscfg from blob storage
Build request and payload
Submit the request
Display the ID of the request as returned from the platform.

The source code is posted on CodePlex.

Following are signposts to help you avoid some of the bumps on the road:
Certificate Handling
When you are accessing the Service Management API from the client, you must use a certificate to authenticate the program to the portal. You must upload the management certificate to the Azure portal and have the original certificate on the client. The PowerShell script (or windows client) references the local version and matches it to the version that’s in the portal.
When you’re using a web or worker role, the cert has to be loaded into the role as well as the management certificate store so the role can match its copy to the one in the management certificate store. Hence, you actually have to export the certificate twice – once to the DER version (to a file with .CER extension) as well as export the private key version (.PFX extension). The source code that I’ve provided has additional details.
Request URI
One of the parameters in the request URI is <service-name>. The value that you need to use is the one in the ‘DNS Prefix’ field in the management portal. It’s case sensitive.
Creating the Payload
The web request for CreateDeployment requires a payload. I tried a couple of ways to create the payload and decided I like using XDocument and XElement (System.XML.Linq) to manage this. The other way I tried is plain old embedded text.
If you like the plain old text method, there’s a wrinkle to avoid. Don’t precede your XML with a newline. I tried this at first for readability, but the request fails with InvalidXMLRequest.

GOOD
payload = string.Format(@”<?xml version=”"1.0″” encoding=”"utf-8″” standalone=”"yes”"?>
<CreateDeployment xmlns=”"http://schemas.microsoft.com/windowsazure”"> <Name>{0}</Name>
…
BAD
payload = string.Format(@”
<?xml version=”"1.0″” encoding=”"utf-8″” standalone=”"yes”"?>
<CreateDeployment xmlns=”"http://schemas.microsoft.com/windowsazure”"> <Name>{0}</Name>
…

If you prefer the System.XML.Linq approach, there is another wrinkle to avoid.

XNamespace xn = “http://schemas.microsoft.com/windowsazure”;
XDocument doc = new XDocument(
new XElement(xn + “CreateDeployment”,
new XElement(xn + “Name”, “gotest-20101231″),
…

After building up the XDocument, you need to get it into a string before transmitting it. This string must be UTF-8 encoded. XDocument.Save wants a StringWriter, and StringWriter.ToString() outputs UTF-16. (If you forget and do this, you’ll get ProtocolError.) The solution is to subclass StringWriter, adding an encoder that outputs UTF-8. Again, details are in the provided source.
Referencing REST Resources
In general, REST resource designators are case sensitive in the Windows Azure REST API. The portal will enforce lower case on some things, but your deployment package might be upper and lower case, so be careful with that.
Multiple Versions
There have been several releases of the API, and older versions are still supported. Hence, when you build your request, you need to specify x-ms-version in a request header. Check carefully for the correct version; the value given for CreateDeployment is 2009-10-01 (or later) in the Request Headers section of the documentation. If you read further you’ll find that it must be 2010-04-01 if you include StartDeployment or TreatWarningsAsError in your payload.

PRNewswire reported “Lokad uses Windows Azure Platform from Microsoft to boost time-to-value and expand retail market opportunity for new highly scalable and cost-effective sales forecasting offering in the cloud” as a teaser for a Lokad Announces Innovative Cloud-based Sales Forecasting Service for the Retail Industry at NRF 2011 press release of 1/10/2011:

Lokad today announced Salescast, a web application for sales forecasting hosted on Windows Azure, for the retail industry. The announcement was made at the NRF's 100th Annual Exposition and Conference.

Sales forecasting has long been a key to success in the retail industry because good forecasts are critical to ensure optimal inventory levels. Too much inventory, and costs explode; too little inventory and there is nothing to sell.

While software solutions for sales forecasting have existed for decades, however, most forecasting toolkits prove to be very labor intensive as the forecasting models need to be manually refined and usually do not address the complex sales forecasting challenges posed by promotions or cannibalizations. These in-house forecasting solutions also often are very costly. They require hiring at least one qualified statistician, designing a clever model for the business, and purchasing expensive software licenses and servers to run the computations. These costs are so high that only very large companies can usually afford them. And the results often remain unsatisfactory.

To meet the sales forecasting challenge in the retail industry, Lokad delivers forecasting as a cloud-based, on-demand service that requires little or no manpower on the retailer side. Because it is pay-per-use and delivered by the cloud, the Salescast service can be implemented quickly and cost-effectively to speed time to value and it delivers the most accurate forecasts in the retail industry in minutes rather than months.

"Classically, forecasting a large number of product sales requires manually building and tweaking a model for every single reference – a laborious process that would require larger retailers to hire a dozen statisticians to complete the job," said Joannes Vermorel, CEO, Lokad. "At Lokad, we leverage the unprecedented processing capabilities offered by Windows Azure in order to automate the whole forecasting process, assessing the best forecasting model for every product and establishing correlations between products. The results are more accurate forecasts, delivered within 60 minutes instead of months, and at a cost that simply doesn't compare to classical solutions."

"At NRF 2011, Microsoft and our partners such as Lokadare showcasing how, together, we are delivering exceptional business value in enabling retail customers to deploy and manage mission-critical solutions such as sales forecasting on the Windows cloud computing platform," said Brendan O'Meara, managing director, Retail and Hospitality, Microsoft. "Delivering world-class cloud computing solutions is an important part of Microsoft's commitment to ensuring that our retail enterprise customers realize the maximum business value from their IT investments. No other company brings together the breadth of consumer and enterprise cloud capabilities in conjunction with a familiar technology and productivity platform, and a broad developer ecosystem, to provide retail enterprises with the IT flexibility and choice they need to run their businesses and compete in today's global economy."

Lokad has built its own proprietary forecasting technology on the Windows Azure platform. The Salescast solution is implemented in C# / .NET 4.0. Web front ends rely on ASP.NET. Data storage relies on Blob, Table and Queue Storage as provided by Windows Azure. Microsoft SQL Azure is also used to store transactional data.

Without the cloud computing capabilities of Windows Azure, Lokad was not able to scale its forecasting service and could only serve small midmarket retailers. Windows Azure enables Lokad to deliver much more accurate forecasts at a much larger scale. …

Read the rest here.

Mary Jo Foley (@maryjofoley) concluded her 2011 to be a big year for Microsoft in ERP post of 1/10/2011 with:

Last year, Microsoft execs told me to expect the next releases of the Dynamics ERP products to incorporate the same underlying payments and commerce infrastructure as BPOS/Office 365. These payment and commerce components will be built on top of Windows Azure and Azure’s service bus (the AppFabric). [Emphasis added.]

Read Mary Jo’s entire post here.

Visual Studio LightSwitch

Rowan Miller described EF Feature CTP5: Pluggable Conventions of forthcoming Entity Framework v5 in a 1/10/2011 post to the ADO.NET Team blog:

We have released Entity Framework Feature Community Technology Preview 5 (CTP5) . Feature CTP5 contains a preview of new features that we are planning to release as a stand-alone package in Q1 of 2011 and would like to get your feedback on. Feature CTP5 builds on top of the existing Entity Framework 4 (EF4) functionality that shipped with .NET Framework 4.0 and Visual Studio 2010 and is an evolution of our previous CTPs.
If you aren’t familiar with Code First then you should read the Code First Walkthrough before tackling this post.
Code First includes a set of simple, model-wide behaviors that provide sensible configuration defaults for the parts of your model that have not been explicitly configured using Data Annotations or the Fluent API. These default behaviors are referred to as Conventions. One of the most commonly requested features is the ability to add custom conventions or switch off some of the default conventions. Based on your feedback we’ve bumped up the priority of this feature and CTP5 includes an early preview of “Pluggable Conventions”.
Limitations (Please Read)

The functionality included in CTP5 is an early preview that was included to get your feedback. There are a number of rough edges and the API surface is likely to change for the RTM release.
The Pluggable Conventions feature we have implemented so far allows you to perform configuration based on information obtained by reflection from the types and members in your model. It does not allow you to read information about the shape of your model. This does impose some restrictions; for example you can-not read the model to find out if a given property is a foreign key. These restrictions are something we will remove in the future but not before the initial RTM in 2011.
The Model

Here is a simple console application that uses Code First to perform data access. If we run the application we’ll get an exception because Code First can’t find a property to use as the primary key for the two entities. This is because the primary key properties don’t conform to the default conventions.
using System;
using System.Collections.Generic;
using System.Data.Entity;
using System.Data.Entity.Database;
namespace ConventionSample
{
    class Program
    {
        static void Main(string[] args)
        {
            DbDatabase.SetInitializer(
                new DropCreateDatabaseIfModelChanges<BlogContext>());

            using (var ctx = new BlogContext())
            {
                ctx.Blogs.Add(new Blog {  Name = "Test Blog" });
                ctx.SaveChanges();

                foreach (var blog in ctx.Blogs)
                {
                    System.Console.WriteLine("{0}: {1}", blog.BlogKey, blog.Name);
                }
            }
        }
    }

    public class BlogContext : DbContext
    {
        public DbSet<Blog> Blogs { get; set; }
        public DbSet<Post> Posts { get; set; }
    }

    public class Blog
    {
        public int BlogKey { get; set; }
        public string Name { get; set; }

        public virtual ICollection<Post> Posts { get; set; }
    }

    public class Post
    {
        public int PostKey { get; set; }
        public string Title { get; set; }
        public string Abstract { get; set; }
        public string Content { get; set; }

        public int BlogKey { get; set; }
        public virtual Blog Blog { get; set; }
    }
}
Our First Custom Convention

We could just use DataAnnotations or the Fluent API to configure the primary key for each of these entities but if we have a larger model this is going to get very repetitive. It would be better if we could just tell Code First that any property named ‘<TypeName>Key’ is the primary key. We can now do this by creating a custom convention:
using System;
using System.Data.Entity.ModelConfiguration.Configuration.Types;
using System.Data.Entity.ModelConfiguration.Conventions.Configuration;
public class MyKeyConvention :
IConfigurationConvention<Type, EntityTypeConfiguration>
{
    public void Apply(Type typeInfo, Func<EntityTypeConfiguration> configuration)
    {
        var pk = typeInfo.GetProperty(typeInfo.Name + "Key");
        if (pk != null)
        {
            configuration().Key(pk);
        }
    }
} 
We then register our convention by overriding OnModelCreating in our derived context:
public class BlogContext : DbContext
{
    public DbSet<Blog> Blogs { get; set; }
    public DbSet<Post> Posts { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Conventions.Add<MyKeyConvention>();
    }
}
A Closer Look at IConfigurationConvention

IConfigurationConvention allows us to implement a convention that is passed some reflection information and a configuration object.
Why Not Just IConvention?

You may be wondering why it’s not just IConvention? As mentioned earlier this first part of pluggable conventions allows you to write simple conventions using reflection information but does not allow you to read any information about the model. Later on (after our first RTM) we intend to expose the other half of our pluggable convention story that allows you to read from the model and perform much more granular changes.
Why Func<EntityTypeConfiguration>?

We want to avoid needlessly creating configurations for every type and property in your model, the configuration is lazily created when you decide to use it. We know the use of Func isn’t ideal and we’ll have something cleaner ready for RTM.
Am I restricted to Type and EntityTypeConfiguration?

No, the generic parameters are pretty flexible and Code First will call your convention with any combination of MemberInfo/Configuration that it finds in the model. For example you could specify PropertyInfo and StringPropertyConfiguration and Code First will call your convention for all string properties in your model:
using System;
using System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive;
using System.Data.Entity.ModelConfiguration.Conventions.Configuration;
using System.Reflection;

public class MakeAllStringsShort :
    IConfigurationConvention<PropertyInfo, StringPropertyConfiguration>
{
    public void Apply(PropertyInfo propertyInfo, 
        Func<StringPropertyConfiguration> configuration)
    {
        configuration().MaxLength = 200;
    }
}
The first generic can be either of the following:

Type (System)
PropertyInfo (System.Reflection)

The second generic can be any of the following:

ModelConfiguration (System.Data.Entity.ModelConfiguration.Configuration)
EntityTypeConfiguration (System.Data.Entity.ModelConfiguration.Configuration.Types)
PropertyConfiguration (System.Data.Entity.ModelConfiguration.Configuration.Properties)

PrimitivePropertyConfiguration (System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive)

DateTimePropertyConfiguration (System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive)
DecimalPropertyConfiguration (System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive)
LengthPropertyConfiguration (System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive)

StringPropertyConfiguration (System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive)
BinaryPropertyConfiguration (System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive)

Important: Take note of the namespace included after each type. The Code First API currently includes types with the same names in other namespaces. Using these types will result in a convention that compiles but is never called. If the namespace ends in .Api then you have the wrong namespace. This is a rough edge that we will address before we RTM.

Most combinations are valid, however specifying Type and PropertyConfiguration (or any type derived from PropertyConfiguration) will result in the convention never being called since there is never a PropertyConfiguration for a Type. If you were to specify PropertyInfo and EntityTypeConfiguration then you will be passed the configuration for the type that the property is defined on. For example we could re-write our primary key convention as follows:
using System;
using System.Data.Entity.ModelConfiguration.Configuration.Types;
using System.Data.Entity.ModelConfiguration.Conventions.Configuration;
using System.Reflection;

public class MyKeyConvention :
    IConfigurationConvention<PropertyInfo, EntityTypeConfiguration>
{
    public void Apply(PropertyInfo propertyInfo, Func<EntityTypeConfiguration> configuration)
    {
        if (propertyInfo.Name == propertyInfo.DeclaringType.Name + "Key")
        {
            configuration().Key(propertyInfo);
        }
    }
}
More Examples

Make all string properties in the model non-unicode:
using System;
using System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive;
using System.Data.Entity.ModelConfiguration.Conventions.Configuration;
using System.Reflection;

public class MakeAllStringsNonUnicode :
    IConfigurationConvention<PropertyInfo, StringPropertyConfiguration>
{
    public void Apply(PropertyInfo propertyInfo, 
        Func<StringPropertyConfiguration> configuration)
    {
        configuration().IsUnicode = false;
    }
}
Ignore properties that conform to a common pattern: 
using System;
using System.Data.Entity.ModelConfiguration.Configuration.Types;
using System.Data.Entity.ModelConfiguration.Conventions.Configuration;
using System.Reflection;

public class IgnoreCommonTransientProperties :
    IConfigurationConvention<PropertyInfo, EntityTypeConfiguration>
{
    public void Apply(PropertyInfo propertyInfo, 
        Func<EntityTypeConfiguration> configuration)
    {
        if (propertyInfo.Name == "LastRefreshedFromDatabase"
            && propertyInfo.PropertyType == typeof(DateTime))
        {
            configuration().Ignore(propertyInfo);
        }
    }
}
Ignore all types in a given namespace. 
using System;
using System.Data.Entity.ModelConfiguration.Configuration;
using System.Data.Entity.ModelConfiguration.Conventions.Configuration;

public class IgnoreAllTypesInNamespace :
    IConfigurationConvention<Type, ModelConfiguration>
{
    private string _namespace;

    public IgnoreAllTypesInNamespace(string ns)
    {
        _namespace = ns;
    }

    public void Apply(Type typeInfo, 
        Func<ModelConfiguration> configuration)
    {
        if (typeInfo.Namespace == _namespace)
        {
            configuration().Ignore(typeInfo);
        }
    }
}
Because this convention does not have a default constructor we need to use a different overload of ModelBuilder.Conventions.Add.
public class BlogContext : DbContext
{
    public DbSet<Blog> Blogs { get; set; }
    public DbSet<Post> Posts { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Conventions.Add(
            new IgnoreAllTypesInNamespace("ConventionSample.UnMappedTypes"));
    }
} 
Use TPT mapping for inheritance hierarchies by default:
using System;
using System.Data.Entity.ModelConfiguration.Configuration.Types;
using System.Data.Entity.ModelConfiguration.Conventions.Configuration;

public class TptByDefault :
    IConfigurationConvention<Type, EntityTypeConfiguration>
{
    public void Apply(Type typeInfo, 
        Func<EntityTypeConfiguration> configuration)
    {
        configuration().ToTable(typeInfo.Name);
    }
}
Use a custom [NonUnicode] attribute to identify properties that should be non-unicode, note the use of the AttributeConfigurationConvention helper class that is included in CTP5.
using System;
using System.Data.Entity.ModelConfiguration.Configuration.Properties.Primitive;
using System.Data.Entity.ModelConfiguration.Conventions.Configuration;
using System.Reflection;

[AttributeUsage(AttributeTargets.Property, AllowMultiple = false)]
public class NonUnicodeAttribute : Attribute
{ }

public class ApplyNonUnicodeAttribute
    : AttributeConfigurationConvention<PropertyInfo, StringPropertyConfiguration, NonUnicodeAttribute>
{

    public override void Apply(PropertyInfo propertyInfo, StringPropertyConfiguration configuration, NonUnicodeAttribute attribute)
    {
        configuration.IsUnicode = false;
    }
}
Removing Existing Conventions

As well as adding in your own conventions you can also disable any of the default conventions. For example you may wish to disable the convention that configures integer primary keys to be identity by default:
public class BlogContext : DbContext
{
    public DbSet<Blog> Blogs { get; set; }
    public DbSet<Post> Posts { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Conventions.Remove<StoreGeneratedIdentityKeyConvention>();
    }
}
The conventions that can be removed are:
Namespace: System.Data.Entity.ModelConfiguration.Conventions.Edm

AssociationInverseDiscoveryConvention
Looks for navigation properties on classes that reference each other and configures them as inverse properties of the same relationship.
ComplexTypeDiscoveryConvention
Looks for types that have no primary key and configures them as complex types.
DeclaredPropertyOrderingConvention
Ensures the primary key properties of each entity precede other properties.
ForeignKeyAssociationMultiplicityConvention
Configures relationships to be required or optional based on the nullability of the foreign key property, if included in the class definition.
IdKeyDiscoveryConvention
Looks for properties named Id or <TypeName>Id and configures them as the primary key.
NavigationPropertyNameForeignKeyDiscoveryConvention
Looks for a property to use as the foreign key for a relationship, using the <NavigationProperty><PrimaryKeyProperty> pattern.
OneToManyCascadeDeleteConvention
Switches cascade delete on for required relationships.
OneToOneConstraintIntroductionConvention
Configures the primary key as the foreign key for one:one relationships.
PluralizingEntitySetNameConvention
Configures the entity set name in the Entity Data Model to be the pluralized type name.
PrimaryKeyNameForeignKeyDiscoveryConvention
Looks for a property to use as the foreign key for a relationship, using the <PrimaryKeyProperty> pattern.
PropertyMaxLengthConvention
Configures all String and byte[] properties to have max length by default.
StoreGeneratedIdentityKeyConvention
Configure all integer primary keys to be Identity by default.
TypeNameForeignKeyDiscoveryConvention
Looks for a property to use as the foreign key for a relationship, using the <PrincipalTypeName><PrimaryKeyProperty> pattern.

Namespace: System.Data.Entity.ModelConfiguration.Conventions.Edm.Db

ColumnOrderingConvention
Applies any ordering specified in [Column] annotations.
ColumnTypeCasingConvention
Converts any store types that were explicitly configured to be lowercase. Some providers, including MS SQL Server and SQL Compact, require store types to be specified in lower case.
PluralizingTableNameConvention
Configures table names to be the pluralized type name.

Namespace: System.Data.Entity.ModelConfiguration.Conventions.Configuration
These conventions process the data annotations that can be specified in classes. For example if you wanted to stop Code First from taking notice of the StringLength data annotation you could remove the StringLengthAttributeConvention.

ColumnAttributeConvention
ComplexTypeAttributeConvention
ConcurrencyCheckAttributeConvention
DatabaseGeneratedAttributeConvention
ForeignKeyAttributeConvention
InversePropertyAttributeConvention
KeyAttributeConvention
MaxLengthAttributeConvention
NotMappedPropertyAttributeConvention
NotMappedTypeAttributeConvention
RequiredNavigationPropertyAttributeConvention
StringLengthAttributeConvention
TableAttributeConvention
TimestampAttributeConvention

Namespace: System.Data.Entity.Database

IncludeMetadataInModel
Adds the EdmMetadata table, used by DbContext to check if the database schema matches the current model.

Summary

In this post we covered the Pluggable Conventions feature that is included in EF Feature CTP5. This is an early preview and still has a number of rough edges but we included it because we really want your feedback.
As always we would love to hear any feedback you have by commenting on this blog post.
For support please use the Entity Framework Pre-Release Forum.

Visual Studio LightSwitch is dependent on Entity Framework as its data source, so posts about EF v5 go in this section.

Return to section navigation list>

Windows Azure Infrastructure

Steve Ballmer announced the departure of Bob Muglia, president of Microsoft’s Server and Tools Division, in Steve Ballmer E-mail to Employees on Bob Muglia Transition posted by Microsoft PressPass on 1/20/2011:

Text of an internal e-mail from Microsoft chief executive officer Steve Ballmer to employees regarding changes in leadership within the Server & Tools Business.

From: Steve Ballmer
Sent: Monday, January 10, 2011
To: Microsoft – All Employees
Subject: STB - Building on Success, Moving Forward

There are very few $15B businesses in the software industry, and Microsoft is the only company that has built three of them. While Windows and Office are household words, our Server and Tools Business has quietly and steadily grown to be the unquestioned leader in server computing. We have driven the industry forward and established the foundation for an entire generation of business applications. We have overcome significant competitive challenges. Over the past twenty years, the outstanding leadership from everyone involved in STB has made it a $15B business today.

We are now ready to build on our success and move forward into the era of cloud computing. Once again, Microsoft and our STB team are defining the future of business computing. In October, we completed an incredibly successful PDC where we detailed the future of the cloud, outlining Platform as a Service and demonstrating the rapid advancement of Windows Azure.

The best time to think about change is when you are in a position of strength, and that’s where we are today with STB – leading the server business, successful with our developer tools, and poised to lead the rapidly emerging cloud future. Bob Muglia and I have been talking about the overall business and what is needed to accelerate our growth. In this context, I have decided that now is the time to put new leadership in place for STB. This is simply recognition that all businesses go through cycles and need new and different talent to manage through those cycles. Bob has been a phenomenal partner throughout this process, and he and his leadership team have the right strategy in place.

In conjunction with this leadership change, Bob has decided to leave Microsoft this summer. He will continue to actively run STB as I conduct an internal and external search for the new leader. Bob will onboard the new leader and will also complete additional projects for me.

Bob has been a founder and leader of our server business from its earliest inception. He has led our Developer, Office, and Mobile Devices Divisions, and key parts of Windows NT and our Online Services business. I’ve worked with him in many capacities over the years and I’ve always appreciated his customer focus, technical depth, people leadership skills, and his positive energy. I want to thank Bob for his hard work, many accomplishments, and his focus on putting Microsoft first for 23 years.

We enter this new decade with STB providing the platform for today’s business solutions, and uniquely well-positioned to drive the future of cloud computing. I believe STB will continue to lead the industry with outstanding products and services for our customers and exceptional results for our business.

Thanks,
Steve

Mary Jo Foley (@maryjofoley) published Muglia's e-mail to the Microsoft troops: 'I'm moving on to new opportunities outside of Microsoft' in a 1/10/2011 post to ZDNet’s All About Microsoft blog:

“Last week I celebrated 23 years working for Microsoft. During that time Microsoft has grown from a brilliant, yet awkward and aggressive adolescent riding a rocket ship into a mature industry leader. I’ve learned an amazing amount from the people with whom I’ve worked, from the customers I’ve served, and from the many partners who share this industry. I feel blessed to have had the privilege of working with so many great people. Later this year, I’m moving on to new opportunities outside of Microsoft, so I wanted to take a few minutes to share with you what’s important to me in life and leadership.

“The foundation of who I am is based on living with integrity. Integrity requires principles, and my primary principle is to focus on doing the right thing, as best I can. The best thing, to the best of my ability, for our customers, our products, our shareholders, and of course, our people.

“Other principles, or guideposts by which I live, are learning from and listening to others to make the best decision possible; not being afraid to admit a mistake and change a decision when it is wrong; being consistently honest, even when it hurts; treating our customers, partners, and people with the respect they deserve, with the expectation that each of my actions forms the basis of a lifetime relationship; and finally, being willing to admit and apologize when I have not lived up to these principles.

“Integrity is my cornerstone for leading people. Leading starts by setting a strategy - not one that I’ve dreamed up myself but something that my team has worked together to create. The strategy provides the North Star for each person.
The second part of leading people is creating a structure that enables collaboration and provides clarity of roles and responsibilities. This is more than an organizational structure; it is creating a system so people can work together effectively and productively, in an environment that makes it possible for each person to give their best.

“Leading is more than strategy and structure; it’s all about people. Choosing the right people for each role is critical, but insufficient. Even more important is empowering every person to be their best, to work with others, to be as creative as possible. It’s about providing each and every person with encouragement when they’ve done something amazing and constructive feedback when they are off track.

“While each individual is important, success requires a team. The team is more important than the needs or capabilities of any individual. This is what makes a team much more than the sum of the parts, and a joy to be part of.

“That brings me to delivering results. Results are built when people act with integrity and deliver their best. Results are all about the positive impact we have on the world – transforming personal lives and revolutionizing ways of doing business.

“As a leader at Microsoft, I have a responsibility for delivering results to our shareholders. STB has performed well – with revenue growing from $9.7B in 2006 before I took over, to $14.9B reported last July, and operating income climbing from $3B to $5.5B over the same period. That’s over a 50% increase in revenue with a near doubling in income. That growth continued during the first quarter of our FY11. There are few organizations in the industry who have demonstrated the same results.

“I am incredibly excited by the emergence of cloud computing, and the opportunity it represents to shape business and the way people live for years to come. I have deeply enjoyed my role in positioning Microsoft as a leader and innovator in cloud computing.

“The coming months are a time of transition. During this time, I will be fully engaged in leading STB until new leadership is in place. After that, I will continue to do everything I can to help Microsoft, STB, and all of you.

“I particularly want to recognize the outstanding work done by my current team in the Server and Tools Business over the past four years. We rapidly built a series of best-of-breed products that changed the way businesses run, while helping our customers and partners be successful. We’ve led the industry while facing tough competitors, most notably Linux, VMware, and Oracle. We succeeded by focusing on the simple idea that our customers make smart decisions, so we need to provide the best solution for everything our customers want to do with our products.

“Microsoft is a blessing in my life and a blessing for my family. I love working with our customers and partners. Most of all, I treasure the wonderful and bright people with whom I’m privileged to work each day. I hope that in some way, large or small, I have helped each of you to lead your life with your own deep sense of integrity, that you help to bring out the best in other people, and deliver the results that matter most to you.

“My best to you, with thanks,

“bob”

Mary Jo, who broke the Muglia story with her Server and Tools chief Muglia to leave Microsoft in summer 2011 post of the same date, is “betting [Amitabh] Srivastava[, who runs the Server & Cloud Division will] “become the new STB chief.”

I’m not sanguine about Srivastava’s chances for taking his former boss’s place.

Derrick Harris asked Does Bob Muglia’s Departure Spell Doom for Windows Azure? and answered “No” in a 1/10/2011 post to GigaOm’s Structure blog:

Microsoft Server and Tools Business President Bob Muglia is leaving Microsoft this summer, as a result of CEO Steve Ballmer’s decision to seek new leadership for that division. Ballmer announced the news via an email to employees, in which he states that an internal and external replacement search is underway. This news comes just months after Chief Software Architect, and fellow cloud visionary, Ray Ozzie announced his resignation in October. I’ve been as high as anybody on Microsoft over the past couple years, especially when it comes to cloud computing, but this news has [left me] wholly unsure of what to expect from Microsoft’s cloud efforts going forward.

Microsoft’s legacy as a PC-centric company with too great a reliance on that business model fairly attracted a good number of skeptics, but the reality — from my perspective — is that Microsoft is a formidable competitor in both the cloud and the data center, and that’s all because of the work coming out of Muglia’s division. Windows Azure is Grade-A cloud platform, and that Microsoft’s cloud services lineup (which includes office, collaboration and email applications) is probably the best in the business right now, at least when it comes to wooing large customers. In the data center, Hyper-V is among the fastest-growing hypervisors in terms of installments and support, and System Center Virtual Machine Manager is adding more cloud-like features on a regular basis. Most estimates still have Microsoft trailing VMware by a mile in terms of overall installations, but customers seem to like Hyper-V’s free price tag, at least, and are choosing it in greater numbers every year. What’s more, as Ballmer acknowledges in his email announcing Muglia’s departure, the Server and Tools Business — which also houses Windows Server and SQL Server — does $15 billion in annual business.

It might be cloud computing where Microsoft, under Muglia’s leadership, has been the most impressive. Looking even past the technical features of Windows Azure, the decision to sell Windows Azure Appliances to partners could turn the cloud business into the PC business, as might the tight alignment between PaaS and SaaS on the platform. Microsoft’s cloud efforts aren’t perfect, of course – its seeming unwillingness to develop true hybrid capabilities between Hyper-V/System Center and Windows Azure is questionable – but, all in all, Microsoft is among the best and most-complete cloud vendors around.

Muglia’s departure comes on the heels of Chief Software Architect Ray Ozzie’s resignation in October, and that combination doesn’t bode well for Microsoft’s cloud future. With two of its most-respected cloud leaders leaving in such short order, one has to ask what’s going on at Microsoft and what Steve Ballmer has in mind. Under the guidance of Ozzie and Muglia, Microsoft’s cloud computing efforts were defying expectations by creating a truly unique cloud platform and showing that Microsoft was willing to sacrifice high on-premise software margins in the name of innovation. If Ballmer’s decision to replace Muglia has anything to do with remedying that margin issue, the result might be that Microsoft’s cloud strategy will ultimately fulfill its critics’ prophesies.

Related content from GigaOM Pro (sub req’d):

Microsoft Azure: What It Is,What It Costs, and Who Should Care
What Microsoft Can Teach Us About Cloud Computing
Could SaaS + PaaS Equal Cloud Computing Gold?

The preceding was pasted from the HTML source for the post at 1:45 PM on 1/10/2011 because the text wasn’t visible on the blog’s page at that time.

Arik Hesseldahl compared margins of the three Microsoft groups in his Head Of Microsoft’s Servers And Business Unit Leaving This Summer post to All Things Digital’s New Enterprise blog:

…
Muglia only been elevated to the job two years ago, nearly to the day. The Server and Tools Business is at $14.9 billion in annual revenue (fiscal 2010) Microsoft’s third largest division behind the Windows/Windows Live Division and and the Microsoft Business Division, both of which reported revenues north of $18 billion in 2010. On Muglia’s watch sales at STB grew more than 12%, and its operating margins went from 31 percent in 2008 to 37 percent in 2010. However STB is nowhere near as profitable as the other two divisions: Business Division reported operating margins of 63 percent in 2010 while Windows saw 70 percent. Ballmer says in his memo that he’s eager to see stronger growth from STB. [Emphasis added.]
…

Adron Hall (@adronbh) described Windows Azure and the PaaS Context in a 1/10/2011 post:

PaaS stands for Platform as a Service. The new concept around Devops* (Developer + Operations) has allowed cloud computing to reach an apex of agility for business. For developers PaaS provides an ultimately clean and agile experience around staging and deployment. PaaS is also the highest level of cost savings for most prospective enterprise and mid-size business users of the cloud computing services. Windows Azure has positioned itself with the vast majority of its services as a platform.

Working with a platform, instead of an infrastructure based cloud computing service allows Devops to focus almost solely on the business problems. In addition this prevents an unnecessary staffing level for IT in most organizations. With staff re-focused on business problems and eliminating the majority of hardware issues in an organization costs go down while return on investment dramatically increases.

The Ideal PaaS Scenario, Athenaeum Corporation

Imagine a company, I’ll call it Athenaeum Corporation that has around 250 people and provides a web based on demand service. Right now they have 4 geographically dispersed data centers that incur real estate, staffing, energy, and other costs. In each of those geographically dispersed data centers there are network switches and dedicated web servers connected to clustered with failover databases. Each set of clustered databases is setup to replicate among all the geographically disperse locations everyday on a near real-time basis. The website that these locations host is then balanced by load balancers, which also require maintenance and administration.

The headquarters of this company is located away from the data centers, but has a smaller duplicate data center of its own that also receives replicated data and hosts the website. This is for internal and development purposes. The development team consists of approximately 45 people out of the 250 staff. The network operations staff is about 25 people, with internal IT making up another 15 people. Altogether the direct support of development and operations is 85 people out of a 250 person staff.

At the headquarters are approximately 280 machines ranging from desktop PCs to Laptops. These machines are used to support operations, sales, accounting, support, and every other part of the company. These 280 machines are connected to approximately 60 internal servers that provide things like Exchange Services, file sharing directories, communications on instant messengers, Sharepoint services, and other IT related tools. In addition there are other switches, cabling, and other items related to the routing, load balancing, and usage of these internal services.

The Athenaeum Corporation that I’ve described is a perfect scenario that could literally save hundreds of thousands of dollars with cloud computing services. While saving that money they could possibly increase their physical service, better their uptime & system processing performance, and more just by migrating to the Windows Azure Platform.

Before jumping into how a company like the Athenaeum Corporation might jump into PaaS with the Windows Azure Platform, let’s take a quick review of the services that the Windows Azure Platform provides.

The Platform of Windows Azure

The core Windows Azure Platform is made up of compute and storage. The compute is broken up into Web, Worker, and CGI Roles. The storage is broken up into Table, Blob, and Queue services. All of these features have a platform SDK that can be used (as has been used throughout this book) or RESTful Web Service APIs. From the basis of an operating system, it is abstracted away and only the platform is of concern to development.

Beyond the core compute and storage elements the Windows Azure Platform cloud has the Windows Azure AppFabric and the SQL Azure Relational Database for service bus, security access control, and storage of highly structured data. The AppFabric is made up of two core features; the access control and the service bus. The SQL Azure is really just a clustered, high end instance of SQL Server running with a hot swappable backup that is managed by Microsoft in their data centers.

The Windows Azure AppFabric is one of the features of Windows Azure Platform that makes working with on-premises, internal, disparate, and Windows Azure Platform or other cloud services easy. With the AppFabric access control security, claims based identification, and other authentication mechanisms may be used for seemless single sign-on experiences. With the systems secured with the access control, the AppFabric service bus can then be used as a way to manage and keep communication between those disparate systems flowing and active. The AppFabric Access Control & Services Bus provides a way to incorporate any request to incorporate systems that a business enterprise, government, or other entity may have.

With SQL Azure, a hosted, high end solution to relational data storage needs is provided. One big concern is that the data sizes are to 50GB in storage. Although the there is this 50GB limit, once this size has been attained the data most likely should not be contained solely in a relational data store. This is when the other Windows Azure Storage mediums come into play. But for data under 50 GB, a relational data store setup to work seamlessly in Windows Azure like this provides additional platform capabilities for developers to port traditionally hosted applications into the cloud with minimal changes.

Now that the platform is covered, how would the Athenaeum Corporation move their system & website operations into the Windows Azure Platform for increased capabilities and decreased costs? The first thing needed is a breakdown of the individual systems and interoperations.

Relational databases in each of the geographically dispersed data centers with failover databases.
Headquarters has 280 PCs and Laptops.
Headquarters has 60 internal IT maintained servers with custom applications, file-sharing, and other tools running on Windows Server.
Load balancing is done for the web based on demand services in house.
4 Data Centers geographically dispersed with respective real estate, staffing, energy, and other costs.
Network operations requires approximately 25 staff for 24-hour a day operational uptime.
Web Based On Demand Services.

I’ll start breaking down these 7 key functionalities and state how the move to Windows Azure would change costing by using the platform. Relational databases in each of these data centers can be moved in a couple different ways.

One is to move the databases into one single primary SQL Azure instance. Since the databases are most likely located at each of the datacenters for location CDN reasons, it made sense before, but with the move to the cloud the Windows Azure CDN could be used and the database would likely have better access to the geographically dispersed web presence points.
The second is to move the databases to affinity points within the cloud that already match the current locations, porting the replication functionality for the specific data that each site needs.
The 280 PCs and Laptops would still need connectivity and access to all of the existing applications they have now. The cloud changes little in regard to this situation. However redundant machines could be removed and with the implementation of SaaS based solutions, which I’ll discuss further in the next section, would dramatically decrease the cost of machines that each employee would need along with a decrease in support, administration, and maintenance of the software they currently use.
The 60 internal servers at headquarters that IT maintains could be migrated completely, especially if they’re all running a Windows Operating System. For anything that isn’t, one may want to look to AWS, Rackspace, or other virtualization solution at these cloud providers. In Windows Azure internal servers hosting IIS applications could likely have them moved to Web or Worker Roles. Anything such as Ruby on Rails, PHP, or Java that is hosted via IIS can be moved to a CGI Role in Windows Azure. For anything that has other complexities and such can be installed on a Windows Azure VM Role.
Current in house load balancing can be eliminated entirely. There is no need for in house management of this with a PaaS like the Windows Azure Platform. So mark this off the cost list, it is included in the cost of the service and requires no configuration, management, or other interaction.
Each data center that previously provided geographic locations for the web presence can be brought into the Windows Azure Cloud. There are two primary locations in North America at this time, and several more in other countries throughout the world. With this ability the need to have 4 different data centers is removed. In most cases, the centers that Windows Azure is located in also have significant security and penetration tests done at a physical level. This effectively increases the security of each of the geographic access points. Removing one more cost, while providing more for the money.
Network operations, effectively simplified by the removal of routing, load balancing, and other concerns that needed to be done in house. The cloud offers 24x7x365 operational uptime. This eliminates the need for the in house staffing, with only a 4-6 staff needed for this particular scenario. The roles and requirements for the 4-6 staffing positions would primarily be there to maintain data, assure that systems that are custom are maintained and operational within the Windows Azure Cloud.
The last item is easily moved into the Windows Azure Platform using a Windows Azure Web Role. This provides everything needed to operate a SaaS Web Application with the Windows Azure Portal PaaS.

On that last point of moving the Athenaeum Software into the Windows Azure Cloud, is SaaS on the Windows Azure Platform.

Jonathan Feldman explained Fundamentals: Calculating TCO in 10 Steps in a 1/10/2011 Cloud Computing Brief for InformationWeek::Analytics (premium subscription required):

How much will that project cost? IT needs to have a reliable answer, both for its own budgeting purposes and for the CFO. Our report walks you through 10 steps to create a credible TCO for any IT project. PLUS: A customizable worksheet. (S2351210)

Table of Contents

3 Author's Bio
4 Executive Summary
5 The TCO Rationale
5 Figure 1: IT Budget Timeframe
6 Step 1. Identify the Lifecycle
6 Step 2. Rethink Traditional Assumptions
7 Step 3. Use Columns
7 Figure 2: Percent of Total Operating Budget Allocated to IT Expenditures
8 Step 4. State All Assumptions
9 Step 5. List All Known Cash Flows
9 Figure 3: IT Operating Expense vs. Capital Expense
10 Step 6. Ask Finance for Cost of Capital
11 Step 7. Calculate Net Present Value
11 Step 8. Compare With Another Project
11 Step 9. Consider Graphics
12 Step 10. Congratulate Yourself
12 Figure 4: Cost of WAN Links
13 Related Reports

Windows Azure Platform Appliance (WAPA), Hyper-V, Server App-V and Private Clouds

Brian Lauge Pedersen reminded developers about Microsoft Server Application Virtualization in a 1/10/2011 post:

Microsoft Server Application Virtualization builds on the technology used in client Application Virtualization, allowing for the separation of application configuration and state from the underlying operating system.

Microsoft Server Application Virtualization converts traditional server applications into state separated "XCopyable" images without requiring code changes to the applications themselves, allowing you to host a variety of Windows 2008 applications on the Windows Azure worker role.

If you haven’t read about this technology that we are bring to the marked later this year, you have to read about this on this blog :

http://blogs.technet.com/b/systemcenter/archive/2010/12/22/microsoft-server-application-virtualization-ctp-released-run-more-of-your-applications-on-windows-azure.aspx

Cloud Security and Governance

Mike Vizard posted The Pressing Need for IT Security Automation on 12/21/2010 (missed when posted):

The challenge with IT security today isn’t just that the bad guys get more sophisticated with each passing year, it’s that the number of people dedicated to maintaining security within the enterprise is either staying constant or shrinking at a time when the number of assets that needs to be defended is increasing.

This situation is creating increased problems for IT organizations of all sizes. A recent survey of 1,963 IT professionals conducted by eEye Digital Security, a provider of vulnerability assessment and management software, finds that 60 percent of professionals acknowledge having unpatched vulnerabilities in at least 25 percent of their applications.

The primary reason why this situation exists, according to the survey, is that IT organizations don’t have enough people on hand to manage the task.

But eEye Digital CTO Marc Maiffert says that IT organizations need to come to grips with a new security reality. Rather than relying on unstructured manual processes for dealing with security issues, IT organizations need to start embracing frameworks that automate as much of the remediation process as possible.

Right now, the application environment that IT organizations are trying to secure is simply too fractured. Without some automated approach, IT organizations are counting on having the security expertise needed to manage each of those application environments on hand. Given the current state of the economy, the likelihood of that happening is minimal, says Maiffert.

And in a new era of zero-day attacks, the probability that security professionals can identify and remediate vulnerabilities before something bad happens to the organization is even more remote.

Unfortunately, resistance to automation in the ranks of security professionals has been building up for several years now, largely because there is a lack of trust caused by not being able to see into the security processes that are being automated. But as the scope of the problem continues to grow and more transparency comes to the automation process, Maiffert says that it’s only a matter of time before automating the vulnerability identification and remediation process becomes the standard operating procedure.

After all, what limited security resources that IT organizations have in terms of people should be focused on a lot more challenging security issues than vulnerability management.

Cloud Computing Events

Jim O’Neil announced Firestarter: Transforming IT with Cloud Computing to be held in multiple east-coast cities between 2/1 and 3/24/2011:

As a fitting complement to last Fall’s Windows Azure Firestarter series put on by my colleagues Brian Hitney, Peter Laudati, and myself, East Region IT Professional Evangelists, Bob Hunt and Yung Chou, will be kicking off the new year with their own set of all-day Firestarters, taking a deep, IT professional-focused look at cloud computing.

Date Click City to Register

February 1 Chevy Chase MD

February 3 Malvern PA

February 8 Iselin NJ

February 10 New York City

March 8 Coral Gables FL

March 10 Tampa FL

March 22 Waltham MA

March 24 Alpharetta GA

Session 1 - Cloud Computing Essentials for IT Pros

While cloud computing is emerging as a promising IT service delivery vehicle, to cloud or not to cloud? That is not the question. For IT professionals, it is crucial to recognize the opportunities and play a key role in the transformation from existing infrastructure-focused IT into a service-oriented, user-centric, and IT as a service environment. Are you ready for the challenge to lead and transform your IT organization?

This session is to clarify myths and provides pertinent information of cloud computing from an IT professional’s viewpoint. We will review the architecture and examine the service delivery models of cloud computing while stepping through business scenarios and operations of employing public cloud and private cloud.

Session 2 - Physical, Virtualized, and Cloud

With virtualization and cloud computing, IT now has many options to deliver services on premises and beyond. How does virtualization work? What to virtualize? Is virtualization relevant to cloud computing? And what is cloud? Do we need cloud? Is cloud the right solution? Which resource to put in the cloud, or not at all? How do we get from point a to point b? This session is to connect some of these dots for and help IT professionals gain insights of transitioning existing IT into a cloud-friendly and cloud-ready environment.

Session 3 - Public Cloud: What, Why, and How

Instead of worrying about plumbing, public cloud gives IT the power to focus on business logic with ability to scale up or down resource capacities depending on business demands. Further with a pay-for-use business model, IT won’t be wasting money on providing services you thought you might need but never got around to using. This session demonstrates employing public cloud for information workers to carry out work routines while IT is realizing the benefits on a daily basis.

Session 4 - Private Cloud: What, Why, and How

Cloud computing with attributes like pooled computing resources, automated management, scalability, on-demand provisioning, etc. is a very opportunity to transform enterprise IT from a device-dependent and infrastructure-focused deployment structure into a customer-centric service-oriented delivery vehicle. For enterprise, the roadmap of IT as a Service is via a common management framework and private cloud.

This session presents the criteria, examines strategies, and walks through the processes of assessing and constructing private cloud with demonstrations on delivering and managing Infrastructure as a Service (IaaS) in an enterprise setting.

Jim is a Developer Evangelist for Microsoft who covers the Northeast District.

Other Cloud Computing Platforms and Services

Todd Hoff explained Riak's Bitcask - A Log-Structured Hash Table for Fast Key/Value Data in an 1/10/2011 post to the High Scalability blog:

How would you implement a key-value storage system if you were starting from scratch? The approach Basho settled on with Bitcask, their new backend for Riak, is an interesting combination of using RAM to store a hash map of file pointers to values and a log-structured file system for efficient writes. In this excellent Changelog interview, some folks from Basho describe Bitcask in more detail.

The essential Bitcask:

Keys are stored in memory for fast lookups. All keys must fit in RAM.
Writes are append-only, which means writes are strictly sequential and do not require seeking. Writes are write-through. Every time a value is updated the data file on disk is appended and the in-memory key index is updated with the file pointer.
Read queries are satisfied with O(1) random disk seeks. Latency is very predictable if all keys fit in memory because there's no random seeking around through a file.
For reads, the file system cache in the kernel is used instead of writing a complicated caching scheme in Riak.
Old values are compacted or "merged" to free up space.
Get and set concurrency are implemented using vector clocks by the software layer above Bitcask.
The key to value index exists in memory and in the filesystem in hint files. The hint file is generated when data files are merged. On restart the index only needs to be rebuilt for non-merged files which should be a small percentage of the data.

Eric Brewer (CAP theorem) came up with idea with Bitcask by considering if you have the capacity to keep all keys in memory, which is quite likely on modern systems, you can have a relatively easy to design and implement storage system. The commit log can be used as the database itself, providing atomicity and durability. Only one write is required to persist the data. Separate writes to a data file and a commit log is not necessary.

When a value is updated it is first appended to the on-disk commit log. Then the in-memory hash table that maps keys to disk pointers is updated to point to the file and the offset of the record in the file. So a read takes just one file I/O. The hash key locates the file pointer and you just seek to the offset and read the value. For writes it's just an append to the file. Pretty slick. It's good compromise between a pure in-memory database and a disk based data store backed by a virtual memory layer.

Some potential issues:

If you suspect you will have more keys than RAM then an architecture that keeps a working set in memory would be a better choice.
It will be slower than a pure in-memory database.
Syncing on every write could be a little painful. Write throughput could be increased if writes were buffered and the data was replicated synchronously to a backup node for high availability.
I trust operating system caches not. An OS cache can't know your access patterns. A custom cache might introduce complexity, but it's hard to believe it wouldn't perform better or be more tunable when things go wrong. Basho seems happy with this approach, but it still makes me queasy. What happens when the traffic has a uniform distribution, or a pareto-like distribution? Benchmark your app!

Related Articles

Changelog interview where Bitcask is described.
Bitcask - A Log-Structured Hash Table for Fast Key/Value Data by Justin Sheehy and David Smith with inspiration from Eric Brewer.
Redis Diskstore. Here's how Redis will do it.
RethinkDB internals: the caching architecture. Here's how RethingDB does it.
Bitcask Rocks by Jeff Darcy. Bitcask performs as advertised, at least for these simple tests. Without tuning, both reads and writes seem to be amortizing the cost of seeks over a very large number of operations.
Beating the I/O Bottleneck: A Case for Log-Structured File Systems by J Ousterhout.
Riak SmartMachine Benchmark: The Technical Details
The Log-Structured Merge-Tree (LSM-Tree) by P O'Neil
Damn Cool Algorithms: Log structured storage by Nick Johnson
HBase Architecture 101 - Write-ahead-Log by Lars George.
Cassandra Wiki Durability Entry
Alfred - is a fast in-process key-value store for node.js. Alfred stores data on append-only files.

Lydia Leong (@cloudpundit) responded to readers’ issues with her recent posts by explaining The process of a [Gartner] Magic Quadrant in a 1/10/2010 post to her Cloud Pundit blog:

There’ve been a number of ongoing dialogues on Twitter, on Quora, and various people’s blogs about the Magic Quadrant. I thought, however, that it would be helpful to talk about the process of doing a Magic Quadrant, before I write another blog post that explains how we came to our current cloud-and-hosting MQ. We lay this process out formally in the initiation letters sent to vendors when we start MQ research, so I’m not giving away any secrets here, just exposing a process that’s probably not well known to people outside of analyst relations roles.

A Magic Quadrant starts its life with a market definition and inclusion criteria. It’s proposed to our group of chief analysts (each sector we cover, like Software, has a chief), who are in charge of determining what markets are MQ-worthy, whether or not the market is defined in a reasonable way, and so forth. In other words, analysts can’t decide to arbitrarily write an MQ, and there’s oversight and a planning process, and an editorial calendar that lays out MQ publication schedules for the entire year.

The next thing that you do is to decide your evaluation criteria, and the weights for these criteria — in other words, how you are going to quantitatively score the MQ. These go out to the vendors near the beginning of the MQ process (usually about 3 months before the target publication date), and are also usually published well in advance in a research note describing the criteria in detail. (We didn’t do a separate criteria note for this past MQ for the simple reason that we were much too busy to do the writing necessary.) Gartner’s policy is to make analysts decide these things in advance for fairness — deciding your criteria and their weighting in advance makes it clear to vendors (hopefully) that you didn’t jigger things around to favor anyone.

In general, when you’re doing an MQ in a market, you are expected to already know the vendors well. The research process is useful for gathering metrics, letting the vendors tell you about small things that they might not have thought to brief you on previously, and getting the summary briefing of what the vendor thought were important business changes in the last year. Vendors get an hour to tell you what they think you need to know. We contact three to five reference customers provided by the vendor, but we also rely heavily upon what we’ve heard from our own clients. There should generally not be any surprises involved for either the analysts or the vendors, assuming that the vendors have done a decent job of analyst relations.

Client status and whatnot doesn’t make any difference whatsoever on the MQ. (Gartner gets 80% of its revenue from IT buyers who rely on us to be neutral evaluators. Nothing a vendor could pay us would ever be worth risking that revenue stream.) However, it generally helps vendors if they’ve been more transparent with us, over the previous year. That doesn’t require a client relationship, although I suspect most vendors are more comfortable being transparent if they have an NDA in place with us and can discuss these things in inquiry, rather than in the non-NDA context of a briefing (though we always keep things confidential if asked to). Ongoing contact tends to mean that we’re more likely to understand not just what a vendor has done, but why they’ve done it. Transparency also helps us to understand the vendor’s apparent problems and bad decisions, and the ways they’re working to overcome them. It leads to an evaluation that takes into account not just what the vendor is visibly doing, but also the thought process behind it.

Once the vendors have gone through their song and dance, we enter our numeric scores for the defined criteria into a tool that then produces the Magic Quadrant graph. We cannot arbitrarily move vendors around; you can’t say, well, gosh, that vendor seems like they ought to be a Leader / Challenger / Visionary / Niche Player, let’s put them in that box, or X vendor is superior to Y vendor and they should come out higher. The only way to change where a vendor is placed is to change their score on the criterion. We do decide the boundaries of the MQ (the scale of the overall graph compared to the whitespace in the graph) and thus where the axes fall, but since a good MQ is basically a scatterplot, any movement of axis placement alters the quadrant placement of not just one vendor but a bunch.

Once the authoring analysts get done with that, and have done all the write-up text, it goes into a peer review process. It’s formally presented in a research meeting, any analyst can comment, and we get challenged to defend the results. Content gets clarified, and in some cases, text as well as ratings get altered as people point out things that we might not have considered.

Every vendor then gets a fact-check review; they get a copy of the MQ graphic, plus the text we’ve written about them. They’re entitled to a phone call. They beg and plead, the ones who are clients call their account executives and make promises or threats. Vendors are also entitled to escalate into our management chain, and to the Ombudsman. We never change anything unless the vendor can demonstrate something is erroneous or unclear.

MQs also get management and methodologies review — ensuring that the process has been followed, basically, and that we haven’t done anything that we could get sued for. Then, and only then, does it go to editing and publication. Theoretically the process takes four months. It consumes an incredible amount of time and effort.

Lydia’s articles that prompted this post are at Windows Azure and Cloud Computing Posts for 1/7/2011+ (scroll down).

James Hamilton described Google Megastore: The Data Engine Behind GAE in this 1/9/2011 update to a 2008 post:

Megastore is the data engine supporting the Google Application Engine. It’s a scalable structured data store providing full ACID semantics within partitions but lower consistency guarantees across partitions.

I wrote up some notes on it back in 2008 Under the Covers of the App Engine Datastore and posted Phil Bernstein’s excellent notes from a 2008 SIGMOD talk: Google Megastore. But there has been remarkably little written about this datastore over the intervening couple of years until this year’s CIDR conference papers were posted. CIDR 2011 includes Megastore: Providing Scalable, Highly Available Storage for Interactive Services.

My rough notes from the paper:

Megastore is built upon BigTable
Bigtable supports fault-tolerant storage within a single datacenter
Synchronous replication based upon Paxos and optimized for long distance inter-datacenter links
Partitioned into a vast space of small databases each with its own replicated log

Each log stored across a Paxos cluster
Because they are so aggressively partitioned, each Paxos group only has to accept logs for operations on a small partition. However, the design does serialize updates on each partition

3 billion writes and 20 billion read transactions per day
Support for consistency unusual for a NoSQL database but driven by (what I believe to be) the correct belief that inconsistent updates make many applications difficult to write (see I Love Eventual Consistency but …)
Data Model:

The data model is declared in a strongly typed schema
There are potentially many tables per schema
There are potentially many entities per table
There are potentially many strongly typed properties per entity
Repeating properties are allowed

Tables can be arranged hierarchically where child tables point to root tables

Megastore tables are either entity group root tables or child tables
The root table and all child tables are stored in the same entity group

Secondary indexes are supported

Local secondary indexes index a specific entity group and are maintained consistently
Global secondary indexes index across entity groups are asynchronously updates and eventually consistent
Repeated indexes: supports indexing repeated values (e.g. photo tags)
Inline indexes provide a way to denormalize data from source entities into a related target entity as a virtual repeated column.

There are physical storage hints:

“IN TABLE” directs Megastore to store two tables in the same underlying BigTable
“SCATTER” attribute prepends a 2 byte hash to each key to cool hot spots on tables with monotonically increasing values like dates (e.g. a history table).
“STORING” clause on an index supports index-only-access by redundantly storing additional data in an index. This avoids the double access often required of doing a secondary index lookup to find the correct entity and then selecting the correct properties from that entity through a second table access. By pulling values up into the secondary index, the base table doesn’t need to be accessed to obtain these properties.

3 levels of read consistency:

Current: Last committed value
Snapshot: Value as of start of the read transaction
Inconsistent reads: used for cross entity group reads

Update path:

Transaction writes its mutations to the entity groups write-ahead log and then apply the mutations to the data (write ahead logging).
Write transaction always begins with a current read to determine the next available log position. The commit operation gathers mutations into a log entry, assigns an increasing timestamp, and appends to log which is maintained using paxos.
Update rates within a entity group are seriously limited by:

When there is log contention, one wins and the rest fail and must be retried.
Paxos only accepts a very limited update rate (order 10^2 updates per second).
Paper reports that “limiting updates within an entity group to a few writes per second per entity group yields insignificant write conflicts”
Implication: programmers must shard aggressively to get even moderate update rates and consistent update across shards is only supported using two phase commit which is not recommended.

Cross entity group updates are supported by:

Two-phase commit with the fragility that it brings
Queueing and asynchronously applying the changes

Excellent support for backup and redundancy:

Synchronous replication to protect against media failure
Snapshots and incremental log backups

Overall, an excellent paper with lots of detail on a nicely executed storage system. Supporting consistent read and full ACID update semantics is impressive although the limitation of not being able to update an entity group at more than a “few per second” is limiting. [Emphasis added.]

The paper: http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf.

Thanks to Zhu Han, Reto Kramer, and Chris Newcombe for all sending this paper my way.

James wrote the original post while he was Microsoft employee involved with Data Center Futures.

For more details about Google Megastore, see Windows Azure and Cloud Computing Posts for 1/8/2011+ (scroll down).

Ayende Rahien (@ayende, a.k.a. Oren Eini) described The design of RavenDB’s attachments in a 1/6/2011 post:

I got a question on attachments in RavenDB recently:

I know that RavenDb allows for attachments. Thinking in terms of facebook photo albums - would raven attachments be suitable?

And one of the answers from the community was:

We use attachments and it works ok. We are using an older version of RavenDB (Build 176 unstable), and the thing I wish would happen is that attachments were treated like regular documents in the DB. That way you could query them just like other documents. I am not sure if this was changed in newer releases, but there was talk about it being changed.
If I had to redesign again, I would keep attachments out of the DB cause they are resources you could easily off load to a CDN or cloud service like Amazon or Azure. If the files are in your DB, that makes it more work to optimize later.
In summary: You could put them in the DB, but you could also put ketchup on your ice cream. :)

I thought that this is a good point to stop and explain a bit about the attachment support in RavenDB. Let us start from the very beginning.
The only reason RavenDB has attachments support is that we wanted to support the notion of Raven Apps (see Couch Apps) which are completely hosted in RavenDB. That was the original impetus. Since then, they evolved quite nicely. Attachments in RavenDB can have metadata, are replicated between nodes, can be cascade deleted on document deletions and are HTTP cacheable.
One of the features that was requested several times was automatically turning a binary property to an attachment on the client API. I vetoed that feature for several reasons:

It makes things more complicated.
It doesn’t actually gives you much.
I couldn’t think of a good way to explain the rules governing this without it being too complex.
It encourage storing large binaries in the same place as the actual document.

Let us talk in concrete terms here, shall we? Here is my model class:
public class User
{
  public string Id {get;set;}
  public string Name {get;set;}
  public byte[] HashedPassword {get;set;}
  public Bitmap ProfileImage {get;set;}
}
From the point of view of the system, how is it supposed to make a distinction between HashedPassword (16 – 32 bytes, should be stored inside the User document) and ProfileImage (1Kb – 2 MB, should be stored as a separate attachment).
What is worst, and the main reason why attachments are clearly separated from documents, is that there are some things that we don’t want to store inside our document, because that means that:

Whenever we pull the document out, we have to pull the image as well.
Whenever we index the document, we need to load the image as well.
Whenever we update the document we need to send the image as well.

Do you sense a theme here?
There is another issue, whenever we update the user, we invalidate all the user data. But when we are talking about large files, changing the password doesn’t means that you need to invalidate the cached image. For that matter, I really want to be able to load all the images separately and concurrently. If they are stored in the document itself (or even if they are stored as an external attachment with client magic to make it appears that they are in the document) you can’t do that.
You might be familiar with this screen:

If we store the image in the Animal document, we run into all of the problems outlined above.
But if we store it as a Url reference to the information, we can then:

Load all the images on the form concurrently.
Take advantage of HTTP caching.
Only update the images when they are actually changed.

Overall, that is a much nicer system all around.

Date	Click City to Register
February 1	Chevy Chase MD
February 3	Malvern PA
February 8	Iselin NJ
February 10	New York City
March 8	Coral Gables FL
March 10	Tampa FL
March 22	Waltham MA
March 24	Alpharetta GA

Monday, January 10, 2011

Limitations (Please Read)

The Model

Our First Custom Convention

A Closer Look at IConfigurationConvention

Why Not Just IConvention?

Why Func<EntityTypeConfiguration>?

Am I restricted to Type and EntityTypeConfiguration?

More Examples

Removing Existing Conventions

Summary

Table of Contents

Related Articles

0 comments:

Forbes: Who Are The Top 20 Influencers in Big Data?

TRAACKR: Who Are The Top 50 Data Science Influencers?

Blog Archive

OakLeaf Blog Curations on Curah!

Links to SQL Azure Labs and Other Big Data Articles

Windows Azure Mobile Services Preview Walkthrough for Windows Store Apps

OakLeaf's New Windows Azure WebSites

Check Out OakLeaf's New Azure DataMarket Offerings

SearchCloudComputing Articles

Articles for Red Gate Software's ACloudyPlace

Windows Azure Articles for Developer.com

DZone Syndication

Feeds

OakLeaf Systems' Listings in Microsoft Pinpoint

Links to Cover Stories for Visual Studio Magazine

The Windows Azure Team Interview, 11/30/2010

OakLeaf Blog Ranked #134 of Influential Data Blogs by eCairn

OakLeaf Systems' Windows Azure Table Services Sample Project

Check Out the OakLeaf SharePoint Online Site

Google Blog Search

Windows Azure Questions & Answers

About Me

My Latest Books

Early MiniDV and FireWire Posts

Labels

Berkeley Juneteenth Festival 1998 Historical Web Pages

Berkeley Juneteenth Festival Silver Anniversary

Miscellaneous Links

Squidoo Lenses

Terms of Use/Privacy Statement

Copyright