OakLeaf Systems: Windows Azure and Cloud Computing Posts for 10/14/2011+

A compendium of Windows Azure, SQL Azure Database, AppFabric, Windows Azure Platform Appliance and other cloud-computing articles.

Note: This post is updated daily or more frequently, depending on the availability of new articles in the following sections:

Azure Blob, Drive, Table, Queue and Hadoop Services
SQL Azure Database and Reporting
Marketplace DataMarket and OData
Windows Azure AppFabric: Apps, Access Control, WIF and Service Bus
Windows Azure VM Role, Virtual Network, Connect, RDP and CDN
Live Windows Azure Apps, APIs, Tools and Test Harnesses
Visual Studio LightSwitch and Entity Framework v4+
Windows Azure Infrastructure and DevOps
Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds
Cloud Security and Governance
Cloud Computing Events
Other Cloud Computing Platforms and Services

Azure Blob, Drive, Table, Queue and Hadoop Services

Scott M. Fulton, III interviewed Hortonworks CEO Eric Baldeschwieler: Hadoop, the 'Data Cloud' for a 10/14/2011 post to the ReadWriteCloud blog:

Not long ago at all, Oracle laid claim to building the systems that managed a majority of the world's data. This year, the group making the same claim is a spinoff from Yahoo.

The onset of Internet-size databases and cloud architectures brought about architectural quandaries about the nature of relational databases that no one had ever thought they'd have to ponder just a few years earlier. Making tremendous strides in this department in a very short period of time -- literally last June -- is Hortonworks, the newly independent company that produces the Apache-licensed open source database system Hadoop, and the latest partner of Microsoft. This week, ReadWriteWeb talks with Hadoop co-creator and Hortonworks CEO Eric Baldeschwieler.

Our topics: How Hadoop will integrate with Windows Server; whether the structure of data is better suited for Hadoop than other systems, as some NoSQL proponents claim; and the surprising revelation that Baldeschwieler does not perceive Hadoop as an integral part of the NoSQL camp. This despite the fact that Hadoop and NoSQL are often said in the same breath.

Scott Fulton, ReadWriteWeb: Do you perceive at this point Hadoop as a role within Windows Server, or deserving of a role, in the same way that Domain Name Server is a role and Internet Information Server is a role [within Server Manager]?

Eric Baldeschwieler: It's different, in that Hadoop is assembled out of a number of computers. It's kind of the opposite, where I think Windows Server is a perfectly fine component for building a Hadoop cluster. A Hadoop cluster needs to be built on a set of machines that have an operating system. I don't think that Hadoop becomes a service of Windows; rather, you build Hadoop services out of collections of computers, and they could run Windows or they could run whatever other operating system the organization is most comfortable with.

The exciting thing about this partnership [with Microsoft] to us is that it makes Hadoop accessible to a large number of organizations that have selected Windows as their preferred operating system.

RWW: So rather than as a function or a unit of Windows, you prefer to see it as a function of the companies that have put together a collection of all their computers, all of which happen to run Windows?

EB: All of which may happen, yes. I would think of it in similar terms to a cloud offering. I think in some sense Hadoop is a data cloud. Once you've assembled the Hadoop service, you shouldn't care too much what operating system it runs on. Now, you may choose to use applications that have been customized for Windows with Hadoop... There are opportunities to do differentiated things, but Hadoop itself is a very agnostic OS. The services that we're going to build for Hadoop will continue to be operating system-agnostic.

Is cloud data different from "regular" data?

RWW: Is there something intrinsic to the nature of structured data, the class we typically see with RDBMS like SQL Server, Oracle, DB2, that makes it unworkable in a cloud configuration, where it's spread out over multiple processors and devices, and which makes Hadoop suitable as the alternative?

EB: There's nothing about the data per se. What I would say is, Hadoop is much, much simpler than competitive offerings. It's really a very simple layer that's been built from the bottom up with two goals: One is massive scale-out. None of these traditional systems were built with a design point of reaching thousands, or tens of thousands, of nodes. Competing databases, for example, will go up to a dozen or two dozen nodes in a huge installation. So Hadoop was built from the bottom up to be really broadly scalable.

Another differentiation is, Hadoop was built from the bottom up to be built on very low-cost commodity hardware. It's built with the assumption that the hardware it's running on will break, and that it must continue to work even if the hardware breaks. Those two goals of the Hadoop design -- being able to run on thousands of nodes, and being able to run on commodity hardware -- are just very different design points than those traditional systems. Traditional systems have basically exported that part of the problem down to very complicated hardware solutions. There are many, many problems which can be solved much more simply if you're not thinking about scale-out... It's like looking at two animals that evolved from a very, very different ecosystem.

RWW: There are folks who draw a dotted line around the concept of NoSQL, and put the Hadoop elephant squarely in the middle of that. They tell me the basic principle of NoSQL is that big data should never have been designed for a structured query language. It should not be designed for unions and joins and all the mathematical extrapolation; big data was meant to be simpler, key/value pairs, and that architecture of key/value should be embraced as something that all data should eventually follow. Would you agree with that, even partly?

EB: Not as stated. To begin with, Hadoop supports several data processing languages that have exactly the primitives you listed. Hive supports a subset of SQL directly; and Pig supports all of the classic table operators as well -- joins, unions, all the sets, and relational operators. So we're not at all adverse to doing relational algebra and that kind of processing. Hadoop is also really a processing engine.

I think, in general, when you describe the NoSQL movement, I think more of real-time stores such as MongoDB, Riak, or Redis -- there's dozens of these guys. Hadoop is a slightly different beast. The problem they're trying to solve is how to respond to queries in milliseconds with a growing core of data. Hadoop is different again because it's not about millisecond response; Hadoop is really focused on batch processing. Hadoop can really lower the price point by using commodity hardware and by making assumptions that your work will have a certain amount of latency to it, it's very data scanning-intensive instead of seeking-intensive. SQL databases, and even NoSQL stores, are architected so that you can pull up a particular row of data very, very quickly. Hadoop is architected so that you can scan a petabyte of data very, very quickly. There's different design points.

With all these things, I think many of these systems defy simple categorization. As Hadoop evolves, it's able to do a better job with lower-latency work. As databases evolve, they're able to scale to larger systems; and as NoSQL stores evolve, they're able to handle more complex queries.

The place where the other systems exceed what Hadoop can do is in what I would call high-bandwidth random access. If you really need to pull a set of unrelated key/value pairs out of your data as fast as possible, other systems are going to be better. But if you want to scan the last year's data, and understand or find new patterns, or run a machine-learning algorithm to infer the best way to personalize pages for users -- lots and lots of data, where the requirement is to process all the data, that's where Hadoop really excels.

Complementary, not transitional

RWW: Microsoft's folks tell me they're working on a system, that they'll be putting in beta shortly, to help users transition databases for use with Hadoop. I was going to ask, why wouldn't all databases qualify for this transition? But if I understand what you're saying correctly, there's still something to be said for the type of analysis where you're looking for the one thing in a million, the needle in the haystack, instead of a sequential read?

EB: Yea, there are workloads that traditional databases are going to do more cost-effectively than Hadoop. There's no doubt about that. You're not going to see people replacing transactional databases with Hadoop, when what they're trying to build is transactional systems. What does make sense in nearly all cases that I'm aware of is to take a copy of the data you have out of your transactional systems, and also store them on Hadoop. The incremental cost of doing that is very low, and it lets you use that data for the things that Hadoop is good for.

For example, what we're seeing a lot of at Yahoo is, data lives in transactional systems for weeks and months -- maybe 30, 60, 90 days. Because that's how long it's needed to serve production needs. But then there's data which should be kept for one, two or more years inside Hadoop. That lets you do investigation at a really granular level year-over-year, multi-year. It really lets you understand the data in a different way, to have it all available. Whereas the online [data] simply isn't all available because it's not cost-effective to keep petabytes of data in your transactional system. And even if it is, one of the things we see a lot of in production use cases is, the kind of work that people want to do with Hadoop can just crush one of those traditional systems.

So you see people partitioning the work; for example, there's a production system that was both serving user requests that required a transactional database, and that's also doing reporting -- aggregate activity of the users over [several] days, for example. What happened was, that reporting function was just bogging down the database. It was no longer able to meet its transactional requirements in a timely manner, as the load grew. So what they did was move the aggregational work into Hadoop, and left the transactional work in the transactional database. Now they're getting their reports more quickly and they're fielding many more transactions per second off the existing traditional system.

RWW: So there are complementary use cases that can not only be explained but diagrammed for people -- how they can partition their systems so, when they need that granular observation over one to two years, they have a way --

EB: Absolutely. The same applies to business analysis systems. Those also don't scale [very well]. So what we see is, people will use Hadoop to generate a new perception of the data that's useful in these systems, [which] can then be used by analysts who are trained on those tools already to explore that data. That's something that Yahoo does a lot of; they'll explore what they like to call "segments" of the data using traditional tools, but they can only host a tiny fraction of all their data in the data marts and queues. So they'll use Hadoop to produce that view, and load it into a data mart... It's not a case of Hadoop replacing it, but using Hadoop where it complements the other system well.

Allen White (@SQLRunr) described PASS Summit 2011 - The Final Day with the details of David DeWitt’s Day 3 Keynote in a 10/14/2022 post to his SQLBlog.com blog:

Big Data breaks down to massive volumes of records, whether recorded in relational databases or not. By 2020, we'll be managing data in the range of zettabytes, averaging around 35 ZB. Data is being generated by automated sources, generating the incredible quantities we're anticipating. The dramatic drop in the cost of hardware is the reason behind the increase in the amount of data we keep.

eBay uses a Parallel data system, averaging about 10 PB on 256 nodes, while Facebook and Bing use NoSQL systems, managing 20PB and 150PB, respectively. He told us that NoSQL doesn't mean that SQL is dead, it means Not Only SQL - other systems will manage data as well as relational systems. It incorporates more data model flexibility, relaxed consistency models such as eventual consistency, low upfront costs, etc.

The NoSQL model implements a model where data is just stored on arrival, without data cleansing or staging, and the data is evaluated as it arrives. It can use a Key/Value Store method, as in MongoDB, CouchBase, etc, where the data model is very flexible and supports single record retrievals through a key, and other systems like Hadoop, which Microsoft is now supporting. In relational systems there is a structure, where NoSQL uses an "unstructured" model. Relational systems provide maturity and reliability, and NoSQL systems provide flexibility.

This is NOT a paradigm shift. SQL is NOT going away. (Codasyl to relational in the 70s was a paradigm shift.) Businesses will end up with data in both systems.

Big Data started at Google, because they had massive amounts of click stream data that had to be stored and analyzed. It had to scale to petabytes and thousands of nodes. It had to be fault tolerant and simple to program against. They built a distributed file system called GFS and a programming system called MapReduce. Hadoop = HDFS + MapReduce. Hadoop & MapReduce makes it easy to scale to high amounts of data, with fault tolerance and low software and software costs.

HDFS is the underpinnings of the entire ecosystem. It's scalable to 1000s of nodes, and assumes that failures are common. It's a write once, read multiple times, uses a traditional file system and is highly portable. The large file is broken up into 64MB blocks and stored separately on the native file system. The blocks are replicated so that hardware failures are handled, so that block 1 after being written on its original node, will also be stored on two additional nodes (2 and 4). This allows a high level of fault tolerance.

Inside Hadoop there's a name node, which has one instance per cluster. There's also there's a backup node in case the name node has a failure, and there are data nodes. In HDFS the name node is always checking the state of the data nodes and ensuring that the data nodes are alive and balanced. The application has to send a message to the name node to find out where to put the data it needs to write. The name node will report where to place the data, but then gets out of the way and lets the application manage the data writes. Data retrieval is similar in that it asks the name node where the data lives, then gets it from the nodes where it's written.

Failures are handled as an intrinsic part of HDFS. The multiple writes always ensure that the data is stored on nodes on multiple devices so that even rack or switch failures allow access to the data on another device that's still available. When additional hardware is added, the data nodes are rebalanced to make use of it. HDFS is highly scalable, doesn't make use of mirroring or RAID but you have no clue where your data really is.

MapReduce is a programming framework to analyze the data sets stored in HDFS. Map pulls in the data from the smaller chunks, then Reduce analyzes the data against each of the smaller chunks until the work is done. There's a JobTracker function which manages the workload, then TaskTracker functions which manage the data analysis against all the blocks. The JobTracker task lives on top of the Name Node, and the TaskTracker tasks live on the systems with the Data Nodes.

The actual number of map tasks is larger than the number of nodes existing. This allows map tasks to handle work for tasks that fail. Failures are detected by master pings. MapReduce is highly fault tolerant, relatively easy to write and removes the burden of dealing with failures from the programmer. The downside is that the schema is embedded in the application code. There is no shared schema, and there's no declarative query language. Both Facebook's HIVE language and Yahoo's PIG language use Hadoop's MapReduce methodology in their implementations.

Hive introduces a richer environment that pure MapReduce and approaches standard SQL in functionality. Facebook runs 150K jobs daily, and maybe 500 of those are pure MapReduce applications, the rest are HiveQL. In a side-by-side comparison, a couple of standard queries ran about 4 times longer than the same queries using Microsoft's PDW (next release, not yet available.)

Sqoop provides a bridge between the world where unstructured data exists and the structured data warehouse world. Sqoop is a command line utility to move data between those two worlds. Some analyses are hard to do in a query language and are more appropriate for a procedural language, so moving data between them makes sense. The problem with sqoop is that it's fairly slow.

The answer is logically to build a data management system that understands both worlds. Dr. Dewitt terms this kind of system an Enterprise Data Manager. Relational systems and Hadoop are designed to solve different problems. The technology complements each other and is best used where appropriate.

It's so wonderful that PASS brings Dr. Dewitt to help us get back to the fundamental basics behind what we do every day. I love the technical keynotes and really wish Microsoft would learn that marketing presentations aren't why we're here.

SQL Azure Database and Reporting

My (@rogerjenn) Quentin Clark at PASS Summit: 150 GB Max. Database Size, Backup and Live Federation Scaleout for SQL Azure post, updated 10/14/2011, begins:

Quentin, who’s corporate vice president of the Database Systems Group in Microsoft SQL Server organization, reported during his 10/13/2011 PASS keynote current that a Service Release by the end of 2011 will increase maximum database size to 150 GB, enable creation and live expansion of SQL Azure Federations in the Windows Azure Portal, and Backup to Azure storage.

and continues with detailed, illustrated descriptions of:

Increased SQL Azure Database Size, Live Expansion of SQL Azure Federations, and Backup to Azure Storage
Nicholas Dritsas: Deploy On-Premises SQL Server Database to SQL Azure and Backup to Windows Azure in SSMS 2012
Cihan Biyikoglu: New Metro SQL Azure Management Portal with No-Downtime, Click-to-Scale Federation

My (@rogerjenn) Quentin Clark at PASS Summit: SQL Azure Reporting Services and Data Sync CTPs Available from Azure Portal post of 10/14/2011 starts with:

Note: This post was split from the original (10/13/2011) version of my Quentin Clark at PASS Summit: 150 GB Max. Database Size and Live Federation Scaleout for SQL Azure and updated 10/14/2011 with links to new highlights and FAQs for SQL Azure and SQL Azure Data Sync from the http://www.microsoft.com/windowsazure/features/reporting/ site.

Quentin, pictured at right, who’s corporate vice president of the Database Systems Group in Microsoft SQL Server organization, reported during his 10/13/2011 PASS keynote current public availability of SQL Azure Reporting Services and SQL Azure Data Sync Services CTPs in an upgraded Management Portal. A Service Release by the end of 2011 will implement release versions of these and the other new SQL Azure features described in my Quentin Clark at PASS Summit: 150 GB Max. Database Size and Live Federation Scaleout for SQL Azure post.

Updated 10/14/2011 with links to the updated Business Intelligence and SQL Azure Data Sync (CTP) landing pages and new FAQs for Reporting Services and Data Sync.

Gregory Leak posted Announcing: Upcoming SQL Azure Q4 2011 Service Release to the Windows Azure blog on 10/14/2011:

Yesterday at the SQL PASS Summit, we announced several enhancements to SQL Azure that will be delivered with our next service release. The Q4 SR marks a further advancement to our scale on demand capabilities for the data tier. Key enhancements include a 3x increase in maximum database size from 50 GB to 150 GB, and SQL Azure Federation to greatly simplify database elastic scale. Existing applications should be unaffected by the upgrade, and further information on tool updates and how to use these new features will be available on this blog when the rollout of the Q4 SR is complete (expected by end of this year). Also announced in Thursday’s keynote is the immediate availability of the next Community Technology Preview (CTP) releases for SQL Azure Reporting and SQL Azure Data Sync. Both CTPs are now broadly available to SQL Azure customers and can be accessed via the Windows Azure Management portal.

What’s coming in the Q4 2011 Service Release?

The following new features will be included with the Q4 SR:

Increased database size: The maximum database size for individual SQL Azure databases will be expanded from 50 GB to 150 GB.

SQL Azure Federation: With Federation, databases can be elastically scaled out using the sharding database pattern based on database size and the application workload. This new feature will make it dramatically easier to set up sharding, automate the process of adding new shards, and provide significant new functionality for easily managing database shards. Click here for more information.

New SQL Azure Management Portal: The new portal will have a Metro-style user interface with significant new features including the ability to more easily monitor databases, drill-down into schemas, query plans, spatial data, indexes/keys, and query performance statistics (see screen shots below).

Expanded support for user-controlled collations.

New Features for SQL Server Management Studio

New features for SQL Server Management Studio (SSMS) were also demonstrated at the PASS Summit yesterday. Specifically, we will be introducing new capabilities for SSMS that will enable on-premises databases (data and schema) to be moved directly to SQL Azure or to Windows Azure Storage for storage in the cloud, further enhancing hybrid IT scenarios. The updated version of SSMS with these features will first ship in SQL Server 2012 and will also be offered on the web as a free download early next year. We will provide future blog posts when the tool updates are available for download, so stay tuned!

View PASS Summit Keynote with Demonstrations

Click here to watch demonstrations of many of these new features, made during the PASS Summit keynote by Quentin Clark, corporate vice president, SQL Server Database System Group at Microsoft. Click here for more information about Windows Azure and SQL Azure sessions at PASS Summit 2011.

Gregory Leake posted Announcing The New SQL Azure Data Sync CTP Release in a 10/14/2011 post to the Windows Azure blog:

We are very pleased to announce the broad availability of the next Community Technology Preview (CTP) release of SQL Azure Data Sync. This release supports customers' journey to the cloud by enabling hybrid IT environments between on-premises and the cloud, geo-location, and synchronization with remote offices. Key updates include a fresh UI, enhanced filtering, and improved sync group configuration for greater ease and flexibility for syncing data across and within cloud and on-premises databases.

While a CTP release, SQL Azure Data Sync is now available for trial directly within the Windows Azure Management Portal. This new release replaces the CTP2 release and existing users of SQL Azure Data Sync CTP2 should migrate to the new release. The new CTP is open to all SQL Azure subscribers and does not require a separate registration, so you can start using this release and providing feedback today!

What’s New in the Updated SQL Azure Data Sync CTP

The following new features are available in the latest CTP:

Greater ease of use with the new Management Portal:

The new Management Portal provides a rich graphical interpretation of the databases being synchronized and is used to configure, manage and monitor your sync topology.

Greater flexibility with enhanced filtering and sync group configuration:

Filtering: Specify a subset of table columns or specific rows.

Sync group configuration: Specify conflict resolution as well as sync direction per group member.

Greater access for all users:

The new CTP is available to all SQL Azure users for trial, and does not require a separate registration process.

Please click here to access our product documentation pages and learn more about the technologies. Also, an updated SQL Azure Data Sync FAQ is available here.

Background Information

SQL Azure Data Sync is a cloud-based data synchronization service built on the Microsoft Sync Framework technologies. Instead of writing complex custom logic to replicate data between databases, SQL Azure Data Sync is a service with a point-and-click interface to setup your synchronization rules and schedules. The service provides bi-directional data synchronization and data management capabilities allowing data to be easily shared across SQL Azure databases within multiple data centers. The following scenarios are possible:

Cloud to Cloud Synchronization

Geographically co-locate data with applications around the world to provide the most responsive experience for your users. Use it in conjunction with Windows Azure Traffic Manager.

Create one or more copies of data for scale-out. For example, separate your cloud-based reporting workload from your OLTP workload.

Enterprise (on-premises) With cloud

Produce hybrid applications, extending on-premises applications with cloud applications and allowing data to be shared.

Share data between branch or worldwide offices through the cloud.

Aggregate data in the cloud from retail offices to provide cross-location insight and operations.

Make data collected or aggregated in the cloud available to on-premises applications.

In addition, SQL Azure Data Sync enables one-way synchronization and bi-directional synchronization including sync-to-hub and sync-from-hub synchronizations spanning both SQL Azure and on-premise SQL Server instances running in your private data centers.

Sharing Your Feedback

For community-based support, post a question to the SQL Azure MSDN forums. The product team will do its best to answer any questions posted there.

To suggest new SQL Azure Data Sync features or vote on existing suggestions, click here.

To log a bug in this release, use the following steps:

Navigate to https://connect.microsoft.com/SQLServer/Feedback.

You will be prompted to search our existing feedback to verify your issue has not already been submitted.

Once you verify that your issue has not been submitted, scroll down the page and click on the orange Submit Feedback button in the left-hand navigation bar.

On the Select Feedback form, click SQL Server Bug Form.

On the bug form, select Version = SQL Azure Data Sync Preview

On the bug form, select Category = SQL Azure Data Sync

Complete your request.

Click Submit to send the form to Microsoft.

If you have any questions about the feedback submission process or about accessing the new SQL Azure Data Sync CTP, please send us an email message: sqlconne@microsoft.com.

Click here for more information about Windows Azure and SQL Azure sessions at PASS Summit 2011. Click here to watch demonstrations of many of these new features, made during the PASS Summit keynote by Quentin Clark, corporate vice president, SQL Server Database System Group at Microsoft.

Liam Cavanagh (&liamca) reported SQL Azure Data Sync Preview is now available in a 10/14/2011 post:

We are pleased to announce the availability of SQL Azure Data Sync Preview. The Preview does not require a registration code and is available to anyone who has a SQL Azure account.

Improvements for the Preview release include:

Now hosted on the Windows Azure Management site - https://windows.azure.com.

A completely redesigned UI which makes performing common tasks easy, straight-forward and intuitive. The UI includes tutorials and Help within the UI.

Richer sync group configuration options:

Synchronization schedules from 5 minutes to 1 month, and anywhere in between.

Settable conflict resolution policy.

More granular definition of synchronization data sets - down to the table and row level.

Synchronization direction selectable for each database.

Unfortunately we have not been able to add the functionality to update sync group configuration when the schema changes. We intend to add this functionality in a future service release.

During the Preview, like the previous CTPs, using the service is free – though there are the normal Windows Azure and SQL Azure charges. Pricing for the production version has not yet been announced.

Data Sync Preview is not backwardly compatible with CTP2. If you are upgrading from CTP2 you need to remove all databases from the CTP2 sync groups, de-provision each database using deprov.exe, delete all CTP2 sync groups, and uninstall the CTP2 client agent.

The CTP2 site will be taken down in a few weeks. The sooner you upgrade the better.

For More Information

Check out the SQL Azure Data Sync post on Windows Azure blog.

Other sites you might want to check out

Windows Azure Management Portal.

Documentation for the Preview on the SQL Azure Data Sync TechNet wiki.

Client Agent Preview download.

Suggest or vote for new features on the Feature Voting Forum.

Ask questions using the SQL Azure MSDN Forum.

Log a bug at the SQL Server and SQL Azure Connect site.

Give it a try. We think you will like it.

Steve Marx (@smarx) and Wade Wegner (@WadeWegner) interviewed Roger Doherty (@doherty100) on 10/14/2011:

Join Wade and Steve each week as they cover the Windows Azure Platform. You can follow and interact with the show at @CloudCoverShow.

In this episode, Roger Doherty, Technical Evangelist for SQL Server and SQL Azure, joins Steve and Wade to discuss recent updates to SQL Azure database.

In the news:

Our Datacenters are Awesomeness in a Box

Blob Download Bug in Windows Azure SDK 1.5

Windows Azure Marketplace Now Available in 26 Countries

Microsoft Expands Data Platform with SQL Server 2012, New Investments for Managing Any Data, Any Size, Anywhere (includes Hadoop announcement)

Tweet to Roger at @doherty100.

MarketPlace DataMarket and OData

Turker Keskinpala (@tkes) posted Geospatial Properties to the OData.org blog on 10/14/2011:

OData supports geospatial data types as a new set of primitives. They can be used just like any other primitives—passed in URLs as literals, as types and values for properties, projected in $select, and so on. Like other primitives, there is a set of canonical functions that can be used with them.

The only restriction, relative to other primitives, is that geospatial types may not be used as entity keys or in ETags (see below).

The rest of this entry goes into more detail about the geospatial type system that we support, how geospatial types are represented in $metadata, how their values are represented in Atom and JSON payloads, how they are represented in URLs, and what canonical functions are defined for them.

Modeling

Primitive Types

Our type system is firmly rooted in the OGC Simple Features (OGC SF) geometry type system. We diverge from their type system in only three ways.

Figure 1: The OGC Simple Features Type Hierarchy

First, we expose a subset of the type system and a subset of the operations. For details, see the sections below.

Second, the OGC type system is defined for only 2-dimensional geospatial data. We extend the definition of a position to be able to handle a larger number of dimensions. In particular, we handle 2d, 3dz, 3dm, and 4d geospatial data. See the section on Coordinate Reference Systems (CRS) for more information.

Third, the OGC type system is designed for flat-earth geospatial data (termed geometrical data hereafter). OGC does not define a system that handles round-earth geospatial data (termed geographical data). Thus, we duplicate the OGC type system to make a parallel set of types for geographic data. We refer to whether a type is geographical or geometrical as its topology.

Some minor changes in representation are necessary because geographic data is in a bounded surface (the ellipsoid), while geometric data is in an infinite surface (the plane). This shows up, for example, in the definition of a Polygon. We make as few changes as possible; see below for details. Even when we make changes, we follow prior art wherever possible.

Coordinate Reference Systems

Although we support many Coordinate Reference Systems (CRS), there are several limitations (as compared to the OGC standard):

We only support CRS designated by an SRID. This should be an official SRID as standardized by the EPSG (European Petroleum Survey Group). In particular, we don't support custom CRS defined in the metadata, as does GML.

Thus, some data will be inexpressible. For example, there are hydrodynamics readings data represented in a coordinate system where each point has coordinates [lat, long, depth, time, pressure, temperature]. This lets them do all sorts of cool analysis (eg, spatial queries across a surface defined in terms of the temperature and time axes), but is beyond the scope of OData.

There are also some standard coordinate systems that don't have codes standardized by EPSG. So we couldn't represent those. Ex: some common systems in New Zealand & northern Europe have standards but no ID code.

The CRS is part of the static type of a property. Even if that property is of the base type, that property is always in the same CRS for all instances.

The CRS is static under projection. The above holds even between results of a projection.

There is a single "varies" SRID value. This allows a service to explicitly state that the CRS varies on a per-instance basis. This still does not allow the coordinate system to vary between sub-regions of a shape (e.g., the various points in a GeographyMultiPoint).

Geometric primitives with different CRS are not type-compatible under filter, group, sort, any action, or in any other way. If you want to filter an entity by a point defined in State Plane, you have to send literals in State Plane. OData will not transform UTM zone 10 to Washington State Plane North for you.

There are client-side libraries that can do some coordinate transforms for you.

Servers could expose coordinate transform functions as non-OGC function extensions. See below for details.

Nominally, the Geometry/Geography type distinction is redundant with the CRS. Each CRS is inherently either round-earth or flat-earth. However, OData does not automatically resolve this. Implementations need not know which CRS match which model type. The modeler will have to specify both the type & the CRS.

There is a useful default CRS for Geography (round earth) data: WGS84 (SRID 4326). This is the coordinate system used for GPS. Implementations will use that default if none is provided.

The default CRS for Geometry (flat earth) data is SRID 0. This represents an arbitrary flat plane with unit-less dimensions.

New Primitive Types

The Point types—Edm.GeographyPoint and Edm.GeometryPoint

"Point" is defined as per the OGC. Roughly, it consists of a single position in the underlying topology and CRS. Edm.GeographyPoint is used for points in the round-earth (geographic) topology. Edm.GeometryPoint is a point in a flat-earth (geometric) topology.

These primitives are used for properties with a static point type. All entities of this type will have a point value for this property.

Example properties that would be of type point include a user's current location or the location of a bus stop.

The LineString types—Edm.GeographyLineString and Edm.GeometryLineString

"LineString" is defined as per the OGC. Roughly, it consists of a set of positions with "linear" interpolation between those positions, all in the same topology and CRS, and represents a path. Edm.GeographyLineString is used for geographic LineStrings; its segments are great elliptical arcs. Edm.GeometryLineString is used for geometric coordinates; it uses ordinary linear interpolation.

These primitives are used for properties with a static path type. Example properties would be the path for a bus route entity, or the path that I followed on my jog this morning (stored in a Run entity).

The Polygon types—Edm.GeographyPolygon and Edm.GeometryPolygon

"Polygon" is defined as per the OGC. Roughly, it consists of a single bounded area which may contain holes. It is represented using a set of LineStrings that follow specific rules. These rules differ between geometric and geographic topologies (see below).

These primitives are used for properties with a static single-polygon type. Examples include the area enclosed in a single census tract, or the area reachable by driving for a given amount of time from a given initial point.

Some things that people think of as polygons, such as the boundaries of states, are not actually polygons. For example, the state of Hawaii includes several islands, each of which is a full bounded polygon. Thus, the state as a whole cannot be represented as a single polygon. It is a Multipolygon.

Different representation rules come into play with Polygons in geographic coordinate systems. The OGC SF definition of a polygon says that it consists of all the points between a single outer ring and a set of inner rings. However, "outer" is not well-defined on a globe. Thus, we need to slightly alter the definition.

OData uses the same definition for Polygon as Sql Server. A polygon is the set of points "between" a set of rings. More specifically, we use a left-foot winding rule to determine which region is in the polygon. If you traverse each ring in the order of control points, then all points to the left of the ring are in the polygon. Each polygon is the set of points which are to the left of all rings in its set of boundaries.

Thus the total set of special rules for the boundary LineStrings:

In either coordinate system, each LineString must be a ring. In other words, its start and end must be the same point.

In Edm.GeometryPolygon, the first ring must be "outer," and all others must be "inner." Inner rings can be in any order with respect to each other, and each ring can be in either direction.

In Edm.GeographyPolygon, each ring must follow the left-foot winding rule. The rings may be in any order with respect to each other.

The Union types—Edm.GeographyCollection and Edm.GeometryCollection

"GeometryCollection" is defined as per the OGC. Roughly, it consists of a union of all points that are contained in a set of disjoint shapes, each of which has the same CRS. Edm.GeographyCollection is used for points in the round-earth (geographic) topology. Edm.GeometryCollection is in a flat-earth (geometric) topology.

These primitives are used for properties that represent a shape that cannot be defined using any of the other geospatial types. Each value is the represented by unioning together sub-shapes until the right set of positions is represented.

Example properties that would be of type geography collection are hard to find, but they do show up in advanced geospatial activities, especially as the result of advanced operations.

The MultiPolygon types—Edm.GeographyMultiPolygon and Edm.GeometryMultiPolygon

"MultiPolygon" is defined as per the OGC. Roughly, it consists of the shape that is a union of all polygons in a set, each of which has the same CRS and is disjoint from all others in the set. Edm.GeographyMultiPolygon is used for points in the round-earth (geographic) topology. Edm.GeometryMultiPolygon is in a flat-earth (geometric) topology.

MultiPolygon values are often used for values that seem like polygons at first glance, but have disconnected regions. Polygon can only represent shapes with one part. For example, when representing countries, it seems at first that Polygon would be the appropriate choice. However, some countries have islands, and those islands are fully disconnected from the other parts of the nation. MultiPolygon can represent this, while Polygon cannot.

MultiPolygon is different from a container of polygons in that it represents a single shape. That shape is described by its sub-regions, but those sub-regions are not actually, themselves, useful values. For example, a set of buildings would be stored as a collection of polygons. Each element in that collection is a building. It has real identity. However, a state is a MultiPolygon. It might contain, for example, a polygon that covers the left two-thirds of the big island in Hawaii. That is just a "bit of a country," which only has meaning when it is unioned with a bunch of other polygons to make Hawaii.

The MultiLineString types—Edm.GeographyMultiLineString and Edm.GeometryMultiLineString

"MultiLineString" is defined as per the OGC. Roughly, it consists of the shape that is a union of all line strings in a set, each of which has the same CRS. Edm.GeographyMultiLineString is used for points in the round-earth (geographic) topology. Edm.GeometryMultiLineString is in a flat-earth (geometric) topology.

MultiLineString values are used for properties that represent the union of a set of simultaneous paths. This is not a sequence of paths—that would be better represented as a collection of line strings. MultiLineString could be used to represent, for example, the positions at which it is unsafe to dig due to gas mains (assuming all the pipes lacked width).

The MultiPoint types—Edm.GeographyMultiPoint and Edm.GeometryMultiPoint

"MultiPoint" is defined as per the OGC. Roughly, it consists of the shape that is a union of all points in a set, each of which has the same CRS. Edm.GeographyMultiPoint is used for points in the round-earth (geographic) topology. Edm.GeometryMultiPoint is in a flat-earth (geometric) topology.

MultiPoint values are used for properties that represent a set of simultaneous positions, without any connected regions. This is not a sequence of positions—that would be better represented as a collection of points. MultiPoint handles the far more rare case when some value can be said to be all of these positions, simultaneously, but cannot be said to be just any one of them. This usually comes up as the result of geospatial operations.

The base types — Edm.Geography and Edm.Geometry

The base type represents geospatial data of an undefined type. It might vary per entity. For example, one entity might hold a point, while another holds a multi-linestring. It can hold any of the types in the OGC hierarchy that have the correct topology and CRS.

Although core OData does not support any functions on the base type, a particular implementation can support operations via extensions (see below). In core OData, you can read & write properties that have the base types, though you cannot usefully filter or order by them.

The base type is also used for dynamic properties on open types. Because these properties lack metadata, the server cannot state a more specific type. The representation for a dynamic property MUST contain the CRS and topology for that instance, so that the client knows how to use it.

Therefore, spatial dynamic properties cannot be used in $filter, $orderby, and the like without extensions. The base type does not expose any canonical functions, and spatial dynamic properties are always the base type.

Edm.Geography represents any value in a geographic topology and given CRS. Edm.Geometry represents any value in a geometric topology and given CRS.

Each instance of the base type has a specific type that matches an instantiable type from the OGC hierarchy. The representation for an instance makes clear the actual type of that instance.

Thus, there are no instances of the base type. It is simply a way for the $metadata to state that the actual data can vary per entity, and the client should look there.

Spatial Properties on Entities

Zero or more properties in an entity can have a spatial type. The spatial types are regular primitives. All the standard rules apply. In particular, they cannot be shredded under projection. This means that you cannot, for example, use $select to try to pull out the first control position of a LineString as a Point.

For open types, the dynamic properties will all effectively be of the base type. You can tell the specific type for any given instance, just as for the base type. However, there is no static type info available. This means that dynamic properties need to include the CRS & topology.

Spatial-Primary Entities (Features)

This is a non-goal. We do not think we need these as an intrinsic. We believe that we can model this with a pass-through service using vocabularies. If you don't know what the GIS community means by Features, don't worry about it. They're basically a special case of OData's entities.

Communicating

Metadata

We define new types: Edm.Geography, Edm.Geometry, Edm.GeogrpahyPoint, Edm.GeometryPoint, Edm.GeographyPolygon, Edm.GeometryPolygon, Edm.GeographyCollection, Edm.GeometryCollection, Edm.GeographyMultiPoint, Edm.GeometryMultiPoint, Edm.GeographyMultiLineString, Edm.GeometryMultiLineString, Edm.GeographyMultiPolygon, and Edm.GeometryMultiPolygon. Each of them has a facet that is the CRS, called "SRID".

Entities in Atom

Entities are represented in Atom using the GML Simple Features Profile, at compliance level 0 (GML SF0). However, a few changes are necessary. This GML profile is designed to transmit spatial-primary entites (features). Thus, it defines an entire document format which consists of shapes with some associated ancillary data (such as a name for the value represented by that shape).

OData entites are a lot more than just geospatial values. We need to be able to represent a single geospatial value in a larger document. Thus, we use only those parts of GML SF0 that represent the actual geospatial values. This is used within an entity to represent the value of a particular property.

This looks like:
<entity m:type="FoursquareUser">
    <property name="username" m:type="Edm.String">Neco Fogworthy</property>
    <property name="LastKnownPosition" m:type="Edm.GeographyPoint"><gml:Point
      gml:srsName=\"http://www.opengis.net/def/crs/EPSG/0/4326\"
      xmlns:gml=\"http://www.opengis.net/gml\">
        <gml:pos>45.12 -127.432 NaN 3.1415</gml:pos>
      </gml:Point></property></entity>
Entities in JSON

We will use GeoJSON. Technically, GeoJSON is designed to support the same feature-oriented perspective as is GML SF0. So we are using only the same subset of GeoJSON as we do for GML SF0. We do not allow the use of types "Feature" or "FeatureCollection." Use entities to correlate a geospatial type with metadata.

Furthermore, "type" SHOULD be ordered first in the GeoJSON object, followed by coordinates, then the optional properties. This allows recipients to more easily distinguish geospatial values from complex type values when, for example, reading a dynamic property on an open type.

This looks like:
{ "d" : { "results": [ { "__metadata":
        { "uri": "http://services.odata.org/Foursquare.svc/Users('Neco447')",
        "type": "Foursquare.User" }, "ID": "Neco447",
        "Name": "Neco Fogworthy", "FavoriteLocation": { "type": "Point",
            "coordinates": [-127.892345987345, 45.598345897] }, "LastKnownLocation":
            { "type": "Point", "coordinates": [-127.892345987345,
            45.598345897], "crs": { "type": "name", "properties":
            { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, "bbox":
            [-180.0, -90.0, 180.0, 90.0] } }, { /* another User Entry */ } ], "__count":
        "2", } } 
Dynamic Properties

Geospatial values in dynamic properties are represented exactly as they would be for static properties, with one exception: the CRS is required. The recipient will not be able to examine metadata to find this value, so the value must specify it.

Geospatial Literals in URIs

Geospatial URL literals are represented using WKT, with a common extension. There are at least 3 common extensions to WKT (PostGIS, ESRI, and Sql Server). They disagree in many places, but those that allow including an SRID all use the same approach. As such, they all use (approximately) the same representation for values with 2d coordinates. Here are some examples:
/Stores$filter=Category/Name eq "coffee" and geo.distance(Location,
        POINT(-127.89734578345 45.234534534)) lt 900.0
/Stores$filter=Category/Name eq "coffee" and geo.intersects(Location,
        SRID=12345;POLYGON((-127.89734578345 45.234534534,-127.89734578345 45.234534534,-127.89734578345
        45.234534534,-127.89734578345 45.234534534)))
/Me/Friends$filter=geo.distance(PlannedLocations, SRID=12345;POINT(-127.89734578345
        45.234534534) lt 900.0 and PlannedTime eq datetime'2011-12-12T13:36:00'
Not usable everywhere

Geospatial values are neither equality comparable nor partially-ordered. Therefore, the results of these operations would be undefined.

Furthermore, geospatial types have very long literal representations. This would make it difficult to read a simple URL that navigates along a series of entities with geospatial keys.

Geospatial primitives MUST NOT be used with any of the logical or arithmetic operators (lt, eq, not, add, etc).

Geospatial primitives MUST NOT be used as keys.

Geospatial primitives MUST NOT be used as part of an entity's ETag.

Distances

Some queries, such as the coffee shop search above, need to represent a distance.

Distance is represented the same in the two topologies, but interpreted differently. In each case, it is represented as a double scalar value. The units are interpreted by the topology and coordinate system for the property with which it is compared or calculated.

Because a plane is uniform, we can simply define distances in geometric coordinates to be in terms of that coordinate system's units. This works as long as each axis uses the same unit for its coordinates, which is the general case.

Geographic topologies are not typically uniform, because they use angular measures. The distance between longitude -125 and -124 is not the same at all points on the globe. It goes to 0 at the poles. Thus, the underlying coordinate system measures position well, but does not work for describing a distance.

For this reason, each geographic CRS also defines a unit that will be used for distances. For most CRSs, this is meters. However, some use US feet, Indian feet, German meters, or other units. In order to determine the meaning of a distance scalar, the developer must read the reference (http://www.epsg-registry.org/) for the CRS involved.

New Canonical Functions

Each of these canonical functions is defined on certain geospatial types. Thus, each geospatial primitive type has a set of corresponding canonical functions. An OData implementation that supports a given geospatial primitive type SHOULD support using the corresponding canonical functions in $filter. It MAY support using the corresponding canonical functions in $orderby.

The canonical functions are named just like Simple Features-compliant extension methods. This means that individual server extensions for standard OGC functions feel like core OData. This works as long as we explicitly state (or reference) the set of functions allowed in geo.

Currently, these canonical functions are defined in two dimensions, as that is all that is standardized in OGC SF. Each function is calculated by first projecting the points to 2D (dropping the Z & M coordinates).

geo.distance

Geo.distance is a canonical function defined between points. It returns a distance, as defined above. The two arguments must use the same topology & CRS. The distance is measured in that topology. Geo.distance is one of the corresponding functions for points. Geo.distance is defined as equivalent to the OGC SF Distance method for their overlapping domain, with equivalent semantics for geographical points.

geo.intersects

Geo.intersects identifies whether a point is contained within the enclosed space of a polygon. Both arguments must be of the same topology & CRS. It returns a Boolean value. Geo.intersects is a canonical function for any implementation that includes both points and polygons. Geo.intersects is equivalent to OGC SF's Intersects in their area of overlap, extended with the same semantics for geographic data.

geo.length

Geo.length returns the total path length of a linestring. It returns a distance, as defined above. Geo.length is a corresponding function for linestrings. Geo.length is equivalent to the OGC SF Length operation for geometric linestrings, and is extended with equivalent semantics to geographic data.

All other OGC functions

OData does not require these, because we want to make it easier to stand up a server that is not backed by a database. Some are very hard to implement, especially in geographic coordinates.

A provider that is capable of handling OGC SF functions MAY expose those as Functions on the appropriate geospatial primitives (using the new Function support).

We are reserving a namespace, "geo," for these standard functions. If the function matches a function specified in Simple Features, you SHOULD place it in this namespace. If the function does not meet the OGC spec, you MUST NOT place it in this namespace. Future versions of the OData spec may define more canonical functions in this namespace. The namespace is reserved to allow exactly these types of extensions without breaking existing implementations.

In the SQL version of the Simple Features standard, the function names all start with ST_ as a way to provide namespacing. Because OData has real namespaces, it does not need this pseudo-namespace. Thus, the name SHOULD NOT include the ST_ when placed in the geo namespace. Similarly, the name SHOULD be translated to lowercase, to match other canonical functions in OData. For example, OGC SF for SQL's ST_Buffer would be exposed in OData as geo.buffer. This is similar to the Simple Features implementation on CORBA.

All other geospatial functions

Any other geospatial operations MAY be exposed by using Functions. These functions are not defined in any way by this portion of the spec. See the section on Functions for more information, including namespacing issues. They MUST NOT be exposed in the geo namespace.

Examples

Find coffee shops near me
/Stores$filter=/Category/Name eq "coffee" and geo.distance(Location,
        POINT(-127.89734578345 45.234534534)) lt 0.5&$orderby=geo.distance(Location, POINT-127.89734578345
        45.234534534)&$top=3
Find the nearest 3 coffee shops, by drive time

This is not directly supported by OData. However, it can be handled by an extension. For example:
/Stores$filter=/Category/Name eq "coffee"&$orderby=MyNamespace.driving_time_to(POINT(-127.89734578345
        45.234534534, Location)&$top=3
Note that while geo.distance is symmetric in its args, MyNamespace.driving_time_to might not be. For example, it might take one-way streets into account. This would be up to the data service that is defining the function.

Compute distance along routes
/Me/Runs$orderby=geo.length(Route) desc&$top=15
Find all houses close enough to work

For this example, let's assume that there's one OData service that can tell you the drive time polygons around a point (via a service operation). There's another OData service that can search for houses. You want to mash them up to find you houses in your price range from which you can get to work in 15 minutes.

First,
/DriveTime(to=POINT(-127.89734578345 45.234534534), maximum_duration=time'0:15:00')
returns
{ "d" : { "results": [ { "__metadata": { "uri":
        "http://services.odata.org/DriveTime(to=POINT(-127.89734578345 45.234534534),
        maximum_duration=time'0:15:00')", "type": "Edm.Polygon"
        }, "type": "Polygon", "coordinates": [[[-127.0234534534,
        45.089734578345], [-127.0234534534, 45.389734578345], [-127.3234534534, 45.389734578345],
        [-127.3234534534, 45.089734578345], [-127.0234534534, 45.089734578345]], [[-127.1234534534,
        45.189734578345], [-127.1234534534, 45.289734578345], [-127.2234534534, 45.289734578345],
        [-127.2234534534, 45.189734578345], [-127.1234534534, 45.289734578345]]] } ], "__count":
        "1", } }
Then, you'd send the actual search query to the second endpoint:
/Houses$filter=Price gt 50000 and Price lt 250000 and geo.intersects(Location,
        POLYGON((-127.0234534534 45.089734578345,-127.0234534534 45.389734578345,-127.3234534534
        45.389734578345,-127.3234534534 45.089734578345,-127.0234534534 45.089734578345),(-127.1234534534
        45.189734578345,-127.1234534534 45.289734578345,-127.2234534534 45.289734578345,-127.2234534534
        45.189734578345,-127.1234534534 45.289734578345)))
Is there any way to make that URL shorter? And perhaps do this in one query? Not yet.

This is actually an overly-simple polygon for a case like this. This is just a square with a single hole in it. A real driving polygon would contain multiple holes and a lot more boundary points. So that polygon in the final query would realistically be 3-5 times as long in the URL.

It would be really nice to support reference arguments in URLs (with cross-domain support). Then you could represent the entire example in a single query:
/Houses$filter=Price gt 50000 and Price lt 250000 and geo.intersects(Location,
            Ref("http://drivetime.services.odata.org/DriveTime(to=POINT(-127.89734578345
            45.234534534), maximum_duration=time'0:15:00')"))
However, this is not supported in OData today.

In Closing

This set of new primitive types for OData allows it to represent many types of geospatial data. It does not handle everything. Future versions may increase the set of values that can be represented in OData.

For example, the OGC is working to standardize the set of types used for non-linear interpolation. Similarly, many geospatial implementations are just starting to get into the intricacies of geographic topologies. They are discovering cases which do not work with the current geometry-based standards. As the geospatial community solves these problems and extends the standards, OData will likely incorporate new types.

As ever, please use the mailing list to tell us what you think about this proposal.

Turker Keskinpala (@tkes) reported new Vocabularies in OData in a 10/14/2011 post to the OData.org blog:

We blogged about design thoughts on vocabularies a while back and while our goals for supporting vocabularies in OData have stayed similar, the vocabulary syntax has evolved from what we initially explored. In this blog post, I’ll explain the current state of vocabularies support in OData protocol.

What are vocabularies?

A vocabulary is a namespace containing terms. A term is a named metadata extension for a data source such as an OData service. A term may optionally have a single value, or a collection of named properties. The OData metadata document is insufficiently expressive to enable certain class of experiences. Clients and data providers can cooperate to enable richer experiences by enhancing OData metadata with vocabularies.

Following are a few examples of extensions to OData that may leverage vocabularies:

· Inventing a validation metadata so that a service may describe valid ranges, value lists, expressions, etc. for properties of entity types.

· Visualization metadata may be defined to support generic browsing and visualization of data published via OData.

· Adapting microformats or defining RDF vocabularies in terms of vocabularies, to bridge and integrate OData services, other linked data and semantic web technologies.

We are not aiming to specify how to capture the semantics of the vocabulary, or how to enforce the vocabulary; these are left to the definers/applicants of vocabularies, the producers and consumers who choose to understand the vocabulary. This approach makes possible in OData the re-use of many existing vocabularies and their shared semantic meaning.

Design Principles

The following design principals shape the vocabularies support in OData:

Use of vocabularies does not disturb use by clients that aren’t concerned with vocabularies.

Vocabulary definers, appliers, service owners, and service clients are independent

Vocabulary mechanics address common needs without ascribing high-level semantics

Reuse of a predecessor vocabulary by successor vocabulary is recognizable by existing clients of the predecessor vocabulary

Vocabulary Annotations

To support vocabularies, we specify a new CSDL syntax used to define and apply terms. . This enables the definition and application of vocabularies using familiar EDM constructs and existing reference mechanisms. In the example below we look at the new syntax in the context of three broad overlapping categories:

Simple Metadata Extensions

Property Attachment

Structural

These categories reflect how we are currently thinking about the problem and we are looking for your feedback on these as well.

Simple Metadata Extensions

One of the use cases for vocabulary annotations is extending service metadata with static information expressed as term annotations. An example of this is extending service metadata using a term which describes a valid range of values for certain properties.

Imagine having the following CSDL:
<Schema Namespace="ExampleModel" Alias="ExampleModel"
xmlns=http://schemas.microsoft.com/ado/2009/11/edm>
<EntityType Name="Person">
<Key>
<PropertyRef Name="PersonId" />
</Key>
<Property Name="PersonId" Type="Int32" Nullable="false" />
<Property Name="FirstName" Type="String" Nullable="true" />
<Property Name="LastName" Type="String" Nullable="true" />
<Property Name="Age" Type="Int32" Nullable="true" />
</EntityType>
</Schema>
One may want to annotate this Person entity with a Range term to extend the service metadata to provide additional validation information for the Age property. This can be achieved in two different ways; inline and out of line, respectively:
<Schema Namespace=c Alias="ExampleModel"
xmlns=http://schemas.microsoft.com/ado/2009/11/edm>
<Using NamespaceUri="http://fabrikam.com/vocabularies/Validation"
Alias="Validation" />
<EntityType Name="Person">
<Key>
<PropertyRef Name="PersonId" />
</Key>
<Property Name="PersonId" Type="Int32" Nullable="false" />
<Property Name="FirstName" Type="String" Nullable="true" />
<Property Name="LastName" Type="String" Nullable="true" />
<Property Name="Age" Type="Int32" Nullable="true">
<TypeAnnotation Term="Validation.Range">
<PropertyValue Property="Min" Decimal="16" />
</TypeAnnotation>
</Property>
</EntityType>
</Schema>
or
<Schema Namespace=c Alias="ExampleModel"
xmlns=http://schemas.microsoft.com/ado/2009/11/edm>
<Using NamespaceUri="http://fabrikam.com/vocabularies/Validation" Alias="Validation" />
<Annotations Target="ExampleModel.Person.Age">
<TypeAnnotation Term="Validation.Range">
<PropertyValue Property=”Min” Decimal="16" />
</TypeAnnotation> 
</Annotations>
<EntityType Name="Person">
<Key>
<PropertyRef Name="PersonId" />
</Key>
<Property Name="PersonId" Type="Int32" Nullable="false" />
<Property Name="FirstName" Type="String" Nullable="true" />
<Property Name="LastName" Type="String" Nullable="true" />
<Property Name="Age" Type="Int32" Nullable="true" />
</EntityType>
</Schema>
In both forms shown above, the “Validation” identifier used to qualify the term name in the annotations is an alias for the globally unique namespace, specified by NamespaceUri="http://fabrikam.com/vocabularies/Validation".

Extending the metadata with a term as shown in the example may mean that the minimum value of the Age property of a Person entity cannot be less than 16. This enables clients that understand the ‘Range’ term to validate user input and provide useful user feedback without first performing additional round trips to the data service.

Another good example for this category of annotations is extensions with terms related to how entities should be displayed. For example, a client may not be interested in displaying the PersonId while displaying a list of People. It would be possible to ‘tag’ the PersonId property with a Hide term in addition to specifying the range for the Age property as follows (showing only the out-of-line syntax for brevity):
<Using NamespaceUri="http://fabrikam.com/vocabularies/Validation"
Alias="Validation" />
<Using NamespaceUri="http://fabrikam.com/vocabularies/Display" Alias="Display" />
<Annotations Target="ExampleModel.Person.PersonId">
<TypeAnnotation Term="Display.Hide" />
</Annotations>
<Annotations Target="ExampleModel.Person.Age">
<TypeAnnotation Term="Validation.Range">
<PropertyValue Property="Min" Decimal="16" />
</TypeAnnotation>
</Annotations>
In the example above, there are no properties to set for the term Display.Hide and simply tagging the PersonId property with this term may tell clients to hide the PersonId field while displaying the list of person entities.

Property Attachment

Another broad category of vocabulary annotations that one can think of are property attachments. There may be cases where it’s useful to extend metadata with properties that are not originally in the data model but available as terms in a vocabulary.

A good example can be a scenario where the data service extends the metadata with a Title term to describe a single summarizing line of text to display while visualizing certain entities. Without a well- known term, a generic client has no choice but to use a heuristic to select content to display per entity. Clients that support the Title term would display the entities with the title intended by the user of the vocabulary.

Suppose the CSDL in the earlier example actually looks like the following:
<Schema Namespace="ExampleModel" Alias="ExampleModel" xmlns="http://schemas.microsoft.com/ado/2009/11/edm">
<EntityType Name="Person">
<Key>
<PropertyRef Name="PersonId" />
</Key>
<Property Name="PersonId" Type="Int32" Nullable="false" />
<Property Name="FirstName" Type="String" Nullable="true" />
<Property Name="LastName" Type="String" Nullable="true" />
<Property Name="FriendlyName" Type="String" Nullable="true" />
<Property Name="Age" Type="Int32" Nullable="true" />
</EntityType>
<EntityContainer Name="MyContainer" m:IsDefaultEntityContainer="true">
<EntitySet Name="People" Type="ExampleModel.Person" />
</EntityContainer>
</Schema>
Suppose that a service wants to extend the metadata with Title term to specify a value for Title of the Person EntityType. This would enable clients that support the Title term to display a list of person entities with the Title desired by the service author. The EntityType can be annotated with a Title term as follows:
<Annotations Target="ExampleModel.Person">
<ValueAnnotation Term="Display.Title" Path="FriendlyName" />
</Annotations>
The annotation in the example above extends the metadata using the Title term and a string value that is assigned to it. Originally there was not the concept of a Title for the Person EntityType. Annotation of the EntityType with the Title term conceptually attached a Title property whose value is the same as that of the FriendlyName property of the entity.

We also support expressions as part of a term annotation. In fact, in each of the preceding cases (with the exception of Display.Hide) there was a simple expression in the form of a literal or path. More complex expressions may apply functions to values, to enable richer uses of terms. For example, suppose that the preferred Title of the Person entity is in the form “LastName, FirstName.” This requires a more complex expression to express. A ValueAnnotation to support this may look like the following:
<Annotations Target="ExampleModel.Person">
   <ValueAnnotation Term="Display.Title" >
      <Apply Function="Display.Concat">
         <Path>LastName</Path>
         <String>, </String>
         <Path>FirstName</Path>
      </Apply>
   </ValueAnnotation>
</Annotations>
In this example, the behavior of Display.Concat is defined as part of the Display vocabulary to mean, “Concatenate my arguments, resulting in a single aggregate string value.”

Structural

The last use case we will discuss today makes use of terms that reference other terms. As an example of this imagine a vocabulary named Market, which describes a term named Product, and a related term named Review. The following CSDL represents a target model, as well as annotations for each term:
<Schema Namespace="c" Alias="ExampleModel"
xmlns=http://schemas.microsoft.com/ado/2009/11/edm>
<Using NamespaceUri="http://fabrikam.com/vocabularies/Market" 
Alias="Market" />
<EntityType Name="Sku">
<Key>
<PropertyRef Name="Id" />
</Key>
<Property Name="Id" Type="Int32" Nullable="false" />
<Property Name="Name" Type="String" />
<Property Name="Manufacturer" Type="String" />
<Property Name="Description" Type="Int32" />
<Property Name="PricePerUnit" Type="Decimal" />
<NavigationProperty Name="CustomerFeedback"
ToRole="CustomerFeedback" FromRole="Sku"
Relationship="ExampleModel.SkuFeedback />
<TypeAnnotation Term="Market.Product">
<PropertyValue Property="Title" Path="Name" />
<PropertyValue Property="Description" Path="Description" />
<PropertyValue Property="Reviews" Path="CustomerFeedback" />
</TypeAnnotation>
</EntityType>
<EntityType Name="CustomerFeedback">
<Key>
<PropertyRef Name="Id" />
</Key>
<Property Name="Id" Type="Int32" Nullable="false" />
<Property Name="CustomerName" Type="String" />
<Property Name="Comment" Type="String" />
<Property Name="Rating" Type="Int32" />
<NavigationProperty Name="Sku"
ToRole="Sku" FromRole="PurchaserFeedback"
Relationship="ExampleModel.SkuFeedback />
<TypeAnnotation Term="Market.Review">
<PropertyValue Property="Author" Path="CustomerName" />
<PropertyValue Property="Description" Path="Comment" />
<PropertyValue Property="Rating" Path="Rating" />
</TypeAnnotation>
</EntityType>
</Schema>
In this case the TypeAnnotation for Market.Product references the TypeAnnotation for Market.Review indirectly, by specifying a path to a NavigationProperty (e.g. CustomerFeedback) that relates the two target EntityType definitions.

As exemplified below, clients concerned with products and reviews can use the Market vocabulary to mine and visualize products and review information from an OData service. The won’t need other built-in knowledge of any one specific service.

Vocabulary Definitions

It is possible to annotate a model using terms from a vocabulary which doesn’t have a formal definition. A formal definition is not required, and there are likely to be common cases where no formal definition exists.

However, it is possible to formally define vocabularies in EDM. Suppose we are defining Validation and Display vocabularies. Here is how the different terms that were used in the annotation examples above can be defined in CSDL:

Display Vocabulary
<Schema NamespaceUri="http://vocabularies.foo.com/Display" Alias=”Display” xmlns="http://schemas.microsoft.com/ado/2009/11/edm">
<ValueTerm Name="Title" Type="Edm.String" />
<EntityType Name="Hide" BaseType="Edm.TypeTerm" />
</Schema>
Validation Vocabulary
<Schema NamespaceUri="http://vocabularies.foo.com/Validation" Alias="Validation" xmlns="http://schemas.microsoft.com/ado/2009/11/edm">
<EntityType Name="Range" BaseType="Edm.TypeTerm">
<Property Name="Min" Type="Edm.Decimal" Nullable="True" />
<Property Name=”Max” Type="Edm.Decimal" Nullable="True" />
</EntityType>
</Schema>
We can use a <ValueTerm> to define a term with a single value of a specified type. A value term is applied to EDM using <ValueAnnotation>. <ValueTerm> must appear directly under the Schema element and must specify the Name and the Type of the term.

On the other hand, an EntityType that derives from Edm.TypeTerm is used to define a type term. A type term is applied to EDM using <TypeAnnotation>.

A vocabulary definition may have a NamespaceUri which specifies an Uri value that uniquely identifies the definition model.

A formal definition provides a common form for communicating the valid forms for term annotations for a given vocabulary. A Formal definition also makes it possible to provide better vocabulary definition and validation experiences through tools and frameworks.

This represents our current thinking around supporting vocabularies in OData. We are looking forward to your feedback, so please let us know your thoughts via the OData Mailing List.

Shayne Burgess (@shayneburgess) asserted “This blog talks about a new feature delivered in the WCF Data Services October CTP that can be downloaded here” in an introduction to his Introducing the OData Library post of 10/14/2011 to the Astoria Team blog:

WCF Data Services’ latest CTP includes a new stand-alone library for working directly with OData. The library makes public some underpinnings of WCF Data Services (the server and client library), and we made this library stand-alone to allow its use independent from WCF Data Services. The library provides a low-level implementation of some components needed to build an OData producer/consumer. Specifically, we focused on the core tasks of reading/writing OData from streams in the library’s first version, and in the future we hope to add more fundamental OData functionality (possibly OData Uri reading and writing). However, we haven’t made any final plans on what we will add, and we welcome your feedback.

I want to take a minute to explain this library’s relation to the existing WCF Data Services products; this library doesn’t replace WCF Data Services. If you want a great end-to-end solution for creating and exposing your data via an OData endpoint, then the WCF Data Services server library is (and will continue to be) the way to go. If you want a great OData Feed-consuming client with auxiliary support, like code generation and LINQ translation, then WCF Data Services’ client library is still your best bet. However, we also recognize that people are exploring creative possibilities with OData, and to help them build their own solutions from scratch we made the components we use as part of the WCF Data Services stack available as a stand-alone library.

We have published the OData library’s latest source code for the on codeplex (http://odata.codeplex.com) as shared source for developers on .NET and other platforms

The CodePlex source code includes the samples that I have attached to this blog post, and I’ll walk through a couple of those samples to illustrate reading and writing OData.

Writing a Single Entity

To hide the library’s details of stream-reading/writing the OData Library uses an abstraction called a Message, which consists of stream and header interfaces (IODataRequestMessage, IODataResponseMessage). The example below walks through the basics of single-entry writing using an implementation of the messages that work over HTTPClient (this implementation is available in the samples project).

The library uses a class called the ODataMessageWriter to write the actual body of a single ODataMessage (request or response). The ODataMessageWriter has a bunch of methods on it that can be used for writing small non-streaming payloads, such as single properties or individual complex-type values. For larger payloads (such as entities and collections of entities) the ODataMessageWriter has methods that create streaming writers for each payload type. The example shows how to use the ODataMessageWriter methods to create an ODataEntryWriter that can be used to write a single OData entry.

Finally, the sample goes on to use the ODataEntryWriter to write a single Customer entry along with four primitive properties and two deferred links. The samples project includes a few samples that show how to write an expanded navigation link as well.

Writing Full Sample …

57 lines of source code elided for brevity

Reading a Single Entity

Let’s look at an example that shows OData deserialization via the library. The example method below demonstrates how to issue a request to the Netflix OData feed for the set of Genres and parse the response.

The example below makes use of the same ODataMessage classes as the previous example (the HTTPClientMessage), but first creates an HTTPClientRequestMessage that targets the Genres URL for the OData Netflix feeds, and then executes the request to get an HTTPClientResponseMessage that represents the response returned by the Netflix services. For readability, the example just outputs the data in the response to a text file afterwards.

The example below uses an IEdmModel not used in the writer example above. When the ODataMessageReader is created an IEdmModel is passed in as a parameter – the IEdmModel is essentially an in-memory representation of the metadata about the service that is exposed via the $metadata url. For a client component the easiest way to create the IEdmModel is to use the ReadMetadata method in the OData Library that creates an in-memory IEdmModel by parsing a $metadata document from the server. For a server, you would generally use the APIs included in the Edm Library (Microsoft.Edm.dll) to craft a model. Providing a model for OData parsing provides key benefits:

The reader will validate that the entities and properties in the documents being parsed conform to the model specified

Parsing is done with full type fidelity (i.e. that wire types are converted to the model types when parsed); this is especially important when parsing JSON because the JSON format only preserves 4 types and the OData protocol supports many more. There are configuration options to change how this is done, but I won’t discuss them here for space reasons.

If the service defines feed customizations, the model contains their definitions and the readers (and writers) will only know to apply them correctly if provided a model.

JSON can only be parsed when a model is provided (this is a limitation of the library and we may add JSON parsing without a model at some point in the future). ATOM parsing without a model is supported.

In the example below an ODataFeedReader is created out of the ResponseMessageReader to read the contents of the response stream. The reader works like the XmlReader in the System.XML library, with which many of you will be familiar. Calling the Read() method moves the reader through the document, and each time Read() is called the reader changes to a specific state that depends on what the reader is currently reading, which is represented by an “Item”. For instance, when the reader reads an entry in the feed, it will go to the StartEntry state, and the Item on the reader will be the ODataEntry being read –there are similar states for Feeds and Links. Importantly, when the reader is in a start state (StartEntry, StartFeed, StartLink, etc) the reader will have an Item it has created to hold the Entry/Feed/Link that it is reading, but the Item will be mostly empty because the reader has not actually read it yet. It’s only when the reader gets to the end states (EndEntry, EndFeed, EndLink) that the Item will be fully populated with data. …

122 lines of source code elided for brevity

This is a quick introduction to the new OData Library included in this CTP. The post’s attached samples walk through the basics of OData feed creation and consumption via the library. We welcome any feedback you have on the library so don’t hesitate to contact us.

Abhiram Chivukula posted Announcing WCF Data Services Oct 2011 CTP for .NET 4 and Silverlight 4 on 10/13/2011 post to the Astoria Team blog:

I’m very excited to announce the release of the October 2011 CTP of the next version of WCF Data Services libraries. This release includes libraries for .NET 4 and Silverlight 4 with new client and server features in addition to those included in our last October 2010, March 2011 and June 2011 CTPs.

Below is a brief summary of the features available in this CTP. Subsequent blog posts will discuss each feature in more detail and provide examples of how to use each.

Actions:

The inability to kick-off a (non-CRUD) related server process via an OData hypermedia action was an omission somewhat mitigated by low-fidelity workarounds, such as modeling hypermedia actions as entities. Actions will provide an ROA-underpinned means to inject behaviors into an otherwise data-centric model without obfuscating its data aspects, and (like navigation properties) will be advertised in the payload.

This CTP supports invoking ServiceOperation via handcrafted URL parameters, and also enables invoking parameterless actions that can return void, a single object, or a collection of objects in JSON or ATOM format.

Though this release contains the lower layers of Actions support, which enables custom OData providers to use them, it doesn’t yet enable Actions over EF-Provider out-of-box; a refresh of WCF Data Services succeeding the release of the next Entity/.NET Framework will enable this natively.

Spatial:

The ubiquity of location-aware devices demands a data type suited to communicating geospatial data, so this CTP delivers 16 new spatial OData primitives and some corresponding operations which data consumers can perform on spatial values in filter, select, and orderby clauses.

Spatial primitives follow the OGC’s Simple Features standard, but unlike other primitives, the associated operation set is extensible, which allows some servers to expose deep algorithms for powerful functionality while other servers expose only basic operations. Since the server advertises these advanced capabilities via the Actions feature, they’re discoverable by generic clients.

This CTP allows addition of spatial type properties to models via both Reflection and Custom Service Providers (EF-based services don’t yet support spatial properties), and read/write support (in ATOM or JSON formats) for all spatial types supported by SQL Server 2008 R2. The release also enables querying for all entities ordered/filtered by distance to a location, with all code running server-side; i.e. find all coffee shops near me.

Though this release contains the lower layers of Spatial support, which enables custom OData providers to use them, it doesn’t yet enable Spatial properties over EF-based Services out-of-box; a refresh of WCF Data Services succeeding the release of the next Entity/.NET Framework will enable this natively.

Vocabularies:

Those from the Linked Data and RDF worlds will feel at home with Vocabularies, but for those unfamiliar with the idea, a Vocabulary is a collection of terms sharing a namespace, and a term is a metadata extension with an optional value expression that’s applicable to arbitrary Entity Data Models (EDMs). Terms allow data producers to specify how data consumers can richly interpret and handle data. A simple vocabulary might indicate a property’s acceptable value range, whereas a complex vocabulary might specify how to convert an OData person entity into a vCard entity.

This CTP allows data service authors to configure the service for annotation through annotation files and serve a $metadata endpoint enriched with terms.

ODataLib:

The ODataLib .NET client and server libraries allow flexible low-level serialization/deserialization according to the OData Protocol Specifications.

With the exception of $batch, ODataLib now supports deserialization of all OData constructs in addition to the last CTP’s serialization support. Furthermore, ODataLib now ships with EdmLib, a new in-memory metadata system that makes it easy to build an EDM Model of a service for OData serialization/deserialization.

Frequently Asked Questions

Q1: What are the prerequisites?

A1: See the download center page for a list of prerequisites, supported operating systems, etc.

Q2: Does this CTP install side-by-side with previously released CTPs (March & June) that are currently on my development machine?

A2: No, Installation of this CTP will result in setup automatically uninstalling previously installed CTPs, if any are, installed on the machine.

Q3: Does this CTP install side-by-side with the .NET 4 and Silverlight 4 versions that are currently on my development machine?

A3: By in large this install is side-by-side with existing .NET4 and SL4 bits; however, that was not possible in all cases so some VS files will be modified by the CTP installer to enable the Add Service Reference gesture in Visual Studio 2010 to make use of the new features in this CTP. The files should be replaced to their original state during uninstall of this CTP.

Q4: Does this CTP include support for Windows Phone 7?

A: No, you can download the Windows Phone 7 SDK, which includes the OData client, from here. The Windows Phone 7 client does not yet support new features (Spatial, Actions etc.)

Known Issues, Limitations, and Workarounds

Incorrect reference to Data Services assembly in a project after adding WCF Data Services item template in Visual Studio Express:

After adding a WCF Data Services item template to a Data Services server project, the project will have a reference to System.Data.Services.dll from .NET Framework 4. You will need to remove that reference and replace it with a reference to Microsoft.Data.Services.dll from the bin\.NETFramework directory in the Data Services June October CTP installation directory (by default, it is at %programfiles%\Microsoft Data Services June 2011 CTP) and add references to Microsoft.Data.OData.dll and System.Spatial.dll.

Using add service reference in an website project results in .NET Framework 4 client-side code being generated instead of the expected October CTP 2011 code generation:

Add service reference for website projects is not supported for this CTP. This issue should be resolved by next public release.

Custom element annotation support in OData Library:

There is no support for custom element annotations in the OData Library for this CTP. This issue should be resolved by next public release.

A service using the Entity Framework provider, POCO classes with proxy and a model that has decimal keys will result in an InvalidProgramException:

This is a known issue and will be resolved in the next release of Entity Framework.

Spatial and non-standard coordinate systems:

Geospatial values in Atom only support the default coordinate system, JSON has full coordinate system support.

Support for Windows Phone 7:
The OData Windows Phone 7 client is included in the Windows Phone 7.1 SDK . The Windows Phone 7 client only supports features shipped as part of the .NET Framework 4 and does not support any OData V3 features included in this release.

Support for Datajs client library:
The OData Datajs library 7 client only supports features shipped as part of the .NET Framework 4 and does not support any OData V3 features included in this release.

Giving Feedback

The following forum can be used to provide feedback on this CTP:

http://social.msdn.microsoft.com/Forums/en-US/adodotnetdataservices/threads

We look forward to hearing your thoughts on the release!

Windows Azure AppFabric: Apps, Access Control, WIF and Service Bus

Neil MacKenzie (@mknz) posted On Not Using owner With the Azure AppFabric Service Bus on 10/14/2011:

The Windows Azure AppFabric Service Bus team has been good about providing sample code for the various features it develops. There are also a fair number of blog posts (and books) showing how to use these features. Pretty much all the sample code authenticates operations on the Service Bus via a service identity/password combination where the service identity takes the special value of owner.

In this post, I am going to show how to create and configure a service identity other than owner, and specifically how to ensure that this service identity is less privileged than owner. The intent is to show that, notwithstanding the lack of examples, it is pretty easy to use an alternate service identity. As of mid-September, the Service Bus support ACS v2 – and that is what the material in this post was tested with.

Service Bus Namespace

The starting point for all work with the Service Bus is the service namespace which is created and configured on the Windows Azure Portal. In creating a namespace, a unique name must be provided as well as the location of the Windows Azure datacenter which will host the service endpoints for the namespace. The namespace name becomes the first part of the URI used for these service endpoints. For example, if the namespace name is gress then the service gateway for the gress namespace becomes:

https://gress.servicebus.windows.net/

The various services supported by the Service Bus – including relayed and brokered messaging – are hosted at service paths under the service gateway:

https://{namespace}.servicebus.windows.net/{path}

For example, the following are valid service paths:

https://gress.servicebus.windows.net/topics/interestingtopic

https://gress.servicebus.windows.net/boringtopic/subscriptions/somesubscription

https://gress.servicebus.windows.net/EchoService/

Note that a service path must be specifically associated with a service for the path to have any meaning with regard to a service. For example, although the first service path above probably specifies a topic named topic/interestingtopic there is nothing to prevent it being used instead for a relayed service. The second probably specifies a subscription named somesubscription on a topic named boringtopic. The third likely specifies a relayed service named EchoService.

The paths for a namespace nominally form an infinite tree with the / indicating a branch point. For example, the following shows a namespace with two branches, one for services and the other for topics:

https://gress.servicebus.windows.net/services/EchoService

https://gress.servicebus.windows.net/services/OneWayService

https://gress.servicebus.windows.net/topics/interesting

https://gress.servicebus.windows.net/topics/boring

Service Bus Claims

The Service Bus uses the Windows Azure Access Control Service (ACS) for authentication and authorization. The ACS is a Security Token Service (STS) existing as part of claims-based infrastructure. Although it supports limited identity provider (IdP) capability, the primary function of the ACS is in transforming claims in support of federated identity.

This post assumes and requires only minimal understanding of claims-based authentication. The Microsoft Patterns and Practices Team has recently published the second edition of its excellent Guide to Claims-Based Identity and Access Control. This is a good resource for a much deeper understanding of the topic.

When a Service Bus namespace is created, it is associated with a default service identity named owner and a base64-encoded key. This service identity has absolute control of the namespace, including all service paths in it, so it is essential that the associated key be kept secure. This is made even more important by the fact that it is not possible to change the key on the Windows Azure Portal. The default service identity and key can be used to authenticate Service Bus API requests against services hosted by the namespace.

The Service Bus supports three claims which authorize specific operations on a service path. These claims are all of type net.windows.servicebus.action with values:

Listen

Manage

Send

The different values authorize the obvious sets of operations on service paths. For example, the Send claim is required to perform the following operations:

Send messages to a listener at a service namespace (relay)

Get the queue description

Send into to the queue

Get the topic description

Send to the topic

The complete list of operations and associated authorization requirements is provided in the MSDN documentation, which provides an official take on the material in this post.

When the default service identity, owner, authenticates against the ACS a rule is invoked that assigns all three claims to owner thereby authorizing it to perform any operation on any service path in the namespace. Note that this discussion elides a lot of claims-based identity plumbing that is, in practice, hidden from the user of the Service Bus API. Although possible, it is strongly recommended that none of the claims be removed from the authorization provided to owner. Doing so can cause significant problems with the namespace.

It is possible to create another service identity with limited privileges, e.g. without the Manage claim. Indeed, it is possible to create a service identity and associate it with only one of the claims. This allows, for example, a sending service identity to be created with only the Send claim and a listening service identity to be created with only the Listen claim. Doing so allows a sending application to be distributed without the possibility of a service identity with a Manage claim being compromised.

By default, owner has Manage, Send and Listen claims over the infinite path tree of the namespace. It is possible to restrict additional service identities to only part of the service path tree. For example, if a service identity is used only to send messages to a topic there is no need for that service identity to have any claims other than Send on the specific service path associated with the topic. The ACS supports this type of restriction, and evaluates whether a service identity has the required permission for an operation by doing a longest string match on the service path. Essentially, it works its way back along the service path seeking a portion of the path for which the service identity has the needed permission. The service identity is authorized to perform a specific operation only if somewhere on the service path it is configured with the required claim and that permission has not been removed later in the path. This permission inheritance is similar to that on a file system. Note that it can cause problems if owner has permissions removed for a service path.

Best Practices

Suggesting that something is a best practice is probably a fair bit above my pay grade. But here goes with some recommended practices:

Do not use owner for an operation that does not require the full privileges and claims of owner.

Use a service identity with the minimum set of claims needed for a service path.

Associate a service identity with the longest service path possible.

Consider the following two full service paths representing a topic and an associated subscription:

https://gress.servicebus.windows.net/topics/interestingtopic

https://gress.servicebus.windows.net/topics/interestingtopic/subscriptions/mysubscription

If the topic and subscription are created out of band, distinct service identities can be created to authorize sending messages to the topic and receiving messages from the subscription. The sending service identity can be given the Send claim for the topics/interestingtopic service path. The listening service identity can be given the Listen claim for the topics/interestingtopic/subscriptions/mysubscription service path. Separating the privileges like this minimizes the possible damage from a compromised service identity.

SBAzTool

Using an alternative service identity is a little bit more complicated than using owner. However, the Window Azure Service Bus team has provided a command-line utility, SBAzTool, that makes it very easy to add service identities and associate them with service paths.

The SBAzTool is one of the samples provided with the Windows Azure AppFabric SDK v1.5 (September 2011). It is in the ServiceBus/ExploringFeatures/Authorization/ManagingAccessControl directory of the samples. The source code is also supplied in case you want to roll your own service management utility.

The following commands create two new service identities, topicSender and subscriptionListener, and grants the topicSender the Send claim on the topic (from the previous section) and the subscriptionListener the Listen claim on the associated subscription.

sbaztool -n gress -k xkx…6o= storeoptions

sbaztool makeid topicSender
sbaztool grant Send /topics/interestingtopic topicSender

sbaztool makeid subscriptionListener
sbaztool grant Listen /topics/interestingtopic/subscriptions/mysubscription subscriptionListener

sbaztool clearoptions

The storeoptions option is used to store the namespace and the owner key in isolated storage for the SVAzTool – and the clearoptions option removes the information from isolated storage. The SBAzTool can also be used to delete service identities and revoke permissions, as well as display information about service identities and service paths.

SBAzTool makes it really easy to configure service identities and service paths so that service identities other than owner can be used. Consequently, there is no real excuse for not doing so.

Having done all this, we can replace the following code:

TokenProvider tokenProvider =
TokenProvider.CreateSharedSecretTokenProvider(“owner”, ownerKey);

with

TokenProvider tokenProvider =
TokenProvider.CreateSharedSecretTokenProvider(“topicSender”, topicSenderKey);

Windows Azure Portal – Access Control Service

The configuration performed by SBAzTool can also be performed manually on the ACS section of the Windows Azure Portal. The UI for ACS configuration on the portal may appear overkill in terms of what has been discussed so far. However, ACS has a much broader use than authorization on the Service Bus.

Clemens Vasters recently made a video of a presentation in which he steps carefully through the process of configuring an issuer and service path. Consequently, I will not repeat the steps here, and will merely put the configuration performed by SBAzTool into the context of the portal.

The makeid option with SBAzTool creates a Service Identity (visible on the portal) that can be authenticated with either a symmetric key or a password. Clemens suggests that the values of these should be the same, and that is what SBAzTool does.

The grant option with SBAzTool creates a Relying Party Application (visible on the portal) for the full service path, e.g.

http://gress.servicebus.windows.net/topics/interestingtopic

Note that the full service path is normalized to specify http rather than any other protocol. Additionally, the tool specifies a token format of SWT and sets the token lifetime to 1200s.

The tool also creates a Service Group (visible on the portal) and associates it with the relying party application configuration for the service path. It also associates the service group with each intermediate service path en route to the service path. This service group generates the appropriate Manage, Send and Listen claims configured by the grant (and revoke) option of SBAzTool. The tool also adds a default service group which generates the Manage, Send and Listen claims for owner on the service path and each intermediate service path. This is to prevent owner not having full rights on the namespace.

Note that currently there is a bug in the portal which causes a problem when following the instructions in the video. Specifically, when creating a relying party application on the portal it is not possible to save the configuration without generating a token signing key. However, doing so causes an error when accessing the service path from an application. The workaround is to go the Certificates and keys section of the ACS configuration on the portal and delete the created token-signing certificate. Everything works fine once that is done.

Summary

My intent in this post was not to give a deep explanation of how to use a claims-based identity framework to secure the Service Bus. Rather, my intent was to provide the minimum amount of information allowing the reader to replace the use of owner with that of a less-privileged service identity. Specifically, I wanted to show how easy it is to do so using the SBAzTool.

Avkash Chauhan (@avkashchauhan) described Integrating BizTalk Server application with Windows Azure Service Bus Queues in a 10/13/2011 post:

I was asked to find some resources on BizTalk and Windows Azure Integration so after my research, I found the following article by Paolo Salvatori from Microsoft, about integrating BizTalk Server application with Windows Azure Service Bus Queues. The article is very detailed and required a good long sitting to consume it however at the end the reading is absolute wroth of your time.

Read: How to integrate a BizTalk Server application with Service Bus Queues and Topics

This article shows how to integrate a BizTalk Server 2010 application with Windows Azure Service Bus Queues, Topics, and Subscriptions to exchange messages with external systems in a reliable, flexible, and scalable manner. Queues and topics, introduced in the September 2011 Windows Azure AppFabric SDK, are the foundation of a new cloud-based messaging and integration infrastructure that provides reliable message queuing and durable publish/subscribe messaging capabilities to both cloud and on-premises applications based on Microsoft and non-Microsoft technologies. .NET applications can use the new messaging functionality from either a brand-new managed API (Microsoft.ServiceBus.Messaging) or via WCF thanks to a new binding (NetMessagingBinding), and any Microsoft or non-Microsoft applications can use a REST style API to access these features.

No significant articles today.

Windows Azure VM Role, Virtual Network, Connect, RDP and CDN

No significant articles today.

Live Windows Azure Apps, APIs, Tools and Test Harnesses

Avkash Chauhan (@avkashchauhan) reported Windows Azure Autoscaling Application Block (WASABi) BETA is ready for you to try... in a 10/14/2011 post:

BETA release for WASABi (Windows Azure Autoscaling Application Block) is ready for download from NuGet Here:

Source Code Link

Binary Link

A list of October Release and latest changes are available at:

http://entlib.codeplex.com/wikipage?title=WASABiBetaReleaseNotes&referringTitle=EntLib5Azure

New Features:

The Stabilizer increases hysteresis in the block's scaling operations by preventing reactive rules from performing repeated and erratic scaling actions.

Application throttling provides a new reactive rule action for changing a Windows Azure configuration setting for a specific role when a rule condition is satisfied. Through this, you can limit or disable certain relatively expensive operations in your application when the load is above certain thresholds.

E-mail notifications allow for hypothetical scaling. The block still evaluates the rules and makes a decision (recommendation) on what should be scaled and how, but instead of performing the change it sends a notification to configured recipients (human operators) to scale manually or to perform some other action.

Collection of role instance count values for every role defined in the service information store (this is mainly used by the stabilizer feature, but users could also use it in operands for rule conditions).

Ability to define custom actions for reactive rules.

Ability to specify custom operands for use in reactive rule conditions.

New "Max" and "Growth" aggregate functions in operand definition.

Enhanced log messages for rules evaluation and scaling to better determine what is happening in the system.

Added an Evaluation ID to correlate all log messages for a specific evaluation pass.

Added JSON payloads to log messages to assist tools that monitor and interpret the messages.

Provided utility classes and constant definitions to read and parse the log messages.

Configuration tool enhancements, including better names and descriptions for configuration objects, validation, and a Windows Azure connection string editor.

Developer Guide Located at CodePlex:

EntLib-Azure-DevGuide-Beta-Oct2011.pdf

WASABi - Autoscaling Application Block -Beta - Oct2011.pdf

Training Video:

http://channel9.msdn.com/posts/Autoscaling-Windows-Azure-applications

Other Links:

http://blogs.msdn.com/b/agile/archive/2011/10/13/windows-azure-autoscaling-block-beta-is-out.aspx

WASAbi Home Page: http://entlib.codeplex.com/wikipage?title=EntLib5Azure

Autoscaling Overview: http://blogs.msdn.com/b/agile/archive/2011/08/23/autoscaling-windows-azure-applications.aspx

WASAbi rather than WAzAsAbi as the acronym was a bit of a stretch.

Bruce Kyle reported availability of a new ISV Video: Social Media Goes Mobile with Glassboard on Azure in a 10/14/2011 post to the US ISV Evangelism blog:

You can now share your private data with groups of your friends, coworkers, and those outside your work organization courtesy of a new free product, Glassboard, from social media company Sepia Labs. Users can connect with each other's Windows Phone, Android, or iPhone. Soon users can connect from within a Silverlight part in Office 365.

ISV Video Link: Social Media Goes Mobile with Glassboard on Azure.

Walker Fenton and Brian Reischl of Sepial Labs talk with me about why they chose Windows Azure to connect to the various phones. Brian explains how they used WCF using REST on Windows Azure. They do a demo of each phone using Azure to process messages across phone platforms using notifications. They show how you can sharie photos, videos, and locations.

For more information about the technologies used by Sepia Labs, see:

Azure Table Encryption via Attribute

REST on WCF

Windows Azure Toolkit for Windows Phone

About Glassboard

Glassboard is an app for sharing privately with groups (or 'boards' as we call them). With Glassboard you can message a group of people quickly & easily (looks like a text on your phone but it isn't!), share photos and videos, and even show your location when appropriate. Everything within a board is _completely_ private. Only the board chair can invite you to a board, and there is no way for someone not invited to search or discover a board. It's your own private corner of the Internet.

Sepia Labs is a spinoff from NewsGator, from SharePoint partner of Social Sites.

Visual Studio LightSwitch and Entity Framework 4.1+

No significant articles today.

Return to section navigation list>

Windows Azure Infrastructure and DevOps

James Urquhart (@jamesurquhart) posted Cloud, open source, and new network models: Part 1 to C|Net News’ The Wisdom of Clouds blog on 10/14/2011:

What is the network's role in cloud computing? What are the best practices for defining and delivering network architectures to meet the needs of a variety of cloud workloads? Is there a standard model for networking in the cloud?

(Credit:Amazon Web Services)

Last week's OpenStack developer summit in Boston was, by all accounts, a demonstration of the strength and determination of its still young open-source community. Nowhere was this more evident than in the standing-room-only sessions about the Quantum network services project.

I should be clear that though I worked on Quantum planning through Cisco's OpenStack program, I did not personally contribute code to the project. All opinions expressed here are mine, and not necessarily my employer's.

Why is Quantum important in the context of cloud networking? Because, I believe, it represents the model that makes the most sense in cloud infrastructure services today--a model that's increasingly become known as "virtual networking."

In this context, virtual networking refers to a new set of abstractions representing OSI Model Layer 2 concepts, like network segments and interfaces, and Layer 3 concepts, like gateways and subnets, while removing the user from any interaction with actual switches or routers. In fact, there are no direct representations of either switches or routers in most cloud networks.

The diagram below comes from my "Cloud and the Future of Networked Systems" talk, most recently given at Virtual Cloud Connect in late September:

(Credit: James Urquhart)

Here's what's interesting about the way cloud networking is shaking out:

From the perspective of application developers, the network is getting "big, flat, and dumb", with less complexity directly exposed to the application. The network provides connectivity, and--as far as the application knows--all services are explicitly delivered by virtual devices or online "cloud services."

There may be multiple network segments connected together (as in a three-tier Web architecture), but in general the basic network segment abstractions are simply used for connectivity of servers, storage, and supporting services.

From the service provider's perspective, that abstraction is delivered on a physical infrastructure with varying degrees of intelligence and automation that greatly expands the deployment and operations options that the application owner has for that abstraction. Want cross-data-center networks? The real infrastructure can make that happen without the application having to "program" the network abstraction to do so.

Using an electric utility analogy (which I normally hate, but it works in this case), the L2 abstraction is like the standard voltage, current, and outlet specifications that all utilities must deliver to the home. It's a commodity mechanism, with no real differentiation within a given utility market.

The underlying physical systems capabilities (at the "real" L2 and L3), however, are much like the power generation and transmission market today. A highly competitive market, electric utility infrastructure differentiates on such traits as efficiency, cost, and "greenness." We all benefit from the rush to innovate in this market, despite the fact that the output is exactly the same from each option.

Is abstraction really becoming a standard model for cloud? Well, I would say there is still a lot of diversity in the specifics of implementation--both with respect to the abstraction and the underlying physical networking--but there is plenty of evidence that most clouds have embraced the overall concept. It's just hard to see most of the time, as network provisioning is usually implicit within some other service, like a compute service such as Amazon Web Services's EC2, or a platform service, such as Google App Engine.

Remember, public cloud services are multitenant, so they must find a way to share physical networking resources among many independent service consumers. The best way to address this is with some level of virtualization of network concepts--more than just VLANs (though VLANs are often used as a way to map abstractions to physical networks).

Most services give you a sense of controlling network ingress and egress to individual VMs or (in the case of platform services) applications. Amazon's Security Zones are an example of that. A few, such as GoGrid and Amazon's Virtual Private Cloud (pictured at the top of this post), give you a subnet level abstraction.

In part 2 of this series, I'll explain how Quantum explicitly addresses this model, and the next steps that Quantum faces in expanding the applicability of its abstractions to real world scenarios. In the meantime, if you use cloud services, look closely at how networking is handled as a part of those services.

You'll see evidence of a new network model in the making.

Windows Azure Platform Appliance (WAPA), Hyper-V and Private/Hybrid Clouds

Matthew Weinberger (@M_Wein) reported Microsoft Highlights Cloud Features of SQL Server 2012 in a 10/14/2011 post to the TalkinCloud blog:

At the PASS Summit 2011 keynote session, Quentin Clark, corporate vice president of Microsoft SQL Server Database Systems Group, highlighted the cloud readiness of Microsoft SQL Server 2012, including interoperability with SQL Azure and appliances such as HP’s.

For starters, Clark highlighted just how deeply the SQL Azure and SQL Server 2012 solutions are integrated:

“It’s our goal to help customers achieve ultimate scalability, performance and deployment flexibility for mission-critical workloads running in the public and private cloud. Built from the same code base, SQL Server 2012 and SQL Azure provide a complete database solution for the enterprise. The latest releases offer huge advancements that address the evolving needs of IT.”

Now, The VAR Guy has some in-depth analysis of the HP Enterprise Database Consolidation Appliance optimized for SQL Server, which at the high level places your database in a private cloud — a solution that Microsoft claims is the first of its kind, reducing database deployment times from weeks to minutes.

In addition, Microsoft is launching two of its famous community technology previews (CTP) for solutions designed to connect on-premises and cloud-based SQL databases and boost BI, as per the press release:

SQL Azure Reporting enables developers to use familiar tools to create and deploy operational reports to the cloud, which can easily be embedded in an application or browser.

SQL Azure Data Sync enables easier sharing and synchronization between multiple SQL Server and SQL Azure databases, helping customers to extend their enterprise data to the cloud and synchronize data between SQL Azure databases for geo-availability.

And down the line, Microsoft is promising new features such as SQL Azure Federation, which would let developers build and scale out applications across multiple databases; an increase in SQL Azure size to 150GB; and SQL Server 2012 backups directly to SQL Azure.

In other words, Microsoft is serious about tightening the bonds between its server and cloud businesses. TalkinCloud will be watching very closely for more updates.

Read More About This Topic

Microsoft Launches Free Windows Azure Cloud Offer for ISVs

OpenStack FreeCloud Lets ISVs Perform Cloud Experiments

HP, SAP Team Up for Rapid-Deployment Cloud CRM Offering

New HP CEO Meg Whitman: Will eBay Experience Help HP Cloud Push?

HP Partners: Cloud Solution Centers Now Open for Business

Cloud Security and Governance

David Navetta (@DavidNavetta) described SEC Issues Guidance Concerning Cyber Security Incident Disclosure in a 10/14/2011 post to the InformationLawGroup blog:

(Co-authored by Nicole Friess) Publicly traded businesses now have yet another set of guidelines to follow regarding security risks and incidents. On October 13, 2011 the Securities and Exchange Commission (SEC) Division of Corporation Finance released a guidance document that assists registrants in assessing what disclosures should be made in the face of cyber security risks and incidents. The guidance provides an overview of disclosure obligations under current securities laws – some of which, according to the guidance, may require a disclosure of cyber security risks and incidents in financial statements.

Drawing from certain SEC forms and regulations, the guidance emphasizes that registrants should disclose the risk of cyber incidents “if these issues are among the most significant factors that make an investment in the company speculative or risky.” Registrants are expected to evaluate security risks, and if a registrant determines that disclosure is required, the registrant is expected to “describe the nature of the material risks and specify how each risk affects the registrant,” avoiding generic disclosures.

The SEC indicated that in analyzing cyber security risks and whether those risk should be reported, registrants should take the following into account:

prior cyber incidents and the severity and frequency of those incidents;

the probability of cyber incidents occurring and the quantitative and qualitative magnitude of those risks, including the potential costs and other consequences resulting from misappropriation of assets or sensitive information, corruption of data or operational disruption; and

the adequacy of preventative actions taken to reduce cyber security risks in the context of the industry in which they operate and risks to that security, including threatened attacks of which they are aware.

Additionally, the guidance advises registrants to address risks and incidents in their MD&A “if the costs or other consequences associated with one or more known incidents or the risk of potential incidents represent a material event, trend, or uncertainty that is reasonably likely to have a material effect on the registrant’s results of operations, liquidity, or financial condition or would cause reported financial information not to be necessarily indicative of future operating results or financial condition.” Other situations requiring disclosure include if one or more incidents has materially affected a registrant’s “products, services, relationships with customers or suppliers, or competitive conditions” and if an incident is involved in a material pending legal proceeding to which a registrant or any of its subsidiaries is a party. Registrants are also expected to disclose certain security incidents on financial statements, as well as the effectiveness of disclosure controls and procedures on filings with the SEC.

While cyber security risk has always been a potential financial disclosure issue, and something that directors and officers need to take into account, the SEC guidance really highlights the issue and brings it to the fore. Even so, materiality is still going to a big issue, and not every breach will need to be reported as many/most will not likely involve the potential for a material impact to a company.

What the guidance document does stress, however, is process and risk assessment. One read of this guidance is that companies internally are going to have to more carefully forecast and estimate the impact of cyber incidents and the consequences of failing to implement adequate security. This analysis will go well beyond privacy-related security issues where most companies have focused (due to various privacy laws and regulator activity), and implicate key operational issues impacted by security breaches. It will be interesting to see how this affects the internal corporate dynamics between CIOs and their business counter-parts. This guidance may provide additional leverage for security risk managers to obtain bigger budgets, new technology and more personnel.

This guidance may impact the traditional breach notification process as well. Companies may now need to analyze not only whether notice to impacted individuals is necessary, but also whether shareholders should be getting a disclosure in financials statements. This new guidance also raises the specter of directors and officers lawsuits. We saw a D&O suit in the Heartland data breach that went nowhere, does this guidance provide more legs to plaintiffs? Only time will tell.

Cloud Computing Events

Steve Plank (@plankytronixx) reported on 10/14/2011 The New Series of the Windows Azure Bootcamp UK Continues in London & Edinburgh!:

Registration is now open for the November edition of the Windows Azure BootCamp (powered by Tech.Days UK).

The Windows Azure Camps are events which will show you how to take advantage of cloud computing. This is a free full day training to get you up-to-speed with Microsoft’s Windows Azure. We’ll take you on a journey from learning about the cloud, writing some code, deploying to the cloud and making a simple application available on the internet.

There will be experienced people available to guide you through each exercise. Once you have the basics in place, you’ll be up and running. To get your applications running, you’ll need an Azure subscription, so we can provide you with a special free pass that will entitle you to four compute instances, 3 GB of storage, Two* 1GB Web Edition databases. You can carry on using this subscription until it expires, after you’ve left the training course. You do not need a credit card to activate this free pass.

When you walk away from this bootcamp, you can either de-activate the Azure application before you leave, or leave it running so that when you get home you can continue with your Windows Azure coding adventures. In any case you will
walk away with the code you’ve written on your laptop and an ability to modify it, test it locally on your laptop and deploy it to your free Windows Azure subscription any time you choose!

Pre-requisites:

A laptop with either Visual Studio 2008 or Visual Studio 2010 installed along with the Windows Azure SDK and Tools version 1.5. Bring the power supply, you will be using the laptop all day.

A basic knowledge of programming concepts and familiarity with Visual Studio

A basic knowledge of web-programming and how Internet applications work

An understanding of the Microsoft web-stack (Windows Server, IIS, basic security etc.)

Request for an Windows Azure Pass (which will take up to 5 days to process - so please allow time for this)

Spaces are limited, so register your place in the Windows Azure Bootcamp today!

Click here to register for the Windows Azure Bootcamp in Edinburgh on Friday 11 November from 9am onwards.

Click here to register for the Windows Azure Bootcamp in London on Friday 25 November from 9am onwards.

We look forward to seeing you at the Windows Azure Bootcamp.

Other Cloud Computing Platforms and Services

No significant articles today.

Technorati Tags: Windows Azure, Windows Azure Platform, Azure Services Platform, Azure Storage Services, Azure Table Services, Azure Blob Services, Azure Drive Services, Azure Queue Services, SQL Azure Database, SADB, Open Data Protocol, OData, Windows Azure AppFabric, Azure AppFabric, Windows Server AppFabric, Server AppFabric, Cloud Computing, Visual Studio LightSwitch, LightSwitch, Amazon Web Services, AWS