Canonical information modelling - a best practice for SOA?

I recently attended the second annual “Canonical Model Management Forum” at the Washington Plaza Hotel in Washington, DC (see here for my post about last year’s, first meeting, including Forrester’s definition of canonical...


I recently attended the second annual “Canonical Model Management Forum” at the Washington Plaza Hotel in Washington, DC (see here for my post about last year’s, first meeting, including Forrester’s definition of canonical modelling). Enterprise or information architects from a number of government agencies as well as several of the major banks, insurance companies, retailers, credit-card operators, and other private-sector firms attended the meeting.

There was one vendor sponsor (DigitalML, the vendor of IgniteXML). There were a number of presentations by the attendees about their environments, what had motivated them to establish a canonical model, how that work had turned out, and the important lessons learned.

Last year I also had some recent Forrester survey results to share - we have not yet rerun that survey, but we are on the verge of rerunning it, so I’ll post some key results from that once the data is available.

Last year’s post is still the place to go to get the general overview about why to do canonical modelling, the main use cases, some areas of controversy (still raging), and a list of best practices I heard attendees agree upon.

What’s new in 2011?

Based both on what I heard at this meeting, and on other recent interviews:

  • Canonical modelling is becoming more common, and delivering more value. I see this not only in the growth of DigitalML’s customer base, but also in the increasing number of organisations I’ve interviewed who are implementing canonical information models as part of their data integration, application integration, B2B integration, or data services layer implementation, using a wide range of tools. So I predict our survey data will show this too - I wish I had it now!

The main motivation for canonical models is still to increase reuse of shared services, also making them easier and faster to consume. One large oil company I interviewed recently has been measuring this since going live in 2009, and found that 40% of new requests for access to data can now be satisfied by an existing data service. At this year’s CMM Forum, Novartis presented results of its canonical modelling efforts since early 2009 showing reuse ranging from 20% on early projects, up to 100% on later projects, in domains for study management, drug delivery, and customer master.

  • Tool support is even more important to make the effort scalable and cost-effective. Creating the initial model is not that hard, compared to the effort to spawn and manage the many physical XSDs or other artifacts derived from it. Managing change over a population of schemas linked to a canonical model that is in turn linked to multiple industry standards that are in a constant state of evolution requires significant, error-prone manual effort, without tool support. Tool features that support mapping the dependencies between these various models while linking to a common vocabulary, taking subsets to limit change impact, dividing models into multiple federated domains, and interfacing with other modelling tools and metadata / services repositories are the ones cited most often by customers as giving them significant help in sustaining their information architecture and canonical modelling programs.
  • Knowledgeable architects view canonical modelling as a best practice for SOA. While Doug Stacey of Allstate was presenting their approach to implementing a federated canonical model, he made the statement that they were able to obtain support for their information architecture program and canonical model from their Chief Architect by getting agreement that "If you're going to do SOA, you've got to do something to control this" [the data model]. The person sitting next to me, an architect from a large financial services company in the mortgage business, remarked under her breath “who doesn’t?” Many heads around the room were nodding - the only area where any apparent differences emerged was regarding how far down into the domains one can reasonably push governance and control, vs. how light of a hand one may apply to be more pragmatic and focused more on connections between domains.
  • Federated models are becoming increasingly common. I have seen this pattern emerge in several recent examples, both at the forum (such as Allstate), and elsewhere. Federation is still largely a practice of large companies, but it seems to be the only way to make SOA or a canonical model scale to work over such a large and diverse enterprise (including in the federal space). The last time I did research on federated SOA, the federation capabilities of SOA registry/repositories were relatively immature, such that most of those doing federation had to stick with using the same repository everywhere to make it possible (federating across multiple instances). Now it appears to be possible to federate across a limited number of connection types among the leading registry/repositories (HP, IBM, and SAG Centrasite), making a more heterogeneous approach feasible.
  • A sizable proportion of canonical models are also supporting data access layers. When I gave my presentation at the CMM Forum, an update on the state of canonical modelling, I asked for a show of hands of how many of those present (perhaps 35 architects) were using their canonical model in conjunction with a service-oriented data access layer, and about 40% held up their hands. The Novartis presentation showed a specific example — a “virtual data layer offering data in Common Information Model.” At Forrester we’re currently updating our research on the Data Services market, and a number of the companies we’re interviewing about their implementations have made canonical models a key part of their strategy for data services.
  • “Big Data” lies on the horizon, offering great promise for more insights. Combining “Big Data” (e.g. Hadoop, or large warehouses - such as 150 billion rows - on Netezza appliances) with a data services layer that has a canonical model is still a rare and emergent practice, but I expect it to start growing slowly as a design pattern that will appeal to organisations using SOA that also have “Big Data” resources available. Most analytics use cases for “Big Data” will still access those resources directly, through APIs or frameworks, but by also “publishing” key analytical insights up through the data services layer, a much broader population can then take advantage of these assets than would otherwise be able to do so. Given that Customer is one of the most common entities found in data access layers (striving for the elusive “single view of the customer”), “Big Data” containing information about customer behaviour via web, mobile, location-based, or smart-grid applications appears the most attractive for early exploitation
.But one might ask - what’s the model of the “Big Data”? The underlying data may be structured, semi-structured, or unstructured (think Twitter streams), but once analytics extract insights from the data, a structure emerges that complements the canonical model. However, whereas the canonical model expresses a need to govern and standardise, the model of “Big Data” is often dynamic by nature, so don’t try to standardise it, except where particular insights such as phone calling behaviour are reasonably stable over the life of a system. Insights from the Twitter stream will never have that kind of stability - think “trending topics.”

  • MDM golden masters are now more often becoming sources for data services. MDM initiatives are often painful and long, but when they finally begin to deliver results, these make excellent sources for data services, and therefore play a key role in your canonical model. I’ve seen multiple examples of this recently — a few architects referred to this reuse of MDM data (golden masters that had been staged from original sources) in an almost offhand matter, but it was only one of several sources they were aggregating. There’s much more to MDM than just standardising the model, but if your organisation has gone to the trouble of doing the kind of quality scrubbing, de-duping, and information merging that MDM requires, it makes another great place to start in looking for sources for your canonical model, along with whatever industry standards for data exist in your industry context.

There’s much more I could say, but in the interest of time, I’ll stop there. The combination of canonical modelling, data services and “Big Data” is generating a lot of activity and opportunities for innovation, so watch this space for more titbits as they emerge.

And if you’re a Forrester client, and have a canonical modelling initiative you’re considering kicking off, please submit an inquiry to [email protected] on this topic, and I’d be happy to discuss the details of your program. Be sure to send some basic info about what you’re doing as part of the inquiry, so I can prepare to give you maximum value.

Posted by Mike Gilpin

Find your next job with computerworld UK jobs