It is by now a truism that most IT organisations are planning an IT infrastructure strategy that includes cloud computing and that an internal cloud (aka private cloud) is a fundamental part of that strategy.
While trying to avoid getting sucked into the vortex that is defining cloud computing, I think it's safe to say that a cloud computing environment includes the ability for IT resource consumers like application developers to self service resource requests, along with automated provisioning (aka orchestration) of computing resources like virtual servers, network connectivity and storage. The mere deployment of virtualisation enabling support of multiple virtual servers on a single physical server, while admirable and useful in itself, is not cloud computing.
In talking to a number of informed people, it's clear that private cloud implementations are moving forward in many organisations, with RFPs to vendors out with an aim of contract award in 2011 and implementation in 2012.
The question is, how will that private cloud be used, and what are the downstream effects of moving to a private cloud? In our work, we run across a number of scenarios, some of which make sense, some of which don't make sense, and some of which are incomprehensible. I thought it might be useful to share what we see and what we predict are some of the ways private clouds will be used by those organisations implementing them.
A planning assumption that we bring to the table is that no organisation is going to insert a full blown cloud infrastructure into their existing production application environment. The reasons for this assumption are the following:
- A key principle of CIOs everywhere is don't mess with something that's in and working. Change introduces the potential for disruption and failure. So why insert an entire new infrastructure into one in which applications are quite happily humming along? Note: This does not imply that existing environments will not have virtualisation brought in. One of the most felicitous aspects of virtualisation is that it provides great benefit, cost reduction via server consolidation, without introducing much change at the application level.
- Most existing applications won't benefit from being placed into a cloud computing environment. Most production applications are written with static topology and manual administration assumed, so they can't take advantage of self service and automated elasticity. Therefore, inserting a cloud computing infrastructure into the production environment is going to provide little improvement for these applications. In any case, the leisurely march of virtualisation into production environments should call into question the belief that IT organisations are going to, overnight, disperse cloud computing capabilities throughout their production infrastructure.
- Cloud computing is expensive and disruptive to IT organisations. We constantly see organisations that underestimate the cost and change of moving to cloud computing. Just the fact that a new term, devops, needed to be created to describe how IT has to operate in a cloud environment should provide a clue about this.
So, to summarise: putting cloud computing into an existing production environment is disruptive and expensive, and doesn't provide many benefits. This should explain our assumption that most IT organisations will not retrofit cloud computing into their production computing environments.
Given this, many IT organisations are directing their initial private cloud initiatives at serving developers, which makes a lot of sense. Developers are typically underserved by existing processes, and offering them a self service option helps productivity and, crucially, avoids many issues associated with production private clouds, like how to integrate existing heavyweight processes like ITIL with agile self service resource assignment. Moreover, developers are pretty expensive employees, and avoiding long waits for resources reduces costs.
The question is, if an organisation's initial foray into a private cloud is aimed at developers, what are the subsequent use scenarios? In other words, once developers begin using the private cloud for development (and, of course, testing) purposes, what happens? Here are common use scenarios and their implications:
Scenario one: Agile development, static operations
In this scenario, software and QA engineers are provided a private cloud for development purposes, but when it comes time for production deployment, the application is operated according to the existing processes (which were, remember, created to manage static topology, inelastic applications in an often-process heavy ITIL-like environment).
We believe the satisfaction level to this strategy depends upon what proportion of newly developed applications assume and use the elastic automation associated with cloud computing. Selecting this approach might depend upon organisation specific projections of future application elasticity requirements. If the proportion of applications requiring elasticity is rather low, this scenario might be perfectly acceptable. For the majority of newly developed applications, static operation techniques would be appropriate. For the minority of applications that require elasticity, an exception to provide a more agile operations environment could be made and pertinent measures taken.
The challenge with this scenario is that it is in conflict with what we see as the increasingly common nature of future applications. That is, the nature of applications is changing, with more highly variable workloads, much larger scale and more complex deployment topologies that are more difficult to manage in a manual fashion. In a phrase, there is an impedence mismatch between the future of applications and the operational assumptions of this scenario.
Scenario two: Agile development, semi-agile operations
In this scenario, new applications are placed into production in an operations infrastructure that can support elasticity, complex topologies and automated administration, while the existing applications continue to operate in the older, static operations environment. One might think of this as building an add-on to the existing data centre environment, which operates by new rules.
In a way, this scenario is consistent with the history of computing. New computing platforms don't displace what already exists, the platforms accrete to what's in place. What commonly happens is that most new applications are deployed on the new platform, while existing platforms are limited to minor upgrades to existing applications. And, of course, over time the new platform represents the vast preponderance of the total number of applications.
This is an attractive scenario, in that it reduces overall disruption and provides a good deployment option for cloud-developed and -based applications. It avoids the challenges associated with the impedence mismatch of the previous scenario.
Two things to watch out for in this scenario:
First, the disconcerting way in which applications edge from "development" to "roduction" without an official recognition or acknowledgement. IT operations may find itself responsible for applications that it had no idea were going to move into production, requiring agile, elastic infrastructure. That is to say, IT operations may find themselves challenged to provide a production cloud environment well before planning to do so. This "premature" productisation will inevitably cause problems and accelerated catch up.
Second, it's easy to underestimate the change necessary to operate an agile infrastructure. End-to-end automation carries implications well beyond installing a cloud software stack and declaring "open for cloud business." Just as it's traditional that new platforms accrete around old ones, it's also traditional for IT organisations to overemphasise technology and underrate people and process. The outcome of this situation is that the cloud application will suffer many problems when put into production as the operations group learns on the fly how to manage an automated, elastic application.
Scenario three: Agile development, bypassed operations
This scenario presents an existential challenge to the mainstream infrastructure operations organisation and, indirectly, a threat to the financial underpinnings of the entire IT organisation. In this scenario, developers attempt to use the private cloud but, for various reasons, find some element of the environment unsatisfactory and choose to develop or deploy in a public cloud environment.
An example of why this might come to pass can be illustrated by an example we ran into recently. In discussing cloud computing with an infrastructure manager, we described the need for resource user self service. The manager was fine with greater agility, he allowed, but the request for resources had to be forwarded to an operation administrator who would evaluate the request and, should it be appropriate, would provision the resources himself and then forward information back to the developer sufficient to begin using the resource. He really didn't understand the difference between true self service and email-enabled resource requests. I wouldn't care to hear his response to the need for self-provisioning elastic applications directly provisioning resources in response to system load.
This response is typical of organizations responding to innovative developments (I wrote about sustaining versus disruptive innovation last week and concluded that cloud computing is a disruptive innovation). When confronting a disruptive innovation, organizations commonly attempt to force fit it into existing processes and assumptions, usually unsuccessfully.
In this scenario, developers quite happily begin to use the private cloud, but, when confronted with unwillingness on the part of operations to support self service, application elasticity, etc., become dissatisfied with the offering and choose to either: (1) deploy the application outside of the internal data centre or (2) more worryingly, turn their back on the private cloud and choose to develop and deploy in a public cloud environment.
This kind of situation can be blunt or subtle, but, in the end, falls short of what developers want. One of the main points that we emphasise with our clients is that cloud computing reduces the friction in obtaining and using computing resources, discarding the endless requests, meetings, telephone calls, emails, escalations, not to mention the often heavy-handed rationing of resources that expects developers to justify why they want a server (or storage or whatever).
Putting an unresponsive production infrastructure behind an agile development environment may end up investing in a development cloud that ends up unused. Even worse, this scenario can hold the potential for stranded investment, as expensive production environments lie fallow while applications are deployed into public clouds that support low friction interaction.
Overall, organisations looking to deploy private clouds should thoroughly understand what they're signing up for. A development cloud is an appropriate start, but is not sufficient for a long-term plan. It's inevitable that a development cloud will be the first step toward implementing a larger production environment capable of supporting self service, elastic provisioning, and agile operations fully committed to cloud computing characteristics. Anything less will, in the end, fall short.