Once you move your core IT systems into private or public cloud networks, your work isn't over. Now you have a different set of technology issues to deal with: managing the cloud to ensure that your investments pay off for your enterprise and deliver the efficiencies and ROI that you're expecting.
Cloud management and monitoring have become even more important in the wake of a massive outage of Amazon Elastic Compute Cloud service. The IT world got to see just what happens when a cloud environment runs into problems, taking the operations of many companies down with it. There have been several other recent serious cloud outages as well.
Getting the performance that your enterprise is paying for is "one of the big 'gotchas' for public clouds," says Mary Johnston Turner, an analyst at IDC. In a recent study of 250 user companies, service level agreement performance guarantees ranked second in importance after the specific needs of the applications themselves, she says.
"Enterprises are very concerned about performance," she says. "One of the reasons you're seeing so much interest in private clouds is because IT leaders are responsible for getting good performance to their users" and they aren't always ready to hand those huge responsibilities over to third party cloud vendors.
And that, she added, is not just a cloud problem but is one created by the complexity of composite applications that then are introduced into cloud environments.
"It's a huge challenge," Turner says. "Users need to be investing in application performance management [products] that are built for composite applications and virtualised environments. There's a whole category now."
The idea, she says, is to be able to independently monitor the performance of the applications as they go across the network or the cloud, and then be able to measure that performance where it reaches the end user, whether that is inside or outside the firewall, Turner says.
For David Ting, vice president of engineering at IGN.com, one of the largest video game review websites in the world, monitoring his company's cloud performance is critical because the business lives or dies based on the ability of its 25.4 million users to connect with the site's ad-supported online properties.
"For us, performance is money because page views are key," he says. "We're ad-supported, so every page view counts" and helps the company bring in revenue. "These are things that we watch very carefully."
To make it all work, IGN Entertainment, a division of media giant News Corp, uses performance monitoring tools from New Relic that allow their staff to continuously watch the performance of its sites in the cloud. "We depend very heavily on that tool," Ting says. "For us it's about response time and transactions per second for our IGN websites."
Tracking performance as cloud deployment expands
IGN.com has been using the New Relic tools for about 18 months. It started out by moving non-production development and other applications to the cloud to see how things worked. Now IGN.com is putting some new projects onto cloud servers, including a social media stack so the company can ramp up applications and scale them as needed, Ting says. Also being slated for placement in the cloud is the network's disaster recovery infrastructure.
"It could eventually all go to the cloud," Ting says of the company's IT systems. "Performance stability would have to be more certain in the future for us to do that, but we're watching that."
The monitoring from New Relic provides performance metrics IGN couldn't get when it was using other tools, he says. The old tools "did OK for physical machine monitoring, but didn't do application stack monitoring at all without a lot of work from the engineering team."
By watching the New Relic management tools, IT workers can spin up more cloud-based servers, bring down poorly performing instances of applications, then add new instances as needed to keep up response times for users, he says. With the previous tools, Ting's team would obtain insights only into uptime, not response time.
"New Relic gave us tremendous visibility into the response time," which allows IT staffers to take actions on servers even when the servers are running, Ting explains. For example, "we have found instances where one memcached server performed much worse than others in the pool. Upon further investigation, we found one of the memory modules to be defective. In the Nagios world, that server would be running in the pool until it dies."
IGN.com is using Amazon's EC2 for its forays into the cloud today, Ting says. With New Relic, IGN.com can watch over all the parts of its three-tiered architecture, from its front end to its databases to its API tier. The management tools help ensure that user response times stay optimal and don't spike.
"We can look at what's running on the cloud" using plugins that collect data and send all the analytics back to the New Relic tools, Ting says. "They give you very detailed reporting on how the server group is performing," he adds.
"The amount of data and the precision of the data is tremendous," Ting says. "This is where we can start looking at the metrics and be able to make intelligent business decisions with it."
In addition to moving its IT infrastructure, IGN.com has been exploring the cloud to host many of its more than 100 websites for increased performance and uptime, Ting says. The main sites include IGN.com, Askmen.com, Gamespy.com, Fileplanet.com, Teamxbox.com and Gamestats.com.
So far, the trials have been looking positive, Ting says. "We've got some infrastructure pieces moving out into the cloud," he notes. "It's in the experimental stage right now, and we're checking performance."