When Hurricane Sandy struck New York, images of the storm-battered city soon eclipsed other pictures from its destructive course across the Caribbean and Atlantic. It also made clear nature’s effect on business. The New York Stock Exchange was closed for October 29-30 while, with eight days to go until the US Presidential Election, the Huffington Post found its New York data centre flooded.
A combination of damage to other sites in New York and Newark; disrupted communications infrastructure; and data replication and re-synching taking far longer than originally anticipated, meant that that the IT team had to race against time to have the site up and running again in time for the election. While they succeeded, they also learned valuable lessons about disaster recovery.
The Modern Business:
As the example of the Huffington Post shows, modern, 24/7 businesses need their IT services to be constantly available. This is as true of banks as it is of online news sites - for instance, Amazon Web Services and EC2 outages have affected businesses in a wide variety of industries, cutting them off from their websites and other cloud-based services for hours or even days.
Disaster recovery is no longer just about recreating a physical working environment but also ensuring there is no, or at worst minimal, disruption to IT services.
One of the key issues with disaster recovery in this modern IT environment is that ensuring the entire process will work correctly is a costly, time-consuming process. Disaster recovery teams must inspect disaster recovery locations to ensure that they can be brought online quickly and effectively, and that all IT services can be recovered.
Ideally, organisations will also perform disaster recovery drills to ensure that, if and when the worst happens, each part of the business knows its role. The expense and time involved, not to mention the effects of removing key personnel from their duties for drills, means that such tests are rare.
Testing - The Elephant in the Room:
This approach simply won’t work for modern IT services. Even if an organisation had the resources to test disaster recovery every month, that is still far too long for data and applications that change on a 24/7 basis.
Traditionally, testing whether backed-up data can be recovered in the event of a disaster has been sporadic at best. Research conducted by Vanson Bourne in 2013 found that enterprises only test their backups every three months, and only 7.4% of all backups each time, due to the time and effort required.
We have the technology to ensure that this isn’t the case and that, at the very least, testing happens regularly. Increased automation of IT functions can be passed onto testing as well: whether on a daily basis, or even every time a backup is made. While automation removes the man-hours needed in testing, the growth of virtualisation and its corresponding scalability and flexibility makes it simpler to create separate infrastructure so that testing won’t impact the production environment.
Testing won’t address every single issue with disaster recovery: for example, it wouldn’t have protected the Huffington Post’s New York and Newark sites from additional damage, or solved all the issues with communications infrastructure. However, it removes one part of the regular disaster recovery testing process: easing the cost and time taken at least a little. It also means organisations can always be confident that at least one part of the process will operate exactly as planned.
Disaster Recovery in Practice:
An example of disaster recovery working as planned can be seen at Catalent, a worldwide drug development, delivery and supply company. When one of its Japanese facilities was directly threatened by power outages in the wake of an earthquake and ensuing tsunami, Catalent had full confidence in the reliability of its backed-up servers.
Catalent replicated these to its second, unaffected site in Japan and continued operating with no disruption to IT services. Less than two weeks later, Catalent’s site in Corby, Lancashire was destroyed in a fire: including vital servers running inventory, production and shipping applications. Again, Catalent had confidence that those applications and their data were fully protected and could quickly resume operations.
There is no way of knowing when disaster will strike an organisation, and sometimes only pure luck will separate a Catalent from a Huffington Post. However, businesses should ensure that the odds are as far in their favour as possible.
Regular testing of physical disaster recovery procedures, together with automated data protection testing will both reduce costs and ensure that, regardless of what might happen, the always-on, IT-centric modern business isn’t crippled by disaster.
Posted by Ian Wells, VP North-West Europe, Veeam