Leading businesses like Netflix, Flickr, Etsy and Google love to fail -- but not in the ways you might think. Their engineers routinely perform internal failure tests to find defects in their systems, log bugs, and then test against the repairs. These companies understand that failure is a normal operating condition that must be anticipated, accommodated and designed into IT systems.
This continuous improvement strategy involves more than just ensuring that systems have high availability. It’s about supporting the nonstop business - 24 hours a day, 365 days a year -- even when the inevitable failures happen. Systems must be designed to fail intelligently, allowing the business to continue to function under failure and under attack.
According to the Accenture Technology Vision 2014, true resiliency—the ability of IT systems to maintain wholly acceptable levels of operational performance during planned and unplanned disturbances—is what will help organizations mitigate risks to revenue and brand reputation caused by service disruptions. It’s time to architect resiliency into all dimensions of the nonstop enterprise, including applications, business processes, infrastructure and security.
The road to resiliency
As businesses go digital, they become more susceptible to disruption. Arguably, the vulnerability that CIOs feel most acutely is from cyber-attacks. The more professional and prolific attacks become, the greater the role that cyber-security plays in business continuity.
IT leaders should use a business-driven strategy to managing risk across the enterprise. Initial focus for prioritizing resiliency and active defense investments should be aligned to the business processes that comprise the most downside revenue risk in the event of their failure. This process will provide the CIO with the data needed to move from a compliance-focused stance to one that is more threat-centric and tied to strategic risk.
Cyber-attacks aside, businesses that are striving to become digital have to keep up with always-on expectations. Simply posting notices about planned downtime is no longer acceptable. There is less and less tolerance for service interruptions in any form - from business partners and consumers. The repercussions of these always-on expectations ripple throughout the software lifecycle.
Businesses in every industry are beginning to embrace agile development practices throughout the development, deployment, and operations. The challenge of transitioning to agile at scale is being met by a suite of operational tactics and technologies, including DevOps (business-driven integration of software development and IT operations), performance monitoring and failure tracing, workload management, and software defined networking (SDN). These practices and technologies pave the road to resiliency by making it possible to build always-on software and hardware systems.
The right mindset
Resiliency is not achieved by putting cybersecurity structures in place and deploying best-of-breed highly available systems. It’s not about compliance. It’s a shift in mindset to the idea of 100 percent uptime. It’s about having a deep understanding of the constant threats of business disruptions—from hurricanes, hackers or internal upgrades—and the risks that those threats pose to maintaining operational continuity and brand value.
The time to start architecting for resiliency is right now—not when customers expect it or when losses in trade secrets, revenue, or brand value have reached painful levels. Leading CIOs who truly understand the concept of resiliency have begun transitioning their organizations to an always-on state. Knowing that it is neither simple nor cheap, they are taking a pragmatic approach, phasing in resiliency over time in anticipation of a time when their entire business is digital and always-on.
Posted by Michael Biltz, director of the Accenture Technology Vision. Follow him on Twitter at @mjbiltz.
Find your next job with computerworld UK jobs