How management failings led to RBS IT catastrophe

The £56 million fine received by the Royal Bank of Scotland (RBS) today brought an end to the lengthy investigation into the bank’s high profile IT failure during the summer of 2012.


However, the FCA also said that the wider business failed to monitor risks around IT - a function that is central to the overall running of the bank. This was partly due to a lack of IT knowledge held by group management, as well as other factors, such as incomplete audits of IT, including mainframe systems in the preceding 12 months. The responsibility for management of risks also fell to the board, which did not properly review group-wide governance policy measures.

Such policies were "limited in scope" because they "addressed recovering from a single low probability but high impact event" such as the total loss of a data centre, rather than smaller but more probable disruptions like software failure.

The aftermath

Following the outage, regulators in the UK and Ireland began investigations into the issue, with the Central Bank of Ireland fining Ulster Bank last week for failing to ensure stability of infrastructure, which had been outsourced to RBS in 2005. The FCA also subsequently launched a wider investigation into robustness of IT systems used by all UK banks.

Meanwhile, RBS CEO Ross McEwan pledged to invest £750 million over three years to improve resilience of its systems. This included remedial action to simplify its legacy estate and attempt to prevent further occurrences. For example, it completed the separation of its batch processing systems for individual banks within the group in May, meaning that outages will not have the ripple effect on other arms of its business in future.

It is also attempting to reduce the number of core banking platforms by half in the next two years.

“In this year alone we have gone from a single overnight batch to now running four overnight separated batches on different parts of our business,” said McEwan in a recent earnings call.

“That is one of the heaviest lifting I think any financial services organisation could do in their lives. And we have done that in the last 12 months and nobody noticed it. And at the same time we have double batched, we have doubled security, we've got a little bit of work to do on the ATM and the point-of-sale technology, fronts that we connect into, to do by the end of this year.”

So is the FCA right?

So is the FCA correct to say that the outage was due to a lack of safeguards rather than investment in IT? It is likely that both aspects played a major role, said TechMarketView financial services analyst, Peter Roe, with the hugely complex and intertwined legacy systems used by the bank the real underlying cause.

While there were clearly huge amounts spent on IT each year, it was not necessarily targeted in the right place, he said.

“A huge amount of resource has been devoted to keeping the lights on, and also making systems compliant with all the new regulation, so there was little in the past that was able to be spent on real change,” he said.

“[The batch processing software upgrade] is an example of how legacy systems increase the vulnerability of the banks, particularly when systems have to change, as they do.

“It was not a lack of investment – because of the amount that they were spending - but it shows the complexity of changing IT within a bank, [and the danger of] not having the correct process and quality control, because of the fiendishly complicated structures and systems.”

He added: “It is a bit like a game of ‘pick-up-sticks’ – you try and pick one up without moving the other ones – but with software. If you move one system you are going to move others, and if you are not careful you can bring the whole lot of them down.”

According to Lev Lesokhin, executive vice president at software quality analysis firm CAST, legacy infrastructure was ultimately to blame.

“The underlying issue is the creaking infrastructure which the largest (and oldest) UK banks use. This is under increasing pressure to deliver ‘Google-like’ customer services demanded by customers today,” he said.

“Western banking systems are particularly exposed because they were the first to install computer systems, and investment in those systems has since been neglected as tightening budgets have meant less is spent on modernisation and quality assurance.

“Until these underlying issues are addressed and industry standards put in place, we will continue to see glitches like this.”


Find your next job with computerworld UK jobs